tools SAL

701-750 of 3644 results (22ms)

2021-03-24 §
11:20	<arturo>	created 80G cinder volume tools-docker-registry-data (T278303)	[tools]
11:10	<arturo>	starting VM tools-docker-registry-04 which was stopped probably since 2021-03-09 due to hypervisor draining	[tools]
2021-03-23 §
12:46	<arturo>	aborrero@tools-sgegrid-master:~$ sudo systemctl restart gridengine-master.service	[tools]
12:15	<arturo>	delete & re-create VM tools-sgegrid-shadow as Debian Buster (T277653)	[tools]
12:14	<arturo>	created puppet prefix 'tools-sgegrid-shadow' and migrated puppet configuration from VM-puppet	[tools]
12:13	<arturo>	created server group 'tools-grid-master-shadow' with anty-affinity policy	[tools]
2021-03-18 §
19:24	<bstorm>	set profile::toolforge::infrastructure across the entire project with login_server set on the bastion and exec node-related prefixes	[tools]
16:21	<andrewbogott>	enabling puppet tools-wide	[tools]
16:20	<andrewbogott>	disabling puppet tools-wide to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/672456	[tools]
16:19	<bstorm>	added profile::toolforge::infrastructure class to puppetmaster T277756	[tools]
04:12	<bstorm>	rebooted tools-sgeexec-0935.tools.eqiad.wmflabs because it forgot how to LDAP...likely root cause of the issues tonight	[tools]
03:59	<bstorm>	rebooting grid master. sorry for the cron spam	[tools]
03:49	<bstorm>	restarting sssd on tools-sgegrid-master	[tools]
03:37	<bstorm>	deleted a massive number of stuck jobs that misfired from the cron server	[tools]
03:35	<bstorm>	rebooting tools-sgecron-01 to try to clear up the ldap-related errors coming out of it	[tools]
01:46	<bstorm>	killed the toolschecker cron job, which had an LDAP error, and ran it again by hand	[tools]
2021-03-17 §
20:57	<bstorm>	deployed changes to rbac for kubernetes to add kubectl top access for tools	[tools]
20:26	<andrewbogott>	moving tools-elastic-3 to cloudvirt1034; two elastic nodes shouldn't be on the same hv	[tools]
2021-03-16 §
16:31	<arturo>	installing jobutils and misctools 1.41	[tools]
15:55	<bstorm>	deleted a bunch of messed up grid jobs (9989481,8813,81682,86317,122602,122623,583621,606945,606999)	[tools]
12:32	<arturo>	add packages jobutils / misctools v1.41 to {stretch,buster}-tools aptly repository in tools-sge-services-03	[tools]
2021-03-12 §
23:13	<bstorm>	cleared error state for all grid queues	[tools]
2021-03-11 §
17:40	<bstorm>	deployed metrics-server:0.4.1 to kubernetes	[tools]
16:21	<bstorm>	add jobutils 1.40 and misctools 1.40 to stretch-tools	[tools]
13:11	<arturo>	add misctools 1.37 to buster-tools\|toolsbeta aptly repo for T275865	[tools]
13:10	<arturo>	add jobutils 1.40 to buster-tools aptly repo for T275865	[tools]
2021-03-10 §
10:56	<arturo>	briefly stopped VM tools-k8s-etcd-7 to disable VMX cpu flag	[tools]
2021-03-09 §
13:31	<arturo>	hard-reboot tools-docker-registry-04 because issues related to T276922	[tools]
12:34	<arturo>	briefly rebooting VM tools-docker-registry-04, we need to reboot the hypervisor cloudvirt1038 and failed to migrate away	[tools]
2021-03-05 §
12:30	<arturo>	started tools-redis-1004 again	[tools]
12:22	<arturo>	stop tools-redis-1004 to ease draining of cloudvirt1035	[tools]
2021-03-04 §
11:25	<arturo>	rebooted tools-sgewebgrid-generic-0901, repool it again	[tools]
09:57	<arturo>	depool tools-sgewebgrid-generic-0901 to reboot VM. It was stuck in MIGRATING state when draining cloudvirt1022	[tools]
2021-03-03 §
15:17	<arturo>	shutting down tools-sgebastion-07 in an attempt to fix nova state and finish hypervisor migration	[tools]
15:11	<arturo>	tools-sgebastion-07 triggered a neutron exception (unauthorized) while being live-migrated from cloudvirt1021 to 1029. Resetting nova state with `nova reset-state bd685d48-1011-404e-a755-372f6022f345 --active` and try again	[tools]
14:48	<arturo>	killed pywikibot instance running in tools-sgebastion-07 by user msyn	[tools]
2021-03-02 §
15:23	<bstorm>	depooling tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs for reboot. It isn't communicating right	[tools]
15:22	<bstorm>	cleared queue error states...will need to keep a better eye on what's causing those	[tools]
2021-02-27 §
02:23	<bstorm>	deployed typo fix to maintain-kubeusers in an innocent effort to make the weekend better T275910	[tools]
02:00	<bstorm>	running a script to repair the dumps mount in all podpresets T275371	[tools]
2021-02-26 §
22:04	<bstorm>	cleaned up grid jobs 1230666,1908277,1908299,2441500,2441513	[tools]
21:27	<bstorm>	hard rebooting tools-sgeexec-0947	[tools]
21:21	<bstorm>	hard rebooting tools-sgeexec-0952.tools.eqiad.wmflabs	[tools]
20:01	<bd808>	Deleted csr in strange state for tool-ores-inspect	[tools]
2021-02-24 §
18:30	<bd808>	`sudo wmcs-openstack role remove --user zfilipin --project tools user` T267313	[tools]
01:04	<bstorm>	hard rebooting tools-k8s-worker-76 because it's in a sorry state	[tools]
2021-02-23 §
23:11	<bstorm>	draining a bunch of k8s workers to clean up after dumps changes T272397	[tools]
23:06	<bstorm>	draining tools-k8s-worker-55 to clean up after dumps changes T272397	[tools]
2021-02-22 §
20:40	<bstorm>	repooled tools-sgeexec-0918.tools.eqiad.wmflabs	[tools]
19:09	<bstorm>	hard rebooted tools-sgeexec-0918 from openstack T275411	[tools]