admin SAL

5101-5150 of 5599 results (36ms)

2020-09-29 §
14:47	<arturo>	rebooting cloudvirt1012, chasing config weirdness in the linuxbridge agent	[admin]
14:05	<andrewbogott>	reimaging 1014 over and over in an attempt to get partman right	[admin]
13:51	<arturo>	rebooting cloudvirt1012	[admin]
2020-09-28 §
14:55	<arturo>	[jbond42] upgraded facter to v3 across the VM fleet	[admin]
13:54	<andrewbogott>	moving cloudvirt1035 from aggregate 'spare' to 'ceph'. We're going to need all the capacity we can get while converting older cloudvirts to ceph	[admin]
2020-09-24 §
15:47	<arturo>	stopping/restarting rabbitmq-server in all cloudcontrol servers	[admin]
15:45	<arturo>	restarting rabbitmq-server in cloudcontrol103	[admin]
15:15	<arturo>	restarting floating_ip_ptr_records_updater.service in all 3 cloudcontrol servers to reset state after a DNS failure	[admin]
2020-09-18 §
10:16	<arturo>	cloudvirt1039 libvirtd service issues were fixed with a reboot	[admin]
09:56	<arturo>	rebooting cloudvirt1039 (spare) to try to fix some weird libvirtd failure	[admin]
09:50	<arturo>	enabling puppet in cloudvirts and effectively merging patches from T262979	[admin]
08:59	<arturo>	disable puppet in all buster cloudvirts (cloudvirt[1024,1031-1039].eqiad.wmnet) to merge a patch for T263205 and T262979	[admin]
08:50	<arturo>	installing iptables from buster-bpo in cloudvirt1036 (T263205 and T262979)	[admin]
2020-09-15 §
20:32	<andrewbogott>	rebooting cloudvirt1038 to see if it resolves T262979	[admin]
13:58	<andrewbogott>	draining cloudvirt1002 with wmcs-ceph-migrate	[admin]
2020-09-14 §
14:21	<andrewbogott>	draining cloudvirt1001, migrating all VMs with wmcs-ceph-migrate	[admin]
10:41	<arturo>	[codfw1dev] trying to get the bonding working for labtestvirt2003 (T261724)	[admin]
09:47	<arturo>	installed qemu security update in eqiad1 cloudvirts (T262386)	[admin]
09:43	<arturo>	[codfw1dev] installed qemu security update in codfw1dev cloudvirts (T262386)	[admin]
2020-09-09 §
18:13	<andrewbogott>	restarting ceph-mon@cloudcephmon1003 in hopes that the slow ops reported are phantoms	[admin]
18:01	<andrewbogott>	restarting ceph-mgr@cloudcephmon1003 in hopes that the slow ops reported are phantoms (https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/EOWNO3MDYRUZKAK6RMQBQ5WBPQNLHOPV/)	[admin]
17:40	<andrewbogott>	giving ceph pg autoscale another chance: ceph osd pool set eqiad1-compute pg_autoscale_mode on	[admin]
00:05	<bd808>	Running wmcs-novastats-dnsleaks (T262359)	[admin]
2020-09-08 §
21:48	<bd808>	Renamed FQDN prefixes to wikimedia.cloud scheme in cloudinfra-db01's labspuppet db (T260614)	[admin]
14:29	<andrewbogott>	restarting nova-compute on all cloudvirts (everyone is upset from the reset switch failure)	[admin]
14:18	<arturo>	restarting nova-fullstack service in cloudcontrol1003	[admin]
14:17	<andrewbogott>	stopping apache2 on labweb1001 to make sure the Horizon outage is total	[admin]
2020-09-03 §
09:31	<arturo>	icinga downtime cloud* servers for 30 mins (T261866)	[admin]
2020-09-02 §
08:46	<arturo>	[codfw1dev] reimaging spare server labtestvirt2003 as debian buster (T261724)	[admin]
2020-09-01 §
18:18	<andrewbogott>	adding drives on cloudcephosd100[3-5] to ceph osd pool	[admin]
13:40	<andrewbogott>	adding drives on cloudcephosd101[0-2] to ceph osd pool	[admin]
13:34	<andrewbogott>	adding drives on cloudcephosd100[1-3] to ceph osd pool	[admin]
11:27	<arturo>	[codfw1dev] rebooting again cloudnet2002-dev after some network tests, to reset initial state (T261724)	[admin]
11:09	<arturo>	[codfw1dev] rebooting cloudnet2002-dev after some network tests, to reset initial state (T261724)	[admin]
10:49	<arturo>	disable puppet in cloudnet servers to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/623569/	[admin]
2020-08-31 §
23:26	<bd808>	Removed stale lockfile at cloud-puppetmaster-03.cloudinfra.eqiad.wmflabs:/var/lib/puppet/volatile/GeoIP/.geoipupdate.lock	[admin]
11:20	<arturo>	[codfw1dev] livehacking https://gerrit.wikimedia.org/r/c/operations/puppet/+/615161 in the puppetmasters for tests before merging	[admin]
2020-08-28 §
20:12	<bd808>	Running `wmcs-novastats-dnsleaks --delete` from cloudcontrol1003	[admin]
2020-08-26 §
17:12	<bstorm>	Running 'ionice -c 3 nice -19 find /srv/tools -type f -size +100M -printf "%k KB %p\n" > tools_large_files_20200826.txt' on labstore1004 T261336	[admin]
2020-08-21 §
21:34	<andrewbogott>	restarting nova-compute on cloudvirt1033; it seems stuck	[admin]
2020-08-19 §
14:21	<andrewbogott>	rebooting cloudweb2001-dev, labweb1001, labweb1002 to address mediawiki-induced memleak	[admin]
2020-08-06 §
21:02	<andrewbogott>	removing cloudvirt1004/1006 from nova's list of hypervisors; rebuilding them to use as backup test hosts	[admin]
20:06	<bstorm>	manually stopped the RAID check on cloudcontrol1003 T259760	[admin]
2020-08-04 §
18:54	<bstorm>	restarting mariadb on cloudcontrol1004 to setup parallel replication	[admin]
2020-08-03 §
17:02	<bstorm>	increased db connection limit to 800 across galera cluster because we were clearly hovering at limit	[admin]
2020-07-31 §
19:28	<bd808>	wmcs-novastats-dnsleaks --delete (lots of leaked fullstack-monitoring records to clean up)	[admin]
2020-07-27 §
22:17	<andrewbogott>	ceph osd pool set compute pg_num 2048	[admin]
22:14	<andrewbogott>	ceph osd pool set compute pg_autoscale_mode off	[admin]
2020-07-24 §
19:15	<andrewbogott>	ceph mgr module enable pg_autoscaler	[admin]
19:15	<andrewbogott>	ceph osd pool set compute pg_autoscale_mode on	[admin]