1-50 of 10000 results (17ms)
2025-07-11 ยง
22:26 <vriley@cumin1002> START - Cookbook sre.hosts.reimage for host db1259.eqiad.wmnet with OS bookworm [production]
22:10 <vriley@cumin1002> END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1259.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [production]
21:50 <vriley@cumin1002> START - Cookbook sre.hosts.provision for host db1259.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [production]
21:49 <vriley@cumin1002> END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1259 [production]
21:48 <vriley@cumin1002> START - Cookbook sre.network.configure-switch-interfaces for host db1259 [production]
21:46 <vriley@cumin1002> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
21:46 <vriley@cumin1002> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt db1259 - vriley@cumin1002" [production]
21:46 <vriley@cumin1002> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt db1259 - vriley@cumin1002" [production]
21:43 <vriley@cumin1002> START - Cookbook sre.dns.netbox [production]
21:41 <bd808> Unblock 37.114.160.0/19 (T399236) [releng]
20:59 <bd808> Reboted deployment-mediawiki14 to clear active load (T399329) [releng]
20:55 <bd808> blocked even more wide IP ranges in an attempt to get the load on deployment-mediawiki14 consistently below 3. (T399329) [releng]
18:38 <andrewbogott> it didn't [admin]
18:32 <andrewbogott> rebooting cloudceph1013 to see if its missing OSD drive reappears [admin]
18:25 <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.ceph.osd.reactivate (exit_code=0) [admin]
18:25 <andrew@cloudcumin1001> START - Cookbook wmcs.ceph.osd.reactivate [admin]
18:24 <andrew@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1008.eqiad.wmnet with OS bullseye [production]
18:24 <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.ceph.osd.reactivate (exit_code=0) [admin]
18:23 <andrew@cloudcumin1001> START - Cookbook wmcs.ceph.osd.reactivate [admin]
18:23 <andrew@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1007.eqiad.wmnet with OS bullseye [production]
18:21 <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.ceph.osd.reactivate (exit_code=0) [admin]
18:20 <andrew@cloudcumin1001> START - Cookbook wmcs.ceph.osd.reactivate [admin]
18:20 <andrew@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1006.eqiad.wmnet with OS bullseye [production]
18:19 <andrew@cloudcumin1001> END (FAIL) - Cookbook wmcs.ceph.osd.reactivate (exit_code=99) [admin]
18:19 <andrew@cloudcumin1001> START - Cookbook wmcs.ceph.osd.reactivate [admin]
18:19 <andrew@cloudcumin1001> END (FAIL) - Cookbook wmcs.ceph.osd.reactivate (exit_code=99) [admin]
18:19 <andrew@cloudcumin1001> START - Cookbook wmcs.ceph.osd.reactivate [admin]
18:09 <andrew@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1008.eqiad.wmnet with reason: host reimage [production]
18:07 <andrew@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1007.eqiad.wmnet with reason: host reimage [production]
18:06 <andrew@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1008.eqiad.wmnet with reason: host reimage [production]
18:03 <andrew@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1007.eqiad.wmnet with reason: host reimage [production]
18:03 <andrew@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1006.eqiad.wmnet with reason: host reimage [production]
17:57 <andrew@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1006.eqiad.wmnet with reason: host reimage [production]
17:48 <andrew@cumin2002> START - Cookbook sre.hosts.reimage for host cloudcephosd1008.eqiad.wmnet with OS bullseye [production]
17:45 <andrew@cumin2002> START - Cookbook sre.hosts.reimage for host cloudcephosd1007.eqiad.wmnet with OS bullseye [production]
17:39 <andrew@cumin2002> START - Cookbook sre.hosts.reimage for host cloudcephosd1006.eqiad.wmnet with OS bullseye [production]
17:32 <andrew@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1035.eqiad.wmnet with OS bullseye [production]
17:29 <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.ceph.osd.reactivate (exit_code=0) [admin]
17:28 <andrew@cloudcumin1001> START - Cookbook wmcs.ceph.osd.reactivate [admin]
17:21 <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-55 [tools]
17:11 <andrew@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1035.eqiad.wmnet with reason: host reimage [production]
17:10 <sukhe@cumin1003> END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqsin [reason: done testing issues with primary arelion link, T399221] [production]
17:10 <sukhe@cumin1003> START - Cookbook sre.dns.admin DNS admin: pool site eqsin [reason: done testing issues with primary arelion link, T399221] [production]
16:52 <bd808> Reboot deployment-mediawiki14 to clear all open connections (T399329) [releng]
16:51 <andrew@cumin2002> START - Cookbook sre.hosts.reimage for host cloudcephosd1035.eqiad.wmnet with OS bullseye [production]
16:51 <andrew@cumin2002> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1035.eqiad.wmnet with OS bullseye [production]
16:46 <andrew@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1036.eqiad.wmnet with OS bullseye [production]
16:45 <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.ceph.osd.reactivate (exit_code=0) [admin]
16:45 <andrew@cloudcumin1001> START - Cookbook wmcs.ceph.osd.reactivate [admin]
16:31 <topranks> drain Arelion CCT from codfw to eqsin - still see minor packet loss which is affecting purged T399221 [production]