1-50 of 10000 results (84ms)
2026-03-13 ยง
18:58 <jclark@cumin1003> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1034.eqiad.wmnet with reason: host reimage [production]
18:58 <brett@cumin2002> START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie [production]
18:57 <brett@cumin2002> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS trixie [production]
18:55 <jclark@cumin1003> START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1035.eqiad.wmnet with reason: host reimage [production]
18:55 <jclark@cumin1003> START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1034.eqiad.wmnet with reason: host reimage [production]
18:47 <brett@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4051.ulsfo.wmnet with reason: host reimage [production]
18:43 <jclark@cumin1003> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage [production]
18:41 <brett@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on cp4051.ulsfo.wmnet with reason: host reimage [production]
18:40 <jclark@cumin1003> START - Cookbook sre.hosts.reimage for host wdqs1035.eqiad.wmnet with OS trixie [production]
18:40 <jclark@cumin1003> START - Cookbook sre.hosts.reimage for host wdqs1034.eqiad.wmnet with OS trixie [production]
18:40 <jclark@cumin1003> START - Cookbook sre.hosts.reimage for host wdqs1033.eqiad.wmnet with OS trixie [production]
18:36 <jclark@cumin1003> START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage [production]
18:35 <brett@cumin2002> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp4050.ulsfo.wmnet with reason: firmware updates [production]
18:34 <brett@cumin2002> START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie [production]
18:24 <brett@cumin2002> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp4050.ulsfo.wmnet [production]
18:24 <brett@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie [production]
18:22 <jclark@cumin1003> START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS bookworm [production]
18:21 <jclark@cumin1003> END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1374.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [production]
18:21 <brett@cumin2002> START - Cookbook sre.hosts.reimage for host cp4051.ulsfo.wmnet with OS trixie [production]
18:21 <brett@cumin2002> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4051.ulsfo.wmnet with OS trixie [production]
18:12 <jclark@cumin1003> START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS bookworm [production]
18:10 <jclark@cumin1003> START - Cookbook sre.hosts.provision for host wikikube-worker1374.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [production]
18:10 <jclark@cumin1003> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
18:10 <jclark@cumin1003> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" [production]
18:10 <jclark@cumin1003> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" [production]
18:10 <elukey@cumin1003> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1253.eqiad.wmnet with reason: Host went down and paged, depooled [production]
18:06 <cgoubert@cumin1003> dbctl commit (dc=all): 'Depool db1253', diff saved to https://phabricator.wikimedia.org/P89856 and previous config saved to /var/cache/conftool/dbconfig/20260313-180640-cgoubert.json [production]
18:06 <jclark@cumin1003> START - Cookbook sre.dns.netbox [production]
18:05 <brett@cumin2002> START - Cookbook sre.hosts.reimage for host cp4051.ulsfo.wmnet with OS trixie [production]
18:03 <elukey> powercycle db1253 - host not reachable via ssh, no events logged in racadm getsel, no console com2 available (blank screen) [production]
17:59 <brett@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage [production]
17:56 <brett@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage [production]
17:49 <brett@puppetserver1001> conftool action : set/pooled=yes; selector: name=cp4049.* [production]
17:46 <brett@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4049.ulsfo.wmnet with OS trixie [production]
17:37 <cgoubert@deploy2002> helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply [production]
17:37 <cgoubert@deploy2002> helmfile [codfw] START helmfile.d/services/rest-gateway: apply [production]
17:36 <brett@cumin2002> START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie [production]
17:35 <brett@cumin2002> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4050.ulsfo.wmnet with OS trixie [production]
17:35 <cgoubert@deploy2002> helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply [production]
17:34 <cgoubert@deploy2002> helmfile [eqiad] START helmfile.d/services/rest-gateway: apply [production]
17:27 <jforrester@deploy2002> helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply [production]
17:26 <jforrester@deploy2002> helmfile [eqiad] START helmfile.d/services/mw-experimental: apply [production]
17:26 <jforrester@deploy2002> helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply [production]
17:26 <jforrester@deploy2002> helmfile [codfw] START helmfile.d/services/mw-experimental: apply [production]
17:20 <brett@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4049.ulsfo.wmnet with reason: host reimage [production]
17:17 <brett@cumin2002> START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie [production]
17:17 <cgoubert@deploy2002> helmfile [staging] DONE helmfile.d/services/rest-gateway: apply [production]
17:16 <cgoubert@deploy2002> helmfile [staging] START helmfile.d/services/rest-gateway: apply [production]
17:16 <brett@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on cp4049.ulsfo.wmnet with reason: host reimage [production]
17:12 <fnegri@cumin1003> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for clouddb1016.eqiad.wmnet [production]