51-100 of 10000 results (124ms)
2026-03-13 ยง
18:10 <jclark@cumin1003> START - Cookbook sre.hosts.provision for host wikikube-worker1374.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [production]
18:10 <jclark@cumin1003> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
18:10 <jclark@cumin1003> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" [production]
18:10 <jclark@cumin1003> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" [production]
18:10 <elukey@cumin1003> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1253.eqiad.wmnet with reason: Host went down and paged, depooled [production]
18:06 <cgoubert@cumin1003> dbctl commit (dc=all): 'Depool db1253', diff saved to https://phabricator.wikimedia.org/P89856 and previous config saved to /var/cache/conftool/dbconfig/20260313-180640-cgoubert.json [production]
18:06 <jclark@cumin1003> START - Cookbook sre.dns.netbox [production]
18:05 <brett@cumin2002> START - Cookbook sre.hosts.reimage for host cp4051.ulsfo.wmnet with OS trixie [production]
18:03 <elukey> powercycle db1253 - host not reachable via ssh, no events logged in racadm getsel, no console com2 available (blank screen) [production]
17:59 <brett@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage [production]
17:56 <brett@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage [production]
17:49 <brett@puppetserver1001> conftool action : set/pooled=yes; selector: name=cp4049.* [production]
17:46 <brett@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4049.ulsfo.wmnet with OS trixie [production]
17:37 <cgoubert@deploy2002> helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply [production]
17:37 <cgoubert@deploy2002> helmfile [codfw] START helmfile.d/services/rest-gateway: apply [production]
17:36 <brett@cumin2002> START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie [production]
17:35 <brett@cumin2002> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4050.ulsfo.wmnet with OS trixie [production]
17:35 <cgoubert@deploy2002> helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply [production]
17:34 <cgoubert@deploy2002> helmfile [eqiad] START helmfile.d/services/rest-gateway: apply [production]
17:27 <jforrester@deploy2002> helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply [production]
17:26 <jforrester@deploy2002> helmfile [eqiad] START helmfile.d/services/mw-experimental: apply [production]
17:26 <jforrester@deploy2002> helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply [production]
17:26 <jforrester@deploy2002> helmfile [codfw] START helmfile.d/services/mw-experimental: apply [production]
17:20 <brett@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4049.ulsfo.wmnet with reason: host reimage [production]
17:17 <brett@cumin2002> START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie [production]
17:17 <cgoubert@deploy2002> helmfile [staging] DONE helmfile.d/services/rest-gateway: apply [production]
17:16 <cgoubert@deploy2002> helmfile [staging] START helmfile.d/services/rest-gateway: apply [production]
17:16 <brett@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on cp4049.ulsfo.wmnet with reason: host reimage [production]
17:12 <fnegri@cumin1003> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for clouddb1016.eqiad.wmnet [production]
17:12 <fnegri@cumin1003> START - Cookbook sre.hosts.remove-downtime for clouddb1016.eqiad.wmnet [production]
17:11 <fnegri@cumin1003> conftool action : set/pooled=yes; selector: name=clouddb1016.eqiad.wmnet [production]
17:11 <brett@puppetserver1001> conftool action : set/pooled=yes; selector: name=cp4048.* [production]
17:10 <dhinus> (relogging failed sal) conftool action : set/pooled=no; selector: name=clouddb1016.eqiad.wmnet [production]
17:10 <dhinus> (relogging failed sal) DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1016.eqiad.wmnet with reason: Rebooting clouddb1016 T419960 [production]
17:09 <dhinus> (relogging failed sal) END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for clouddb1015.eqiad.wmnet [production]
17:08 <dhinus> (relogging failed sal) START - Cookbook sre.hosts.remove-downtime for clouddb1015.eqiad.wmnet [production]
17:08 <jforrester@deploy2002> helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply [production]
17:07 <jforrester@deploy2002> helmfile [eqiad] START helmfile.d/services/mw-experimental: apply [production]
17:07 <dhinus> fnegri@cumin1003 conftool action : set/pooled=yes; selector: name=clouddb1015.eqiad.wmnet [production]
17:07 <jforrester@deploy2002> helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply [production]
17:07 <brett@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie [production]
17:06 <jforrester@deploy2002> helmfile [codfw] START helmfile.d/services/mw-experimental: apply [production]
16:40 <brett@cumin2002> START - Cookbook sre.hosts.reimage for host cp4049.ulsfo.wmnet with OS trixie [production]
16:39 <brett@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage [production]
16:36 <fnegri@cumin1003> conftool action : set/pooled=no; selector: name=clouddb1015.eqiad.wmnet [production]
16:35 <fnegri@cumin1003> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1015.eqiad.wmnet with reason: Rebooting clouddb1015 T419960 [production]
16:34 <fnegri@cumin1003> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for clouddb1014.eqiad.wmnet [production]
16:34 <fnegri@cumin1003> START - Cookbook sre.hosts.remove-downtime for clouddb1014.eqiad.wmnet [production]
16:34 <fnegri@cumin1003> conftool action : set/pooled=yes; selector: name=clouddb1014.eqiad.wmnet [production]
16:29 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt-staging2001.codfw.wmnet [production]