8451-8500 of 10000 results (37ms)
2024-09-24 ยง
09:46 <btullis@cumin1002> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1001.eqiad.wmnet with OS bookworm [production]
09:33 <jiji@cumin1002> START - Cookbook sre.hosts.provision for host mw2426.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTARTand with Dell SCP reboot policy GRACEFUL [production]
09:27 <tappof> upgrade mtail on lists* and ncredir* https://phabricator.wikimedia.org/T375085 [production]
09:25 <arnaudb@cumin1002> END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=93) for the switch from eqiad to codfw [production]
09:25 <arnaudb@cumin1002> START - Cookbook sre.switchdc.databases.prepare for the switch from eqiad to codfw [production]
09:21 <jnuche@deploy1003> scap failed: <UnboundLocalError> local variable 'e' referenced before assignment (scap version: 4.104.0-1) (duration: 38m 15s) [production]
09:19 <stevemunene@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1176.eqiad.wmnet with OS bullseye [production]
09:11 <wmbot~dcaro@urcuchillay> START - Cookbook wmcs.ceph.osd.undrain_rack [admin]
09:11 <stevemunene@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1177.eqiad.wmnet with OS bullseye [production]
09:10 <wmbot~dcaro@urcuchillay> END (ERROR) - Cookbook wmcs.ceph.osd.undrain_rack (exit_code=97) [admin]
09:10 <wmbot~dcaro@urcuchillay> START - Cookbook wmcs.ceph.osd.undrain_rack [admin]
09:08 <jiji@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw2427.codfw.wmnet with reason: reimage [production]
09:07 <jiji@cumin1002> START - Cookbook sre.hosts.downtime for 3:00:00 on mw2427.codfw.wmnet with reason: reimage [production]
09:07 <jiji@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw2426.codfw.wmnet with reason: reimage [production]
09:07 <jiji@cumin1002> START - Cookbook sre.hosts.downtime for 3:00:00 on mw2426.codfw.wmnet with reason: reimage [production]
09:05 <jiji@cumin1002> END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2427.codfw.wmnet [production]
09:04 <jiji@cumin1002> START - Cookbook sre.k8s.pool-depool-node depool for host mw2427.codfw.wmnet [production]
09:04 <jiji@cumin1002> END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2426.codfw.wmnet [production]
09:03 <jiji@cumin1002> START - Cookbook sre.k8s.pool-depool-node depool for host mw2426.codfw.wmnet [production]
09:03 <stevemunene@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1176.eqiad.wmnet with reason: host reimage [production]
08:59 <stevemunene@cumin1002> START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1176.eqiad.wmnet with reason: host reimage [production]
08:55 <stevemunene@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1177.eqiad.wmnet with reason: host reimage [production]
08:51 <stevemunene@cumin1002> START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1177.eqiad.wmnet with reason: host reimage [production]
08:46 <stevemunene@cumin1002> START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye [production]
08:43 <jnuche@deploy1003> Started scap sync-world: testwikis to 1.43.0-wmf.24 refs T373643 [production]
08:37 <stevemunene@cumin1002> START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye [production]
08:36 <elukey@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetmaster2001.codfw.wmnet with reason: WIP - working on puppet runs [production]
08:36 <elukey@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on puppetmaster2001.codfw.wmnet with reason: WIP - working on puppet runs [production]
08:36 <fnegri@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223 [production]
08:36 <fnegri@cumin1002> START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223 [production]
08:30 <jnuche@deploy1003> Started scap sync-world: testwikis to 1.43.0-wmf.24 refs T373643 [production]
08:20 <stevemunene@cumin1002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye [production]
07:41 <XioNoX> reboot cr3-ulsfo - T375345 [production]
07:28 <dcausse> closing the backport window [production]
07:27 <dcausse@deploy1003> Finished scap sync-world: Backport for [[gerrit:1073565|Add a private variant of the cirrus update stream (T374335)]] (duration: 24m 11s) [production]
07:12 <dcausse@deploy1003> dcausse, ebernhardson: Continuing with sync [production]
07:07 <dcausse@deploy1003> dcausse, ebernhardson: Backport for [[gerrit:1073565|Add a private variant of the cirrus update stream (T374335)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
07:03 <dcausse@deploy1003> Started scap sync-world: Backport for [[gerrit:1073565|Add a private variant of the cirrus update stream (T374335)]] [production]
07:02 <tappof@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog1002.eqiad.wmnet [production]
06:35 <tappof@cumin2002> START - Cookbook sre.hosts.reboot-single for host centrallog1002.eqiad.wmnet [production]
05:55 <tappof> centrallog1002 upgrade to bookworm in progress https://phabricator.wikimedia.org/T353912 [production]
05:03 <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance [production]
05:03 <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance [production]
04:06 <mwpresync@deploy1003> Pruned MediaWiki: 1.43.0-wmf.21 (duration: 06m 02s) [production]
03:06 <eevans@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: Server depooled. Has hardware issues [production]
03:06 <eevans@cumin1002> START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: Server depooled. Has hardware issues [production]
03:02 <mwpresync@deploy1003> Started scap sync-world: testwikis to 1.43.0-wmf.24 refs T373643 [production]
01:16 <jclark@cumin1002> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
01:14 <jclark@cumin1002> START - Cookbook sre.dns.netbox [production]
01:08 <jclark@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1052.eqiad.wmnet with OS bookworm [production]