2025-06-04
§
|
23:55 |
<brett@cumin2002> |
END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on A:ncredir |
[production] |
22:45 |
<brett@cumin2002> |
START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on A:ncredir |
[production] |
22:30 |
<robh@cumin2002> |
END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp7016.magru.wmnet |
[production] |
22:27 |
<vriley@cumin1002> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye |
[production] |
22:20 |
<robh@cumin2002> |
END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp7014.magru.wmnet |
[production] |
22:18 |
<damilare> |
SmashPig upgraded from d08693e5 to 3222a1f3 |
[production] |
22:16 |
<ladsgroup@deploy1003> |
Finished scap sync-world: Backport for [[gerrit:1153725|Bump cache key version in EventStore (T396075)]] (duration: 13m 54s) |
[production] |
22:12 |
<robh@cumin2002> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7016.magru.wmnet |
[production] |
22:12 |
<robh@cumin2002> |
END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7015.magru.wmnet |
[production] |
22:12 |
<robh@cumin2002> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7015.magru.wmnet |
[production] |
22:11 |
<robh@cumin2002> |
END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7015.magru.wmnet |
[production] |
22:11 |
<brett> |
sudo -i cumin 'A:ncredir' 'depool && apt-get update && apt-get upgrade -y && pool' -b1 -s10 |
[production] |
22:09 |
<ladsgroup@deploy1003> |
ladsgroup: Continuing with sync |
[production] |
22:04 |
<ladsgroup@deploy1003> |
ladsgroup: Backport for [[gerrit:1153725|Bump cache key version in EventStore (T396075)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. |
[production] |
22:02 |
<ladsgroup@deploy1003> |
Started scap sync-world: Backport for [[gerrit:1153725|Bump cache key version in EventStore (T396075)]] |
[production] |
22:02 |
<robh@cumin2002> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7015.magru.wmnet |
[production] |
22:02 |
<robh@cumin2002> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7014.magru.wmnet |
[production] |
22:02 |
<robh@cumin2002> |
END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp7013.magru.wmnet |
[production] |
21:58 |
<robh@cumin2002> |
END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp7012.magru.wmnet |
[production] |
21:43 |
<robh@cumin2002> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7013.magru.wmnet |
[production] |
21:42 |
<robh@cumin2002> |
END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7011.magru.wmnet |
[production] |
21:40 |
<robh@cumin2002> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7012.magru.wmnet |
[production] |
21:40 |
<robh@cumin2002> |
END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7010.magru.wmnet |
[production] |
21:39 |
<robh@cumin2002> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7010.magru.wmnet |
[production] |
21:35 |
<robh@cumin2002> |
END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7010.magru.wmnet |
[production] |
21:29 |
<ryankemper@cumin2002> |
START - Cookbook sre.wdqs.data-reload reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/scholarly/20250526/ using stat1011.eqiad.wmnet) |
[production] |
21:25 |
<robh@cumin2002> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7011.magru.wmnet |
[production] |
21:25 |
<robh@cumin2002> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7010.magru.wmnet |
[production] |
21:24 |
<ryankemper@cumin2002> |
START - Cookbook sre.wdqs.data-reload reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20250526/ using stat1009.eqiad.wmnet) |
[production] |
21:22 |
<robh@cumin2002> |
END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7009.magru.wmnet |
[production] |
21:14 |
<robh@cumin2002> |
END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp7008.magru.wmnet |
[production] |
21:07 |
<vriley@cumin1002> |
START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye |
[production] |
21:06 |
<vriley@cumin1002> |
END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED |
[production] |
21:05 |
<robh@cumin2002> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7009.magru.wmnet |
[production] |
21:05 |
<robh@cumin2002> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7008.magru.wmnet |
[production] |
21:04 |
<cjming> |
end of UTC late backport window |
[production] |
21:04 |
<robh@cumin2002> |
END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7007.magru.wmnet |
[production] |
21:02 |
<cjming@deploy1003> |
Finished scap sync-world: Backport for [[gerrit:1153689|SUL3: Retry local login on failure due to invalid/expired login token (T390784)]], [[gerrit:1153690|SUL3: Retry local login on failure… (follow-ups) (T390784)]], [[gerrit:1153691|SUL3: Retry local login on failure due to invalid/expired login token (T390784)]], [[gerrit:1153692|SUL3: Retry local login on failure… (follow-ups) (T390784)]] (d |
[production] |
21:01 |
<jforrester@deploy1003> |
helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply |
[production] |
20:55 |
<cjming@deploy1003> |
matmarex, cjming: Continuing with sync |
[production] |
20:55 |
<vriley@cumin1002> |
START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED |
[production] |
20:54 |
<cjming@deploy1003> |
matmarex, cjming: Backport for [[gerrit:1153689|SUL3: Retry local login on failure due to invalid/expired login token (T390784)]], [[gerrit:1153690|SUL3: Retry local login on failure… (follow-ups) (T390784)]], [[gerrit:1153691|SUL3: Retry local login on failure due to invalid/expired login token (T390784)]], [[gerrit:1153692|SUL3: Retry local login on failure… (follow-ups) (T390784)]] synced to |
[production] |
20:51 |
<cjming@deploy1003> |
Started scap sync-world: Backport for [[gerrit:1153689|SUL3: Retry local login on failure due to invalid/expired login token (T390784)]], [[gerrit:1153690|SUL3: Retry local login on failure… (follow-ups) (T390784)]], [[gerrit:1153691|SUL3: Retry local login on failure due to invalid/expired login token (T390784)]], [[gerrit:1153692|SUL3: Retry local login on failure… (follow-ups) (T390784)]] |
[production] |
20:51 |
<jforrester@deploy1003> |
helmfile [codfw] START helmfile.d/services/wikifunctions: apply |
[production] |
20:50 |
<robh@cumin2002> |
END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7006.magru.wmnet |
[production] |
20:46 |
<robh@cumin2002> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7007.magru.wmnet |
[production] |
20:44 |
<robh@cumin2002> |
END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp7005.magru.wmnet |
[production] |
20:40 |
<robh@cumin2002> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7006.magru.wmnet |
[production] |
20:38 |
<cjming@deploy1003> |
Finished scap sync-world: Backport for [[gerrit:1153686|Treat File::getShortDesc() as possibly unsafe HTML (T395834)]], [[gerrit:1153687|Treat File::getShortDesc() as possibly unsafe HTML (T395834)]] (duration: 15m 37s) |
[production] |
20:37 |
<robh@cumin2002> |
END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7004.magru.wmnet |
[production] |