|
2025-12-02
ยง
|
| 16:57 |
<marostegui@cumin1003> |
dbctl commit (dc=all): 'Repooling after maintenance db2149 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86321 and previous config saved to /var/cache/conftool/dbconfig/20251202-165702-marostegui.json |
[production] |
| 16:54 |
<jhathaway@cumin1003> |
START - Cookbook sre.hosts.reimage for host sretest1005.eqiad.wmnet with OS bookworm |
[production] |
| 16:53 |
<swfrench@cumin2002> |
END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad (T352245) |
[production] |
| 16:53 |
<swfrench@cumin2002> |
START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad (T352245) |
[production] |
| 16:51 |
<ladsgroup@cumin1003> |
dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P86320 and previous config saved to /var/cache/conftool/dbconfig/20251202-165119-ladsgroup.json |
[production] |
| 16:51 |
<jhathaway@cumin1003> |
END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1005.eqiad.wmnet with OS bookworm |
[production] |
| 16:44 |
<ihurbain@deploy2002> |
Finished scap sync-world: Backport for [[gerrit:1214069|Bump parsoid to v0.23.0-a7.1 on wmf.4 (T411238 T410960)]], [[gerrit:1214070|Bump parsoid to v0.23.0-a7.1 on wmf.4 (T411238 T410960)]] (duration: 09m 21s) |
[production] |
| 16:43 |
<jhathaway@cumin1003> |
START - Cookbook sre.hosts.reimage for host sretest1005.eqiad.wmnet with OS bookworm |
[production] |
| 16:43 |
<inflatador> |
bking@wmf3062 restart WDQS codfw to resolve lag/possible deadlocks |
[production] |
| 16:39 |
<ihurbain@deploy2002> |
ihurbain: Continuing with sync |
[production] |
| 16:39 |
<jhathaway@cumin1003> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1005.eqiad.wmnet with OS bookworm |
[production] |
| 16:38 |
<btullis@deploy2002> |
helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/analytics-test: apply |
[production] |
| 16:38 |
<btullis@deploy2002> |
helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/analytics-test: apply |
[production] |
| 16:37 |
<ihurbain@deploy2002> |
ihurbain: Backport for [[gerrit:1214069|Bump parsoid to v0.23.0-a7.1 on wmf.4 (T411238 T410960)]], [[gerrit:1214070|Bump parsoid to v0.23.0-a7.1 on wmf.4 (T411238 T410960)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. |
[production] |
| 16:36 |
<ladsgroup@cumin1003> |
dbctl commit (dc=all): 'Repooling after maintenance db2193 (T410589)', diff saved to https://phabricator.wikimedia.org/P86319 and previous config saved to /var/cache/conftool/dbconfig/20251202-163612-ladsgroup.json |
[production] |
| 16:36 |
<marostegui@cumin1003> |
END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1251 gradually with 4 steps - Pool db1251.eqiad.wmnet in after cloning |
[production] |
| 16:35 |
<ihurbain@deploy2002> |
Started scap sync-world: Backport for [[gerrit:1214069|Bump parsoid to v0.23.0-a7.1 on wmf.4 (T411238 T410960)]], [[gerrit:1214070|Bump parsoid to v0.23.0-a7.1 on wmf.4 (T411238 T410960)]] |
[production] |
| 16:30 |
<jhathaway@cumin1003> |
START - Cookbook sre.hosts.reimage for host sretest1005.eqiad.wmnet with OS bookworm |
[production] |
| 16:27 |
<brett> |
import varnish 7.1.1-2~bpo13+wmf2 into trixie-wikimedia - T401832 |
[production] |
| 16:24 |
<jhathaway@cumin1003> |
END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1005.eqiad.wmnet with OS bookworm |
[production] |
| 16:23 |
<jhathaway@cumin1003> |
START - Cookbook sre.hosts.reimage for host sretest1005.eqiad.wmnet with OS bookworm |
[production] |
| 16:20 |
<jhathaway@cumin1003> |
END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1005.eqiad.wmnet with OS bookworm |
[production] |
| 16:19 |
<jhathaway@cumin1003> |
START - Cookbook sre.hosts.reimage for host sretest1005.eqiad.wmnet with OS bookworm |
[production] |
| 16:18 |
<swfrench-wmf> |
restarted navtiming on webperf1003 - T352245 |
[production] |
| 16:14 |
<swfrench-wmf> |
begin rolling restarts of eqiad-associated confds - T352245 |
[production] |
| 16:12 |
<moritzm> |
installing nodejs security updates |
[production] |
| 16:12 |
<swfrench@deploy2002> |
Unlocked for deployment [MediaWiki]: Hold deployments during etcd certificate change - T352245 (duration: 03m 45s) |
[production] |
| 16:12 |
<jhathaway@cumin1003> |
END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1005.eqiad.wmnet with OS bookworm |
[production] |
| 16:10 |
<jhathaway@cumin1003> |
START - Cookbook sre.hosts.reimage for host sretest1005.eqiad.wmnet with OS bookworm |
[production] |
| 16:08 |
<swfrench@deploy2002> |
Locking from deployment [MediaWiki]: Hold deployments during etcd certificate change - T352245 |
[production] |
| 16:08 |
<swfrench-wmf> |
migrating etcd to PKI certs on conf1008 - T352245 |
[production] |
| 16:08 |
<jhathaway@cumin1003> |
END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1005.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL |
[production] |
| 16:02 |
<moritzm> |
installing libsndfile security updates |
[production] |
| 16:01 |
<jhathaway@cumin1003> |
START - Cookbook sre.hosts.provision for host sretest1005.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL |
[production] |
| 16:00 |
<gehel> |
restarting wdqs@codfw - system overloaded |
[production] |
| 15:58 |
<jhathaway@cumin1003> |
DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on sretest1005.eqiad.wmnet with reason: ipxe |
[production] |
| 15:50 |
<marostegui@cumin1003> |
START - Cookbook sre.mysql.pool db1251 gradually with 4 steps - Pool db1251.eqiad.wmnet in after cloning |
[production] |
| 15:48 |
<moritzm> |
upgrade Envoy on Yarn T405808 |
[production] |
| 15:45 |
<mvernon@cumin1003> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1088.eqiad.wmnet with OS bullseye |
[production] |
| 15:29 |
<mvernon@cumin1003> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1088.eqiad.wmnet with reason: host reimage |
[production] |
| 15:25 |
<mvernon@cumin1003> |
START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1088.eqiad.wmnet with reason: host reimage |
[production] |
| 15:13 |
<moritzm> |
upgrade Envoy on Turnilo T405808 |
[production] |
| 15:12 |
<mvernon@cumin1003> |
START - Cookbook sre.hosts.reimage for host ms-be1088.eqiad.wmnet with OS bullseye |
[production] |
| 14:51 |
<Lucas_WMDE> |
UTC afternoon backport+config window done |
[production] |
| 14:47 |
<urbanecm@deploy2002> |
Finished scap sync-world: Backport for [[gerrit:1213988|[Growth] Enable Add Link for 3 wikis (T407818)]] (duration: 07m 46s) |
[production] |
| 14:43 |
<urbanecm@deploy2002> |
urbanecm: Continuing with sync |
[production] |
| 14:41 |
<urbanecm@deploy2002> |
urbanecm: Backport for [[gerrit:1213988|[Growth] Enable Add Link for 3 wikis (T407818)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. |
[production] |
| 14:41 |
<marostegui@cumin1003> |
dbctl commit (dc=all): 'Depooling db1198 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86314 and previous config saved to /var/cache/conftool/dbconfig/20251202-144148-marostegui.json |
[production] |
| 14:41 |
<marostegui@cumin1003> |
DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance |
[production] |
| 14:41 |
<marostegui@cumin1003> |
dbctl commit (dc=all): 'Repooling after maintenance db1189 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86313 and previous config saved to /var/cache/conftool/dbconfig/20251202-144123-marostegui.json |
[production] |