2025-03-27
ยง
|
16:05 |
<jhancock@cumin2002> |
START - Cookbook sre.dns.netbox |
[production] |
16:00 |
<elukey> |
`sudo systemctl restart burrow-jumbo-eqiad.service prometheus-burrow-exporter@jumbo-eqiad.service` on kafkamon1003 - attempt to check if the new kafka lag for benthos-webrequest_live is due to burrow - T390029 |
[production] |
15:59 |
<root@cloudcumin1001> |
START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers |
[tools] |
15:57 |
<ebernhardson@deploy1003> |
Finished scap sync-world: Backport for [[gerrit:1131359|Move cirrus traffic to eqiad for platform upgrade (T388610)]] (duration: 12m 49s) |
[production] |
15:53 |
<aborrero@cloudcumin1001> |
END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for all NFS workers |
[tools] |
15:53 |
<aborrero@cloudcumin1001> |
START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers |
[tools] |
15:51 |
<marostegui@cumin1002> |
dbctl commit (dc=all): 'db2181 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P74475 and previous config saved to /var/cache/conftool/dbconfig/20250327-155117-root.json |
[production] |
15:50 |
<ebernhardson@deploy1003> |
ebernhardson: Continuing with sync |
[production] |
15:49 |
<ebernhardson@deploy1003> |
ebernhardson: Backport for [[gerrit:1131359|Move cirrus traffic to eqiad for platform upgrade (T388610)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) |
[production] |
15:44 |
<bking@cumin2002> |
END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_codfw |
[production] |
15:44 |
<bking@cumin2002> |
START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_codfw |
[production] |
15:44 |
<ebernhardson@deploy1003> |
Started scap sync-world: Backport for [[gerrit:1131359|Move cirrus traffic to eqiad for platform upgrade (T388610)]] |
[production] |
15:44 |
<otto@deploy1003> |
helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply |
[production] |
15:44 |
<otto@deploy1003> |
helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply |
[production] |
15:36 |
<marostegui@cumin1002> |
dbctl commit (dc=all): 'db2181 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P74474 and previous config saved to /var/cache/conftool/dbconfig/20250327-153612-root.json |
[production] |
15:28 |
<hashar> |
Restarting Gerrit to raise heap from 32G to 64G (T387223) and to enable pushing notifications to browsers (T389327) |
[production] |
15:28 |
<ottomata> |
upgrading eventgate-logging-external to node20 (using new per stream header enrich setting), first testing in staging. - T383814, T387908 |
[production] |
15:26 |
<dcausse@deploy1003> |
helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
15:26 |
<dcausse@deploy1003> |
helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
15:21 |
<marostegui@cumin1002> |
dbctl commit (dc=all): 'db2181 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P74473 and previous config saved to /var/cache/conftool/dbconfig/20250327-152106-root.json |
[production] |
15:19 |
<hashar@deploy1003> |
Finished scap sync-world: Sync patch to PrivateSettings.php and removal of unused configs (Gerrit: 1127930 1127889 1127890 1127886 1125095 1127900 1127898 1127887 1127897 1127888 1127929) (duration: 11m 52s) |
[production] |
15:15 |
<elukey> |
update benthos@webrequest-live's config on centrallog nodes to new Kafka topics (haproxy vs varnishkafka) - T390029 |
[production] |
15:07 |
<hashar@deploy1003> |
Started scap sync-world: Sync patch to PrivateSettings.php and removal of unused configs (Gerrit: 1127930 1127889 1127890 1127886 1125095 1127900 1127898 1127887 1127897 1127888 1127929) |
[production] |
15:06 |
<hashar@deploy1003> |
sync-world aborted: Sync patch to PrivateSettings.php and removal of unused configs (Gerrit: 1127930 1127889 1127890 1127886 1125095 1127900 1127898 1127887 1127897 1127888 1127929) (duration: 00m 16s) |
[production] |
15:06 |
<btullis@cumin1002> |
START - Cookbook sre.dns.netbox |
[production] |
15:06 |
<hashar@deploy1003> |
Started scap sync-world: Sync patch to PrivateSettings.php and removal of unused configs (Gerrit: 1127930 1127889 1127890 1127886 1125095 1127900 1127898 1127887 1127897 1127888 1127929) |
[production] |
15:06 |
<marostegui@cumin1002> |
dbctl commit (dc=all): 'db2181 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P74472 and previous config saved to /var/cache/conftool/dbconfig/20250327-150601-root.json |
[production] |
15:05 |
<bd808> |
Moved role::acme_chief::cloud from individual instance config to deployment-acme-chief Puppet prefix. |
[releng] |
15:02 |
<brouberol@deploy1003> |
helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. |
[production] |
15:02 |
<taavi@cloudcumin1001> |
END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster |
[tools] |
15:02 |
<taavi@cloudcumin1001> |
Added a new k8s worker tools-k8s-worker-111.tools.eqiad1.wikimedia.cloud to the cluster |
[tools] |
15:02 |
<brouberol@deploy1003> |
helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. |
[production] |
15:00 |
<moritzm> |
installing setuptools security updates |
[production] |
14:59 |
<taavi@cloudcumin1001> |
END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-72 |
[tools] |
14:54 |
<tgr_> |
UTC afternoon deploys done |
[production] |
14:52 |
<tgr@deploy1003> |
Finished scap sync-world: Backport for [[gerrit:1131481|Enable SUL3 for temp users on group 0/1 (T384220)]] (duration: 22m 27s) |
[production] |
14:52 |
<taavi@cloudcumin1001> |
START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster |
[tools] |
14:45 |
<tgr@deploy1003> |
tgr: Continuing with sync |
[production] |
14:40 |
<moritzm> |
uploaded Boost 1.83.0-4.1~wmf12u1 (backport of Boost 1.83 to Bookworm, needed by Mapnik 4.0.6) T389776 |
[production] |
14:39 |
<brouberol@deploy1003> |
helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply |
[production] |
14:39 |
<brouberol@deploy1003> |
helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply |
[production] |
14:35 |
<tgr@deploy1003> |
tgr: Backport for [[gerrit:1131481|Enable SUL3 for temp users on group 0/1 (T384220)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) |
[production] |
14:34 |
<andrew@cloudcumin1001> |
END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment eqiad1 for all services |
[admin] |
14:33 |
<taavi@cloudcumin1001> |
START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-72 |
[tools] |
14:33 |
<taavi@cloudcumin1001> |
END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-72 |
[tools] |
14:33 |
<taavi@cloudcumin1001> |
START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-72 |
[tools] |
14:30 |
<tgr@deploy1003> |
Started scap sync-world: Backport for [[gerrit:1131481|Enable SUL3 for temp users on group 0/1 (T384220)]] |
[production] |
14:26 |
<jmm@deploy1003> |
helmfile [eqiad] DONE helmfile.d/services/thumbor: apply |
[production] |
14:24 |
<tgr@deploy1003> |
Finished scap sync-world: Backport for [[gerrit:1131444|Fix badpass logging for locally nonexistent users]] (duration: 19m 42s) |
[production] |
14:22 |
<andrew@cloudcumin1001> |
START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for all services |
[admin] |