2025-05-05
§
|
09:58 |
<elukey@deploy1003> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' . |
[production] |
09:45 |
<hashar> |
Cleared /srv/docker/overlay2 on contint2002 |
[releng] |
09:41 |
<hashar> |
Cleared /srv/docker/overlay2 on contint1002 (it had bunch of old layers from April/May 2024) |
[releng] |
09:39 |
<elukey@deploy1003> |
helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. |
[production] |
09:39 |
<elukey@deploy1003> |
helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. |
[production] |
09:38 |
<elukey> |
depool inference/codfw from DNS discovery to safely apply new pod/container security settings - T369493 |
[production] |
09:30 |
<dreamyjazz@deploy1003> |
Finished scap sync-world: Backport for [[gerrit:1141844|[plwiki] Add 'abusefilter-view-private' to sysop (T393353)]] (duration: 13m 04s) |
[production] |
09:23 |
<dreamyjazz@deploy1003> |
dreamyjazz, msz2001: Continuing with sync |
[production] |
09:21 |
<dreamyjazz@deploy1003> |
dreamyjazz, msz2001: Backport for [[gerrit:1141844|[plwiki] Add 'abusefilter-view-private' to sysop (T393353)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) |
[production] |
09:17 |
<dreamyjazz@deploy1003> |
Started scap sync-world: Backport for [[gerrit:1141844|[plwiki] Add 'abusefilter-view-private' to sysop (T393353)]] |
[production] |
09:03 |
<godog> |
powercycle vrts1003 + vrts2002 - soft lockup T393357 |
[production] |
08:56 |
<godog> |
powercycle centrallog2002 - can not login on ssh or console |
[production] |
08:40 |
<ryankemper@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2015.codfw.wmnet with OS bullseye |
[production] |
08:32 |
<tappof> |
rebooting prometheus2007 - no ssh, com2 via racadm hangs |
[production] |
08:32 |
<godog> |
powercycle centrallog1002 - can not login on ssh or console |
[production] |
08:21 |
<ryankemper@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2015.codfw.wmnet with reason: host reimage |
[production] |
08:19 |
<andrew@cloudcumin1001> |
END (FAIL) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=99) |
[admin] |
08:17 |
<ryankemper@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2015.codfw.wmnet with reason: host reimage |
[production] |
08:17 |
<tappof> |
powercycle prometheus2008 - no ssh, mgmt console showing systemd units being deactivated, no root login |
[production] |
08:15 |
<elukey> |
powercycle prometheus2005 - no ssh, mgmt console showing systemd units being deactivated, no root login |
[production] |
08:11 |
<elukey> |
powercycle prometheus1008 - no ssh, mgmt console showing cpu soft lockup continously |
[production] |
08:05 |
<jgiannelos@deploy1003> |
helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply |
[production] |
08:05 |
<jgiannelos@deploy1003> |
helmfile [eqiad] START helmfile.d/services/mobileapps: apply |
[production] |
08:02 |
<tappof> |
rebooting prometheus1005 prometheus1006 and prometheus2006 |
[production] |
08:00 |
<ryankemper@cumin2002> |
END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs2015 |
[production] |
08:00 |
<ryankemper@cumin2002> |
END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs2015 |
[production] |
08:00 |
<ryankemper@cumin2002> |
START - Cookbook sre.network.configure-switch-interfaces for host wdqs2015 |
[production] |
08:00 |
<ryankemper@cumin2002> |
END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs2015.codfw.wmnet 209.48.192.10.in-addr.arpa 9.0.2.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors |
[production] |
08:00 |
<ryankemper@cumin2002> |
START - Cookbook sre.dns.wipe-cache wdqs2015.codfw.wmnet 209.48.192.10.in-addr.arpa 9.0.2.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors |
[production] |
08:00 |
<ryankemper@cumin2002> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
08:00 |
<ryankemper@cumin2002> |
END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs2015 - ryankemper@cumin2002" |
[production] |
08:00 |
<ryankemper@cumin2002> |
START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs2015 - ryankemper@cumin2002" |
[production] |
07:59 |
<jgiannelos@deploy1003> |
helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply |
[production] |
07:59 |
<jgiannelos@deploy1003> |
helmfile [eqiad] START helmfile.d/services/mobileapps: apply |
[production] |
07:59 |
<jgiannelos@deploy1003> |
helmfile [codfw] DONE helmfile.d/services/mobileapps: apply |
[production] |
07:58 |
<jgiannelos@deploy1003> |
helmfile [codfw] START helmfile.d/services/mobileapps: apply |
[production] |
07:54 |
<Dreamy_Jazz> |
UTC morning backport window finished |
[production] |
07:54 |
<dreamyjazz@deploy1003> |
Finished scap sync-world: Backport for [[gerrit:1141573|nnwiki: enable wgCiteResponsiveReferences (T393299)]], [[gerrit:1141582|ruwikibooks: enable VisualEditorAvailableNamespaces for Рецепт (recipe) namespace (T392803)]], [[gerrit:1141089|Add checkuserwiki favicon (T393246)]], [[gerrit:1141574|nupwiki: add timezone (T390711)]] (duration: 14m 11s) |
[production] |
07:47 |
<dreamyjazz@deploy1003> |
dreamyjazz, bunnypranav, anzx: Continuing with sync |
[production] |
07:44 |
<dreamyjazz@deploy1003> |
dreamyjazz, bunnypranav, anzx: Backport for [[gerrit:1141573|nnwiki: enable wgCiteResponsiveReferences (T393299)]], [[gerrit:1141582|ruwikibooks: enable VisualEditorAvailableNamespaces for Рецепт (recipe) namespace (T392803)]], [[gerrit:1141089|Add checkuserwiki favicon (T393246)]], [[gerrit:1141574|nupwiki: add timezone (T390711)]] synced to the testservers (https://wikitech.wikimedia.org |
[production] |
07:40 |
<dreamyjazz@deploy1003> |
Started scap sync-world: Backport for [[gerrit:1141573|nnwiki: enable wgCiteResponsiveReferences (T393299)]], [[gerrit:1141582|ruwikibooks: enable VisualEditorAvailableNamespaces for Рецепт (recipe) namespace (T392803)]], [[gerrit:1141089|Add checkuserwiki favicon (T393246)]], [[gerrit:1141574|nupwiki: add timezone (T390711)]] |
[production] |
07:31 |
<kartik@deploy1003> |
Finished scap sync-world: Backport for [[gerrit:1140703|Mobile frequent languages entrypoint: Add dependency to sitemapper (T393144 T386223)]] (duration: 17m 27s) |
[production] |
07:25 |
<kartik@deploy1003> |
abi, kartik: Continuing with sync |
[production] |
07:21 |
<ryankemper@cumin2002> |
START - Cookbook sre.dns.netbox |
[production] |
07:21 |
<ryankemper@cumin2002> |
START - Cookbook sre.hosts.move-vlan for host wdqs2015 |
[production] |
07:20 |
<ryankemper@cumin2002> |
START - Cookbook sre.hosts.reimage for host wdqs2015.codfw.wmnet with OS bullseye |
[production] |
07:19 |
<ryankemper@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2014.codfw.wmnet with OS bullseye |
[production] |
07:19 |
<kartik@deploy1003> |
abi, kartik: Backport for [[gerrit:1140703|Mobile frequent languages entrypoint: Add dependency to sitemapper (T393144 T386223)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) |
[production] |
07:15 |
<brouberol@deploy1003> |
helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. |
[production] |
07:15 |
<brouberol@deploy1003> |
helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. |
[production] |