2024-11-14
ยง
|
16:57 |
<swfrench@cumin2002> |
START - Cookbook sre.discovery.datacenter depool all active/active services in eqiad: Network maintenance - None |
[production] |
16:52 |
<mfossati@deploy2002> |
Finished deploy [airflow-dags/platform_eng@7c4873e]: decouple article-level image suggestions from section-level ones (duration: 00m 53s) |
[production] |
16:51 |
<mfossati@deploy2002> |
Started deploy [airflow-dags/platform_eng@7c4873e]: decouple article-level image suggestions from section-level ones |
[production] |
16:45 |
<swfrench@cumin2002> |
END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None |
[production] |
16:45 |
<swfrench@cumin2002> |
START - Cookbook sre.discovery.datacenter status all services in all: None - None |
[production] |
16:40 |
<cgoubert@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage |
[production] |
16:38 |
<swfrench@cumin2002> |
END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) |
[production] |
16:37 |
<swfrench@cumin2002> |
START - Cookbook sre.discovery.datacenter |
[production] |
16:36 |
<cgoubert@cumin1002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage |
[production] |
16:36 |
<swfrench@cumin2002> |
END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) |
[production] |
16:36 |
<swfrench@cumin2002> |
START - Cookbook sre.discovery.datacenter |
[production] |
16:33 |
<ladsgroup@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1190.eqiad.wmnet with reason: Sad |
[production] |
16:33 |
<ladsgroup@cumin1002> |
START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1190.eqiad.wmnet with reason: Sad |
[production] |
16:33 |
<ladsgroup@cumin1002> |
dbctl commit (dc=all): 'db1190 sad', diff saved to https://phabricator.wikimedia.org/P71044 and previous config saved to /var/cache/conftool/dbconfig/20241114-163317-ladsgroup.json |
[production] |
16:31 |
<klausman@deploy2002> |
helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. |
[production] |
16:31 |
<klausman@deploy2002> |
helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. |
[production] |
16:18 |
<cgoubert@cumin1002> |
START - Cookbook sre.hosts.reimage for host wikikube-worker1305.eqiad.wmnet with OS bullseye |
[production] |
16:04 |
<cmooney@cumin1002> |
END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 151575 |
[production] |
16:03 |
<cmooney@cumin1002> |
START - Cookbook sre.network.peering with action 'configure' for AS: 151575 |
[production] |
16:01 |
<papaul> |
ongoing maintenance on cr1-eqiad |
[production] |
16:00 |
<jhancock@cumin2002> |
START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED |
[production] |
15:57 |
<pt1979@cumin2002> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cr1-eqiad,cr1-eqiad IPV6,re0.cr1-eqiad.mgmt with reason: router upgrade |
[production] |
15:57 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cr1-eqiad,cr1-eqiad IPV6,re0.cr1-eqiad.mgmt with reason: router upgrade |
[production] |
15:56 |
<sukhe@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp4043.ulsfo.wmnet with reason: depooled, debugging |
[production] |
15:56 |
<sukhe@cumin1002> |
START - Cookbook sre.hosts.downtime for 4:00:00 on cp4043.ulsfo.wmnet with reason: depooled, debugging |
[production] |
15:55 |
<pt1979@cumin2002> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cr1-eqiad,cr1-eqiad IPV6,cr1-eqiad.mgmt with reason: router upgrade |
[production] |
15:55 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cr1-eqiad,cr1-eqiad IPV6,cr1-eqiad.mgmt with reason: router upgrade |
[production] |
15:49 |
<moritzm> |
installing nss security updates |
[production] |
15:47 |
<reedy@deploy2002> |
Synchronized wmf-config/CommonSettings.php: T379834 (duration: 08m 02s) |
[production] |
15:47 |
<sukhe@puppetserver1001> |
conftool action : set/pooled=no; selector: name=cp4043.ulsfo.wmnet |
[production] |
15:47 |
<sukhe@cumin1002> |
END (ERROR) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=97) Rolling upgrade/restart of Apache Traffic Server on P{cp4043*,cp4051*} and A:cp for 9.2.6-1wm1 |
[production] |
15:45 |
<jayme@cumin2002> |
END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl2002.codfw.wmnet |
[production] |
15:45 |
<jayme@cumin2002> |
START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl2002.codfw.wmnet |
[production] |
15:45 |
<jayme@cumin2002> |
END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-ctrl2002.codfw.wmnet |
[production] |
15:45 |
<jayme@cumin2002> |
START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-ctrl2002.codfw.wmnet |
[production] |
15:43 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.network.cf (exit_code=0) |
[production] |
15:43 |
<pt1979@cumin2002> |
START - Cookbook sre.network.cf |
[production] |
15:42 |
<sukhe@cumin1002> |
START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4043*,cp4051*} and A:cp for 9.2.6-1wm1 |
[production] |
15:40 |
<stevemunene@cumin1002> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1016.eqiad.wmnet with OS bullseye |
[production] |
15:39 |
<stevemunene@cumin1002> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1020.eqiad.wmnet with OS bullseye |
[production] |
15:37 |
<volans> |
installed spicerack v8.16.1 to cumin hosts |
[production] |
15:36 |
<sukhe@cumin1002> |
END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad [reason: junos upgrade, T364092] |
[production] |
15:36 |
<sukhe@cumin1002> |
START - Cookbook sre.dns.admin DNS admin: depool site eqiad [reason: junos upgrade, T364092] |
[production] |
15:35 |
<ladsgroup@deploy2002> |
Finished scap sync-world: Backport for [[gerrit:1091248|Revert "mmv.js: Store comingFromHashChange as a class property" (T379835)]] (duration: 12m 10s) |
[production] |
15:33 |
<sukhe> |
reprepro -C main include bullseye-wikimedia trafficserver_9.2.6-1wm1_amd64.changes: T379797 |
[production] |
15:30 |
<sukhe@cumin1002> |
START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox |
[production] |
15:29 |
<jayme@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: T379719 |
[production] |
15:29 |
<jayme@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: T379719 |
[production] |
15:28 |
<jayme@cumin2002> |
END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-ctrl2002.codfw.wmnet |
[production] |
15:28 |
<jayme@cumin2002> |
START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-ctrl2002.codfw.wmnet |
[production] |