1701-1750 of 10000 results (17ms)
2025-05-05 ยง
12:04 <aqu@deploy1003> Started deploy [analytics/refinery@dbfa557] (thin): Deploying new refinery/source artifacts THIN [analytics/refinery@dbfa557d] [production]
12:04 <aqu@deploy1003> Finished deploy [analytics/refinery@dbfa557]: Deploying new refinery/source artifacts [analytics/refinery@dbfa557d] (duration: 03m 17s) [production]
12:04 <jmm@cumin2002> START - Cookbook sre.hosts.reboot-single for host ganeti1024.eqiad.wmnet [production]
12:01 <aqu@deploy1003> Started deploy [analytics/refinery@dbfa557]: Deploying new refinery/source artifacts [analytics/refinery@dbfa557d] [production]
12:00 <aqu> Deploying new artifacts in analytics/refinery 0.2.29.4 and 0.2.61 [analytics]
12:00 <aqu@deploy1003> Finished deploy [analytics/refinery@dbfa557] (hadoop-test): Deploying new refinery/source artifacts TEST [analytics/refinery@dbfa557d] (duration: 00m 53s) [production]
11:59 <aqu@deploy1003> Started deploy [analytics/refinery@dbfa557] (hadoop-test): Deploying new refinery/source artifacts TEST [analytics/refinery@dbfa557d] [production]
11:58 <filippo@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet [production]
11:58 <filippo@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2008.codfw.wmnet [production]
11:56 <filippo@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet [production]
11:53 <jmm@cumin2002> START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1024.eqiad.wmnet [production]
11:49 <filippo@cumin1002> START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet [production]
11:49 <filippo@cumin1002> START - Cookbook sre.hosts.reboot-single for host prometheus2008.codfw.wmnet [production]
11:49 <filippo@cumin1002> START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet [production]
11:46 <filippo@cumin1002> END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host prometheus2006.codfw.wmnet [production]
11:46 <filippo@cumin1002> START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet [production]
11:45 <filippo@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2007.codfw.wmnet [production]
11:44 <filippo@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet [production]
11:38 <filippo@cumin1002> START - Cookbook sre.hosts.reboot-single for host prometheus2007.codfw.wmnet [production]
11:34 <filippo@cumin1002> START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet [production]
11:12 <jelto@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1003.eqiad.wmnet [production]
11:05 <jelto@cumin1002> START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet [production]
11:05 <jynus@cumin1002> DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 6:00:00 on backup[1010-1014].eqiad.wmnet with reason: Upgrade and restart [production]
11:04 <jelto@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts2002.codfw.wmnet [production]
10:57 <jelto@cumin1002> START - Cookbook sre.hosts.reboot-single for host vrts2002.codfw.wmnet [production]
10:57 <jelto@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts2002.codfw.wmnet [production]
10:35 <elukey@puppetserver1001> conftool action : set/pooled=true; selector: dnsdisc=inference,name=codfw [production]
10:32 <jelto@cumin1002> START - Cookbook sre.hosts.reboot-single for host vrts2002.codfw.wmnet [production]
10:32 <jelto@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts2002.codfw.wmnet [production]
10:24 <tappof> rebooting prometheus1007 into linux-image-6.1.0-33-amd64 [production]
10:19 <andrew@cloudcumin1001> START - Cookbook wmcs.ceph.osd.depool_and_destroy [admin]
10:17 <jelto@cumin1002> START - Cookbook sre.hosts.reboot-single for host vrts2002.codfw.wmnet [production]
09:58 <elukey@deploy1003> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' . [production]
09:45 <hashar> Cleared /srv/docker/overlay2 on contint2002 [releng]
09:41 <hashar> Cleared /srv/docker/overlay2 on contint1002 (it had bunch of old layers from April/May 2024) [releng]
09:39 <elukey@deploy1003> helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. [production]
09:39 <elukey@deploy1003> helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. [production]
09:38 <elukey> depool inference/codfw from DNS discovery to safely apply new pod/container security settings - T369493 [production]
09:30 <dreamyjazz@deploy1003> Finished scap sync-world: Backport for [[gerrit:1141844|[plwiki] Add 'abusefilter-view-private' to sysop (T393353)]] (duration: 13m 04s) [production]
09:23 <dreamyjazz@deploy1003> dreamyjazz, msz2001: Continuing with sync [production]
09:21 <dreamyjazz@deploy1003> dreamyjazz, msz2001: Backport for [[gerrit:1141844|[plwiki] Add 'abusefilter-view-private' to sysop (T393353)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
09:17 <dreamyjazz@deploy1003> Started scap sync-world: Backport for [[gerrit:1141844|[plwiki] Add 'abusefilter-view-private' to sysop (T393353)]] [production]
09:03 <godog> powercycle vrts1003 + vrts2002 - soft lockup T393357 [production]
08:56 <godog> powercycle centrallog2002 - can not login on ssh or console [production]
08:40 <ryankemper@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2015.codfw.wmnet with OS bullseye [production]
08:32 <tappof> rebooting prometheus2007 - no ssh, com2 via racadm hangs [production]
08:32 <godog> powercycle centrallog1002 - can not login on ssh or console [production]
08:21 <ryankemper@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2015.codfw.wmnet with reason: host reimage [production]
08:19 <andrew@cloudcumin1001> END (FAIL) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=99) [admin]
08:17 <ryankemper@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2015.codfw.wmnet with reason: host reimage [production]