1251-1300 of 10000 results (104ms)
2025-05-05 ยง
12:43 <klausman@cumin2002> START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet [production]
12:42 <klausman@cumin2002> END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet [production]
12:39 <klausman@cumin2002> START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet [production]
12:34 <jmm@cumin2002> END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet [production]
12:34 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet [production]
12:28 <tappof> Rolling reboot of Prometheus nodes in eqiad (1005, 1006, 1008) to rollback the kernel [production]
12:27 <jmm@cumin2002> START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet [production]
12:22 <jmm@cumin2002> START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet [production]
12:10 <jmm@cumin2002> END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1024.eqiad.wmnet [production]
12:10 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1024.eqiad.wmnet [production]
12:06 <aqu@deploy1003> Finished deploy [analytics/refinery@dbfa557] (thin): Deploying new refinery/source artifacts THIN [analytics/refinery@dbfa557d] (duration: 01m 07s) [production]
12:04 <aqu@deploy1003> Started deploy [analytics/refinery@dbfa557] (thin): Deploying new refinery/source artifacts THIN [analytics/refinery@dbfa557d] [production]
12:04 <aqu@deploy1003> Finished deploy [analytics/refinery@dbfa557]: Deploying new refinery/source artifacts [analytics/refinery@dbfa557d] (duration: 03m 17s) [production]
12:04 <jmm@cumin2002> START - Cookbook sre.hosts.reboot-single for host ganeti1024.eqiad.wmnet [production]
12:01 <aqu@deploy1003> Started deploy [analytics/refinery@dbfa557]: Deploying new refinery/source artifacts [analytics/refinery@dbfa557d] [production]
12:00 <aqu@deploy1003> Finished deploy [analytics/refinery@dbfa557] (hadoop-test): Deploying new refinery/source artifacts TEST [analytics/refinery@dbfa557d] (duration: 00m 53s) [production]
11:59 <aqu@deploy1003> Started deploy [analytics/refinery@dbfa557] (hadoop-test): Deploying new refinery/source artifacts TEST [analytics/refinery@dbfa557d] [production]
11:58 <filippo@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet [production]
11:58 <filippo@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2008.codfw.wmnet [production]
11:56 <filippo@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet [production]
11:53 <jmm@cumin2002> START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1024.eqiad.wmnet [production]
11:49 <filippo@cumin1002> START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet [production]
11:49 <filippo@cumin1002> START - Cookbook sre.hosts.reboot-single for host prometheus2008.codfw.wmnet [production]
11:49 <filippo@cumin1002> START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet [production]
11:46 <filippo@cumin1002> END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host prometheus2006.codfw.wmnet [production]
11:46 <filippo@cumin1002> START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet [production]
11:45 <filippo@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2007.codfw.wmnet [production]
11:44 <filippo@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet [production]
11:38 <filippo@cumin1002> START - Cookbook sre.hosts.reboot-single for host prometheus2007.codfw.wmnet [production]
11:34 <filippo@cumin1002> START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet [production]
11:12 <jelto@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1003.eqiad.wmnet [production]
11:05 <jelto@cumin1002> START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet [production]
11:05 <jynus@cumin1002> DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 6:00:00 on backup[1010-1014].eqiad.wmnet with reason: Upgrade and restart [production]
11:04 <jelto@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts2002.codfw.wmnet [production]
10:57 <jelto@cumin1002> START - Cookbook sre.hosts.reboot-single for host vrts2002.codfw.wmnet [production]
10:57 <jelto@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts2002.codfw.wmnet [production]
10:35 <elukey@puppetserver1001> conftool action : set/pooled=true; selector: dnsdisc=inference,name=codfw [production]
10:32 <jelto@cumin1002> START - Cookbook sre.hosts.reboot-single for host vrts2002.codfw.wmnet [production]
10:32 <jelto@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts2002.codfw.wmnet [production]
10:24 <tappof> rebooting prometheus1007 into linux-image-6.1.0-33-amd64 [production]
10:17 <jelto@cumin1002> START - Cookbook sre.hosts.reboot-single for host vrts2002.codfw.wmnet [production]
09:58 <elukey@deploy1003> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' . [production]
09:39 <elukey@deploy1003> helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. [production]
09:39 <elukey@deploy1003> helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. [production]
09:38 <elukey> depool inference/codfw from DNS discovery to safely apply new pod/container security settings - T369493 [production]
09:30 <dreamyjazz@deploy1003> Finished scap sync-world: Backport for [[gerrit:1141844|[plwiki] Add 'abusefilter-view-private' to sysop (T393353)]] (duration: 13m 04s) [production]
09:23 <dreamyjazz@deploy1003> dreamyjazz, msz2001: Continuing with sync [production]
09:21 <dreamyjazz@deploy1003> dreamyjazz, msz2001: Backport for [[gerrit:1141844|[plwiki] Add 'abusefilter-view-private' to sysop (T393353)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
09:17 <dreamyjazz@deploy1003> Started scap sync-world: Backport for [[gerrit:1141844|[plwiki] Add 'abusefilter-view-private' to sysop (T393353)]] [production]
09:03 <godog> powercycle vrts1003 + vrts2002 - soft lockup T393357 [production]