production SAL

6901-6950 of 10000 results (135ms)

2024-10-24 §
11:05	<mvernon@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)	[production]
10:56	<jmm@cumin2002>	START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2038.codfw.wmnet	[production]
10:51	<mvernon@cumin2002>	START - Cookbook sre.hosts.reboot-cluster	[production]
10:43	<elukey@cumin2002>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bookworm	[production]
10:38	<mvernon@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)	[production]
10:30	<btullis@cumin1002>	END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host an-redacteddb1001.eqiad.wmnet	[production]
10:27	<jayme@cumin1002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1285.eqiad.wmnet with OS bookworm	[production]
10:26	<jynus@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: stopped being the active one, stopping replication	[production]
10:26	<jynus@cumin1002>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: stopped being the active one, stopping replication	[production]
10:23	<mvernon@cumin2002>	START - Cookbook sre.hosts.reboot-cluster	[production]
10:22	<jayme@cumin1002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1288.eqiad.wmnet with OS bookworm	[production]
10:22	<mvernon@cumin2002>	END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)	[production]
10:21	<mvernon@cumin2002>	START - Cookbook sre.hosts.reboot-cluster	[production]
10:21	<Emperor>	reboot apus frontends T376800	[production]
10:19	<jayme@cumin1002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1289.eqiad.wmnet with OS bookworm	[production]
10:18	<btullis@cumin1002>	START - Cookbook sre.hosts.reboot-single for host an-redacteddb1001.eqiad.wmnet	[production]
10:17	<jayme@cumin1002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1286.eqiad.wmnet with OS bookworm	[production]
10:11	<jynus@cumin1002>	dbctl commit (dc=all): 'promoting pc1014 as the master of pc5 T378068', diff saved to https://phabricator.wikimedia.org/P70584 and previous config saved to /var/cache/conftool/dbconfig/20241024-101150-jynus.json	[production]
10:08	<jayme@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1285.eqiad.wmnet with reason: host reimage	[production]
10:03	<jayme@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1288.eqiad.wmnet with reason: host reimage	[production]
10:03	<jynus@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1014.eqiad.wmnet with reason: moved pc number	[production]
10:03	<jynus@cumin1002>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1014.eqiad.wmnet with reason: moved pc number	[production]
10:00	<jayme@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1289.eqiad.wmnet with reason: host reimage	[production]
09:59	<jynus>	restart pc1014 T378068	[production]
09:57	<jayme@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1286.eqiad.wmnet with reason: host reimage	[production]
09:57	<jayme@cumin1002>	START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1289.eqiad.wmnet with reason: host reimage	[production]
09:55	<jayme@cumin1002>	START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1288.eqiad.wmnet with reason: host reimage	[production]
09:54	<jayme@cumin1002>	START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1285.eqiad.wmnet with reason: host reimage	[production]
09:54	<jayme@cumin1002>	START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1286.eqiad.wmnet with reason: host reimage	[production]
09:37	<jayme@cumin1002>	START - Cookbook sre.hosts.reimage for host wikikube-worker1289.eqiad.wmnet with OS bookworm	[production]
09:35	<jayme@cumin1002>	START - Cookbook sre.hosts.reimage for host wikikube-worker1288.eqiad.wmnet with OS bookworm	[production]
09:35	<jayme@cumin1002>	START - Cookbook sre.hosts.reimage for host wikikube-worker1286.eqiad.wmnet with OS bookworm	[production]
09:34	<jayme@cumin1002>	START - Cookbook sre.hosts.reimage for host wikikube-worker1285.eqiad.wmnet with OS bookworm	[production]
09:28	<elukey@cumin2002>	START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bookworm	[production]
09:25	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb1003.eqiad.wmnet	[production]
09:25	<jayme@cumin1002>	END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1285-1286,1288-1289].eqiad.wmnet	[production]
09:23	<jelto@deploy2002>	helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply	[production]
09:22	<jelto@deploy2002>	helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply	[production]
09:22	<jayme@cumin1002>	START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1285-1286,1288-1289].eqiad.wmnet	[production]
09:21	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host netboxdb1003.eqiad.wmnet	[production]
09:18	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb2003.codfw.wmnet	[production]
09:14	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host netboxdb2003.codfw.wmnet	[production]
09:12	<arnaudb@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on pc[1014,1017].eqiad.wmnet with reason: pc maintenance T378068	[production]
09:12	<arnaudb@cumin1002>	START - Cookbook sre.hosts.downtime for 4:00:00 on pc[1014,1017].eqiad.wmnet with reason: pc maintenance T378068	[production]
08:30	<arnaudb@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70582 and previous config saved to /var/cache/conftool/dbconfig/20241024-083027-arnaudb.json	[production]
08:30	<kevinbazira@deploy2002>	helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .	[production]
08:27	<kevinbazira@deploy2002>	helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .	[production]
08:23	<moritzm>	installing bash/zsh updates from bookworm point release	[production]
08:23	<kevinbazira@deploy2002>	helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .	[production]
08:22	<jmm@cumin2002>	END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master	[production]