production SAL

2151-2200 of 10000 results (112ms)

2025-03-13 §
09:56	<elukey@cumin1002>	START - Cookbook sre.hosts.reimage for host restbase1044.eqiad.wmnet with OS bullseye	[production]
09:56	<elukey@cumin1002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1043.eqiad.wmnet with OS bullseye	[production]
09:56	<elukey@cumin1002>	END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"	[production]
09:53	<elukey@cumin1002>	START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"	[production]
09:42	<volans>	uploaded cumin_5.1.0 to apt.wikimedia.org bullseye-wikimedia	[production]
09:40	<elukey@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1043.eqiad.wmnet with reason: host reimage	[production]
09:37	<vgutierrez@cumin1002>	END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs6001.drmrs.wmnet} and A:liberica (T384477)	[production]
09:37	<vgutierrez@cumin1002>	START - Cookbook sre.loadbalancer.admin config_reloading P{lvs6001.drmrs.wmnet} and A:liberica (T384477)	[production]
09:36	<elukey@cumin1002>	START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1043.eqiad.wmnet with reason: host reimage	[production]
09:24	<elukey@cumin1002>	START - Cookbook sre.hosts.reimage for host restbase1043.eqiad.wmnet with OS bullseye	[production]
09:22	<stevemunene@cumin1002>	END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1204.eqiad.wmnet	[production]
09:20	<stevemunene@cumin1002>	START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1204.eqiad.wmnet	[production]
09:15	<elukey@cumin2002>	END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART	[production]
09:12	<gkyziridis@deploy2002>	helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .	[production]
09:10	<elukey@cumin2002>	START - Cookbook sre.hosts.provision for host restbase1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART	[production]
09:06	<stevemunene@cumin1002>	END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1204.eqiad.wmnet	[production]
09:04	<stevemunene@cumin1002>	START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1204.eqiad.wmnet	[production]
09:02	<vgutierrez@cumin1002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs6001.drmrs.wmnet with OS bookworm	[production]
08:51	<vgutierrez@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs6001.drmrs.wmnet with reason: host reimage	[production]
08:48	<vgutierrez@cumin1002>	START - Cookbook sre.hosts.downtime for 2:00:00 on lvs6001.drmrs.wmnet with reason: host reimage	[production]
08:46	<vgutierrez@cumin1002>	START - Cookbook sre.hosts.reimage for host lvs6001.drmrs.wmnet with OS bookworm	[production]
08:45	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1034.eqiad.wmnet with OS bookworm	[production]
08:30	<arnaudb@cumin1002>	DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on gerrit2003.wikimedia.org with reason: testing	[production]
08:28	<kevinbazira@deploy2002>	helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .	[production]
08:28	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1034.eqiad.wmnet with reason: host reimage	[production]
08:25	<elukey@cumin2002>	END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART	[production]
08:24	<jmm@cumin2002>	START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1034.eqiad.wmnet with reason: host reimage	[production]
08:20	<vgutierrez@cumin1002>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs6001.drmrs.wmnet with OS bookworm	[production]
08:15	<stevemunene@cumin1002>	END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) for hosts an-worker1204.eqiad.wmnet	[production]
08:14	<elukey@cumin2002>	START - Cookbook sre.hosts.provision for host restbase1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART	[production]
08:14	<stevemunene@cumin1002>	START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1204.eqiad.wmnet	[production]
08:12	<stevemunene@cumin1002>	END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1204.eqiad.wmnet	[production]
08:10	<vgutierrez@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs6001.drmrs.wmnet with reason: host reimage	[production]
08:09	<stevemunene@cumin1002>	START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1204.eqiad.wmnet	[production]
08:06	<vgutierrez@cumin1002>	START - Cookbook sre.hosts.downtime for 2:00:00 on lvs6001.drmrs.wmnet with reason: host reimage	[production]
08:03	<elukey@cumin2002>	END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART	[production]
08:02	<jmm@cumin2002>	START - Cookbook sre.hosts.reimage for host ganeti1034.eqiad.wmnet with OS bookworm	[production]
07:58	<elukey@cumin2002>	START - Cookbook sre.hosts.provision for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART	[production]
07:57	<elukey@cumin1002>	END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART	[production]
07:56	<elukey@cumin1002>	START - Cookbook sre.hosts.provision for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART	[production]
07:50	<vgutierrez@cumin1002>	START - Cookbook sre.hosts.reimage for host lvs6001.drmrs.wmnet with OS bookworm	[production]
07:42	<vgutierrez@cumin1002>	DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs6001.drmrs.wmnet with reason: depooled before reimage	[production]
07:42	<krinkle@deploy2002>	Finished scap sync-world: Backport for [[gerrit:1127164\|fatal-error: Ensure action=cache max-age is higher than response time]] (duration: 11m 28s)	[production]
07:41	<vgutierrez>	depool lvs6001 before being reimaged - T384477	[production]
07:35	<krinkle@deploy2002>	krinkle: Continuing with sync	[production]
07:33	<krinkle@deploy2002>	krinkle: Backport for [[gerrit:1127164\|fatal-error: Ensure action=cache max-age is higher than response time]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)	[production]
07:30	<krinkle@deploy2002>	Started scap sync-world: Backport for [[gerrit:1127164\|fatal-error: Ensure action=cache max-age is higher than response time]]	[production]
07:24	<marostegui@cumin1002>	dbctl commit (dc=all): 'es2039 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P74215 and previous config saved to /var/cache/conftool/dbconfig/20250313-072403-root.json	[production]
07:21	<marostegui@cumin1002>	dbctl commit (dc=all): 'es1039 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P74214 and previous config saved to /var/cache/conftool/dbconfig/20250313-072141-root.json	[production]
07:19	<stevemunene@cumin1002>	END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1200-1208].eqiad.wmnet	[production]