451-500 of 10000 results (31ms)
2025-05-28 ยง
18:19 <dancy@deploy1003> rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.3 refs T392173 [production]
18:15 <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1041.eqiad.wmnet' (T390914) [admin]
18:11 <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt1040.eqiad.wmnet' (T390914) [admin]
18:07 <swfrench@deploy1003> Finished scap sync-world: Scap deployment to put production in a consistent state - T377121 (duration: 07m 48s) [production]
18:04 <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt1040.eqiad.wmnet' (T390914) [admin]
18:03 <andrew@cloudcumin1001> END (ERROR) - Cookbook wmcs.openstack.restart_openstack (exit_code=97) on deployment eqiad1 for all services [admin]
18:00 <swfrench@deploy1003> Started scap sync-world: Scap deployment to put production in a consistent state - T377121 [production]
17:53 <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for all services [admin]
17:52 <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudnet1006.eqiad.wmnet' (T390914) [admin]
17:52 <btullis@cumin1002> START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bullseye [production]
17:48 <btullis@cumin1002> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-druid1003.eqiad.wmnet with OS bullseye [production]
17:43 <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudnet1006.eqiad.wmnet' (T390914) [admin]
17:42 <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudnet1005.eqiad.wmnet' (T390914) [admin]
17:37 <hmonroy@deploy1003> hmonroy: Continuing with sync [production]
17:34 <hmonroy@deploy1003> hmonroy: Backport for [[gerrit:1151756|InitialiseSettings: enable multiblocks on group1 (T377121)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [production]
17:33 <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudnet1005.eqiad.wmnet' (T390914) [admin]
17:31 <hmonroy@deploy1003> Started scap sync-world: Backport for [[gerrit:1151756|InitialiseSettings: enable multiblocks on group1 (T377121)]] [production]
17:30 <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudrabbit1001.eqiad.wmnet' (T390914) [admin]
17:29 <btullis@cumin1002> START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bullseye [production]
17:27 <jmm@cumin1003> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
17:24 <jmm@cumin1003> START - Cookbook sre.dns.netbox [production]
17:21 <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudrabbit1001.eqiad.wmnet' (T390914) [admin]
17:21 <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudrabbit1002.eqiad.wmnet' (T390914) [admin]
17:20 <bking@cumin2002> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on elastic[1054,1067,1103].eqiad.wmnet with reason: downtime until decom [production]
17:13 <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudrabbit1002.eqiad.wmnet' (T390914) [admin]
17:12 <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudrabbit1003.eqiad.wmnet' (T390914) [admin]
17:03 <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudrabbit1003.eqiad.wmnet' (T390914) [admin]
17:02 <andrew@cloudcumin1001> END (FAIL) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=99) on host 'cloudrabbot1003.eqiad.wmnet' (T390914) [admin]
17:02 <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudrabbot1003.eqiad.wmnet' (T390914) [admin]
17:00 <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudcontrol1006.eqiad.wmnet' (T390914) [admin]
16:59 <marostegui@cumin1002> dbctl commit (dc=all): 'es1038 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76658 and previous config saved to /var/cache/conftool/dbconfig/20250528-165939-root.json [production]
16:44 <marostegui@cumin1002> dbctl commit (dc=all): 'es1038 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76657 and previous config saved to /var/cache/conftool/dbconfig/20250528-164433-root.json [production]
16:43 <eevans@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2003.codfw.wmnet with OS bullseye [production]
16:42 <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudcontrol1006.eqiad.wmnet' (T390914) [admin]
16:42 <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudcontrol1007.eqiad.wmnet' (T390914) [admin]
16:40 <xcollazo@deploy1003> Finished deploy [airflow-dags/analytics@afad011]: Deploy latest DAGs for main Airflow instance. T385112. (duration: 00m 39s) [production]
16:40 <xcollazo@deploy1003> Started deploy [airflow-dags/analytics@afad011]: Deploy latest DAGs for main Airflow instance. T385112. [production]
16:36 <fceratto@deploy1003> helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . [production]
16:29 <marostegui@cumin1002> dbctl commit (dc=all): 'es1038 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76656 and previous config saved to /var/cache/conftool/dbconfig/20250528-162928-root.json [production]
16:26 <eevans@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2003.codfw.wmnet with reason: host reimage [production]
16:23 <eevans@cumin1002> START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2003.codfw.wmnet with reason: host reimage [production]
16:21 <marostegui@cumin1002> dbctl commit (dc=all): 'db2187 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76655 and previous config saved to /var/cache/conftool/dbconfig/20250528-162142-root.json [production]
16:20 <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudcontrol1007.eqiad.wmnet' (T390914) [admin]
16:19 <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudcontrol1011.eqiad.wmnet' (T390914) [admin]
16:19 <jmm@cumin1003> END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ncredir7003.magru.wmnet [production]
16:19 <jmm@cumin1003> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ncredir7003.magru.wmnet with OS bookworm [production]
16:18 <stevemunene@cumin1002> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker1119.eqiad.wmnet with reason: Repair data node volume failure [production]
16:18 <fceratto@cumin1002> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance [production]
16:17 <fceratto@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2227 (T395241)', diff saved to https://phabricator.wikimedia.org/P76654 and previous config saved to /var/cache/conftool/dbconfig/20250528-161740-fceratto.json [production]
16:14 <marostegui@cumin1002> dbctl commit (dc=all): 'es1038 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76653 and previous config saved to /var/cache/conftool/dbconfig/20250528-161423-root.json [production]