| 2025-09-17
      
      ยง | 
    
  | 09:20 | <elukey@cumin1003> | START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw | [production] | 
            
  | 09:19 | <hnowlan@deploy1003> | helmfile [staging] DONE helmfile.d/services/rest-gateway: apply | [production] | 
            
  | 09:19 | <hnowlan@deploy1003> | helmfile [staging] START helmfile.d/services/rest-gateway: apply | [production] | 
            
  | 09:18 | <elukey@cumin1003> | START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART | [production] | 
            
  | 09:17 | <elukey@cumin1003> | START - Cookbook sre.hosts.provision for host cp2055.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED | [production] | 
            
  | 09:17 | <jmm@cumin2002> | START - Cookbook sre.hosts.reimage for host maps2014.codfw.wmnet with OS bookworm | [production] | 
            
  | 09:14 | <elukey@cumin1003> | END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2054.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED | [production] | 
            
  | 09:11 | <ladsgroup@cumin1003> | dbctl commit (dc=all): 'Bump weight of db1206 in general group (T403966)', diff saved to https://phabricator.wikimedia.org/P83384 and previous config saved to /var/cache/conftool/dbconfig/20250917-091137-ladsgroup.json | [production] | 
            
  | 09:06 | <elukey@cumin1003> | END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART | [production] | 
            
  | 09:02 | <elukey@cumin1003> | START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART | [production] | 
            
  | 08:53 | <elukey@cumin1003> | START - Cookbook sre.hosts.provision for host cp2054.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED | [production] | 
            
  | 08:52 | <jmm@cumin2002> | END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps2013.codfw.wmnet with OS bookworm | [production] | 
            
  | 08:50 | <elukey@cumin1003> | END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2053.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED | [production] | 
            
  | 08:44 | <brouberol@deploy1003> | helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. | [production] | 
            
  | 08:43 | <brouberol@deploy1003> | helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. | [production] | 
            
  | 08:42 | <moritzm> | upgrading Envoy on deployment hosts T403663 | [production] | 
            
  | 08:36 | <kharlan@deploy1003> | Finished scap sync-world: Backport for [[gerrit:1189108|hCaptcha: Track events via Prometheus (T402767)]], [[gerrit:1189107|hCaptcha: Track events via Prometheus (T402767)]], [[gerrit:1189106|hCaptcha: Remove non-existent message]], [[gerrit:1189105|hCaptcha: Remove non-existent message]] (duration: 13m 41s) | [production] | 
            
  | 08:33 | <fabfur> | restart pybal on lvs1019/lvs2013/lvs2014 to clear out alert | [production] | 
            
  | 08:33 | <jmm@cumin2002> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2013.codfw.wmnet with reason: host reimage | [production] | 
            
  | 08:31 | <kharlan@deploy1003> | kharlan: Continuing with sync | [production] | 
            
  | 08:30 | <elukey@cumin1003> | START - Cookbook sre.hosts.provision for host cp2053.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED | [production] | 
            
  | 08:30 | <elukey@cumin1003> | END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2052.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED | [production] | 
            
  | 08:28 | <jmm@cumin2002> | START - Cookbook sre.hosts.downtime for 2:00:00 on maps2013.codfw.wmnet with reason: host reimage | [production] | 
            
  | 08:28 | <kharlan@deploy1003> | kharlan: Backport for [[gerrit:1189108|hCaptcha: Track events via Prometheus (T402767)]], [[gerrit:1189107|hCaptcha: Track events via Prometheus (T402767)]], [[gerrit:1189106|hCaptcha: Remove non-existent message]], [[gerrit:1189105|hCaptcha: Remove non-existent message]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. | [production] | 
            
  | 08:27 | <moritzm> | upgrading Envoy on IDM hosts T403663 | [production] | 
            
  | 08:22 | <kharlan@deploy1003> | Started scap sync-world: Backport for [[gerrit:1189108|hCaptcha: Track events via Prometheus (T402767)]], [[gerrit:1189107|hCaptcha: Track events via Prometheus (T402767)]], [[gerrit:1189106|hCaptcha: Remove non-existent message]], [[gerrit:1189105|hCaptcha: Remove non-existent message]] | [production] | 
            
  | 08:15 | <elukey@cumin1003> | START - Cookbook sre.hosts.provision for host cp2052.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED | [production] | 
            
  | 08:13 | <elukey@cumin1003> | END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2051.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED | [production] | 
            
  | 08:10 | <jmm@cumin2002> | START - Cookbook sre.hosts.reimage for host maps2013.codfw.wmnet with OS bookworm | [production] | 
            
  | 08:08 | <jmm@cumin2002> | END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps2012.codfw.wmnet with OS bookworm | [production] | 
            
  | 07:53 | <elukey@cumin1003> | START - Cookbook sre.hosts.provision for host cp2051.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED | [production] | 
            
  | 07:52 | <elukey@cumin1003> | END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED | [production] | 
            
  | 07:49 | <jmm@cumin2002> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2012.codfw.wmnet with reason: host reimage | [production] | 
            
  | 07:43 | <jmm@cumin2002> | START - Cookbook sre.hosts.downtime for 2:00:00 on maps2012.codfw.wmnet with reason: host reimage | [production] | 
            
  | 07:32 | <elukey@cumin1003> | START - Cookbook sre.hosts.provision for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED | [production] | 
            
  | 07:30 | <kharlan@deploy1003> | Finished scap sync-world: Backport for [[gerrit:1189038|hCaptcha: Enable on phase 1 wikis (T402366)]] (duration: 21m 08s) | [production] | 
            
  | 07:25 | <kharlan@deploy1003> | kharlan: Continuing with sync | [production] | 
            
  | 07:24 | <jmm@cumin2002> | START - Cookbook sre.hosts.reimage for host maps2012.codfw.wmnet with OS bookworm | [production] | 
            
  | 07:15 | <kharlan@deploy1003> | kharlan: Backport for [[gerrit:1189038|hCaptcha: Enable on phase 1 wikis (T402366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. | [production] | 
            
  | 07:13 | <jmm@cumin2002> | END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps2011.codfw.wmnet with OS bookworm | [production] | 
            
  | 07:09 | <kharlan@deploy1003> | Started scap sync-world: Backport for [[gerrit:1189038|hCaptcha: Enable on phase 1 wikis (T402366)]] | [production] | 
            
  | 07:02 | <moritzm> | upgrading Envoy on debmonitor T403663 | [production] | 
            
  | 06:54 | <jmm@cumin2002> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2011.codfw.wmnet with reason: host reimage | [production] | 
            
  | 06:50 | <jmm@cumin2002> | START - Cookbook sre.hosts.downtime for 2:00:00 on maps2011.codfw.wmnet with reason: host reimage | [production] | 
            
  | 06:47 | <jmm@cumin2002> | DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps[2012-2014].codfw.wmnet with reason: in setup | [production] | 
            
  | 06:30 | <jmm@cumin2002> | START - Cookbook sre.hosts.reimage for host maps2011.codfw.wmnet with OS bookworm | [production] | 
            
  | 01:12 | <mwpresync@deploy1003> | Finished scap build-images: Publishing wmf/next image (duration: 11m 35s) | [production] | 
            
  | 01:00 | <mwpresync@deploy1003> | Started scap build-images: Publishing wmf/next image | [production] | 
            
  | 00:38 | <rzl@deploy1003> | helmfile [codfw] DONE helmfile.d/services/recommendation-api: apply | [production] | 
            
  | 00:38 | <rzl@deploy1003> | helmfile [codfw] START helmfile.d/services/recommendation-api: apply | [production] |