| 
      
        2025-08-26
      
      §
     | 
  
    
  | 01:08 | 
  <ladsgroup@cumin1002> | 
  DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db1244.eqiad.wmnet with reason: Maintenance | 
  [production] | 
            
  | 01:06 | 
  <ladsgroup@cumin1002> | 
  dbctl commit (dc=all): 'Repooling after maintenance db1249 (T391056)', diff saved to https://phabricator.wikimedia.org/P81748 and previous config saved to /var/cache/conftool/dbconfig/20250826-010618-ladsgroup.json | 
  [production] | 
            
  | 00:59 | 
  <ladsgroup@cumin1002> | 
  dbctl commit (dc=all): 'Depooling db1249 (T391056)', diff saved to https://phabricator.wikimedia.org/P81747 and previous config saved to /var/cache/conftool/dbconfig/20250826-005952-ladsgroup.json | 
  [production] | 
            
  | 00:59 | 
  <ladsgroup@cumin1002> | 
  DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1249.eqiad.wmnet with reason: Maintenance | 
  [production] | 
            
  | 00:55 | 
  <ladsgroup@cumin1002> | 
  DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1244.eqiad.wmnet with reason: Maintenance | 
  [production] | 
            
  | 00:50 | 
  <ladsgroup@cumin1002> | 
  DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2207.codfw.wmnet with reason: Maintenance | 
  [production] | 
            
  | 00:49 | 
  <ladsgroup@cumin1002> | 
  DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1222.eqiad.wmnet with reason: Maintenance | 
  [production] | 
            
  | 00:49 | 
  <ladsgroup@cumin1002> | 
  DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2220.codfw.wmnet with reason: Maintenance | 
  [production] | 
            
  | 00:48 | 
  <ladsgroup@cumin1002> | 
  DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1236.eqiad.wmnet with reason: Maintenance | 
  [production] | 
            
  | 00:47 | 
  <ladsgroup@cumin1002> | 
  DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2229.codfw.wmnet with reason: Maintenance | 
  [production] | 
            
  | 00:47 | 
  <ladsgroup@cumin1002> | 
  DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1201.eqiad.wmnet with reason: Maintenance | 
  [production] | 
            
  | 00:39 | 
  <ladsgroup@cumin1002> | 
  DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2205.codfw.wmnet with reason: Maintenance | 
  [production] | 
            
  | 00:36 | 
  <ladsgroup@cumin1002> | 
  DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1223.eqiad.wmnet with reason: Maintenance | 
  [production] | 
            
  | 00:35 | 
  <ladsgroup@cumin1002> | 
  DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2213.codfw.wmnet with reason: Maintenance | 
  [production] | 
            
  | 00:34 | 
  <ladsgroup@cumin1002> | 
  DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1210.eqiad.wmnet with reason: Maintenance | 
  [production] | 
            
  | 00:24 | 
  <ladsgroup@cumin1002> | 
  DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1244.eqiad.wmnet with reason: Maintenance | 
  [production] | 
            
  | 00:16 | 
  <ladsgroup@cumin1002> | 
  END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1244.eqiad.wmnet | 
  [production] | 
            
  | 00:07 | 
  <brett> | 
  Run systemctl reset-failed on disappeared nrpe2nodexp-disk_space.timer units (T395446) | 
  [production] | 
            
  
    | 
      
        2025-08-25
      
      §
     | 
  
    
  | 23:59 | 
  <ladsgroup@cumin1002> | 
  END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1244 - Upgrading db1244.eqiad.wmnet | 
  [production] | 
            
  | 23:59 | 
  <ladsgroup@cumin1002> | 
  START - Cookbook sre.mysql.depool db1244 - Upgrading db1244.eqiad.wmnet | 
  [production] | 
            
  | 23:59 | 
  <ladsgroup@cumin1002> | 
  START - Cookbook sre.mysql.upgrade for db1244.eqiad.wmnet | 
  [production] | 
            
  | 23:48 | 
  <ladsgroup@cumin1002> | 
  dbctl commit (dc=all): 'Depool db1244 T402871', diff saved to https://phabricator.wikimedia.org/P81746 and previous config saved to /var/cache/conftool/dbconfig/20250825-234856-ladsgroup.json | 
  [production] | 
            
  | 23:47 | 
  <ladsgroup@dns1004> | 
  END - running authdns-update | 
  [production] | 
            
  | 23:45 | 
  <ladsgroup@dns1004> | 
  START - running authdns-update | 
  [production] | 
            
  | 23:43 | 
  <ladsgroup@cumin1002> | 
  dbctl commit (dc=all): 'Promote db1160 to s4 primary and set section read-write T402871', diff saved to https://phabricator.wikimedia.org/P81745 and previous config saved to /var/cache/conftool/dbconfig/20250825-234303-ladsgroup.json | 
  [production] | 
            
  | 23:39 | 
  <ladsgroup@cumin1002> | 
  dbctl commit (dc=all): 'Set s4 eqiad as read-only for maintenance - T402871', diff saved to https://phabricator.wikimedia.org/P81744 and previous config saved to /var/cache/conftool/dbconfig/20250825-233934-ladsgroup.json | 
  [production] | 
            
  | 23:39 | 
  <Amir1> | 
  Starting s4 eqiad failover from db1244 to db1160 - T402871 | 
  [production] | 
            
  | 23:31 | 
  <ladsgroup@cumin1002> | 
  dbctl commit (dc=all): 'Set db1160 with weight 0 T402871', diff saved to https://phabricator.wikimedia.org/P81743 and previous config saved to /var/cache/conftool/dbconfig/20250825-233128-ladsgroup.json | 
  [production] | 
            
  | 23:30 | 
  <ladsgroup@cumin1002> | 
  DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s4 T402871 | 
  [production] | 
            
  | 23:23 | 
  <jhathaway@cumin1002> | 
  END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL | 
  [production] | 
            
  | 23:21 | 
  <jhathaway@cumin1002> | 
  START - Cookbook sre.hosts.provision for host sretest1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL | 
  [production] | 
            
  | 23:17 | 
  <jhathaway@cumin1002> | 
  DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on sretest1003.eqiad.wmnet with reason: sleep test | 
  [production] | 
            
  | 23:00 | 
  <maryum> | 
  Deploy security fix for T397396 | 
  [production] | 
            
  | 22:55 | 
  <maryum> | 
  Deploy security fix for T401220 | 
  [production] | 
            
  | 22:28 | 
  <andrew@cloudcumin1001> | 
  END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-81 | 
  [tools] | 
            
  | 22:27 | 
  <maryum> | 
  Deployed security fix for T298690 | 
  [production] | 
            
  | 22:22 | 
  <andrew@cloudcumin1001> | 
  START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-81 | 
  [tools] | 
            
  | 22:20 | 
  <ladsgroup@deploy1003> | 
  Finished scap sync-world: Backport for [[gerrit:1181786|Move update of category members count to a dedicated job (T365303)]] (duration: 12m 26s) | 
  [production] | 
            
  | 22:15 | 
  <ladsgroup@deploy1003> | 
  ladsgroup: Continuing with sync | 
  [production] | 
            
  | 22:14 | 
  <ladsgroup@deploy1003> | 
  ladsgroup: Backport for [[gerrit:1181786|Move update of category members count to a dedicated job (T365303)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. | 
  [production] | 
            
  | 22:08 | 
  <ladsgroup@deploy1003> | 
  Started scap sync-world: Backport for [[gerrit:1181786|Move update of category members count to a dedicated job (T365303)]] | 
  [production] | 
            
  | 22:05 | 
  <ladsgroup@deploy1003> | 
  Sync cancelled. | 
  [production] | 
            
  | 21:53 | 
  <ladsgroup@deploy1003> | 
  ladsgroup: Backport for [[gerrit:1181786|Move update of category members count to a dedicated job (T365303)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. | 
  [production] | 
            
  | 21:47 | 
  <ladsgroup@deploy1003> | 
  Started scap sync-world: Backport for [[gerrit:1181786|Move update of category members count to a dedicated job (T365303)]] | 
  [production] | 
            
  | 21:47 | 
  <sbassett> | 
  Deployed updated security mitigations for T399627 | 
  [production] | 
            
  | 21:23 | 
  <sbassett> | 
  Deployed security mitigations for T402146, T402077, T402095, T400525 | 
  [production] | 
            
  | 21:21 | 
  <jhathaway@cumin1002> | 
  END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL | 
  [production] | 
            
  | 21:19 | 
  <jhathaway@cumin1002> | 
  START - Cookbook sre.hosts.provision for host sretest1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL | 
  [production] | 
            
  | 21:18 | 
  <jhathaway@cumin1002> | 
  END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL | 
  [production] | 
            
  | 21:17 | 
  <andrew@cloudcumin1001> | 
  END (ERROR) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=97) (T401693) | 
  [admin] |