| 
      
        2025-08-26
      
      ยง
     | 
  
    
  | 13:49 | 
  <mvernon@cumin2002> | 
  START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad | 
  [production] | 
            
  | 13:48 | 
  <dcausse@deploy1003> | 
  dcausse: Backport for [[gerrit:1182088|Revert "NetworkSession: Only enable for private wikis" (T373826)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. | 
  [production] | 
            
  | 13:48 | 
  <jclark@cumin1002> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1052.eqiad.wmnet with reason: host reimage | 
  [production] | 
            
  | 13:43 | 
  <fceratto@cumin1002> | 
  dbctl commit (dc=all): 'Repooling after maintenance db2151 (T401906)', diff saved to https://phabricator.wikimedia.org/P81764 and previous config saved to /var/cache/conftool/dbconfig/20250826-134311-fceratto.json | 
  [production] | 
            
  | 13:42 | 
  <dcausse@deploy1003> | 
  Started scap sync-world: Backport for [[gerrit:1182088|Revert "NetworkSession: Only enable for private wikis" (T373826)]] | 
  [production] | 
            
  | 13:42 | 
  <jclark@cumin1002> | 
  START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1052.eqiad.wmnet with reason: host reimage | 
  [production] | 
            
  | 13:42 | 
  <dcaro> | 
  extended object storage quota to 100G (T402923) | 
  [tools] | 
            
  | 13:42 | 
  <fceratto@cumin1002> | 
  dbctl commit (dc=all): 'Depooling db2151 (T401906)', diff saved to https://phabricator.wikimedia.org/P81763 and previous config saved to /var/cache/conftool/dbconfig/20250826-134201-fceratto.json | 
  [production] | 
            
  | 13:41 | 
  <fceratto@cumin1002> | 
  DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2151.codfw.wmnet with reason: Maintenance | 
  [production] | 
            
  | 13:40 | 
  <mvernon@cumin2002> | 
  END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-fe[1017-1020].eqiad.wmnet | 
  [production] | 
            
  | 13:40 | 
  <mvernon@cumin2002> | 
  START - Cookbook sre.hosts.remove-downtime for ms-fe[1017-1020].eqiad.wmnet | 
  [production] | 
            
  | 13:35 | 
  <lucaswerkmeister-wmde@deploy1003> | 
  mwscript-k8s job started: CentralAuth:FixRenamedUserGlobalEditCount metawiki  # T313900 (dry run) | 
  [production] | 
            
  | 13:35 | 
  <stevemunene@cumin1003> | 
  END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons. | 
  [production] | 
            
  | 13:34 | 
  <mvernon@cumin2002> | 
  DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-fe[1017-1020].eqiad.wmnet with reason: reboot before bringing into service | 
  [production] | 
            
  | 13:33 | 
  <lucaswerkmeister-wmde@deploy1003> | 
  Finished scap sync-world: Backport for [[gerrit:1181782|PHPSessionHandler: Better handle objects stored in the session (T402602)]], [[gerrit:1181788|Add maint script to fix global edit count of renamed users (T313900)]], [[gerrit:1181789|Add maint script to fix wrong actors in local log entries for global renames (T398177)]] (duration: 12m 54s) | 
  [production] | 
            
  | 13:28 | 
  <jhancock@cumin1003> | 
  END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host frmx2002 | 
  [production] | 
            
  | 13:28 | 
  <jhancock@cumin1003> | 
  START - Cookbook sre.network.configure-switch-interfaces for host frmx2002 | 
  [production] | 
            
  | 13:28 | 
  <lucaswerkmeister-wmde@deploy1003> | 
  matmarex, lucaswerkmeister-wmde: Continuing with sync | 
  [production] | 
            
  | 13:28 | 
  <jhancock@cumin1003> | 
  END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es2039 | 
  [production] | 
            
  | 13:28 | 
  <jhancock@cumin1003> | 
  START - Cookbook sre.network.configure-switch-interfaces for host es2039 | 
  [production] | 
            
  | 13:27 | 
  <jhancock@cumin1003> | 
  END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | 
  [production] | 
            
  | 13:26 | 
  <lucaswerkmeister-wmde@deploy1003> | 
  matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1181782|PHPSessionHandler: Better handle objects stored in the session (T402602)]], [[gerrit:1181788|Add maint script to fix global edit count of renamed users (T313900)]], [[gerrit:1181789|Add maint script to fix wrong actors in local log entries for global renames (T398177)]] synced to the testservers (see https://wikitech.wikim | 
  [production] | 
            
  | 13:26 | 
  <jmm@deploy1003> | 
  helmfile [eqiad] DONE helmfile.d/services/thumbor: apply | 
  [production] | 
            
  | 13:24 | 
  <jhancock@cumin1003> | 
  START - Cookbook sre.dns.netbox | 
  [production] | 
            
  | 13:24 | 
  <jhancock@cumin1003> | 
  END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | 
  [production] | 
            
  | 13:21 | 
  <jhancock@cumin1003> | 
  START - Cookbook sre.dns.netbox | 
  [production] | 
            
  | 13:20 | 
  <lucaswerkmeister-wmde@deploy1003> | 
  Started scap sync-world: Backport for [[gerrit:1181782|PHPSessionHandler: Better handle objects stored in the session (T402602)]], [[gerrit:1181788|Add maint script to fix global edit count of renamed users (T313900)]], [[gerrit:1181789|Add maint script to fix wrong actors in local log entries for global renames (T398177)]] | 
  [production] | 
            
  | 13:20 | 
  <jmm@deploy1003> | 
  helmfile [eqiad] START helmfile.d/services/thumbor: apply | 
  [production] | 
            
  | 13:11 | 
  <jclark@cumin1002> | 
  START - Cookbook sre.hosts.reimage for host cloudcephosd1052.eqiad.wmnet with OS bullseye | 
  [production] | 
            
  | 13:09 | 
  <jclark@cumin1002> | 
  END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1052.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED | 
  [production] | 
            
  | 13:06 | 
  <jmm@deploy1003> | 
  helmfile [codfw] DONE helmfile.d/services/thumbor: apply | 
  [production] | 
            
  | 13:02 | 
  <jmm@deploy1003> | 
  helmfile [codfw] START helmfile.d/services/thumbor: apply | 
  [production] | 
            
  | 13:02 | 
  <stevemunene> | 
  restart analytics druid jvm to pick up the newly decommissioned hosts and broken links T402814 | 
  [analytics] | 
            
  | 12:57 | 
  <jmm@deploy1003> | 
  helmfile [staging] DONE helmfile.d/services/thumbor: apply | 
  [production] | 
            
  | 12:56 | 
  <jmm@deploy1003> | 
  helmfile [staging] START helmfile.d/services/thumbor: apply | 
  [production] | 
            
  | 12:55 | 
  <jclark@cumin1002> | 
  START - Cookbook sre.hosts.provision for host cloudcephosd1052.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED | 
  [production] | 
            
  | 12:54 | 
  <jclark@cumin1002> | 
  END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | 
  [production] | 
            
  | 12:54 | 
  <jclark@cumin1002> | 
  END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dnscloudcephosd1052  - jclark@cumin1002" | 
  [production] | 
            
  | 12:54 | 
  <jclark@cumin1002> | 
  START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dnscloudcephosd1052  - jclark@cumin1002" | 
  [production] | 
            
  | 12:50 | 
  <jclark@cumin1002> | 
  START - Cookbook sre.dns.netbox | 
  [production] | 
            
  | 12:48 | 
  <stevemunene@cumin1003> | 
  START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons. | 
  [production] | 
            
  | 12:16 | 
  <dbrant@deploy1003> | 
  helmfile [codfw] DONE helmfile.d/services/mobileapps: apply | 
  [production] | 
            
  | 12:15 | 
  <dbrant@deploy1003> | 
  helmfile [codfw] START helmfile.d/services/mobileapps: apply | 
  [production] | 
            
  | 12:15 | 
  <dbrant@deploy1003> | 
  helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply | 
  [production] | 
            
  | 12:14 | 
  <dbrant@deploy1003> | 
  helmfile [eqiad] START helmfile.d/services/mobileapps: apply | 
  [production] | 
            
  | 12:12 | 
  <dbrant@deploy1003> | 
  helmfile [staging] DONE helmfile.d/services/mobileapps: apply | 
  [production] | 
            
  | 12:11 | 
  <dbrant@deploy1003> | 
  helmfile [staging] START helmfile.d/services/mobileapps: apply | 
  [production] | 
            
  | 11:55 | 
  <Daimona> | 
  Running queries from T402239#11118710 in x1.wikishared to fix broken event addresses (again) | 
  [production] | 
            
  | 11:25 | 
  <ladsgroup@cumin1002> | 
  DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on es2039.codfw.wmnet with reason: Glow up (T399927) | 
  [production] | 
            
  | 11:25 | 
  <ladsgroup@cumin1002> | 
  DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on es1039.eqiad.wmnet with reason: Glow up (T399927) | 
  [production] |