| 
      
        2021-03-08
      
      §
     | 
  
    
  | 23:22 | 
  <razzi> | 
  rebalance kafka partitions for webrequest_upload partition 12 | 
  [analytics] | 
            
  | 18:49 | 
  <razzi> | 
  rebalance kafka partitions for webrequest_upload partition 11 | 
  [analytics] | 
            
  | 18:11 | 
  <elukey> | 
  drain + reimage an-worker11[15,16] to Buster | 
  [analytics] | 
            
  | 17:12 | 
  <elukey> | 
  drain + reimage an-worker11[13,14] to Buster | 
  [analytics] | 
            
  | 16:17 | 
  <elukey> | 
  drain + reimage an-worker1109/1110 to Buster | 
  [analytics] | 
            
  | 14:54 | 
  <elukey> | 
  drain + reimage an-worker110[7,8] to Buster | 
  [analytics] | 
            
  | 14:52 | 
  <ottomata> | 
  altered topics (eqiad|codfw).mediawiki.client.session_tick to have 2 partitions - T276502 | 
  [analytics] | 
            
  | 13:51 | 
  <elukey> | 
  drain + reimage an-worker110[4,5] to Buster | 
  [analytics] | 
            
  | 10:41 | 
  <elukey> | 
  drain + reimage an-worker1104/1089 to Debian Buster | 
  [analytics] | 
            
  | 09:19 | 
  <elukey> | 
  drain + reimage an-worker108[3,4] to Buster | 
  [analytics] | 
            
  | 08:20 | 
  <elukey> | 
  drain + reimage an-worker108[1,2] to Buster | 
  [analytics] | 
            
  | 07:23 | 
  <elukey> | 
  drain + reimage analytics107[4,5] to Buster | 
  [analytics] | 
            
  
    | 
      
        2021-03-05
      
      §
     | 
  
    
  | 18:30 | 
  <razzi> | 
  run again sudo -i wmf-auto-reimage-host -p T269211 clouddb1021.eqiad.wmnet --new | 
  [analytics] | 
            
  | 18:18 | 
  <razzi> | 
  sudo cookbook sre.dns.netbox -t T269211 "Move clouddb1021 to private vlan" | 
  [analytics] | 
            
  | 18:17 | 
  <razzi> | 
  re-run interface_automation.ProvisionServerNetwork with private vlan | 
  [analytics] | 
            
  | 18:16 | 
  <razzi> | 
  delete non-mgmt interface for clouddb1021 | 
  [analytics] | 
            
  | 17:07 | 
  <razzi> | 
  sudo -i wmf-auto-reimage-host -p T269211 clouddb1021.eqiad.wmnet --new | 
  [analytics] | 
            
  | 16:54 | 
  <razzi> | 
  sudo cookbook sre.dns.netbox -t T269211 "Reimage and rename labsdb1012 to clouddb1021" | 
  [analytics] | 
            
  | 16:52 | 
  <razzi> | 
  run script at https://netbox.wikimedia.org/extras/scripts/interface_automation.ProvisionServerNetwork/ | 
  [analytics] | 
            
  | 16:47 | 
  <razzi> | 
  edit https://netbox.wikimedia.org/dcim/devices/2078/ device name from labsdb1012 to clouddb1021 | 
  [analytics] | 
            
  | 16:30 | 
  <razzi> | 
  delete non-mgmt interfaces for labsdb1012 at https://netbox.wikimedia.org/dcim/devices/2078/interfaces/ | 
  [analytics] | 
            
  | 16:28 | 
  <razzi> | 
  rename https://netbox.wikimedia.org/ipam/ip-addresses/734/ DNS name from labsdb1012.mgmt.eqiad.wmnet to clouddb1021.mgmt.eqiad.wmnet | 
  [analytics] | 
            
  | 16:08 | 
  <razzi> | 
  sudo cookbook sre.hosts.decommission labsdb1012.eqiad.wmnet -t T269211 | 
  [analytics] | 
            
  | 15:52 | 
  <razzi> | 
  stop mariadb on labsdb1012 | 
  [analytics] | 
            
  | 15:39 | 
  <razzi> | 
  rebalance kafka partitions for webrequest_upload partition 10 | 
  [analytics] | 
            
  | 15:07 | 
  <elukey> | 
  drain + reimage analytics1073 and an-worker1086 to Debian Buster | 
  [analytics] | 
            
  | 13:36 | 
  <elukey> | 
  roll restart HDFS Namenodes for the Hadoop cluster to pick up new Xmx settings (https://gerrit.wikimedia.org/r/c/operations/puppet/+/668659) | 
  [analytics] | 
            
  | 10:20 | 
  <elukey> | 
  force run of refinery-druid-drop-public-snapshots to check Druid public's performances | 
  [analytics] | 
            
  | 10:06 | 
  <elukey> | 
  failover HDFS Namenode from 1002 to 1001 (high GC pauses triggered the HDFS zkfc daemon on 1001 and the failover to 1002) | 
  [analytics] | 
            
  | 08:32 | 
  <elukey> | 
  drain + reimage an-worker107[8,9] to Debian Buster (one Journal node included) | 
  [analytics] | 
            
  | 07:22 | 
  <elukey> | 
  drain + reimage analytics107[0-1] to debian buster | 
  [analytics] | 
            
  | 07:13 | 
  <elukey> | 
  add analytis1066 back with /dev/sdb removed | 
  [analytics] | 
            
  | 07:01 | 
  <elukey> | 
  stop hadoop daemons on analytics1066 - disk errors on /dev/sdb after reimage | 
  [analytics] | 
            
  
    | 
      
        2021-03-04
      
      §
     | 
  
    
  | 21:19 | 
  <razzi> | 
  rebalance kafka partitions for webrequest_upload partition 9 | 
  [analytics] | 
            
  | 16:27 | 
  <elukey> | 
  drain + reimage analytics106[8,9] to Debian Buster (one is a journalnode) | 
  [analytics] | 
            
  | 15:12 | 
  <elukey> | 
  drain + reimage analytics106[6,7] to Debian Buster | 
  [analytics] | 
            
  | 14:21 | 
  <elukey> | 
  drain + reimage analytics1065 to Debian Buster | 
  [analytics] | 
            
  | 13:32 | 
  <elukey> | 
  drain + reimage analytics10[63,64] to Debian Buster | 
  [analytics] | 
            
  | 12:48 | 
  <elukey> | 
  drain + reimage analytics10[61,62] to Debian Buster | 
  [analytics] | 
            
  | 10:40 | 
  <elukey> | 
  drain + reimage analytics1059/1060 to Debian Buster | 
  [analytics] | 
            
  | 09:32 | 
  <elukey> | 
  reboot an-worker[1097-1101] (GPU workers) to pick up the new kernel (5.10) | 
  [analytics] | 
            
  | 09:02 | 
  <elukey> | 
  kill/start mediawiki-geoeditors-monthly to apply backtick change (hive script) | 
  [analytics] |