| 2021-03-25
      
      § | 
    
  | 19:30 | <bstorm> | forced deletion of all jobs stuck in a deleting state T277653 | [tools] | 
            
  | 17:46 | <arturo> | rebooting tools-sgeexec-* nodes to account for new grid master (T277653) | [tools] | 
            
  | 16:20 | <arturo> | rebuilding tools-sgegrid-master VM as debian buster (T277653) | [tools] | 
            
  | 16:18 | <arturo> | icinga-downtime toolschecker for 2h | [tools] | 
            
  | 16:05 | <bstorm> | failed over the tools grid to the shadow master T277653 | [tools] | 
            
  | 13:36 | <arturo> | shutdown tools-sge-services-03 (T278354) | [tools] | 
            
  | 13:33 | <arturo> | shutdown tools-sge-services-04 (T278354) | [tools] | 
            
  | 13:31 | <arturo> | point aptly clients to `tools-services-05.tools.eqiad1.wikimedia.cloud` (hiera change) (T278354) | [tools] | 
            
  | 12:58 | <arturo> | created VM `tools-services-05` as Debian Buster (T278354) | [tools] | 
            
  | 12:51 | <arturo> | create cinder volume `tools-aptly-data` (T278354) | [tools] | 
            
  
    | 2021-03-24
      
      § | 
    
  | 12:46 | <arturo> | shutoff the old stretch VMs `tools-docker-registry-03` and `tools-docker-registry-04` (T278303) | [tools] | 
            
  | 12:38 | <arturo> | associate floating IP 185.15.56.67 with `tools-docker-registry-05` and refresh FQDN docker-registry.tools.wmflabs.org accordingly (T278303) | [tools] | 
            
  | 12:33 | <arturo> | attach cinder volume `tools-docker-registry-data` to VM `tools-docker-registry-05` (T278303) | [tools] | 
            
  | 12:32 | <arturo> | snapshot cinder volume `tools-docker-registry-data` into `tools-docker-registry-data-stretch-migration` (T278303) | [tools] | 
            
  | 12:32 | <arturo> | bump cinder storage quota from 80G to 400G (without quota request task) | [tools] | 
            
  | 12:11 | <arturo> | created VM `tools-docker-registry-06` as Debian Buster (T278303) | [tools] | 
            
  | 12:09 | <arturo> | dettach cinder volume `tools-docker-registry-data` (T278303) | [tools] | 
            
  | 11:46 | <arturo> | attach cinder volume `tools-docker-registry-data` to VM `tools-docker-registry-03` to format it and pre-populate it with registry data (T278303) | [tools] | 
            
  | 11:20 | <arturo> | created 80G cinder volume tools-docker-registry-data (T278303) | [tools] | 
            
  | 11:10 | <arturo> | starting VM tools-docker-registry-04 which was stopped probably since 2021-03-09 due to hypervisor draining | [tools] | 
            
  
    | 2021-03-18
      
      § | 
    
  | 19:24 | <bstorm> | set profile::toolforge::infrastructure across the entire project with login_server set on the bastion and exec node-related prefixes | [tools] | 
            
  | 16:21 | <andrewbogott> | enabling puppet tools-wide | [tools] | 
            
  | 16:20 | <andrewbogott> | disabling puppet tools-wide to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/672456 | [tools] | 
            
  | 16:19 | <bstorm> | added profile::toolforge::infrastructure class to puppetmaster T277756 | [tools] | 
            
  | 04:12 | <bstorm> | rebooted tools-sgeexec-0935.tools.eqiad.wmflabs because it forgot how to LDAP...likely root cause of the issues tonight | [tools] | 
            
  | 03:59 | <bstorm> | rebooting grid master. sorry for the cron spam | [tools] | 
            
  | 03:49 | <bstorm> | restarting sssd on tools-sgegrid-master | [tools] | 
            
  | 03:37 | <bstorm> | deleted a massive number of stuck jobs that misfired from the cron server | [tools] | 
            
  | 03:35 | <bstorm> | rebooting tools-sgecron-01 to try to clear up the ldap-related errors coming out of it | [tools] | 
            
  | 01:46 | <bstorm> | killed the toolschecker cron job, which had an LDAP error, and ran it again by hand | [tools] |