| 
      
        2021-04-14
      
      §
     | 
  
    
  | 14:05 | 
  <elukey> | 
  run build/env/bin/hue migrate on an-tool1009 after the hue upgade | 
  [analytics] | 
            
  | 13:10 | 
  <elukey> | 
  rollback hue-next to 4.8 - issues not present in staging | 
  [analytics] | 
            
  | 13:00 | 
  <elukey> | 
  upgrade Hue to 4.9 on an-tool1009 - hue-next.wikimedia.org | 
  [analytics] | 
            
  | 10:02 | 
  <elukey> | 
  roll restart yarn nodemanagers on hadoop prod (attempt to see if they entered in a weird state, graceful restart) | 
  [analytics] | 
            
  | 09:54 | 
  <elukey> | 
  kill long running mediawiki-job refine erroring out application_1615988861843_166906 | 
  [analytics] | 
            
  | 09:46 | 
  <elukey> | 
  kill application_1615988861843_163186 for the same reason | 
  [analytics] | 
            
  | 09:43 | 
  <elukey> | 
  kill application_1615988861843_164387 to see if any improvement to socket consumption is made | 
  [analytics] | 
            
  | 09:14 | 
  <elukey> | 
  run "sudo kill `pgrep -f sqoop`" on an-launcher1002 to clean up old test processes still running | 
  [analytics] | 
            
  
    | 
      
        2021-04-08
      
      §
     | 
  
    
  | 16:33 | 
  <elukey> | 
  reboot an-worker1100 again to check if all the disks come up correctly | 
  [analytics] | 
            
  | 15:43 | 
  <razzi> | 
  rebalance kafka partitions for webrequest_text partitions 17, 18 | 
  [analytics] | 
            
  | 15:35 | 
  <elukey> | 
  reboot an-worker1100 to see if it helps with the strange BBU behavior in T279475 | 
  [analytics] | 
            
  | 14:07 | 
  <elukey> | 
  drop /var/spool/rsyslog from stat1008 - corrupted files due to root partition filled up caused a SEGV for rsyslog | 
  [analytics] | 
            
  | 11:14 | 
  <hnowlan> | 
  created aqs user and loaded full schemas into analytics wmcs cassandra | 
  [analytics] | 
            
  | 08:35 | 
  <elukey> | 
  apt-get clean on stat1008 to free some space | 
  [analytics] | 
            
  | 07:44 | 
  <elukey> | 
  restart hadoop hdfs masters on an-master100[1,2] to apply the new log4j settings fro the audit log | 
  [analytics] | 
            
  | 06:44 | 
  <elukey> | 
  re-deployed refinery to hadoop-test after fixing permissions on an-test-coord1001 | 
  [analytics] | 
            
  
    | 
      
        2021-04-07
      
      §
     | 
  
    
  | 23:03 | 
  <ottomata> | 
  installing anaconda-wmf-2020.02~wmf5 on remaining  nodes - T279480 | 
  [analytics] | 
            
  | 22:51 | 
  <ottomata> | 
  installing anaconda-wmf-2020.02~wmf5 on stat boxes - T279480 | 
  [analytics] | 
            
  | 22:47 | 
  <mforns> | 
  finished refinery deployment up to 1dbbd3dfa996d2e970eb1cbc0a63d53040d4e3a3 | 
  [analytics] | 
            
  | 22:39 | 
  <mforns> | 
  deployment of refinery via scap to hadoop-test failed with Permission denied: '/srv/deployment/analytics/refinery-cache/.config' (deployemt to production went fine) | 
  [analytics] | 
            
  | 21:44 | 
  <mforns> | 
  starting refinery deploy up to 1dbbd3dfa996d2e970eb1cbc0a63d53040d4e3a3 | 
  [analytics] | 
            
  | 21:26 | 
  <mforns> | 
  deployed refinery-source v0.1.4 | 
  [analytics] | 
            
  | 21:25 | 
  <razzi> | 
  sudo apt-get install --reinstall sudo apt-get install --reinstall anaconda-wmf on stat1008 | 
  [analytics] | 
            
  | 20:15 | 
  <razzi> | 
  rebalance kafka partitions for webrequest_text partitions 15, 16 | 
  [analytics] | 
            
  | 19:53 | 
  <ottomata> | 
  upgrade anaconda-wmf everywhere to 2020.02~wmf4 with fixes for T279480 | 
  [analytics] | 
            
  | 14:03 | 
  <hnowlan> | 
  setting profile::aqs::git_deploy: true in aqs-test1001 hiera config | 
  [analytics] |