| 
      
        2021-01-07
      
      §
     | 
  
    
  | 18:08 | 
  <elukey> | 
  chown -R /tmp/DataFrameToDruid analytics:druid (was: analytics:hdfs) on hdfs to temporarily unblock Hive2Druid jobs | 
  [analytics] | 
            
  | 16:31 | 
  <elukey> | 
  remove /etc/mysql/conf.d/research-client.cnf from stat100x nodes | 
  [analytics] | 
            
  | 15:40 | 
  <elukey> | 
  deprecate the 'reseachers' posix group for good | 
  [analytics] | 
            
  | 11:24 | 
  <elukey> | 
  execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod -R o-rwx /wmf/data/event_sanitized" to fix some file permissions as well | 
  [analytics] | 
            
  | 10:36 | 
  <elukey> | 
  execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod -R o-rwx /wmf/data/event" on an-master1001 to fix some file permissions (an-launcher executed timers during the past hours without the new umask) - T270629 | 
  [analytics] | 
            
  | 09:37 | 
  <elukey> | 
  forced re-run of monitor_refine_event_failure_flags.service on an-launcher1002 to clear alerts | 
  [analytics] | 
            
  | 08:26 | 
  <joal> | 
  Rerunning 4 failed refine jobs (mediawiki_cirrussearch_request, day=6/hour=20|21, day=7/hour=0|2) | 
  [analytics] | 
            
  | 08:14 | 
  <elukey> | 
  re-enable puppet on an-launcher1002 to apply new refine memory settings | 
  [analytics] | 
            
  | 07:59 | 
  <elukey> | 
  re-enabling all oozie jobs previously suspended | 
  [analytics] | 
            
  | 07:54 | 
  <elukey> | 
  restart oozie on an-coord1001 | 
  [analytics] | 
            
  
    | 
      
        2020-12-22
      
      §
     | 
  
    
  | 19:35 | 
  <elukey> | 
  restart hive daemons on an-coord1001 to pick up new settings | 
  [analytics] | 
            
  | 18:13 | 
  <elukey> | 
  failover analytics-hive.eqiad.wmnet to an-coord1002 (to allow maintenance on an-coord1001) | 
  [analytics] | 
            
  | 18:07 | 
  <elukey> | 
  restart hive server on an-coord1002 (current standby - no traffic) to pick up the new config (use the local metastore as opposed to what it is pointed by analytics-hive) | 
  [analytics] | 
            
  | 17:00 | 
  <mforns> | 
  Deployed refinery as part of weekly train (v0.0.142) | 
  [analytics] | 
            
  | 16:42 | 
  <mforns> | 
  Deployed refinery-source v0.0.142 | 
  [analytics] | 
            
  | 16:30 | 
  <mforns> | 
  Deployed refinery-source v0.0.142 | 
  [analytics] | 
            
  | 15:00 | 
  <razzi> | 
  stopping superset server on analytics-tool1004 | 
  [analytics] | 
            
  | 10:36 | 
  <elukey> | 
  restart presto coordinator to pick up analytics-hive settings | 
  [analytics] | 
            
  | 10:25 | 
  <elukey> | 
  failover analytics-hive.eqiad.wmnet to an-coord1001 | 
  [analytics] | 
            
  | 09:56 | 
  <elukey> | 
  restart hive daemons on an-coord1001 to pick up analytics-hive settings | 
  [analytics] | 
            
  | 07:27 | 
  <elukey> | 
  reboot stat100[4-8] (analytics hadoop clients) for kernel upgrades | 
  [analytics] | 
            
  | 07:23 | 
  <elukey> | 
  move all analytics clients (spark refine, stat100x, hive-site.xml on hdfs, etc..) to analytics-hive.eqiad.wmnet | 
  [analytics] |