| 
      
        2018-01-18
      
      §
     | 
  
    
  | 19:11 | 
  <joal> | 
  Kill-Restart coord_pageviews_top_bycountry_monthly ooie job from 2015-05 | 
  [analytics] | 
            
  | 19:10 | 
  <joal> | 
  Add fake data to cassandra to silent alarms (Thanks again ema) | 
  [analytics] | 
            
  | 18:56 | 
  <joal> | 
  Truncating table "local_group_default_T_top_bycountry"."data" in cassandra before reload | 
  [analytics] | 
            
  | 15:21 | 
  <mforns> | 
  refinery deployment using scap and then deploying onto hdfs finished | 
  [analytics] | 
            
  | 15:07 | 
  <mforns> | 
  starting refinery deployment | 
  [analytics] | 
            
  | 12:43 | 
  <elukey> | 
  piwik on bohrium re-enabled | 
  [analytics] | 
            
  | 12:40 | 
  <elukey> | 
  set piwik in readonly mode and stopped mysql on bohrium (prep step for reboot) | 
  [analytics] | 
            
  | 09:38 | 
  <elukey> | 
  reboot thorium (analytics webserver) for security upgrade - This maintenance will cause temporary unavailability of the Analytics websites | 
  [analytics] | 
            
  | 09:37 | 
  <elukey> | 
  resumed druid hourly index jobs via hue and restored pivot's configuration | 
  [analytics] | 
            
  | 09:21 | 
  <elukey> | 
  reboot druid1001 for kernel upgrades | 
  [analytics] | 
            
  | 09:00 | 
  <elukey> | 
  suspended hourly druid batch index jobs via Hue | 
  [analytics] | 
            
  | 08:58 | 
  <elukey> | 
  temporarily set druid1002 in superset's druid cluster config (via UI) | 
  [analytics] | 
            
  | 08:53 | 
  <elukey> | 
  temporarily point pivot's configuration to druid1002 (druid1001 needs to be rebooted) | 
  [analytics] | 
            
  | 08:52 | 
  <elukey> | 
  disable druid1001's middlemanager as prep step for reboot | 
  [analytics] | 
            
  | 07:11 | 
  <elukey> | 
  re-run webrequest-load-wf-misc-2018-1-18-3 via Hue | 
  [analytics] | 
            
  
    | 
      
        2018-01-17
      
      §
     | 
  
    
  | 17:33 | 
  <elukey> | 
  killed the banner impression spark job (application_1515441536446_27293) again to force it to respawn (real time indexers not present) | 
  [analytics] | 
            
  | 17:29 | 
  <elukey> | 
  restarted all druid overlords on druid100[123] (weird race condition messages about who was the leader for some task) | 
  [analytics] | 
            
  | 16:24 | 
  <elukey> | 
  re-run all the pageview-druid-hourly failed jobs via Hue | 
  [analytics] | 
            
  | 14:42 | 
  <elukey> | 
  restart druid middlemanager on druid1003 as attempt to unblock realtime streaming | 
  [analytics] | 
            
  | 14:21 | 
  <elukey> | 
  forced kill of banner impression data streaming job to get it restarted | 
  [analytics] | 
            
  | 11:44 | 
  <elukey> | 
  re-run pageview-druid-hourly-wf-2018-1-17-9 and pageview-druid-hourly-wf-2018-1-17-8 (failed due to druid1002's middlemanager being in a weird state after reboot) | 
  [analytics] | 
            
  | 11:44 | 
  <elukey> | 
  restart druid middlemanager on druid1002 | 
  [analytics] | 
            
  | 10:38 | 
  <elukey> | 
  stopped all crons on hadoop-coordinator-1 | 
  [analytics] | 
            
  | 10:37 | 
  <elukey> | 
  re-run webrequest-druid-hourly-wf-2018-1-17-8 (failed due to druid1002's reboot) | 
  [analytics] | 
            
  | 10:22 | 
  <elukey> | 
  reboot druid1002 for kernel upgrades | 
  [analytics] | 
            
  | 09:53 | 
  <elukey> | 
  disable druid middlemanager on druid1002 as prep step for reboot | 
  [analytics] | 
            
  | 09:46 | 
  <elukey> | 
  rebooted analytics1003 | 
  [analytics] | 
            
  | 09:46 | 
  <elukey> | 
  removed upstart config for brrd on eventlog1001 (failing and spamming syslog, old leftover?) | 
  [analytics] | 
            
  | 08:53 | 
  <elukey> | 
  disabled camus as prep step for analytics1003 reboot | 
  [analytics] |