| 
      
        2023-07-05
      
      §
     | 
  
    
  | 14:36 | 
  <stevemunene> | 
  enable puppet on analytics1069 to get the host back into puppetdb and hence allow the the decommission cookbook run later | 
  [analytics] | 
            
  | 11:47 | 
  <btullis> | 
  restarted archiva for T329716 | 
  [analytics] | 
            
  | 11:45 | 
  <btullis> | 
  restarted hive-servers2 and hive-metastore service on an-coord1002 | 
  [analytics] | 
            
  | 11:40 | 
  <btullis> | 
  roll-restarting kafka-jumbo brokers for T329716 | 
  [analytics] | 
            
  | 11:01 | 
  <btullis> | 
  roll-restarting the presto workers for T329716 | 
  [analytics] | 
            
  | 10:20 | 
  <btullis> | 
  deploying updated spark3 defaults to disable the `spark.shuffle.useOldFetchProtocol`option for T332765 | 
  [analytics] | 
            
  | 09:45 | 
  <btullis> | 
  failing back namenode to an-master1001 with `sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet` on an-master1001 | 
  [analytics] | 
            
  | 09:38 | 
  <btullis> | 
  re-enabled gobblin jobs on an-launcher1002 | 
  [analytics] | 
            
  | 09:03 | 
  <btullis> | 
  switching yarn shuffler  - running puppet on 87 worker nodes | 
  [analytics] | 
            
  | 08:44 | 
  <btullis> | 
  disabled gobblin and spark jobs on an-launcher for T332765 | 
  [analytics] | 
            
  | 08:33 | 
  <btullis> | 
  disabled gobblin jobs with  https://gerrit.wikimedia.org/r/c/operations/puppet/+/935425 | 
  [analytics] | 
            
  | 08:27 | 
  <btullis> | 
  roll-restarting hadoop workers in the test cluster | 
  [analytics] | 
            
  
    | 
      
        2023-06-27
      
      §
     | 
  
    
  | 14:53 | 
  <mforns> | 
  deployed airflow analytics to unbreak DataHub's Druid ingestion | 
  [analytics] | 
            
  | 13:32 | 
  <joal> | 
  Rerun druid_load_pageviews_hourly_aggregated_daily after deploy | 
  [analytics] | 
            
  | 13:32 | 
  <joal> | 
  druid_load_pageviews_hourly_aggregated_dailyRerun | 
  [analytics] | 
            
  | 13:25 | 
  <joal> | 
  Deploy Airflow | 
  [analytics] | 
            
  | 11:10 | 
  <joal> | 
  Deploy refinery onto HDFS | 
  [analytics] | 
            
  | 11:01 | 
  <stevemunene> | 
  upgrading an-test-worker1003 to bullseye, keeping `/srv/hadoop` intact | 
  [analytics] | 
            
  | 10:55 | 
  <joal> | 
  Deploy refinery using scap | 
  [analytics] | 
            
  | 09:42 | 
  <stevemunene> | 
  !log run puppet on hadoop-masters this does a refresh of the hdfs nodes | 
  [analytics] | 
            
  | 09:38 | 
  <stevemunene> | 
  Exclude analytics1061_1069 from HDFS and YARN | 
  [analytics] | 
            
  | 09:21 | 
  <btullis> | 
  upgrading an-test-worker1002 to bullseye, keeping `/srv/hadoop` intact | 
  [analytics] | 
            
  | 08:38 | 
  <elukey> | 
  revoked puppet cert for 'varnishkafka' and cleaned up its cergen's files in puppet private | 
  [analytics] | 
            
  | 07:14 | 
  <elukey> | 
  `sudo kill `pgrep -u paramd`` on stat1005 to unblock puppet | 
  [analytics] |