| 
      
        2023-03-08
      
      §
     | 
  
    
  | 11:54 | 
  <ottomata> | 
  Deployed refinery using scap, then deployed onto hdfs | 
  [analytics] | 
            
  | 10:36 | 
  <nfraison> | 
  restart namenode in an-master1002 to take in account new quota init threads setting | 
  [analytics] | 
            
  | 10:25 | 
  <nfraison> | 
  failover namenode in prod from an-master1002-eqiad-wmnet to an-master1001-eqiad-wmnet | 
  [analytics] | 
            
  | 09:59 | 
  <nfraison> | 
  restart namenode in an-master1001 (standby in prod) to take in account new quota init threads setting | 
  [analytics] | 
            
  | 09:53 | 
  <nfraison> | 
  restart namenode in an-test-master1002 to take in account new quota init threads setting | 
  [analytics] | 
            
  | 09:52 | 
  <nfraison> | 
  failover namenode in test from an-test-master1002-eqiad-wmnet to an-test-master1001-eqiad-wmnet | 
  [analytics] | 
            
  | 09:47 | 
  <nfraison> | 
  restart namenode in an-test-master1001 to take in account new quota init threads setting | 
  [analytics] | 
            
  | 09:36 | 
  <nfraison> | 
  restart test hiveserver2: T303168 | 
  [analytics] | 
            
  | 09:13 | 
  <nfraison> | 
  restart prod resourcemanager to take in account new dedicated exclude file | 
  [analytics] | 
            
  | 08:58 | 
  <nfraison> | 
  restart test resourcemanager to take in account new dedicated exclude file | 
  [analytics] | 
            
  | 07:56 | 
  <nfraison> | 
  restart prod jobhistory to take in account: https://gerrit.wikimedia.org/r/c/operations/puppet/+/894481 | 
  [analytics] | 
            
  | 07:47 | 
  <nfraison> | 
  restart test jobhistory to take in account: https://gerrit.wikimedia.org/r/c/operations/puppet/+/894481 | 
  [analytics] | 
            
  
    | 
      
        2023-03-07
      
      §
     | 
  
    
  | 22:03 | 
  <mforns> | 
  deployed airflow analytics again to try and fix druid_load_edit_hourly | 
  [analytics] | 
            
  | 16:55 | 
  <xcollazo> | 
  deployed image-suggestions hotfix to platform_eng Airflow instance. See https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/262. | 
  [analytics] | 
            
  | 15:23 | 
  <btullis> | 
  re-enabling ingestion via gobblin. | 
  [analytics] | 
            
  | 14:59 | 
  <nfraison> | 
  force startup of nodemanager on analytics_cluster | 
  [analytics] | 
            
  | 14:58 | 
  <btullis> | 
  pooled druid1004 | 
  [analytics] | 
            
  | 14:57 | 
  <btullis> | 
  pooling aqs1010 and aqs1016 | 
  [analytics] | 
            
  | 14:56 | 
  <btullis> | 
  pooling datahubsearch1001 | 
  [analytics] | 
            
  | 14:53 | 
  <btullis> | 
  leaving safe mode on hdfs | 
  [analytics] | 
            
  | 13:59 | 
  <btullis> | 
  disabled puppet temporarily on an-master100[1-2] to avoid an automatic restart of yarn | 
  [analytics] | 
            
  | 13:57 | 
  <btullis> | 
  stopped `hadoop-yarn-resourcemanager.service` on both an-master100[1-2] | 
  [analytics] | 
            
  | 13:54 | 
  <btullis> | 
  entering safe mode with `sudo -u hdfs kerberos-run-command hdfs hadoop dfsadmin -safemode enter` on an-master1002 | 
  [analytics] | 
            
  | 12:57 | 
  <btullis> | 
  depooled druid1004 for T329073 | 
  [analytics] | 
            
  | 12:56 | 
  <btullis> | 
  depooled datahubsearch1001 for T329073 | 
  [analytics] | 
            
  | 12:51 | 
  <btullis> | 
  disabled gobblin timers on an-launcher1002 | 
  [analytics] | 
            
  | 12:46 | 
  <btullis> | 
  depooling aqs1016for T329073 | 
  [analytics] | 
            
  | 12:45 | 
  <btullis> | 
  depooling aqs1010 for T329073 | 
  [analytics] |