| 2023-03-08
      
      § | 
    
  | 09:59 | <nfraison> | restart namenode in an-master1001 (standby in prod) to take in account new quota init threads setting | [analytics] | 
            
  | 09:53 | <nfraison> | restart namenode in an-test-master1002 to take in account new quota init threads setting | [analytics] | 
            
  | 09:52 | <nfraison> | failover namenode in test from an-test-master1002-eqiad-wmnet to an-test-master1001-eqiad-wmnet | [analytics] | 
            
  | 09:47 | <nfraison> | restart namenode in an-test-master1001 to take in account new quota init threads setting | [analytics] | 
            
  | 09:36 | <nfraison> | restart test hiveserver2: T303168 | [analytics] | 
            
  | 09:13 | <nfraison> | restart prod resourcemanager to take in account new dedicated exclude file | [analytics] | 
            
  | 08:58 | <nfraison> | restart test resourcemanager to take in account new dedicated exclude file | [analytics] | 
            
  | 07:56 | <nfraison> | restart prod jobhistory to take in account: https://gerrit.wikimedia.org/r/c/operations/puppet/+/894481 | [analytics] | 
            
  | 07:47 | <nfraison> | restart test jobhistory to take in account: https://gerrit.wikimedia.org/r/c/operations/puppet/+/894481 | [analytics] | 
            
  
    | 2023-03-07
      
      § | 
    
  | 22:03 | <mforns> | deployed airflow analytics again to try and fix druid_load_edit_hourly | [analytics] | 
            
  | 16:55 | <xcollazo> | deployed image-suggestions hotfix to platform_eng Airflow instance. See https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/262. | [analytics] | 
            
  | 15:23 | <btullis> | re-enabling ingestion via gobblin. | [analytics] | 
            
  | 14:59 | <nfraison> | force startup of nodemanager on analytics_cluster | [analytics] | 
            
  | 14:58 | <btullis> | pooled druid1004 | [analytics] | 
            
  | 14:57 | <btullis> | pooling aqs1010 and aqs1016 | [analytics] | 
            
  | 14:56 | <btullis> | pooling datahubsearch1001 | [analytics] | 
            
  | 14:53 | <btullis> | leaving safe mode on hdfs | [analytics] | 
            
  | 13:59 | <btullis> | disabled puppet temporarily on an-master100[1-2] to avoid an automatic restart of yarn | [analytics] | 
            
  | 13:57 | <btullis> | stopped `hadoop-yarn-resourcemanager.service` on both an-master100[1-2] | [analytics] | 
            
  | 13:54 | <btullis> | entering safe mode with `sudo -u hdfs kerberos-run-command hdfs hadoop dfsadmin -safemode enter` on an-master1002 | [analytics] | 
            
  | 12:57 | <btullis> | depooled druid1004 for T329073 | [analytics] | 
            
  | 12:56 | <btullis> | depooled datahubsearch1001 for T329073 | [analytics] | 
            
  | 12:51 | <btullis> | disabled gobblin timers on an-launcher1002 | [analytics] | 
            
  | 12:46 | <btullis> | depooling aqs1016for T329073 | [analytics] | 
            
  | 12:45 | <btullis> | depooling aqs1010 for T329073 | [analytics] | 
            
  | 08:00 | <nfraison> | Reimage an-conf1003 to upgrade to bullseye T329362 | [analytics] | 
            
  
    | 2023-03-01
      
      § | 
    
  | 22:45 | <mforns> | re-deployed airflow analytics with some forgotten changes | [analytics] | 
            
  | 22:42 | <mforns> | deployed Airflow analytics | [analytics] | 
            
  | 22:30 | <mforns> | finished refinery deployment, although didn't manage to run refinery-deploy-to-hdfs without warnings... | [analytics] | 
            
  | 21:48 | <mforns> | kill edit-hourly-coord in Hue to migrate it to Airflow | [analytics] | 
            
  | 21:26 | <mforns> | starting refinery deploy | [analytics] | 
            
  | 19:38 | <SandraEbele> | rerunning webrequest load text for 2023-03-01-08 hour. | [analytics] | 
            
  | 18:54 | <joal> | Create empty partitions in event.mediawiki_page_move table for codfw datacenter from beginning of week (2023-02-27T00 -> 2023-02-28T13) | [analytics] | 
            
  | 10:25 | <nfraison> | rebooting an-worker1132 being slower than other node (potential issue with raid card/disks) | [analytics] | 
            
  | 07:59 | <nfraison> | restarted hiveserver2 in analytics-test to take in account -XX:MaxMetaspaceSize=512m JVM parameter | [analytics] |