| 
      
        2022-01-26
      
      §
     | 
  
    
  | 15:54 | 
  <joal> | 
  Add new CH-UA fields to wmf_raw.webrequest and wmf.webrequest | 
  [analytics] | 
            
  | 15:44 | 
  <joal> | 
  Kill-restart webrequest oozie job after deploy | 
  [analytics] | 
            
  | 15:40 | 
  <joal> | 
  Kill-restart edit-hourly oozie job after deploy | 
  [analytics] | 
            
  | 15:27 | 
  <joal> | 
  Deploy refinery to HDFS | 
  [analytics] | 
            
  | 15:10 | 
  <elukey> | 
  elukey@cp4036:~$ sudo systemctl restart varnishkafka-eventlogging | 
  [analytics] | 
            
  | 15:10 | 
  <elukey> | 
  elukey@cp4036:~$ sudo systemctl restart varnishkafka-statsv | 
  [analytics] | 
            
  | 15:06 | 
  <elukey> | 
  elukey@cp4035:~$ sudo systemctl restart varnishkafka-eventlogging.service - metrics showing messages stuck for a poll() | 
  [analytics] | 
            
  | 14:56 | 
  <elukey> | 
  elukey@cp4035:~$ sudo systemctl restart varnishkafka-webrequest.service - metrics showing messages stuck for a poll() | 
  [analytics] | 
            
  | 14:52 | 
  <joal> | 
  Deploy refinery with scap | 
  [analytics] | 
            
  | 10:07 | 
  <btullis> | 
  btullis@cumin1001:~$ sudo cumin 'O:cache::upload or O:cache::text' 'disable-puppet btullis-T296064-T299401' | 
  [analytics] | 
            
  
    | 
      
        2022-01-24
      
      §
     | 
  
    
  | 21:18 | 
  <btullis> | 
  btullis@deploy1002:/srv/deployment/analytics/refinery$ scap deploy -e hadoop-test -l an-test-coord1001.eqiad.wmnet | 
  [analytics] | 
            
  | 20:35 | 
  <btullis> | 
  rebooting an-test-coord1001 after recreating the /srv/file system. | 
  [analytics] | 
            
  | 20:28 | 
  <btullis> | 
  root@an-test-coord1001:~# mke2fs -t ext4 -j -m 0.5 /dev/vg0/srv | 
  [analytics] | 
            
  | 19:53 | 
  <btullis> | 
  power cycled an-test-coord1001 from racadm | 
  [analytics] | 
            
  | 19:50 | 
  <btullis> | 
  rebooting an-test-coord1001 | 
  [analytics] | 
            
  | 19:19 | 
  <ottomata> | 
  kill mysqld on an-test-coord1001 - 19:19:04 [@an-test-coord1001:/etc] $ sudo kill 42433 | 
  [analytics] | 
            
  | 19:02 | 
  <razzi> | 
  razzi@an-test-coord1001:~$ sudo systemctl stop presto-server | 
  [analytics] | 
            
  | 18:23 | 
  <razzi> | 
  downtime an-coord1001 while attempting to fix /srv partition | 
  [analytics] | 
            
  | 11:48 | 
  <elukey> | 
  roll restart of kafka test brokers to pick up the new keystore/tls-certs (1y of validity) | 
  [analytics] | 
            
  
    | 
      
        2022-01-13
      
      §
     | 
  
    
  | 12:41 | 
  <joal> | 
  rerun failed instances of webrequest-load-coord | 
  [analytics] | 
            
  | 11:59 | 
  <btullis> | 
  stopped eventlogging service on eventlog1003 with 1 hour's downtime. | 
  [analytics] | 
            
  | 11:52 | 
  <btullis> | 
  Upgrading hive packages on stat1005 | 
  [analytics] | 
            
  | 11:26 | 
  <btullis> | 
  restarted hive-metastore and hive-server2 on an-coord1001 after running puppet. | 
  [analytics] | 
            
  | 11:23 | 
  <btullis> | 
  btullis@an-coord1001:~$ sudo apt install hive hive-hcatalog hive-jdbc hive-metastore hive-server2 oozie oozie-client | 
  [analytics] | 
            
  | 11:18 | 
  <btullis> | 
  btullis@an-coord1002:~$ sudo systemctl restart hive-metastore hive-server2 | 
  [analytics] | 
            
  | 09:53 | 
  <btullis> | 
  DNS change deployed, failing over hive to an-coord1002. | 
  [analytics] | 
            
  | 09:42 | 
  <btullis> | 
  btullis@an-coord1002:~$ sudo apt install hive hive-hcatalog hive-jdbc hive-metastore hive-server2 oozie-client | 
  [analytics] | 
            
  | 08:45 | 
  <joal> | 
  Kill-restart wikidata-json_entity-weekly-coord after deploy | 
  [analytics] |