551-600 of 5989 results (32ms)
2023-08-24 §
18:20 <btullis> attempting another failback of the hadoop namenode services [analytics]
16:47 <btullis> start hadoop namenode on an-master1001 after crash. [analytics]
16:46 <btullis> failback unsuccessful. namenode services still running on an-master1002. [analytics]
16:43 <btullis> going for failback of HDFS namenode service from an-master1002 to an-master1001 [analytics]
16:10 <btullis> about to reboot an-master1001 [analytics]
16:09 <btullis> failing over yarn resourcemanager to an-master1002 [analytics]
16:07 <btullis> failing over hdfs namenode from an-master1001 to an-master1002 [analytics]
12:40 <btullis> rebooting an-coord1001 [analytics]
12:08 <btullis> failing over hive to an-coord1002 in advance of reboot of an-coord1001 [analytics]
11:24 <btullis> btullis@cp3074:~$ sudo systemctl start varnishkafka-webrequest.service [analytics]
2023-08-23 §
14:50 <btullis> rebooting an-launcher1002 [analytics]
08:22 <btullis> beginning a rolling reboot of kafka-jumbo [analytics]
2023-08-22 §
17:24 <joal> Redeploying refinery onto Hadoop-test to try to fix jar issue [analytics]
14:29 <gmodena> deploying refinery with hdfs [analytics]
14:08 <gmodena> deploying refinery using scap [analytics]
13:03 <btullis> deploying the change to the yarn log retention and compression for T342923 [analytics]
2023-08-17 §
15:12 <btullis> failing hive back to an-coord1001 following maintenance [analytics]
14:59 <btullis> restarting hive-server2 and hive-metastore services on an-coord1001 after failover. [analytics]
14:49 <btullis> failing over hive to an-coord1002 to permit restart of hive on an-coord1001 [analytics]
09:29 <btullis> deploying airflow-analytics [analytics]
2023-08-16 §
17:06 <btullis> aqs deploy completed successfully. [analytics]
17:05 <btullis> re-ran efine_eventlogging_analytics failed job and sent follow-up email. [analytics]
16:52 <btullis> deploying aqs again [analytics]
16:43 <btullis> deploying aqs [analytics]
2023-08-14 §
09:27 <btullis> rebooted an-worker1124 due to CPU lockups [analytics]
2023-08-12 §
14:16 <btullis> re-ran refine_event job for 'mediawiki_revision_create|mediawiki_page_create' [analytics]
2023-08-10 §
16:59 <btullis> re-enabled airflow jobs on analytics_test instance [analytics]
08:58 <btullis> rebooting an-db1001 [analytics]
08:57 <btullis> stopped all airflow-scheduler services [analytics]
08:57 <btullis> paused all dags on all airflow instances [analytics]
2023-08-09 §
14:22 <btullis> failing over namenode on test cluster from an-test-master1001 to an-test-master1002 after upgrade of an-test-master1002 to bullseye [analytics]
11:31 <btullis> I did systemctl reset-failed logrotate.service on datahubsearch1002 [analytics]
11:08 <btullis> starting hadoop-hdfs-namenode.service on an-master1002 [analytics]
11:02 <btullis> failing over namenode services to an-master1002 so that I can reboot an-master1001 [analytics]
09:49 <btullis> restarted systemd-timedate service on an-worker1086 [analytics]
2023-08-07 §
17:09 <btullis> deploying new mediawiki_history snapshot to AQS [analytics]
2023-08-02 §
20:42 <xcollazo> deployed latest for Airflow analytics instance. [analytics]
19:30 <xcollazo> deploying refinery to try and fix https://lists.wikimedia.org/hyperkitty/list/data-engineering-alerts@lists.wikimedia.org/thread/QKXYMYKMWXGRNYZ77CENA5F2EGA66QQ2/ [analytics]
12:42 <xcollazo> Redeploy of analytics_product Airflow instance to see it it clears a Spark issue [analytics]
2023-08-01 §
11:37 <btullis> ran apt clean on an-tool1009 to free up disk space [analytics]
06:24 <elukey> roll restart kafka jumbo brokers to apply new threads settings [analytics]
2023-07-31 §
19:03 <xcollazo> Deployed https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/471 for analytics Airflow instance [analytics]
12:25 <btullis> upgrading airflow on an-launcher1002 to 2.6.3 [analytics]
2023-07-28 §
19:38 <xcollazo> Deployed T342926 and https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/469 to analytics Airflow instance [analytics]
14:34 <milimetric> deployed a fix for a sqoop typo [analytics]
2023-07-27 §
18:48 <milimetric> done deploying some simple stuff to refinery (static files and script comment updates) [analytics]
2023-07-25 §
09:42 <stevemunene> powercycle wdqs1013.eqiad.wmnet [analytics]
2023-07-19 §
16:35 <joal> Deploy airflow fixfor cassandra loading jobs [analytics]
13:44 <btullis> restarting hive-server2 and hive-metastore services on an-coord1001 (currently standby) [analytics]
12:38 <joal> deploy Airflow analytics dags - Fullrevampof cassandraloading jobs [analytics]