351-400 of 5760 results (32ms)
2023-08-10 §
08:57 <btullis> paused all dags on all airflow instances [analytics]
2023-08-09 §
14:22 <btullis> failing over namenode on test cluster from an-test-master1001 to an-test-master1002 after upgrade of an-test-master1002 to bullseye [analytics]
11:31 <btullis> I did systemctl reset-failed logrotate.service on datahubsearch1002 [analytics]
11:08 <btullis> starting hadoop-hdfs-namenode.service on an-master1002 [analytics]
11:02 <btullis> failing over namenode services to an-master1002 so that I can reboot an-master1001 [analytics]
09:49 <btullis> restarted systemd-timedate service on an-worker1086 [analytics]
2023-08-07 §
17:09 <btullis> deploying new mediawiki_history snapshot to AQS [analytics]
2023-08-02 §
20:42 <xcollazo> deployed latest for Airflow analytics instance. [analytics]
19:30 <xcollazo> deploying refinery to try and fix https://lists.wikimedia.org/hyperkitty/list/data-engineering-alerts@lists.wikimedia.org/thread/QKXYMYKMWXGRNYZ77CENA5F2EGA66QQ2/ [analytics]
12:42 <xcollazo> Redeploy of analytics_product Airflow instance to see it it clears a Spark issue [analytics]
2023-08-01 §
11:37 <btullis> ran apt clean on an-tool1009 to free up disk space [analytics]
06:24 <elukey> roll restart kafka jumbo brokers to apply new threads settings [analytics]
2023-07-31 §
19:03 <xcollazo> Deployed https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/471 for analytics Airflow instance [analytics]
12:25 <btullis> upgrading airflow on an-launcher1002 to 2.6.3 [analytics]
2023-07-28 §
19:38 <xcollazo> Deployed T342926 and https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/469 to analytics Airflow instance [analytics]
14:34 <milimetric> deployed a fix for a sqoop typo [analytics]
2023-07-27 §
18:48 <milimetric> done deploying some simple stuff to refinery (static files and script comment updates) [analytics]
2023-07-25 §
09:42 <stevemunene> powercycle wdqs1013.eqiad.wmnet [analytics]
2023-07-19 §
16:35 <joal> Deploy airflow fixfor cassandra loading jobs [analytics]
13:44 <btullis> restarting hive-server2 and hive-metastore services on an-coord1001 (currently standby) [analytics]
12:38 <joal> deploy Airflow analytics dags - Fullrevampof cassandraloading jobs [analytics]
11:22 <jennifer_ebe> deploying refinery to hdfs [analytics]
10:57 <jennifer_ebe> deploying refinery using scap [analytics]
10:54 <btullis> migrating hive services to an-coord1002 via DNS for T329716 (to permit restart of hive services on an-coord1001). [analytics]
10:15 <btullis> restarting oozie service on an-coord1001 for T329716 [analytics]
10:14 <btullis> restarting presto-service on an-coord1001 for T329716 [analytics]
10:06 <btullis> restarting java services on an-test-coord1001 for JVM update [analytics]
09:13 <btullis> correction: to an-test-client1002 [analytics]
09:13 <btullis> deploying airflow-dags for analytics_test to an-test-client1001 [analytics]
2023-07-18 §
13:20 <stevemunene> deploy airflow-dags to an-test-client1002 T341700 [analytics]
2023-07-17 §
13:34 <elukey> `kill `pgrep -u appledora`` and `kill `pgrep -u akhatun`` on stat1008 to unblock puppet (offboarded users deletion) [analytics]
13:32 <btullis> proceeding to reimage analytics1072 (journalnode, in addition to datanode) [analytics]
09:31 <btullis> restarted airflow services on an-test-client1002 in order to pick up new versions [analytics]
09:19 <btullis> upgrading airflow on an-test-client1002 to version 2.6.3 [analytics]
2023-07-13 §
20:38 <xcollazo> deployed Airflow DAGs for analytics instance to pickup T335860 [analytics]
2023-07-12 §
16:26 <btullis> `sudo cumin A:wikireplicas-all 'maintain-views --replace-all --all-databases --table revision'` for T339037 [analytics]
14:11 <btullis> roll-restarting zookeeper on druid-public for new JVM version [analytics]
2023-07-11 §
11:00 <btullis> Proceeding to upgrade datahub in production [analytics]
08:59 <btullis> rebooting kafkamon1003 [analytics]
08:54 <btullis> `systemctl start burrow-jumbo-eqiad.service` on kafkamon1003 for T341551 [analytics]
2023-07-10 §
14:04 <btullis> powered on an-worker1145 [analytics]
14:02 <btullis> powered off an-worker1145 for T341481 [analytics]
10:55 <btullis> `sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet` on an-master1001 [analytics]
2023-07-07 §
09:56 <btullis> `sudo systemctl start hadoop-hdfs-namenode.service ` on an-master1001 [analytics]
09:28 <stevemunene> running sre.hadoop.roll-restart-masters restart the maters to completely remove any reference of analytics[1058-1069] T317861 [analytics]
09:15 <stevemunene> run puppet on hadoop masters to pick up changes from recently decommissioned hosts [analytics]
08:12 <elukey> wipe kafka-test cluster (data + zookeper config) to start clean after the issue happened yesterday [analytics]
2023-07-06 §
14:51 <elukey> upgraded zookeeper-test1002 to bookworm, but its metadata got re-initialized as well (my bad for this) [analytics]
14:30 <stevemunene> decommission analytics1069.eqiad.wmnet T341209 [analytics]
14:19 <stevemunene> decommission analytics1068.eqiad.wmnet T341208 [analytics]