2023-07-19
§
|
13:44 |
<btullis> |
restarting hive-server2 and hive-metastore services on an-coord1001 (currently standby) |
[analytics] |
12:38 |
<joal> |
deploy Airflow analytics dags - Fullrevampof cassandraloading jobs |
[analytics] |
11:22 |
<jennifer_ebe> |
deploying refinery to hdfs |
[analytics] |
10:57 |
<jennifer_ebe> |
deploying refinery using scap |
[analytics] |
10:54 |
<btullis> |
migrating hive services to an-coord1002 via DNS for T329716 (to permit restart of hive services on an-coord1001). |
[analytics] |
10:15 |
<btullis> |
restarting oozie service on an-coord1001 for T329716 |
[analytics] |
10:14 |
<btullis> |
restarting presto-service on an-coord1001 for T329716 |
[analytics] |
10:06 |
<btullis> |
restarting java services on an-test-coord1001 for JVM update |
[analytics] |
09:13 |
<btullis> |
correction: to an-test-client1002 |
[analytics] |
09:13 |
<btullis> |
deploying airflow-dags for analytics_test to an-test-client1001 |
[analytics] |
2023-07-06
§
|
14:51 |
<elukey> |
upgraded zookeeper-test1002 to bookworm, but its metadata got re-initialized as well (my bad for this) |
[analytics] |
14:30 |
<stevemunene> |
decommission analytics1069.eqiad.wmnet T341209 |
[analytics] |
14:19 |
<stevemunene> |
decommission analytics1068.eqiad.wmnet T341208 |
[analytics] |
14:06 |
<stevemunene> |
decommission analytics1067.eqiad.wmnet T341207 |
[analytics] |
13:13 |
<stevemunene> |
decommission analytics1066.eqiad.wmnet T341206 |
[analytics] |
13:02 |
<stevemunene> |
decommission analytics1065.eqiad.wmnet T341205 |
[analytics] |
12:35 |
<stevemunene> |
decommission analytics1064.eqiad.wmnet T341204 |
[analytics] |
11:18 |
<stevemunene> |
decommission analytics1063.eqiad.wmnet T339201 |
[analytics] |
10:40 |
<stevemunene> |
decommission analytics1062.eqiad.wmnet T339200 |
[analytics] |
09:57 |
<stevemunene> |
decommission analytics1061.eqiad.wmnet T339199 |
[analytics] |
07:23 |
<stevemunene> |
run puppet agent on hadoop masters |
[analytics] |
07:21 |
<stevemunene> |
Remove analytics1064_1069 from hdfs net_topology |
[analytics] |
07:17 |
<stevemunene> |
stop hadoop-hdfs-datanode service on analytics[1061-1069] Preparing to decommission the hosts - T317861 |
[analytics] |
07:11 |
<stevemunene> |
disable-puppet on analytics[1061-1069] Preparing to decommission the hosts - T317861 |
[analytics] |
2023-07-05
§
|
14:36 |
<stevemunene> |
enable puppet on analytics1069 to get the host back into puppetdb and hence allow the the decommission cookbook run later |
[analytics] |
11:47 |
<btullis> |
restarted archiva for T329716 |
[analytics] |
11:45 |
<btullis> |
restarted hive-servers2 and hive-metastore service on an-coord1002 |
[analytics] |
11:40 |
<btullis> |
roll-restarting kafka-jumbo brokers for T329716 |
[analytics] |
11:01 |
<btullis> |
roll-restarting the presto workers for T329716 |
[analytics] |
10:20 |
<btullis> |
deploying updated spark3 defaults to disable the `spark.shuffle.useOldFetchProtocol`option for T332765 |
[analytics] |
09:45 |
<btullis> |
failing back namenode to an-master1001 with `sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet` on an-master1001 |
[analytics] |
09:38 |
<btullis> |
re-enabled gobblin jobs on an-launcher1002 |
[analytics] |