2021-02-12
§
|
19:19 |
<milimetric> |
deployed refinery with query syntax fix for the last broken cassandra job and an updated EL whitelist |
[analytics] |
18:34 |
<razzi> |
rebalance kafka partitions for atskafka_test_webrequest_text |
[analytics] |
18:31 |
<razzi> |
rebalance kafka partitions for __consumer_offsets |
[analytics] |
17:48 |
<joal> |
Rerun wikidata-articleplaceholder_metrics-wf-2021-2-10 |
[analytics] |
17:47 |
<joal> |
Rerun wikidata-specialentitydata_metrics-wf-2021-2-10 |
[analytics] |
17:43 |
<joal> |
Rerun wikidata-json_entity-weekly-wf-2021-02-01 |
[analytics] |
17:08 |
<elukey> |
reboot presto workers for kernel upgrade |
[analytics] |
16:32 |
<mforns> |
finished deployment of analytics-refinery |
[analytics] |
15:26 |
<mforns> |
started deployment of analytics-refinery |
[analytics] |
15:16 |
<elukey> |
roll restart druid broker on druid-public to pick up new settings |
[analytics] |
07:54 |
<elukey> |
roll restart of druid brokers on druid-public - locked after scheduled datasource deletion |
[analytics] |
07:46 |
<elukey> |
force a manual run of refinery-druid-drop-public-snapshots on an-launcher1002 (3d before its natural start) - controlled execution to see how druid + 3xdataset replication reacts |
[analytics] |
2021-02-09
§
|
22:04 |
<razzi> |
rebalance kafka partitions for eqiad.resource-purge |
[analytics] |
20:51 |
<joal> |
Rerun webrequest-load-coord-[text|upload] for 2021-02-09T07:00 after data was imported to camus |
[analytics] |
20:50 |
<razzi> |
rebalance kafka partitions for codfw.resource-purge |
[analytics] |
20:31 |
<joal> |
Rerun webrequest-load-coord-[text|upload] for 2021-02-09T06:00 after data was imported to camus |
[analytics] |
16:30 |
<elukey> |
restart datanode on ana-worker1100 |
[analytics] |
16:14 |
<ottomata> |
restart datanode on analytics1059 with 16g heap |
[analytics] |
16:08 |
<ottomata> |
restart datanode on an-worker1080 withh 16g heap |
[analytics] |
15:58 |
<ottomata> |
restart datanode on analytics1058 |
[analytics] |
15:55 |
<ottomata> |
restart datenode on an-worker1115 |
[analytics] |
15:38 |
<elukey> |
restart namenode on an-master1002 |
[analytics] |
15:01 |
<elukey> |
restart an-worker1104 with 16g heap size to allow bootstrap |
[analytics] |
15:01 |
<elukey> |
restart an-worker1103 with 16g heap size to allow bootstrap |
[analytics] |
14:57 |
<elukey> |
restart an-worker1102 with 16g heap size to allow bootstrap |
[analytics] |
14:54 |
<elukey> |
restart an-worker1090 with 16g heap size to allow bootstrap |
[analytics] |
14:50 |
<elukey> |
restart analytics1072 with 16g heap size to allow bootstrap |
[analytics] |
14:50 |
<elukey> |
restart analytics1069 with 16g heap size to allow bootstrap |
[analytics] |
14:08 |
<elukey> |
restart analytics1069's datanode with bigger heap size |
[analytics] |
13:39 |
<elukey> |
restart hdfs-datanode on analytics10[65,69] - failed to bootstrap due to issues reading datanode dirs |
[analytics] |
13:38 |
<elukey> |
restart hdfs-datanode on an-worker1080 (test canary - not showing up in block report) |
[analytics] |
10:04 |
<elukey> |
stop mysql replication an-coord1001 -> an-coord1002, an-coord1001 -> db1108 |
[analytics] |
08:29 |
<elukey> |
leave hdfs safemode to let distcp do its job |
[analytics] |
08:25 |
<elukey> |
set hdfs safemode on for the Analytics cluster |
[analytics] |
08:19 |
<elukey> |
umount /mnt/hdfs from all nodes using it |
[analytics] |
08:16 |
<joal> |
Kill flink yarn app |
[analytics] |
08:08 |
<elukey> |
stop jupyterhub on stat100x |
[analytics] |
08:07 |
<elukey> |
stop hive on an-coord100[1,2] - prep step for bigtop upgrade |
[analytics] |