2022-01-26
§
|
15:44 |
<joal> |
Kill-restart webrequest oozie job after deploy |
[analytics] |
15:40 |
<joal> |
Kill-restart edit-hourly oozie job after deploy |
[analytics] |
15:27 |
<joal> |
Deploy refinery to HDFS |
[analytics] |
15:10 |
<elukey> |
elukey@cp4036:~$ sudo systemctl restart varnishkafka-eventlogging |
[analytics] |
15:10 |
<elukey> |
elukey@cp4036:~$ sudo systemctl restart varnishkafka-statsv |
[analytics] |
15:06 |
<elukey> |
elukey@cp4035:~$ sudo systemctl restart varnishkafka-eventlogging.service - metrics showing messages stuck for a poll() |
[analytics] |
14:56 |
<elukey> |
elukey@cp4035:~$ sudo systemctl restart varnishkafka-webrequest.service - metrics showing messages stuck for a poll() |
[analytics] |
14:52 |
<joal> |
Deploy refinery with scap |
[analytics] |
10:07 |
<btullis> |
btullis@cumin1001:~$ sudo cumin 'O:cache::upload or O:cache::text' 'disable-puppet btullis-T296064-T299401' |
[analytics] |
2022-01-24
§
|
21:18 |
<btullis> |
btullis@deploy1002:/srv/deployment/analytics/refinery$ scap deploy -e hadoop-test -l an-test-coord1001.eqiad.wmnet |
[analytics] |
20:35 |
<btullis> |
rebooting an-test-coord1001 after recreating the /srv/file system. |
[analytics] |
20:28 |
<btullis> |
root@an-test-coord1001:~# mke2fs -t ext4 -j -m 0.5 /dev/vg0/srv |
[analytics] |
19:53 |
<btullis> |
power cycled an-test-coord1001 from racadm |
[analytics] |
19:50 |
<btullis> |
rebooting an-test-coord1001 |
[analytics] |
19:19 |
<ottomata> |
kill mysqld on an-test-coord1001 - 19:19:04 [@an-test-coord1001:/etc] $ sudo kill 42433 |
[analytics] |
19:02 |
<razzi> |
razzi@an-test-coord1001:~$ sudo systemctl stop presto-server |
[analytics] |
18:23 |
<razzi> |
downtime an-coord1001 while attempting to fix /srv partition |
[analytics] |
11:48 |
<elukey> |
roll restart of kafka test brokers to pick up the new keystore/tls-certs (1y of validity) |
[analytics] |
2022-01-13
§
|
12:41 |
<joal> |
rerun failed instances of webrequest-load-coord |
[analytics] |
11:59 |
<btullis> |
stopped eventlogging service on eventlog1003 with 1 hour's downtime. |
[analytics] |
11:52 |
<btullis> |
Upgrading hive packages on stat1005 |
[analytics] |
11:26 |
<btullis> |
restarted hive-metastore and hive-server2 on an-coord1001 after running puppet. |
[analytics] |
11:23 |
<btullis> |
btullis@an-coord1001:~$ sudo apt install hive hive-hcatalog hive-jdbc hive-metastore hive-server2 oozie oozie-client |
[analytics] |
11:18 |
<btullis> |
btullis@an-coord1002:~$ sudo systemctl restart hive-metastore hive-server2 |
[analytics] |
09:53 |
<btullis> |
DNS change deployed, failing over hive to an-coord1002. |
[analytics] |
09:42 |
<btullis> |
btullis@an-coord1002:~$ sudo apt install hive hive-hcatalog hive-jdbc hive-metastore hive-server2 oozie-client |
[analytics] |
08:45 |
<joal> |
Kill-restart wikidata-json_entity-weekly-coord after deploy |
[analytics] |