451-500 of 4769 results (20ms)
2022-01-26 §
15:27 <joal> Deploy refinery to HDFS [analytics]
15:10 <elukey> elukey@cp4036:~$ sudo systemctl restart varnishkafka-eventlogging [analytics]
15:10 <elukey> elukey@cp4036:~$ sudo systemctl restart varnishkafka-statsv [analytics]
15:06 <elukey> elukey@cp4035:~$ sudo systemctl restart varnishkafka-eventlogging.service - metrics showing messages stuck for a poll() [analytics]
14:56 <elukey> elukey@cp4035:~$ sudo systemctl restart varnishkafka-webrequest.service - metrics showing messages stuck for a poll() [analytics]
14:52 <joal> Deploy refinery with scap [analytics]
10:07 <btullis> btullis@cumin1001:~$ sudo cumin 'O:cache::upload or O:cache::text' 'disable-puppet btullis-T296064-T299401' [analytics]
2022-01-25 §
19:46 <ottomata> removing hdfs druid deep storage from test cluster [analytics]
19:37 <ottomata> reseting test cluster druid via druid reset-cluster https://druid.apache.org/docs/latest/operations/reset-cluster.html - T299930 [analytics]
14:30 <ottomata> stopping services on an-test-coord1001 - T299930 [analytics]
14:29 <ottomata> stopping druid* on an-test-druid1001 - T299930 [analytics]
11:30 <btullis> pooled aqs1011 T298516 [analytics]
11:29 <btullis> btullis@puppetmaster1001:~$ sudo -i confctl select name=aqs1011.eqiad.wmnet set/pooled=yes [analytics]
2022-01-24 §
21:18 <btullis> btullis@deploy1002:/srv/deployment/analytics/refinery$ scap deploy -e hadoop-test -l an-test-coord1001.eqiad.wmnet [analytics]
20:35 <btullis> rebooting an-test-coord1001 after recreating the /srv/file system. [analytics]
20:28 <btullis> root@an-test-coord1001:~# mke2fs -t ext4 -j -m 0.5 /dev/vg0/srv [analytics]
19:53 <btullis> power cycled an-test-coord1001 from racadm [analytics]
19:50 <btullis> rebooting an-test-coord1001 [analytics]
19:19 <ottomata> kill mysqld on an-test-coord1001 - 19:19:04 [@an-test-coord1001:/etc] $ sudo kill 42433 [analytics]
19:02 <razzi> razzi@an-test-coord1001:~$ sudo systemctl stop presto-server [analytics]
18:23 <razzi> downtime an-coord1001 while attempting to fix /srv partition [analytics]
11:48 <elukey> roll restart of kafka test brokers to pick up the new keystore/tls-certs (1y of validity) [analytics]
2022-01-22 §
08:36 <elukey> `apt-get clean` on an-test-coord1001 to free some space [analytics]
2022-01-21 §
01:03 <milimetric> rerunning the eventlogging_to_druid_network_flows_internal-sanitization_daily timer that failed to get logs [analytics]
2022-01-20 §
11:58 <btullis> re-enabled puppet on all hive nodes, deploying the updated log4j configuration for parquet [analytics]
11:36 <btullis> temporarily disabling puppet on servers with hive installed T297734 [analytics]
07:49 <joal> Rerun failed webrequest jobs (text and upload, 2022-01-19T19:00 [analytics]
2022-01-19 §
15:44 <ottomata> installing anaconda-wmf_2020.02~wmf6_amd64.deb on all analytics cluster nodes. - T292699 [analytics]
14:00 <ottomata> installing anaconda-wmf_2020.02~wmf6_amd64.deb on stat1004 - T292699 [analytics]
2022-01-17 §
07:19 <elukey> launch webrequest bundle from 2022-01-16T01:00 (first hour missing for text) - 0003712-220113112502223-oozie-oozi-B [analytics]
07:17 <elukey> kill webrequest bundle, text coordinator failed (logs/info/etc.. https://hue.wikimedia.org/hue/jobbrowser/#!id=0024621-210701181527401-oozie-oozi-B) [analytics]
07:13 <elukey> umount/mount /mnt/hdfs on an-coord1001 to pick up java upgrades [analytics]
2022-01-16 §
16:43 <elukey> `elukey@an-launcher1002:~$ sudo systemctl reset-failed eventlogging_to_druid_network_internal_flows-sanitization_daily.service eventlogging_to_druid_network_internal_flows_daily.service eventlogging_to_druid_network_internal_flows_hourly.service [analytics]
2022-01-13 §
12:41 <joal> rerun failed instances of webrequest-load-coord [analytics]
11:59 <btullis> stopped eventlogging service on eventlog1003 with 1 hour's downtime. [analytics]
11:52 <btullis> Upgrading hive packages on stat1005 [analytics]
11:26 <btullis> restarted hive-metastore and hive-server2 on an-coord1001 after running puppet. [analytics]
11:23 <btullis> btullis@an-coord1001:~$ sudo apt install hive hive-hcatalog hive-jdbc hive-metastore hive-server2 oozie oozie-client [analytics]
11:18 <btullis> btullis@an-coord1002:~$ sudo systemctl restart hive-metastore hive-server2 [analytics]
09:53 <btullis> DNS change deployed, failing over hive to an-coord1002. [analytics]
09:42 <btullis> btullis@an-coord1002:~$ sudo apt install hive hive-hcatalog hive-jdbc hive-metastore hive-server2 oozie-client [analytics]
08:45 <joal> Kill-restart wikidata-json_entity-weekly-coord after deploy [analytics]
2022-01-12 §
21:13 <joal> Deploying refinery to HDFS [analytics]
20:46 <joal> Deploying refinery with scap [analytics]
20:35 <joal> refinery-source v0.1.24 released on archiva [analytics]
11:25 <elukey> move kafka-jumbo nodes to fixed kafka uid/gid [analytics]
07:46 <elukey> `systemctl reset-failed product-analytics-movement-metrics.service` on stat1007 [analytics]
2022-01-10 §
13:56 <btullis> Upgrading oozie packages on an-test-coord1001 to test new log4j versions [analytics]
2022-01-08 §
10:51 <elukey> start hive-server2 on an-coord1002 - failed to connect to the metastore due to restart [analytics]
10:41 <elukey> restart hive daemons on an-coord1002 (after my last upgrade/rollback of packages the prometheus agent settings were not picked up, so no metrics) [analytics]