401-450 of 3939 results (21ms)
2021-03-01 §
21:24 <razzi> rebalance kafka partitions for webrequest_upload partition 6 [analytics]
18:14 <razzi> restart timer that wasn't running on an-worker1101: sudo systemctl restart prometheus-debian-version-textfile.timer [analytics]
17:40 <elukey> reimage an-worker1098 (GPU worker node) to Buster [analytics]
14:48 <elukey> reimage an-worker1097 (gpu node) to debian buster [analytics]
11:55 <elukey> roll restart druid broker on druid-analytics (again) to enable query cache settings (missing config due to typo) [analytics]
11:34 <elukey> roll restart historical daemons (again) on druid-analytics to remove stale config and enable (finally) segment caching. [analytics]
11:02 <elukey> roll restart druid-broker and druid-historical daemons on druid-analytics to pick up new cache settings (disable segment caching on broker and enable it on historicals) [analytics]
09:11 <elukey> restart hadoop daemons on an-worker1112 to pick up the new disk [analytics]
09:11 <elukey> remount /dev/sdl on an-worker1112 (wasn't able to make it fail) [analytics]
2021-02-26 §
16:03 <razzi> rebalance kafka partitions for webrequest_upload partition 4 [analytics]
12:33 <elukey> reimaged an-worker1096 (GPU node) to Debian buster (preserving datanode dirs) [analytics]
09:51 <elukey> reimaged analytics1058 to debian buster (preserving datanode partitions) [analytics]
07:50 <elukey> attempt to reimage analytics1058 (part of the cluster, not a new worker node) to Buster [analytics]
07:29 <elukey> added journalnode partition to all hadoop workers not having it in the Analytics cluster [analytics]
07:01 <elukey> reboot an-worker1099 to clear out kernel soft lockup errors [analytics]
06:59 <elukey> restart datanode on an-worker1099 - soft lockup kernel errors [analytics]
2021-02-25 §
17:04 <razzi> rebalance kafka partitions for webrequest_upload_3 [analytics]
13:36 <elukey> drop /srv/backup/wikistats from thorium [analytics]
13:35 <elukey> drop /srv/backup/backup_wikistats_1 from thorium [analytics]
11:14 <elukey> add an-worker111[7,8] to Analytics Hadoop (were previously backup worker nodes) [analytics]
08:50 <elukey> move analytics-privatedata/search/product to fixed gid/uid on all buster nodes (including airflow/stat100x/launcher) [analytics]
2021-02-24 §
19:16 <ottomata> service hadoop-yarn-nodemanager start on an-worker1112 [analytics]
16:03 <milimetric> deployed refinery [analytics]
14:09 <elukey> roll restart druid brokers on druid public to pick up caffeine cache settings [analytics]
14:03 <elukey> roll restart druid brokers on druid analytics to pick up caffeine cache settings [analytics]
11:08 <elukey> restart druid-broker on an-druid1001 (used by Turnilo) with caffeine cache [analytics]
09:01 <elukey> roll restart druid brokers on druid public - locked [analytics]
07:47 <elukey> change gid/uid for druid + roll restart of all druid nodes [analytics]
2021-02-23 §
21:20 <ottomata> started nodemanager on an-worker1112 [analytics]
21:15 <razzi> rebalance kafka partitions for webrequest_upload partition 2 [analytics]
19:31 <elukey> roll out new uid/gid for mapred/druid/analytics/yarn/hdfs for all buster nodes (no op for stretch) [analytics]
17:47 <elukey> change uid/gid for yarn/mapred/analytics/hdfs/druid on stat100x, an-presto100x [analytics]
15:57 <elukey> an-launcher1002's timers restored [analytics]
15:28 <elukey> stop timers on an-launcher1002 to change gid/uid for yarn/hdfs/mapred/analytics/druid and to reboot for kernel updates [analytics]
15:23 <elukey> deploy new uid/gid scheme for yarn/mapred/analytics/hdfs/druid on an-tool100[8,9] [analytics]
15:22 <elukey> deploy new uid/gid scheme for yarn/mapred/analytics/hdfs/druid on an-airflow1001, an-test* buster nodes [analytics]
15:05 <klausman> an-master1001 ~ $ sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chgrp analytics-privatedata-users /wmf/data/raw/webrequest/webrequest_text/hourly/2021/02/22/01/webrequest* [analytics]
14:51 <elukey> drop /srv/backup-1007 on stat1008 to free space [analytics]
2021-02-22 §
19:27 <ottomata> restart oozie on an-coord1001 to pick up new spark share lib without hadoop jars - T274384 [analytics]
14:38 <ottomata> upgrade spark2 on analytics cluster to 2.4.4-bin-hadoop2.6-5~wmf0 (hadoop jars removed) - T274384 [analytics]
14:12 <ottomata> upgrade spark2 on an-coord1001 to 2.4.4-bin-hadoop2.6-5~wmf0 (hadoop jars removed), will remove and auto-re add spark-2.4.4-assembly.zip in hdfs after running puppet here [analytics]
14:07 <ottomata> upgrade spark2 on stat1004 to 2.4.4-bin-hadoop2.6-5~wmf0 (hadoop jars removed) [analytics]
09:01 <elukey> reboot stat1005/stat1008 for kernel upgrades [analytics]
2021-02-19 §
15:53 <elukey> restart oozie again to test another setting for role/admins [analytics]
15:43 <ottomata> installing spark 2.4.4 without hadoop jars on analytics test cluster - T274384 [analytics]
15:31 <elukey> restart oozie to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/665352 [analytics]
14:34 <joal> rerun mobile_apps-uniques-daily-wf-2021-2-18 [analytics]
09:16 <elukey> stop and decom the hadoop backup cluster [analytics]
2021-02-18 §
18:38 <razzi> rebalance kafka partition for webrequest_upload partition 1 [analytics]
17:27 <elukey> an-coord1002 back in service with raid1 configured [analytics]