3401-3450 of 4493 results (28ms)
2018-01-18 §
08:58 <elukey> temporarily set druid1002 in superset's druid cluster config (via UI) [analytics]
08:53 <elukey> temporarily point pivot's configuration to druid1002 (druid1001 needs to be rebooted) [analytics]
08:52 <elukey> disable druid1001's middlemanager as prep step for reboot [analytics]
07:11 <elukey> re-run webrequest-load-wf-misc-2018-1-18-3 via Hue [analytics]
2018-01-17 §
17:33 <elukey> killed the banner impression spark job (application_1515441536446_27293) again to force it to respawn (real time indexers not present) [analytics]
17:29 <elukey> restarted all druid overlords on druid100[123] (weird race condition messages about who was the leader for some task) [analytics]
16:24 <elukey> re-run all the pageview-druid-hourly failed jobs via Hue [analytics]
14:42 <elukey> restart druid middlemanager on druid1003 as attempt to unblock realtime streaming [analytics]
14:21 <elukey> forced kill of banner impression data streaming job to get it restarted [analytics]
11:44 <elukey> re-run pageview-druid-hourly-wf-2018-1-17-9 and pageview-druid-hourly-wf-2018-1-17-8 (failed due to druid1002's middlemanager being in a weird state after reboot) [analytics]
11:44 <elukey> restart druid middlemanager on druid1002 [analytics]
10:38 <elukey> stopped all crons on hadoop-coordinator-1 [analytics]
10:37 <elukey> re-run webrequest-druid-hourly-wf-2018-1-17-8 (failed due to druid1002's reboot) [analytics]
10:22 <elukey> reboot druid1002 for kernel upgrades [analytics]
09:53 <elukey> disable druid middlemanager on druid1002 as prep step for reboot [analytics]
09:46 <elukey> rebooted analytics1003 [analytics]
09:46 <elukey> removed upstart config for brrd on eventlog1001 (failing and spamming syslog, old leftover?) [analytics]
08:53 <elukey> disabled camus as prep step for analytics1003 reboot [analytics]
2018-01-15 §
13:39 <elukey> stop eventlogging and reboot eventlog1001 for kernel updates [analytics]
09:58 <elukey> rolling reboots of aqs hosts (1005->1009) for kernel updates [analytics]
09:11 <elukey> reboot aqs1004 for kernel updates [analytics]
2018-01-12 §
13:03 <joal> Rerun webrequest-load-wf-text-2018-1-12-9 [analytics]
13:02 <joal> Rerun webrequest-load-wf-upload-2018-1-12-9 [analytics]
10:33 <elukey> reboot analytics1066->69 for kernel updates [analytics]
09:07 <elukey> reboot analytics1063->65 for kernel updates [analytics]
2018-01-11 §
22:35 <ottomata> restarting kafka-jumbo brokers to apply https://gerrit.wikimedia.org/r/403774 [analytics]
22:04 <ottomata> restarting kafka-jumbo brokers to apply https://gerrit.wikimedia.org/r/#/c/403762/ [analytics]
20:57 <ottomata> restarting kafka-jumbo brokers to apply https://gerrit.wikimedia.org/r/#/c/403753/ [analytics]
17:37 <joal> Kill manual banner-streaming job to see it restarted by cron [analytics]
17:11 <ottomata> restart kafka on kafka-jumbo1003 [analytics]
17:08 <ottomata> restart kafka on kafka-jumbo1001...something is not right with my certpath change yesterday [analytics]
14:46 <joal> Deploy refinery onto HDFS [analytics]
14:33 <joal> Deploy refinery with Scap [analytics]
14:07 <joal> Manually restarting banner streaming job to prevent alerting [analytics]
13:23 <joal> Killing banner-streaming job to have it auto-restarted from cron [analytics]
11:45 <elukey> re-run webrequest-load-wf-text-2018-1-11-8 (failed due to reboots) [analytics]
11:39 <joal> rerun mediacounts-load-wf-2018-1-11-8 [analytics]
10:48 <joal> Restarting banner-streaming job after hadoop nodes reboot [analytics]
10:01 <elukey> reboot analytics1059-61 for kernel updates [analytics]
09:34 <elukey> reboot analytics1055->1058 for kernel updates [analytics]
09:04 <elukey> reboot analytics1051->1054 for kernel updates [analytics]
2018-01-10 §
16:56 <elukey> reboot analytics1048->50 for kernel updates [analytics]
16:23 <ottomata> restarting kafka jumbo brokers to apply java.security certpath restrictions [analytics]
11:51 <elukey> re-run webrequest-load-wf-upload-2018-1-10-10 (failed due to reboots) [analytics]
11:27 <elukey> re-run webrequest-load-wf-text-2018-1-10-10 (failed due to reboots) [analytics]
11:26 <elukey> reboot analytics1044->47 for kernel updates [analytics]
11:03 <elukey> reboot analytics1040->43 for kernel updates [analytics]
2018-01-09 §
16:53 <joal> Rerun pageview-druid-hourly-wf-2018-1-9-13 [analytics]
15:33 <elukey> stop mysql on dbstore1002 as prep step for shutdown (stop all slaves, mysql stop) [analytics]
15:10 <elukey> reboot analytics1028 (hadoop worker and hdfs journal node) for kernel updates [analytics]