5151-5200 of 6267 results (38ms)
2018-02-06 §
09:58 <elukey> applied https://gerrit.wikimedia.org/r/c/405687/ manually on deployment-eventlog02 for testing [analytics]
2018-02-05 §
15:51 <elukey> live hacked deployment-eventlog02's /srv/deployment/eventlogging/analytics/eventlogging/handlers.py to add poll(0) to the confluent kafka producer - T185291 [analytics]
11:03 <elukey> restart eventlogging/forwarder legacy-zmq on eventlog1001 due to slow memory leak over time (cached memory down to zero) [analytics]
2018-02-02 §
17:09 <joal> Webrequest upload 2018-02-02 hours 9 and 11 dataloss warning have been checked - They are false positive [analytics]
09:56 <joal> unique_devices-per_project_family-monthly-wf-2018-1 after failure [analytics]
2018-02-01 §
17:00 <ottomata> killing stuck JsonRefine eventlogging analytics job application_1515441536446_52892, not sure why this is stuck. [analytics]
14:06 <joal> Dataloss alerts for upload 2018-02-01 hours 1, 2, 3 and 5 were false positives [analytics]
12:17 <joal> Restart cassandra monthly bundle after January deploy [analytics]
2018-01-23 §
20:10 <ottomata> hdfs dfs -chmod 775 /wmf/data/archive/mediacounts/daily/2018 for T185419 [analytics]
09:26 <joal> Dataloss warning for upload and text 2018-01-23:06 is confirmed to be false positive [analytics]
2018-01-22 §
17:36 <joal> Kill-Restart clickstream oozie job after deploy [analytics]
17:12 <joal> deploying refinery onto HDFS [analytics]
17:12 <joal> Refinery deployed from scap [analytics]
2018-01-18 §
19:11 <joal> Kill-Restart coord_pageviews_top_bycountry_monthly ooie job from 2015-05 [analytics]
19:10 <joal> Add fake data to cassandra to silent alarms (Thanks again ema) [analytics]
18:56 <joal> Truncating table "local_group_default_T_top_bycountry"."data" in cassandra before reload [analytics]
15:21 <mforns> refinery deployment using scap and then deploying onto hdfs finished [analytics]
15:07 <mforns> starting refinery deployment [analytics]
12:43 <elukey> piwik on bohrium re-enabled [analytics]
12:40 <elukey> set piwik in readonly mode and stopped mysql on bohrium (prep step for reboot) [analytics]
09:38 <elukey> reboot thorium (analytics webserver) for security upgrade - This maintenance will cause temporary unavailability of the Analytics websites [analytics]
09:37 <elukey> resumed druid hourly index jobs via hue and restored pivot's configuration [analytics]
09:21 <elukey> reboot druid1001 for kernel upgrades [analytics]
09:00 <elukey> suspended hourly druid batch index jobs via Hue [analytics]
08:58 <elukey> temporarily set druid1002 in superset's druid cluster config (via UI) [analytics]
08:53 <elukey> temporarily point pivot's configuration to druid1002 (druid1001 needs to be rebooted) [analytics]
08:52 <elukey> disable druid1001's middlemanager as prep step for reboot [analytics]
07:11 <elukey> re-run webrequest-load-wf-misc-2018-1-18-3 via Hue [analytics]
2018-01-17 §
17:33 <elukey> killed the banner impression spark job (application_1515441536446_27293) again to force it to respawn (real time indexers not present) [analytics]
17:29 <elukey> restarted all druid overlords on druid100[123] (weird race condition messages about who was the leader for some task) [analytics]
16:24 <elukey> re-run all the pageview-druid-hourly failed jobs via Hue [analytics]
14:42 <elukey> restart druid middlemanager on druid1003 as attempt to unblock realtime streaming [analytics]
14:21 <elukey> forced kill of banner impression data streaming job to get it restarted [analytics]
11:44 <elukey> re-run pageview-druid-hourly-wf-2018-1-17-9 and pageview-druid-hourly-wf-2018-1-17-8 (failed due to druid1002's middlemanager being in a weird state after reboot) [analytics]
11:44 <elukey> restart druid middlemanager on druid1002 [analytics]
10:38 <elukey> stopped all crons on hadoop-coordinator-1 [analytics]
10:37 <elukey> re-run webrequest-druid-hourly-wf-2018-1-17-8 (failed due to druid1002's reboot) [analytics]
10:22 <elukey> reboot druid1002 for kernel upgrades [analytics]
09:53 <elukey> disable druid middlemanager on druid1002 as prep step for reboot [analytics]
09:46 <elukey> rebooted analytics1003 [analytics]
09:46 <elukey> removed upstart config for brrd on eventlog1001 (failing and spamming syslog, old leftover?) [analytics]
08:53 <elukey> disabled camus as prep step for analytics1003 reboot [analytics]
2018-01-15 §
13:39 <elukey> stop eventlogging and reboot eventlog1001 for kernel updates [analytics]
09:58 <elukey> rolling reboots of aqs hosts (1005->1009) for kernel updates [analytics]
09:11 <elukey> reboot aqs1004 for kernel updates [analytics]
2018-01-12 §
13:03 <joal> Rerun webrequest-load-wf-text-2018-1-12-9 [analytics]
13:02 <joal> Rerun webrequest-load-wf-upload-2018-1-12-9 [analytics]
10:33 <elukey> reboot analytics1066->69 for kernel updates [analytics]
09:07 <elukey> reboot analytics1063->65 for kernel updates [analytics]
2018-01-11 §
22:35 <ottomata> restarting kafka-jumbo brokers to apply https://gerrit.wikimedia.org/r/403774 [analytics]