4251-4300 of 5874 results (29ms)
2018-10-29 §
14:27 <ottomata> ran kafka-preferred-replica-election on kafka jumbo-eqiad cluster (this successfully rebalanced webrequest_text partition leadership) T207768 [analytics]
10:23 <joal> Kill yarn application application_1540747790951_1429 to prevent more cluster errors (eating too many resources) [analytics]
08:56 <elukey> bounce yarn resource managers to pick up new zookeeper session timeout settings [analytics]
2018-10-28 §
17:30 <elukey> restart yarn resource manager on an-master1002 to force failover to an-master1001 [analytics]
2018-10-26 §
18:33 <andrewbogott> region migration finished [analytics]
13:36 <andrewbogott> migrating project to eqiad1 [analytics]
11:49 <joal> Rerun failed oozie jobs (pageview and projectview) [analytics]
06:18 <elukey> add AAAA DNS records for aqs and matomo1001 [analytics]
05:55 <elukey> reportupdater hadoop migrated to stat1007 [analytics]
2018-10-25 §
21:06 <ottomata> bouncing eventlogging-processor client side* to pick up mysql whitelist change for ContentTranslationAbuseFilter (https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/469419/) [analytics]
18:14 <joal> Manually resume the bunch of suspended jobs (mostly from ebernhardson and chelsyx - our apologizes for not noticing earlier) [analytics]
18:13 <joal> Manually copy /etc/hive/conf/hive-site.xml to hdfs:///user/hive and set permissions to 644 to allow all users to run oozie jobs [analytics]
15:36 <elukey> shutdown aqs1006 to replace one broken disk [analytics]
14:28 <elukey> upgrade druid on druid100[4-6] to Druid 0.12.3 [analytics]
14:24 <elukey> added AAAA DNS records to all the druid nodes [analytics]
10:36 <joal> Resuming oozie webrequest and pageview druid hourly indexation jobs [analytics]
10:35 <elukey> upgraded Druid on druid100[1-3] to 0.12.3-1 [analytics]
09:16 <elukey> upgrade turnilo to 1.8.1 [analytics]
08:56 <elukey> restart hive-server on an-coord1001 to pick up new prometheus settings [analytics]
08:10 <joal> Suspend webrequest-druid-hourly and pageview-druid-hourly oozie jobs [analytics]
07:52 <joal> Manually add za.wikimedia to pageview-witelist (patch merged: https://gerrit.wikimedia.org/r/469557) [analytics]
2018-10-23 §
16:25 <ottomata> altering topic eventlogging_ReadingDepth to increase partitions from 1 to 12 [analytics]
06:42 <elukey> restart yarn and hdfs daemon on analytics1068 to pick up correct config (the host was down since before we swapped the Hadoop masters due to hw failures) [analytics]
2018-10-22 §
17:24 <elukey> upgraded camus jar version in an-coordq1001's crontab (via puppet) [analytics]
17:21 <elukey> deploy refinery to hdfs (via stat1005) [analytics]
17:12 <elukey> deploy refinery (new version of camus) [analytics]
15:09 <mforns> Finished deployment of refinery using scap and refinery-deploy-to-hdfs [analytics]
14:51 <mforns> Starting deployment of refinery using scap and refinery-deploy-to-hdfs [analytics]
14:50 <mforns> Finished deployment of refinery-source using jenkins [analytics]
14:24 <mforns> Starting deployment of refinery-source using jenkins [analytics]
2018-10-16 §
12:32 <joal> rerun pageview-hourly-wf-2018-10-15-17 [analytics]
2018-10-15 §
19:45 <mforns> Finished refinery deployment with scap and refinery-deploy-to-hdfs [analytics]
19:10 <mforns> Started refinery deployment with scap and refinery-deploy-to-hdfs [analytics]
19:09 <mforns> Finished refinery-source deployment [analytics]
18:42 <mforns> Started refinery-source deployment [analytics]
15:20 <mforns> Finished refinery deployment with scap and refinery-deploy-to-hdfs [analytics]
14:52 <mforns> Started refinery deployment with scap [analytics]
14:47 <mforns> Finished refinery-source deployment [analytics]
14:19 <mforns> Started refinery-source deployment [analytics]
14:05 <elukey> swapped cobalt's ip with gerrit.wikimedia.org's one in analytics-in(4|6) firewall filters on the eqiad routers for https://phabricator.wikimedia.org/T206331#4666622. This should not cause git pulls to fail but let me know in case it does. [analytics]
2018-10-14 §
09:15 <elukey> restart yarn resource manager on an-coord1002 (failover happened due to jvm issues) [analytics]
09:15 <elukey> restart apps-session-metrics with spark 2.3.1 oozie libs (modified the coordinator.properties file manually on disk) [analytics]
2018-10-12 §
07:32 <elukey> cleaned up all september files from eventlog1002's srv el archive to free some space (disk alerts) [analytics]
2018-10-11 §
14:20 <elukey> reboot eventlog1002 for kernel upgrades [analytics]
2018-10-10 §
19:27 <joal> Restart webrequest-load oozie bundle [analytics]
18:23 <joal> kill Webrequest-load bundle [analytics]
18:04 <joal> Kill webrequest-load-coord-upload [analytics]
07:23 <elukey> add ipv6 mapped addresses (and DNS PTRs) to analytics-tools* [analytics]
07:23 <joal> Full restart of browser-general oozie job [analytics]
07:19 <joal> patch mediacount-archive job in prod [analytics]