601-650 of 4325 results (21ms)
2021-04-15 §
14:56 <elukey> deploy refinery via scap - weekly train [analytics]
09:50 <elukey> rollback hue on an-tool1009 to 4.8, it seems that 4.9 still has issues [analytics]
06:32 <elukey> move hue.wikimedia.org to an-tool1009 (from analytics-tool1001) [analytics]
01:36 <razzi> rebalance kafka partitions for webrequest_text partitions 21,22 [analytics]
2021-04-14 §
14:05 <elukey> run build/env/bin/hue migrate on an-tool1009 after the hue upgade [analytics]
13:10 <elukey> rollback hue-next to 4.8 - issues not present in staging [analytics]
13:00 <elukey> upgrade Hue to 4.9 on an-tool1009 - hue-next.wikimedia.org [analytics]
10:02 <elukey> roll restart yarn nodemanagers on hadoop prod (attempt to see if they entered in a weird state, graceful restart) [analytics]
09:54 <elukey> kill long running mediawiki-job refine erroring out application_1615988861843_166906 [analytics]
09:46 <elukey> kill application_1615988861843_163186 for the same reason [analytics]
09:43 <elukey> kill application_1615988861843_164387 to see if any improvement to socket consumption is made [analytics]
09:14 <elukey> run "sudo kill `pgrep -f sqoop`" on an-launcher1002 to clean up old test processes still running [analytics]
2021-04-13 §
16:17 <razzi> rebalance kafka partitions for webrequest_text partitions 19, 20 [analytics]
13:18 <ottomata> Refine now uses refinery-job 0.1.4; RefineFailuresChecker has been removed and its function rolled into RefineMonitor - [analytics]
10:23 <hnowlan> deploying aqs with updated cassandra libraries to aqs1004 while depooled [analytics]
06:17 <elukey> kill application application_1615988861843_158645 to free space on analytics1070 [analytics]
06:10 <elukey> kill application_1615988861843_158592 on analytics1061 to allow space to recover (truncate of course in D state) [analytics]
06:05 <elukey> truncate logs for application_1615988861843_158592 on analytics1061 - one partition full [analytics]
2021-04-12 §
14:21 <ottomata> stop using http proxies for produce_canary_events_job - T274951 [analytics]
2021-04-08 §
16:33 <elukey> reboot an-worker1100 again to check if all the disks come up correctly [analytics]
15:43 <razzi> rebalance kafka partitions for webrequest_text partitions 17, 18 [analytics]
15:35 <elukey> reboot an-worker1100 to see if it helps with the strange BBU behavior in T279475 [analytics]
14:07 <elukey> drop /var/spool/rsyslog from stat1008 - corrupted files due to root partition filled up caused a SEGV for rsyslog [analytics]
11:14 <hnowlan> created aqs user and loaded full schemas into analytics wmcs cassandra [analytics]
08:35 <elukey> apt-get clean on stat1008 to free some space [analytics]
07:44 <elukey> restart hadoop hdfs masters on an-master100[1,2] to apply the new log4j settings fro the audit log [analytics]
06:44 <elukey> re-deployed refinery to hadoop-test after fixing permissions on an-test-coord1001 [analytics]
2021-04-07 §
23:03 <ottomata> installing anaconda-wmf-2020.02~wmf5 on remaining nodes - T279480 [analytics]
22:51 <ottomata> installing anaconda-wmf-2020.02~wmf5 on stat boxes - T279480 [analytics]
22:47 <mforns> finished refinery deployment up to 1dbbd3dfa996d2e970eb1cbc0a63d53040d4e3a3 [analytics]
22:39 <mforns> deployment of refinery via scap to hadoop-test failed with Permission denied: '/srv/deployment/analytics/refinery-cache/.config' (deployemt to production went fine) [analytics]
21:44 <mforns> starting refinery deploy up to 1dbbd3dfa996d2e970eb1cbc0a63d53040d4e3a3 [analytics]
21:26 <mforns> deployed refinery-source v0.1.4 [analytics]
21:25 <razzi> sudo apt-get install --reinstall sudo apt-get install --reinstall anaconda-wmf on stat1008 [analytics]
20:15 <razzi> rebalance kafka partitions for webrequest_text partitions 15, 16 [analytics]
19:53 <ottomata> upgrade anaconda-wmf everywhere to 2020.02~wmf4 with fixes for T279480 [analytics]
14:03 <hnowlan> setting profile::aqs::git_deploy: true in aqs-test1001 hiera config [analytics]
2021-04-06 §
22:34 <razzi> rebalance kafka partitions for webrequest_text_13,14 [analytics]
09:37 <elukey> reimage an-coord1002 to Debian Buster [analytics]
2021-04-05 §
16:07 <razzi> remove old hive logs on an-coord1001: sudo rm /var/log/hive/hive-*.log.2021-02-* [analytics]
14:54 <razzi> remove empty /var/log/sqoop on an-launcher1002 (logs go in /var/log/refinery); sudo rmdir /var/log/sqoop [analytics]
14:51 <razzi> rebalance kafka partitions for webrequest_text partitions 11, 12 [analytics]
2021-04-02 §
16:28 <razzi> rebalance kafka partitions for webrequest_text partitions 9,10 [analytics]
16:19 <elukey> all the Hadoop test cluster on Debian Buster [analytics]
07:28 <elukey> manual fix for an-worker1080's interface in netbox (xe-4/0/11), moved by mistake to public-1b [analytics]
2021-04-01 §
20:27 <razzi> restore superset_production from backup superset_production_1617306805.sql [analytics]
20:14 <razzi> manually run bash /srv/deployment/analytics/superset/deploy/create_virtualenv.sh as analytics_deploy on an-tool1010, since somehow it didn't run with scap [analytics]
20:01 <razzi> sudo chown -R analytics_deploy:analytics_deploy /srv/deployment/analytics/superset/venv since it's owned by root and needs to be removed upon deployment [analytics]
19:54 <razzi> dump superset production to an-coord1001.eqiad.wmnet:/home/razzi/superset_production_1617306805.sql just in case [analytics]
16:50 <razzi> rebalance kafka partitions for webrequest_text partitions 7 and 8 [analytics]