analytics SAL

2351-2400 of 6082 results (33ms)

2021-04-19 §
06:50	<elukey>	move /var/lib/hadoop/name partition under /srv/hadoop/name on an-master1001 - T265126	[analytics]
05:45	<elukey>	cleanup Lex's jupyter notebooks on stat1007 to allow puppet to clean up	[analytics]
2021-04-18 §
07:25	<elukey>	run "PURGE BINARY LOGS BEFORE '2021-04-11 00:00:00';" on an-coord1001 to free some space - T280367	[analytics]
2021-04-16 §
15:14	<elukey>	execute PURGE BINARY LOGS BEFORE '2021-04-09 00:00:00'; on an-coord1001 to free space for /var/lib/mysql - T280367	[analytics]
15:13	<elukey>	execute PURGE BINARY LOGS BEFORE '2021-04-09 00:00:00';	[analytics]
07:54	<elukey>	drop all the cloudera packages from our repositories	[analytics]
2021-04-15 §
21:13	<razzi>	rebalance kafka partitions for webrequest_text partition 23	[analytics]
14:56	<elukey>	deploy refinery via scap - weekly train	[analytics]
09:50	<elukey>	rollback hue on an-tool1009 to 4.8, it seems that 4.9 still has issues	[analytics]
06:32	<elukey>	move hue.wikimedia.org to an-tool1009 (from analytics-tool1001)	[analytics]
01:36	<razzi>	rebalance kafka partitions for webrequest_text partitions 21,22	[analytics]
2021-04-14 §
14:05	<elukey>	run build/env/bin/hue migrate on an-tool1009 after the hue upgade	[analytics]
13:10	<elukey>	rollback hue-next to 4.8 - issues not present in staging	[analytics]
13:00	<elukey>	upgrade Hue to 4.9 on an-tool1009 - hue-next.wikimedia.org	[analytics]
10:02	<elukey>	roll restart yarn nodemanagers on hadoop prod (attempt to see if they entered in a weird state, graceful restart)	[analytics]
09:54	<elukey>	kill long running mediawiki-job refine erroring out application_1615988861843_166906	[analytics]
09:46	<elukey>	kill application_1615988861843_163186 for the same reason	[analytics]
09:43	<elukey>	kill application_1615988861843_164387 to see if any improvement to socket consumption is made	[analytics]
09:14	<elukey>	run "sudo kill `pgrep -f sqoop`" on an-launcher1002 to clean up old test processes still running	[analytics]
2021-04-13 §
16:17	<razzi>	rebalance kafka partitions for webrequest_text partitions 19, 20	[analytics]
13:18	<ottomata>	Refine now uses refinery-job 0.1.4; RefineFailuresChecker has been removed and its function rolled into RefineMonitor -	[analytics]
10:23	<hnowlan>	deploying aqs with updated cassandra libraries to aqs1004 while depooled	[analytics]
06:17	<elukey>	kill application application_1615988861843_158645 to free space on analytics1070	[analytics]
06:10	<elukey>	kill application_1615988861843_158592 on analytics1061 to allow space to recover (truncate of course in D state)	[analytics]
06:05	<elukey>	truncate logs for application_1615988861843_158592 on analytics1061 - one partition full	[analytics]
2021-04-12 §
14:21	<ottomata>	stop using http proxies for produce_canary_events_job - T274951	[analytics]
2021-04-08 §
16:33	<elukey>	reboot an-worker1100 again to check if all the disks come up correctly	[analytics]
15:43	<razzi>	rebalance kafka partitions for webrequest_text partitions 17, 18	[analytics]
15:35	<elukey>	reboot an-worker1100 to see if it helps with the strange BBU behavior in T279475	[analytics]
14:07	<elukey>	drop /var/spool/rsyslog from stat1008 - corrupted files due to root partition filled up caused a SEGV for rsyslog	[analytics]
11:14	<hnowlan>	created aqs user and loaded full schemas into analytics wmcs cassandra	[analytics]
08:35	<elukey>	apt-get clean on stat1008 to free some space	[analytics]
07:44	<elukey>	restart hadoop hdfs masters on an-master100[1,2] to apply the new log4j settings fro the audit log	[analytics]
06:44	<elukey>	re-deployed refinery to hadoop-test after fixing permissions on an-test-coord1001	[analytics]
2021-04-07 §
23:03	<ottomata>	installing anaconda-wmf-2020.02~wmf5 on remaining nodes - T279480	[analytics]
22:51	<ottomata>	installing anaconda-wmf-2020.02~wmf5 on stat boxes - T279480	[analytics]
22:47	<mforns>	finished refinery deployment up to 1dbbd3dfa996d2e970eb1cbc0a63d53040d4e3a3	[analytics]
22:39	<mforns>	deployment of refinery via scap to hadoop-test failed with Permission denied: '/srv/deployment/analytics/refinery-cache/.config' (deployemt to production went fine)	[analytics]
21:44	<mforns>	starting refinery deploy up to 1dbbd3dfa996d2e970eb1cbc0a63d53040d4e3a3	[analytics]
21:26	<mforns>	deployed refinery-source v0.1.4	[analytics]
21:25	<razzi>	sudo apt-get install --reinstall sudo apt-get install --reinstall anaconda-wmf on stat1008	[analytics]
20:15	<razzi>	rebalance kafka partitions for webrequest_text partitions 15, 16	[analytics]
19:53	<ottomata>	upgrade anaconda-wmf everywhere to 2020.02~wmf4 with fixes for T279480	[analytics]
14:03	<hnowlan>	setting profile::aqs::git_deploy: true in aqs-test1001 hiera config	[analytics]
2021-04-06 §
22:34	<razzi>	rebalance kafka partitions for webrequest_text_13,14	[analytics]
09:37	<elukey>	reimage an-coord1002 to Debian Buster	[analytics]
2021-04-05 §
16:07	<razzi>	remove old hive logs on an-coord1001: sudo rm /var/log/hive/hive-.log.2021-02-	[analytics]
14:54	<razzi>	remove empty /var/log/sqoop on an-launcher1002 (logs go in /var/log/refinery); sudo rmdir /var/log/sqoop	[analytics]
14:51	<razzi>	rebalance kafka partitions for webrequest_text partitions 11, 12	[analytics]
2021-04-02 §
16:28	<razzi>	rebalance kafka partitions for webrequest_text partitions 9,10	[analytics]