analytics SAL

151-200 of 6049 results (24ms)

2024-04-18 §
17:42	<joal>	Rerun cacnry-events on previous hour to test patch	[analytics]
17:37	<joal>	DEploy airflow for canary-event scaling	[analytics]
16:57	<btullis>	switching matmo service from matomo1002 to matomo1003	[analytics]
14:07	<btullis>	restarted the hadoop-yarn-resourcemanager.service on an-master100[3-4] to pick up new queue settings for T361499	[analytics]
11:41	<btullis>	adding new 'launchers' yarn queue and renaming 'fifo' to 'gpus' for T361499	[analytics]
09:30	<mforns>	finished refinery deployment for commons impact metrics changes (0.2.36)	[analytics]
08:10	<mforns>	starting refinery deployment for commons impact metrics changes (0.2.36)	[analytics]
2024-04-17 §
21:47	<mforns>	don't have time to deploy refinery today, will do it tomorrow first thing	[analytics]
21:40	<mforns>	Deployed refinery-source using jenkins	[analytics]
08:40	<aqu>	Deployed refinery using scap, then deployed onto hdfs	[analytics]
08:00	<stevemunene>	enable puppet on an-test-client1002 done testing new conda anaytics deb T362648	[analytics]
07:39	<aqu>	analytics/refinery deploy begin (added source jars 0.2.35)	[analytics]
07:37	<stevemunene>	disable puppet on an-test-client1002 to test new conda anaytics deb T362648	[analytics]
2024-04-16 §
20:08	<aqu>	Weekly deploy of refinery using scap, then deployed onto hdfs	[analytics]
15:00	<btullis>	kicked off a rolling restart of the hadoop worker datanode and nodemanager process for T356382	[analytics]
14:40	<btullis>	failed back HDFS namenode from an-master1004 to an-master1003.	[analytics]
11:02	<stevemunene>	upgrade datahub to v0.12.1 T361688	[analytics]
09:16	<btullis>	restarting mapreduce history service on an-master1003 for T356382	[analytics]
2024-04-15 §
11:05	<btullis>	sudo systemctl start hadoop-hdfs-namenode.service on an-master1003 after failed failback operation.	[analytics]
10:45	<btullis>	roll-restarting hadoop masters on the prod cluster for T356382	[analytics]
08:54	<btullis>	roll-restarting hadoop masters on test cluster for T356382	[analytics]
08:36	<btullis>	roll-restarting druid on test cluster for T356382	[analytics]
2024-04-11 §
15:25	<btullis>	restarting hive-server2 and hive-metastore on an-test-coord1001 for T356382	[analytics]
14:10	<elukey>	move cassandra instances on aqs1010 to PKI TLS certs	[analytics]
12:21	<btullis>	deploying editor-analytics with the new aqs-http-gateway chart	[analytics]
2024-04-09 §
13:20	<btullis>	shut down stat1010 to have the GPU power connected for T336040	[analytics]
12:56	<gmodena>	successfully deployed refinery to hadoop and hadoop-test	[analytics]
12:06	<gmodena>	starting a refinery deployment for 2024-04-09	[analytics]
2024-04-08 §
15:43	<btullis>	decommissioning dumpsdata1002 for T362065	[analytics]
15:25	<btullis>	decommissioning dumpsdata1001	[analytics]
12:00	<btullis>	rebooting stat1011 due to unresponsiveness	[analytics]
2024-04-03 §
11:46	<stevemunene>	disable puppet on `an-test-client1002` to test new conda-analytics version T356231	[analytics]
2024-03-28 §
18:04	<btullis>	deploying refinery to HDFS.	[analytics]
16:22	<btullis>	deploying refinery to test the git-lfs integration with scap for T328472	[analytics]
15:00	<elukey>	remove GPU labels in Hadoop Yarn for an-worker[1096-1099] (the hosts don't have a GPU anymore) - T361225	[analytics]
2024-03-27 §
15:14	<brouberol>	decommissioning an-tool1009 now that hue is fully offline - T341895	[analytics]
15:02	<brouberol>	dropping the hue.wikimedia.org CNAME - T341895	[analytics]
2024-03-25 §
15:02	<btullis>	updating the ssl_provider for eventstreams schema servers to cfssl for T360412	[analytics]
2024-03-22 §
13:17	<elukey>	`elukey@cumin1002:~$ sudo cumin 'stat100[4,5,8,9]*' 'kill `pgrep -u kcv-wikimf`'` to unblock puppet on various stat nodes	[analytics]
10:44	<btullis>	shut down an-worker1168 to investigate disk controller failure for T360594	[analytics]
2024-03-20 §
10:50	<brouberol>	superset.wikimedia.org is now migrated to the DSE k8s cluster, CAS errors have receeded	[analytics]
10:20	<brouberol>	migrating superset to Kubernetes. Some CAS errors are expected during ~15 minutes	[analytics]
2024-03-07 §
14:01	<btullis>	deploying updated mediwiki_history_reduced snapshots to AQS 2.0	[analytics]
2024-03-04 §
12:22	<btullis>	restarting hive-server2 and hive-metastore service on an-coord1003	[analytics]
12:00	<btullis>	migrating analytics-hive from an-coord1003 to an-coord1004 with https://gerrit.wikimedia.org/r/c/operations/dns/+/1008414	[analytics]
10:32	<btullis>	restart hive-server2 and hive-metastore service on an-coord1004	[analytics]
2024-02-29 §
14:06	<btullis>	sudo systemctl reset-failed refinery-sqoop-whole-mediawiki.service	[analytics]
09:59	<joal>	Deploying refinery with scap (fix sqoop for tomorrow)	[analytics]
09:25	<brouberol>	decommissioning an-tool1005 now that superset-next is migrated to k8s - T358706	[analytics]
2024-02-28 §
11:08	<btullis>	reimaging dbstore1007 to bookworm for T356961	[analytics]