analytics SAL

1-50 of 5449 results (27ms)

2023-09-01 §
07:43	<stevemunene>	powercycle an-worker1145.eqiad.wmnet host cpus soft lockup T345413	[analytics]
2023-08-31 §
13:02	<aqu>	Deployed refinery using scap, then deployed onto hdfs	[analytics]
12:01	<aqu>	About to deploy analytics refinery (weekly train)	[analytics]
2023-08-30 §
15:43	<stevemunene>	restart hadoop-yarn-nodemanager.service on an-worker11[29-48].eqiad.wmnet in batches of 2 with 3 minutes in between	[analytics]
14:46	<stevemunene>	restart hadoop-yarn-nodemanager.service on an-worker11[00-28].eqiad.wmnet in batches of 2 with 3 minutes in between	[analytics]
14:08	<stevemunene>	restart hadoop-yarn-nodemanager.service on an-worker10[78-99].eqiad.wmnet in batches of 2 with 3 minutes in between	[analytics]
12:41	<stevemunene>	disable puppet on an-worket1147 test hadoop-yarn log aggregation compression algorithm The compression was set to gzip but should have been set to gz	[analytics]
12:26	<stevemunene>	restart hadoop-yarn-nodemanager.service on an-worker1147	[analytics]
2023-08-29 §
11:01	<joal>	Update mediawiki_history_check_denormalize airflow job variables to send job-reports to both data-engineering-alerts and product-analytics	[analytics]
10:52	<joal>	Deploy airflow-dags/analytics	[analytics]
2023-08-24 §
18:20	<btullis>	attempting another failback of the hadoop namenode services	[analytics]
16:47	<btullis>	start hadoop namenode on an-master1001 after crash.	[analytics]
16:46	<btullis>	failback unsuccessful. namenode services still running on an-master1002.	[analytics]
16:43	<btullis>	going for failback of HDFS namenode service from an-master1002 to an-master1001	[analytics]
16:10	<btullis>	about to reboot an-master1001	[analytics]
16:09	<btullis>	failing over yarn resourcemanager to an-master1002	[analytics]
16:07	<btullis>	failing over hdfs namenode from an-master1001 to an-master1002	[analytics]
12:40	<btullis>	rebooting an-coord1001	[analytics]
12:08	<btullis>	failing over hive to an-coord1002 in advance of reboot of an-coord1001	[analytics]
11:24	<btullis>	btullis@cp3074:~$ sudo systemctl start varnishkafka-webrequest.service	[analytics]
2023-08-23 §
14:50	<btullis>	rebooting an-launcher1002	[analytics]
08:22	<btullis>	beginning a rolling reboot of kafka-jumbo	[analytics]
2023-08-22 §
17:24	<joal>	Redeploying refinery onto Hadoop-test to try to fix jar issue	[analytics]
14:29	<gmodena>	deploying refinery with hdfs	[analytics]
14:08	<gmodena>	deploying refinery using scap	[analytics]
13:03	<btullis>	deploying the change to the yarn log retention and compression for T342923	[analytics]
2023-08-17 §
15:12	<btullis>	failing hive back to an-coord1001 following maintenance	[analytics]
14:59	<btullis>	restarting hive-server2 and hive-metastore services on an-coord1001 after failover.	[analytics]
14:49	<btullis>	failing over hive to an-coord1002 to permit restart of hive on an-coord1001	[analytics]
09:29	<btullis>	deploying airflow-analytics	[analytics]
2023-08-16 §
17:06	<btullis>	aqs deploy completed successfully.	[analytics]
17:05	<btullis>	re-ran efine_eventlogging_analytics failed job and sent follow-up email.	[analytics]
16:52	<btullis>	deploying aqs again	[analytics]
16:43	<btullis>	deploying aqs	[analytics]
2023-08-14 §
09:27	<btullis>	rebooted an-worker1124 due to CPU lockups	[analytics]
2023-08-12 §
14:16	<btullis>	re-ran refine_event job for 'mediawiki_revision_create\|mediawiki_page_create'	[analytics]
2023-08-10 §
16:59	<btullis>	re-enabled airflow jobs on analytics_test instance	[analytics]
08:58	<btullis>	rebooting an-db1001	[analytics]
08:57	<btullis>	stopped all airflow-scheduler services	[analytics]
08:57	<btullis>	paused all dags on all airflow instances	[analytics]
2023-08-09 §
14:22	<btullis>	failing over namenode on test cluster from an-test-master1001 to an-test-master1002 after upgrade of an-test-master1002 to bullseye	[analytics]
11:31	<btullis>	I did systemctl reset-failed logrotate.service on datahubsearch1002	[analytics]
11:08	<btullis>	starting hadoop-hdfs-namenode.service on an-master1002	[analytics]
11:02	<btullis>	failing over namenode services to an-master1002 so that I can reboot an-master1001	[analytics]
09:49	<btullis>	restarted systemd-timedate service on an-worker1086	[analytics]
2023-08-07 §
17:09	<btullis>	deploying new mediawiki_history snapshot to AQS	[analytics]
2023-08-02 §
20:42	<xcollazo>	deployed latest for Airflow analytics instance.	[analytics]
19:30	<xcollazo>	deploying refinery to try and fix https://lists.wikimedia.org/hyperkitty/list/data-engineering-alerts@lists.wikimedia.org/thread/QKXYMYKMWXGRNYZ77CENA5F2EGA66QQ2/	[analytics]
12:42	<xcollazo>	Redeploy of analytics_product Airflow instance to see it it clears a Spark issue	[analytics]
2023-08-01 §
11:37	<btullis>	ran apt clean on an-tool1009 to free up disk space	[analytics]