analytics SAL

1-50 of 4582 results (14ms)

2022-06-23 §
13:48	<btullis>	started the namenode service on an-master1001 after failback failure	[analytics]
13:41	<btullis>	The failback didn't work again.	[analytics]
13:39	<btullis>	attempting failback of namenode service from an-master1002 to an-master1001	[analytics]
13:07	<btullis>	restarted hadoop-hdfs-namenode service on an-master1001	[analytics]
11:25	<joal>	kill oozie mediawiki-geoeditors-monthly-coord in favor of airflow job	[analytics]
08:52	<joal>	Deploy airflow	[analytics]
2022-06-22 §
20:55	<aqu>	`scap deploy -f analytics/refinery` because of a crash during `git-fat pull`	[analytics]
19:30	<aqu>	Deploying analytics/refinery	[analytics]
2022-06-21 §
14:56	<aqu>	RefineSanitize from an-launcher1002: sudo -u analytics kerberos-run-command analytics spark2-submit --class org.wikimedia.analytics.refinery.job.refine.RefineSanitize --master yarn --deploy-mode client /srv/deployment/analytics/refinery/artifacts/org/wikimedia/analytics/refinery/refinery-job-0.1.15.jar --config_file /home/aqu/refine.properties --since "2022-06-19T09:52:00+0000" --until	[analytics]
13:33	<aqu>	sudo systemctl start monitor_refine_event_sanitized_main_immediate.service on an-launcher1002	[analytics]
10:47	<btullis>	proceeding with the hadoop.roll-restart-masters cookbook	[analytics]
2022-06-20 §
07:14	<SandraEbele>	Started Airflow 3 Wikidata metrics jobs (Articleplaceholder, Reliability and SpecialEntityData metrics).	[analytics]
07:12	<SandraEbele>	Started Airflow3 Wikidata metrics jobs (Articleplaceholder, Relia)	[analytics]
07:11	<SandraEbele>	killed Oozie wikidata-articleplaceholder_metrics-coord, wikidata-reliability_metrics-coord, and wikidata-specialentitydata_metrics-coord jobs.	[analytics]
2022-06-17 §
12:35	<SandraEbele>	deployed daily airflow dag for 3 Wikidata metrics.	[analytics]
08:36	<btullis>	power cycled an-worker1109 as it was stuck with CPU soft lockups	[analytics]
2022-06-16 §
06:49	<joal>	Rerun webrequest-load-wf-upload-2022-6-15-22 after weird oozie failure	[analytics]
2022-06-15 §
14:48	<btullis>	deploying datahub 0.8.38	[analytics]
2022-06-14 §
10:48	<joal>	unpause renamed dags	[analytics]
10:44	<joal>	Deploy Airflow	[analytics]
10:12	<btullis>	manually failing back hdfs-namenode to an-master1001 after fixing typo	[analytics]
09:36	<joal>	deploy refinery onto HDFS	[analytics]
08:48	<btullis>	roll-restarting hadoop masters T310293	[analytics]
08:40	<joal>	Deploying using scap again after failure cleanup on an-launcher1002	[analytics]
07:45	<joal>	deploy refinery using scap	[analytics]
2022-06-13 §
14:00	<btullis>	restarting presto service on an-coord1001	[analytics]
13:20	<btullis>	btullis@datahubsearch1001:~$ sudo systemctl reset-failed ifup@ens13.service T273026	[analytics]
13:09	<btullis>	restarting oozie service on an-coord1001	[analytics]
12:59	<btullis>	havaing failed over hive to an-coord1002 10 minutes ago, I'm restarting hive services on an-coord1001	[analytics]
12:26	<btullis>	restarting hive-server2 and hive-metastore on an-coord1002	[analytics]
09:54	<joal>	rerun failed refine for network_flows_internal	[analytics]
09:54	<joal>	Rerun failed refine for mediawiki_talk_page_edit events	[analytics]
09:51	<joal>	Manually rerun webrequest_text laod for hour 2022-06-13T03:00	[analytics]
07:18	<joal>	Manually rerun webrequest_text laod for hour 2022-06-12T08:00	[analytics]
2022-06-10 §
17:00	<ottomata>	applied change to airflow instances to bump scheduler parsing_processes = # of cpu processors	[analytics]
08:58	<btullis>	cookbook sre.hadoop.roll-restart-workers analytics	[analytics]
2022-06-09 §
17:17	<joal>	Rerun refine for failed datasets	[analytics]
14:15	<btullis>	manually failing back HDFS namenode from an-master1002 to an-master1001	[analytics]
13:15	<btullis>	roll-restarting the hadoop masters to pick up new JRE	[analytics]
2022-06-08 §
18:06	<joal>	Restart airflow after deploy for dag reprocessing	[analytics]
18:02	<joal>	deploying Airflow dags	[analytics]
13:45	<btullis>	deploying refinery	[analytics]
2022-06-07 §
13:45	<btullis>	deploying updated eventgate images to all remaining deployments.	[analytics]
11:33	<btullis>	deployed an updated version of eventgate to eventgate-analytics-external to address the timing mis-calculation.	[analytics]
10:51	<btullis>	restart the eventlogging_to_druid_netflow-sanitization_daily service on an-launcher1002	[analytics]
2022-06-06 §
13:45	<btullis>	restarting archiva service for new JRE	[analytics]
06:31	<elukey>	restart memcached on an-tool1005 to pick up puppet settings and clear an alert in icinga	[analytics]
2022-06-05 §
03:14	<milimetric>	rerunning mw history since the last failure just looked like a fluke	[analytics]
2022-06-04 §
11:41	<joal>	Maunally launch refinery-sqoop-mediawiki-production after manual fix of refinery-sqoop-mediawiki	[analytics]
11:39	<joal>	Manually sqoop enwiki:user and commonswiki:user and add _SUCCESS flag for following job to kick off	[analytics]