analytics SAL

601-650 of 5602 results (34ms)

2023-03-08 §
09:59	<nfraison>	restart namenode in an-master1001 (standby in prod) to take in account new quota init threads setting	[analytics]
09:53	<nfraison>	restart namenode in an-test-master1002 to take in account new quota init threads setting	[analytics]
09:52	<nfraison>	failover namenode in test from an-test-master1002-eqiad-wmnet to an-test-master1001-eqiad-wmnet	[analytics]
09:47	<nfraison>	restart namenode in an-test-master1001 to take in account new quota init threads setting	[analytics]
09:36	<nfraison>	restart test hiveserver2: T303168	[analytics]
09:13	<nfraison>	restart prod resourcemanager to take in account new dedicated exclude file	[analytics]
08:58	<nfraison>	restart test resourcemanager to take in account new dedicated exclude file	[analytics]
07:56	<nfraison>	restart prod jobhistory to take in account: https://gerrit.wikimedia.org/r/c/operations/puppet/+/894481	[analytics]
07:47	<nfraison>	restart test jobhistory to take in account: https://gerrit.wikimedia.org/r/c/operations/puppet/+/894481	[analytics]
2023-03-07 §
22:03	<mforns>	deployed airflow analytics again to try and fix druid_load_edit_hourly	[analytics]
16:55	<xcollazo>	deployed image-suggestions hotfix to platform_eng Airflow instance. See https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/262.	[analytics]
15:23	<btullis>	re-enabling ingestion via gobblin.	[analytics]
14:59	<nfraison>	force startup of nodemanager on analytics_cluster	[analytics]
14:58	<btullis>	pooled druid1004	[analytics]
14:57	<btullis>	pooling aqs1010 and aqs1016	[analytics]
14:56	<btullis>	pooling datahubsearch1001	[analytics]
14:53	<btullis>	leaving safe mode on hdfs	[analytics]
13:59	<btullis>	disabled puppet temporarily on an-master100[1-2] to avoid an automatic restart of yarn	[analytics]
13:57	<btullis>	stopped `hadoop-yarn-resourcemanager.service` on both an-master100[1-2]	[analytics]
13:54	<btullis>	entering safe mode with `sudo -u hdfs kerberos-run-command hdfs hadoop dfsadmin -safemode enter` on an-master1002	[analytics]
12:57	<btullis>	depooled druid1004 for T329073	[analytics]
12:56	<btullis>	depooled datahubsearch1001 for T329073	[analytics]
12:51	<btullis>	disabled gobblin timers on an-launcher1002	[analytics]
12:46	<btullis>	depooling aqs1016for T329073	[analytics]
12:45	<btullis>	depooling aqs1010 for T329073	[analytics]
08:00	<nfraison>	Reimage an-conf1003 to upgrade to bullseye T329362	[analytics]
2023-03-06 §
23:12	<mforns>	deployed airflow analytics to unbreak druid-load-edit-hourly	[analytics]
15:26	<mforns>	deployed airflow analytics to unbreak druid-load-edit-hourly	[analytics]
13:53	<btullis>	failing over the production hadoop cluster namenode service to an-master1002	[analytics]
13:17	<btullis>	failing over analytics test cluster namenode service to an-test-master1002 T329073	[analytics]
12:26	<nfraison>	Reimage an-conf1002 to upgrade to bullseye T329362	[analytics]
10:15	<ottomata>	deploy mediawiki_history_reduced_2023_02 snapshot to AQS	[analytics]
09:23	<nfraison>	Reimage an-conf1001 to upgrade to bullseye T329362	[analytics]
2023-03-03 §
16:48	<xcollazo>	Deleted snapshot=2023-02-20 for tables image_suggestions_search_index_full, image_suggestions_search_index_delta, image_suggestions_lead_image_data and image_suggestions_wikidata_data from the analytics_platform_eng schema. This data will be regenerated. See https://phabricator.wikimedia.org/T330688.	[analytics]
15:53	<mforns>	deployed airflow analytics to unbreak edit_hourly_dag	[analytics]
15:44	<xcollazo>	Deploying latest image_suggestions DAG on platform_eng Airflow instance	[analytics]
07:29	<elukey>	truncate /var/log/auth.log.1 on krb1001 to free space (root partition almost filled up)	[analytics]
2023-03-02 §
13:27	<nfraison>	airflow on an-test-client1001 is migrated to version 2.5.1	[analytics]
12:32	<joal>	Rerun mediawiki-history-denormalize-wf-2023-02	[analytics]
10:00	<btullis>	commencing second attempt to upgrade airflow on an-test-client1001 to version 2.5.1	[analytics]
2023-03-01 §
22:45	<mforns>	re-deployed airflow analytics with some forgotten changes	[analytics]
22:42	<mforns>	deployed Airflow analytics	[analytics]
22:30	<mforns>	finished refinery deployment, although didn't manage to run refinery-deploy-to-hdfs without warnings...	[analytics]
21:48	<mforns>	kill edit-hourly-coord in Hue to migrate it to Airflow	[analytics]
21:26	<mforns>	starting refinery deploy	[analytics]
19:38	<SandraEbele>	rerunning webrequest load text for 2023-03-01-08 hour.	[analytics]
18:54	<joal>	Create empty partitions in event.mediawiki_page_move table for codfw datacenter from beginning of week (2023-02-27T00 -> 2023-02-28T13)	[analytics]
10:25	<nfraison>	rebooting an-worker1132 being slower than other node (potential issue with raid card/disks)	[analytics]
07:59	<nfraison>	restarted hiveserver2 in analytics-test to take in account -XX:MaxMetaspaceSize=512m JVM parameter	[analytics]
2023-02-28 §
21:33	<xcollazo>	Deploying section_image_recommendations DAG to platform_eng Airflow instance	[analytics]