analytics SAL

1501-1550 of 5580 results (30ms)

2021-09-07 §
10:25	<hnowlan>	truncating data tables on aqs_next cluster	[analytics]
10:12	<joal>	Kill cassandra-hourl loading job for cluster-migration first step	[analytics]
2021-09-03 §
11:43	<joal>	Deploying refinery to hotfix mediarequest cassandra3 loading jobs (second)	[analytics]
09:57	<joal>	Deploy AQS on new AQS servers	[analytics]
09:45	<joal>	Kill-restart mediarequest-top cassandra loading jobs after deploy	[analytics]
09:12	<joal>	Rerun mediawiki-history-denormalize-wf-2021-08 after failure	[analytics]
09:07	<joal>	Deploying refinery to hotfix mediarequest cassandra3 loading jobs	[analytics]
2021-09-01 §
16:44	<mforns>	finished one-off deployment of refinery to fix cassandra3 loading	[analytics]
15:57	<joal>	Kill cassandra loading jobs and restart them after deploy	[analytics]
15:55	<mforns>	starting one-off deployment of refinery to fix cassandra3 loading	[analytics]
13:15	<joal>	Restart cassandra jobs to load cassandra3 with spark	[analytics]
08:21	<joal>	Rerun webrequest-load-wf-upload-2021-9-1-0	[analytics]
2021-08-31 §
23:25	<mforns>	finished deployment of refinery (regular weekly train v0.1.17) successfully, only an-test-coord1001.eqiad.wmnet failed	[analytics]
22:41	<mforns>	starting deployment of refinery (regular weekly train v0.1.17)	[analytics]
22:27	<mforns>	Deployed refinery-source using jenkins	[analytics]
10:30	<hnowlan>	sudo cookbook sre.aqs.roll-restart aqs-next	[analytics]
2021-08-30 §
06:53	<elukey>	drop an-airflow1001's old airflow logs to fix root partition almost filled up	[analytics]
2021-08-26 §
06:22	<elukey>	root@an-launcher1002:/var/lib/puppet/clientbucket# find -type d -empty -delete	[analytics]
06:21	<elukey>	root@an-launcher1002:/var/lib/puppet/clientbucket# find -type f -delete -mtime +60	[analytics]
2021-08-25 §
13:40	<joal>	Kill restart pageview-monthly_dump job and 2 backfilling jobs	[analytics]
13:34	<joal>	Deploy refinery onto HDFS	[analytics]
13:09	<joal>	Deploying refinery using scap	[analytics]
2021-08-24 §
10:30	<btullis>	btullis@an-launcher1002:~$ sudo systemctl start hdfs-balancer.service	[analytics]
2021-08-20 §
08:46	<btullis>	btullis@druid1001:~$ sudo systemctl stop druid-broker druid-coordinator druid-historical druid-middlemanager druid-overlord	[analytics]
2021-08-19 §
19:05	<razzi>	razzi@deploy1002:/srv/deployment/analytics/aqs/deploy$ scap deploy "Deploy aqs 9c062f2"	[analytics]
19:02	<razzi>	note that the aqs-deploy repo's commit message DOES NOT include the changes of aqs in its changes list (though it has the correct SHA in the first line)	[analytics]
18:26	<razzi>	Beginning aqs deploy process	[analytics]
17:55	<razzi>	razzi@labstore1007:~$ sudo systemctl start analytics-dumps-fetch-geoeditors_dumps.service	[analytics]
17:53	<razzi>	sudo systemctl start analytics-dumps-fetch-geoeditors_dumps.service on labstore1006	[analytics]
2021-08-18 §
17:37	<btullis>	on an-coord1001: MariaDB [superset_production]> update clusters set broker_host='an-druid1001.eqiad.wmnet' where cluster_name='analytics-eqiad';	[analytics]
15:08	<joal>	Restart oozie jobs loading druid to use new druid-host	[analytics]
08:55	<joal>	Deploying refinery with scap	[analytics]
2021-08-13 §
16:46	<elukey>	cleanup /srv/discovery on stat1007 after https://gerrit.wikimedia.org/r/c/operations/puppet/+/712422	[analytics]
15:16	<milimetric>	reran the other three failed jobs successfully	[analytics]
14:52	<milimetric>	rerunning webrequest-druid-hourly-wf-2021-8-13-13 because of failure to connect to Hive metastore	[analytics]
2021-08-12 §
14:46	<btullis>	btullis@druid1002:/etc/zookeeper/conf$ sudo systemctl disable druid-broker druid-coordinator druid-historical druid-middlemanager druid-overlord	[analytics]
14:45	<btullis>	btullis@druid1002:/etc/zookeeper/conf$ sudo systemctl stop druid-broker druid-coordinator druid-historical druid-middlemanager druid-overlord	[analytics]
2021-08-11 §
19:43	<btullis>	btullis@druid1003:~$ sudo systemctl stop druid-overlord && sudo systemctl disable druid-overlord	[analytics]
19:41	<btullis>	btullis@druid1003:~$ sudo systemctl stop druid-historical && sudo systemctl disable druid-historical	[analytics]
19:40	<btullis>	btullis@druid1003:~$ sudo systemctl stop druid-coordinator && sudo systemctl disable druid-coordinator	[analytics]
19:37	<btullis>	btullis@druid1003:~$ sudo systemctl stop druid-broker && sudo systemctl disable druid-broker	[analytics]
19:30	<btullis>	btullis@druid1003:~$ curl -X POST http://druid1003.eqiad.wmnet:8091/druid/worker/v1/disable	[analytics]
12:13	<btullis>	migration of zookeeper from druid1002 to an-druid1002 complete, with quorum and two zynced followers. Re-enabling puppet on all druid nodes.	[analytics]
09:48	<btullis>	suspended the following oozie jobs in hue: webrequest-druid-hourly-coord, pageview-druid-hourly-coord, edit-hourly-druid-coord	[analytics]
09:45	<btullis>	btullis@an-launcher1002:~$ sudo systemctl disable eventlogging_to_druid_editattemptstep_hourly.timer eventlogging_to_druid_navigationtiming_hourly.timer eventlogging_to_druid_netflow_hourly.timer eventlogging_to_druid_prefupdate_hourly.timer	[analytics]
09:21	<elukey>	run "sudo find /var/log/airflow -type f -mtime +15 -delete" on an-airflow1001 to free space (root partition almost full)	[analytics]
2021-08-10 §
17:27	<razzi>	resume the following schedules in hue: edit-hourly-druid-coord, pageview-druid-hourly-coord, webrequest-druid-hourly-coord	[analytics]
17:10	<razzi>	sudo cookbook sre.druid.roll-restart-workers analytics (errored out)	[analytics]
09:04	<btullis>	btullis@an-launcher1002:~$ sudo systemctl restart eventlogging_to_druid_prefupdate_hourly.service	[analytics]
09:04	<btullis>	btullis@an-launcher1002:~$ sudo systemctl restart eventlogging_to_druid_netflow_daily.service	[analytics]