1551-1600 of 6007 results (27ms)
2022-04-25 §
09:39 <btullis> failover to an-master1002 successful at 3rd attempt [analytics]
09:30 <btullis> 2nd attempt to switch HDFS services to an-master1002 [analytics]
09:13 <btullis> switching HDFS services to an-master1002 [analytics]
08:53 <btullis> rebooting an-master1002 - T304938 [analytics]
2022-04-23 §
09:38 <elukey> `apt-get clean` on an-airflow1001 to free some space [analytics]
2022-04-21 §
22:26 <mforns> killed browser_general oozie job and started corresponding airflow job [analytics]
2022-04-13 §
16:40 <razzi> reboot an-launcher1002 for security updates [analytics]
2022-04-12 §
22:12 <milimetric> deployed and synced refinery-source 0.1.26 to hdfs [analytics]
2022-04-11 §
12:35 <aqu> About to deploy analytics/refinery "Migrate mediarequest hourly from Oozie to Airflow" (replace previous msg) [analytics]
12:35 <aqu> About to deploy refinery/source "Migrate mediarequest hourly from Oozie to Airflow" [analytics]
2022-04-06 §
20:53 <razzi> roll restart aqs to deploy new mediawiki history snapshot [analytics]
15:51 <mforns> deployed airflow to analytics (big refactor) [analytics]
15:23 <mforns> deployed Airflow to analytics_test (big refactor) [analytics]
09:18 <btullis> restarted eventlogging_to_druid_netflow_hourly on an-launcher1002 [analytics]
2022-04-05 §
20:41 <razzi> deploying refinery for https://gerrit.wikimedia.org/r/c/analytics/refinery/+/776269/ [analytics]
15:54 <razzi> razzi@cumin1001:~$ sudo cookbook sre.hosts.reimage --os bullseye -t T299481 dbstore1005 [analytics]
15:10 <razzi> razzi@cumin1001:~$ sudo cookbook sre.hosts.reimage --os bullseye -t T299481 dbstore1003 [analytics]
15:02 <razzi> set dbstore1003.eqiad.wmnet to downtime for upgrade T299481 [analytics]
15:01 <razzi> set dbstore1003.eqiad.wmnet to downtime for upgrade [analytics]
2022-04-01 §
09:05 <btullis> restarted varnishkafka-eventlogging.service on cp3050 T300246 [analytics]
2022-03-29 §
20:08 <joal> rerun cassandra editors_bycountry_monthly for month 2022-02 [analytics]
20:08 <mforns> restarted webrequest bundle [analytics]
19:57 <mforns> restarted mediawiki-geoeditors-public_monthly-coord [analytics]
19:56 <mforns> finished refinery deployment (regular weekly train) scap and hdfs [analytics]
19:53 <joal> Add new columns to wmf.webrequest (high entropy CH-UA) [analytics]
19:16 <joal> Drop/recreate wmf_raw.webrequest for schema change (high-entropy CH-UA) [analytics]
19:13 <mforns> starting refinery deployment (regular weekly train) [analytics]
19:11 <joal> kill webrequest-load oozie bundle for webrequest schema change [analytics]
17:13 <razzi> razzi@cumin1001:~$ sudo cookbook sre.hosts.downtime an-tool1005.eqiad.wmnet -D 1 -r 'Testing deploy of superset 1.4.2 to staging' [analytics]
15:38 <ntsako> Stopped geoeditor Airflow DAGs to check on data quality [analytics]
14:13 <btullis> correction: restarted hadoop-yarn-nodemanager.service on an-worker1128 [analytics]
14:13 <btullis> restarted hadoop-yarn-nodemanager.service on an-worker1238 [analytics]
2022-03-24 §
11:15 <btullis> roll-restarting kafka-jumbo brokers T300626 [analytics]
2022-03-21 §
18:10 <razzi> sudo systemctl restart jupyter-bearloga-singleuser on stat1008 [analytics]
2022-03-17 §
17:10 <ottomata> restart webrequest and pageview_actor data purge - https://gerrit.wikimedia.org/r/c/operations/puppet/+/771389 [analytics]
14:07 <btullis> shutdown analytics1063 and analytics1067 with 120 minutes of downtime T303151 [analytics]
06:46 <elukey> kill remaining hanging processes for ppche*lko and accra*ze on an-test-client1001 to allow users offboard (puppet broken) [analytics]
2022-03-16 §
19:14 <ottomata> deploying refinery to hadoop-test cluster with new gobblin-wmf-core jar [analytics]
18:00 <razzi> sudo cookbook sre.hosts.downtime -D 3 -r 'Setting up karapace for the first time' karapace1001.eqiad.wmnet [analytics]
17:57 <btullis> restarted mediawiki-history-drop-snapshot service on an-launcher1002 [analytics]
16:03 <aqu> analytics/refinery - scap deply "Migrate session_length/daily from Oozie to Airflow" [analytics]
10:26 <btullis> rerunning failed mediawiki_structured_task_article_link_suggestion_interaction refnie job [analytics]
2022-03-15 §
22:16 <razzi> upload karapace_2.1.3-py3.7-1_amd64.deb to apt.wikimedia.org [analytics]
19:58 <razzi> upload karapace_2.1.3-py3.7-0_amd64.deb to apt.wikimedia.org [analytics]
17:24 <ottomata> also change stats uid and gid to 918 on an-web1001 - T291384 [analytics]
14:35 <ottomata> change stats uid and gid on all stat boxes to 918 - T291384 [analytics]
13:59 <ottomata> roll restarting kafka jumbo brokers to set max.incremental.fetch.session.cache.slots=2000 - T303324 [analytics]
2022-03-14 §
21:05 <razzi> `sudo kill -9 15674` to stop unresponsive hive query [analytics]
2022-03-09 §
21:05 <ottomata> fix group ownership of cchen.db/new_editors/cohort=2021-12 after reverting T291664 - sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chgrp -R analytics-privatedata-users /user/hive/warehouse/cchen.db/new_editors/cohort=2021-12 [analytics]
18:33 <ottomata> fix group ownership of wmf_product.db//new_editors/cohort=2021-12 after reverting T291664 - sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chgrp -R analytics-privatedata-users /user/hive/warehouse/wmf_product.db/new_editors/cohort=2021-12 [analytics]