2023-06-15
§
|
19:27 |
<btullis> |
restarting aqs service on A:aqs in batches of 2, 10 seconds apart |
[analytics] |
17:02 |
<joal> |
Deploying airflow (again) to fix memory issues |
[analytics] |
15:58 |
<joal> |
Rerun druid indexation for mediawiki_history_reduced |
[analytics] |
15:56 |
<joal> |
Deploy airflow to fix druid loading jobs using snapshot |
[analytics] |
15:53 |
<milimetric> |
refinery-source 0.2.17 deployed, refinery updated and synced to hdfs |
[analytics] |
12:47 |
<stevemunene> |
roll running sre.hadoop.roll-restart-masters to completely remove any reference of analytics1058-1060 for T317861 |
[analytics] |
12:34 |
<joal> |
Deploy analytics-airlfow to patch mediawiki_history_reduced druid loading |
[analytics] |
09:05 |
<elukey> |
move varnishkafka instances in ulsfo to PKI |
[analytics] |
2023-06-13
§
|
19:27 |
<btullis> |
restarting the hive-server2 and hive-metastore services on an-coord1001 |
[analytics] |
19:03 |
<btullis> |
freeing up space in /srv on an-launcher1002 with `btullis@an-launcher1002:/srv/airflow-analytics/logs/scheduler$ find -maxdepth 1 -type d -mtime +15 -print0 | xargs -0 sudo rm -rf` for T339002 |
[analytics] |
16:41 |
<ottomata> |
deploying refinery for weekly train |
[analytics] |
15:45 |
<SandraEbele> |
Deployed refinery-source using jenkins |
[analytics] |
15:19 |
<ottomata> |
drop event.mediawiki_page_outlink_topic_prediction_change table and data - T337395 |
[analytics] |
15:13 |
<SandraEbele> |
deploying refinery source |
[analytics] |
15:05 |
<ottomata> |
dropping hive table event.mediawiki_page_change_v1 to pick up backwards incompatible schema change - T337395 |
[analytics] |
15:03 |
<btullis> |
failing over the analytics-hive cname to an-coord1002 |
[analytics] |
13:45 |
<elukey> |
fixed broken graphs in the varnishkafka's dashboard |
[analytics] |
13:37 |
<btullis> |
restarting hive-server2 and hive-metastore on an-coord1002 prior to failover. |
[analytics] |
13:00 |
<btullis> |
rolled out conda-analytics 0.0.18 to analytics-airflow and hadoop-coordinator |
[analytics] |
12:25 |
<btullis> |
beginning rollout of conda-analytics 0.0.18 to hadoop-workers |
[analytics] |
07:10 |
<elukey> |
move varnishkafka instances on cp4037 to PKI TLS certs |
[analytics] |
2023-06-06
§
|
15:52 |
<elukey> |
restart yarn resourcemanager on an-master1002 to restore the Yarn UI (that works only when the active yarn RM is on 1001) |
[analytics] |
15:07 |
<mforns> |
deployed airflow analytics to try and fix the edit_hourly DAG again |
[analytics] |
13:09 |
<ottomata> |
EventStreamConfig - temporarily Disable canary events and hadoop ingestion for development.network.probe stream - T332024 |
[analytics] |
11:29 |
<stevemunene> |
service hadoop-yarn-resourcemanager restart for T317861 |
[analytics] |
11:13 |
<btullis> |
restart airflow-scheduler service on an-test-client1001 for analytics_test instance |
[analytics] |
11:12 |
<btullis> |
restart airflow-scheduler service on an-airflow1006 for product_analytics instance |
[analytics] |
11:12 |
<btullis> |
restart airflow-scheduler service on an-airflow1005 for search instance |
[analytics] |
11:08 |
<btullis> |
restart airflow-scheduler service on an-airflow1002 for research instance |
[analytics] |