401-450 of 5940 results (30ms)
2023-10-17 §
09:07 <btullis> pausing all 3 active dags on airflow-research instance [analytics]
09:07 <btullis> pausing all 28 active airflow dags on airflow-search instance [analytics]
09:03 <btullis> pausing all airflow dags on analytics instance [analytics]
2023-10-16 §
13:05 <brouberol> deploying mw-page-content-change-enrich with the new kafka broker list T336044 [analytics]
10:06 <btullis> deploying presto version 0.283 to production for T342343 with `sudo debdeploy deploy -u 2023-10-12-presto.yaml -Q 'P{O:analytics_cluster::presto::server} or P{O:analytics_cluster::coordinator} or A:stat'` [analytics]
08:49 <brouberol> redeploying datahub with the new kafka broker list T336044 [analytics]
08:42 <brouberol> redeploying eventgate-analytics-external with the new kafka broker list T336044 [analytics]
08:38 <brouberol> redeploying eventgate-analytics with the new kafka broker list T336044 [analytics]
08:34 <brouberol> redeploying eventstreams-internal with the new kafka broker list T336044 [analytics]
2023-10-12 §
13:22 <btullis> rebooting archiva1002.wikimedia.org for T344671 [analytics]
12:00 <btullis> pushing out presto version 0.283 to the test cluster. [analytics]
09:31 <btullis> rebooting an-coord1002 for T344671 [analytics]
09:18 <btullis> power cycling an-master1002 to address unresponsiveness [analytics]
2023-10-11 §
09:27 <btullis> trigger rolling-restart of aqs services with `sudo cumin -b 2 -s 20 A:aqs 'systemctl restart aqs'` [analytics]
2023-10-09 §
18:35 <mforns> deployed airflow analytics [analytics]
10:46 <btullis> started rolling restart of an-worker1[078-156] for T344587 [analytics]
08:55 <btullis> started rolling restart of analytics10[70-77] for T344587 [analytics]
2023-10-05 §
15:30 <btullis> failed over test cluster hadoop namenode services to an-test-master1002 [analytics]
2023-10-04 §
06:19 <Surbhi_> Deployed refinery using scap, then deployed onto hdfs [analytics]
2023-10-02 §
16:45 <joal> Silent the "High Kafka consumer lag for mw_page_content_change_enrich in codfw" alert for 3 days [analytics]
13:40 <stevemunene> roll-restart druid public workers to pick up a new worker node. T336042 [analytics]
13:28 <joal> Manually mark wikidata_item_page_link_weekly.wait_for_mediawiki_page_move task successfull (with note) to overcome datacenter switchover sensor issue [analytics]
13:27 <joal> Manually mark wikidata_item_page_link_weeklywait_for_mediawiki_page_move [analytics]
07:36 <joal> deploying mw-page-content-change-enrich on codfw after kafka has finished synchronizing its replicas [analytics]
2023-09-29 §
13:10 <btullis> systemctl reset-failed on kafka-mirror-main-eqiad_to_jumbo-eqiad@0.service on kafka-jumbo1001 [analytics]
12:07 <joal> mw_page_content_change_enrich alert silenced for the weekend, the app is down, more investigation next week [analytics]
12:06 <joal> Various restarts of mw_page_content_change_enrich k8s app since yesterday - the app is failing to send data to kafka [analytics]
2023-09-28 §
16:38 <btullis> rebooting eventlog1003 for T344671 [analytics]
15:50 <btullis> failed back namenode services from an-master1002 to an-master1001 [analytics]
13:57 <brouberol> started the evacuation of a subset of topics away from kafka-10[01-06].eqiad.wmnet T336044 [analytics]
10:56 <btullis> sudo systemctl start hadoop-hdfs-namenode.service on an-master1001 after cookbook failback failure [analytics]
10:27 <btullis> roll-restarting hadoop namenodes to pick up new heap settings. [analytics]
2023-09-27 §
14:56 <xcollazo> Deploy latest Airflow DAGs to analytics instance [analytics]
14:14 <btullis> removing downtime for kafka-jumbo [analytics]
14:12 <btullis> re-enabled and run puppet on the rest of kafka-jumbo to bring the mirror-makers back to where they should be. [analytics]
14:07 <btullis> deploying kafka-mirror-maker exclusion patch to kafka-jumbo100[1-6] [analytics]
13:44 <aqu> Deployed refinery using scap, then deployed onto hdfs [analytics]
13:12 <aqu> Deployment weekly train of analytics-refinery (included new refinery-source version) [analytics]
12:18 <btullis> added 3 more hours downtime to kafka-jumbo101[0-5].eqiad.wmnet [analytics]
08:29 <elukey> `elukey@cumin1001:~$ sudo cumin 'kafka-jumbo10[01-05]*' 'systemctl start kafka-mirror' -b 1 -s 30` [analytics]
08:28 <elukey> `elukey@cumin1001:~$ sudo cumin 'kafka-jumbo10[06-15]*' 'systemctl stop kafka-mirror'` [analytics]
08:13 <elukey> slowly start mirror maker on one instance at the time on all jumbo nodes [analytics]
08:11 <elukey> start kafka mirror on jumbo1002 [analytics]
08:08 <elukey> stop all mirror maker on jumbo, start only one on jumbo1001 [analytics]
07:47 <elukey> roll restart mirror maker instances on kafka jumbo [analytics]
2023-09-26 §
10:43 <btullis> deploying conda-analytics v0.0.23 to stats servers for T337258 [analytics]
10:36 <btullis> deploying conda-analytics v0.0.23 to analytics-airflow for T337258 [analytics]
10:34 <btullis> deploying conda-analytics v0.0.23 to hadoop-all for T337258 [analytics]
10:28 <btullis> upgrading outdated bigtop packages on stat1009 with `dpkg -l |egrep "\-deb11"|awk '{print $2}'|xargs sudo apt install` for T337465 [analytics]
10:11 <btullis> running 'dpkg -l |egrep "\-deb11"|awk '{print $2}'|xargs sudo apt install` on an-test-client1002 for T337465 [analytics]