201-250 of 5686 results (28ms)
2023-09-21 §
15:02 <milimetric> deployed aqs 1.0 to enable etags on all endpoints - so far everything looks ok [analytics]
08:56 <joal> Rerun edit-hourly druid indexation to fix corrupted data file [analytics]
08:10 <brouberol> redeploying eventgate-analytics in staging T336041 [analytics]
2023-09-19 §
14:19 <jennifer_ebe> airflow analytics deployment with scap successful [analytics]
13:57 <btullis> pushing out https://gerrit.wikimedia.org/r/c/operations/puppet/+/955893 for new refinery job jar files [analytics]
13:43 <jennifer_ebe> deploying airflow analytics dag [analytics]
13:32 <jennifer_ebe> deployment successful [analytics]
13:07 <jennifer_ebe> redeploying refinery from deployment.eqiad.wmnet using scap [analytics]
12:02 <jennifer_ebe> deploying refinery from deployment.eqiad.wmnet [analytics]
09:40 <btullis> commencing rolling restart of all brokers in kafka-jumbo [analytics]
09:27 <btullis> deploying change to kafka-jumbo settings for T344688 [analytics]
08:17 <brouberol> redeploying eventstream-analytics in eqiad T336041 [analytics]
08:05 <brouberol> redeploying eventstream-internal in staging T336041 [analytics]
08:02 <brouberol> redeploying eventgate-analytics-external in staging T336041 [analytics]
07:59 <brouberol> redeploying eventgate-analytics in staging T336041 [analytics]
2023-09-18 §
15:38 <btullis> deploying Superset 2.1.1 to an-tool1005 for superset-next.wikimedia.org [analytics]
13:14 <brouberol> Puppet run successfully on kafka-jumbo1010.eqiad.wmnet. The kafka service is running. T336041 [analytics]
10:45 <stevemunene> deploy datahub in eqiad to pick up new changes T305874 [analytics]
10:42 <stevemunene> deploy datahub in codfw to pick up new changes T305874 [analytics]
09:51 <stevemunene> disable auth_jaas and native login to datahub then enable oidc authentication to production in eqiad T305874 [analytics]
09:43 <stevemunene> disable auth_jaas and native login to datahub then enable oidc authentication to production in codfw T305874 [analytics]
2023-09-14 §
21:40 <btullis> executed apt-get clean on hadoop-test [analytics]
21:31 <btullis> deploying conda-analytics version 0.0.21 to hadoop-test for T337258 [analytics]
18:28 <xcollazo> Deployed latest DAGs to analytics Airflow instance T340861 [analytics]
14:13 <stevemunene> powercycle an-worker1138, investigating failures related to reimage T332570 [analytics]
11:42 <btullis> deploying conda-analytics version 0.0.20 to the test cluster for T337258 [analytics]
2023-09-12 §
14:59 <btullis> successfully failed back the HDFS namenode services to an-master1001 [analytics]
11:21 <btullis> demonstrated the use of SAL for T343762 [analytics]
09:54 <btullis> btullis@an-master1001:~$ sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet [analytics]
2023-09-07 §
16:55 <btullis> restarting the aqs service on all aqs* servers in batches to pick up new MW_history snapshot. [analytics]
13:43 <mforns> (actual timestamp: 2023-09-06, 19:10:29 UTC) cleared airflow task mediawiki_history_reduced.check_mediawiki_history_reduced_error_folder (and subsequent tasks) for snapshot=2023-08. This was due to false positive errors having been generated by the checker. [analytics]
2023-09-05 §
14:26 <btullis> completed eventstreams and eventstreams-internal deployments. [analytics]
14:23 <btullis> deploying eventstreams for T344688 [analytics]
14:15 <btullis> deploying eventstreams-internal for T344688 [analytics]
12:35 <stevemunene> power cycle an-worker1132. Host is stuck on debian install after a failed reimage. [analytics]
10:35 <joal> Rerun cassandra_load_pageview_top_articles_monthly [analytics]
10:35 <joal> Clear airflow false-failed tasks for pageview_hourly (log-aggregation issue) [analytics]
2023-09-01 §
07:43 <stevemunene> powercycle an-worker1145.eqiad.wmnet host cpus soft lockup T345413 [analytics]
2023-08-31 §
13:02 <aqu> Deployed refinery using scap, then deployed onto hdfs [analytics]
12:01 <aqu> About to deploy analytics refinery (weekly train) [analytics]
2023-08-30 §
15:43 <stevemunene> restart hadoop-yarn-nodemanager.service on an-worker11[29-48].eqiad.wmnet in batches of 2 with 3 minutes in between [analytics]
14:46 <stevemunene> restart hadoop-yarn-nodemanager.service on an-worker11[00-28].eqiad.wmnet in batches of 2 with 3 minutes in between [analytics]
14:08 <stevemunene> restart hadoop-yarn-nodemanager.service on an-worker10[78-99].eqiad.wmnet in batches of 2 with 3 minutes in between [analytics]
12:41 <stevemunene> disable puppet on an-worket1147 test hadoop-yarn log aggregation compression algorithm The compression was set to gzip but should have been set to gz [analytics]
12:26 <stevemunene> restart hadoop-yarn-nodemanager.service on an-worker1147 [analytics]
2023-08-29 §
11:01 <joal> Update mediawiki_history_check_denormalize airflow job variables to send job-reports to both data-engineering-alerts and product-analytics [analytics]
10:52 <joal> Deploy airflow-dags/analytics [analytics]
2023-08-24 §
18:20 <btullis> attempting another failback of the hadoop namenode services [analytics]
16:47 <btullis> start hadoop namenode on an-master1001 after crash. [analytics]
16:46 <btullis> failback unsuccessful. namenode services still running on an-master1002. [analytics]