401-450 of 5441 results (30ms)
2023-03-21 §
17:48 <joal> rerun failed airflow tasks [analytics]
17:39 <joal> Deploy airflow, hopefully fixing HDFSArchiver jobs [analytics]
13:21 <nfraison_> deploy last changes on k8s dse cluster (dse-k8s-eqiad: flink-operator should watch rdf-streaming-updater, enable spark operator mutation webhook, Allow communication from spark pods to HDFS/Hive) [analytics]
11:01 <joal> Deploy analytics airflow code [analytics]
10:49 <nfraison_> deployment last changes on k8s dse cluster failed due to certificate secret creation failure due to timeout contacting pki.discovery.wmnet [analytics]
10:41 <joal> Unpause pageview_actor airflow dag [analytics]
10:41 <joal> Alter wmf.pageview_actor table adding referer_data field [analytics]
10:31 <nfraison_> deploy last changes on k8s dse cluster (dse-k8s-eqiad: flink-operator should watch rdf-streaming-updater, enable spark operator mutation webhook, Allow communication from spark pods to HDFS/Hive) [analytics]
10:26 <joal> Deploy refinery onto HDFS [analytics]
10:25 <joal> Pause pageview_actor airflow job during HDFS refinery deploy and alter table update [analytics]
10:13 <joal> Deploy refinery with scap sorry [analytics]
10:13 <joal> Deploy refinery with sqoop [analytics]
2023-03-17 §
07:45 <nfraison_> reset failed session-c624.scope as last issue was on March 14 on an-worker1132 [analytics]
07:42 <joal> Rerun failed refine_event job [analytics]
2023-03-16 §
17:00 <btullis> enabling puppet on an-airflow1004 to restart airflow services. [analytics]
16:51 <btullis> upgrading airflow package on an-airflow1004 [analytics]
16:29 <btullis> stopping puppet and airflow services on an-airflow1004 for the upgrade. [analytics]
2023-03-15 §
18:37 <joal> Manually creating partitions for event.mediawiki_client_session_tick (datacenter=eqiad/year=2023/month=3/day=7/hour=[10,11,12,13,14]) [analytics]
13:10 <btullis> rerunning eventlogging_legacy failed job [analytics]
11:18 <btullis> stopping the matomo database replica on db1108 [analytics]
2023-03-14 §
14:57 <btullis> deploying ceph mon and mgr daemons to cephosd100[1-5] T328123 [analytics]
11:48 <btullis> reran refine_event_sanitized_analytics_immediate for netflow year=2023/month=3/day=8/hour=6 [analytics]
10:23 <btullis> deploying airflow package version 2.5.1-py3.10-20230228 to stats hosts [analytics]
2023-03-13 §
17:14 <nfraison_> restart jobhistory in prod cluster to take in account https://gerrit.wikimedia.org/r/c/operations/puppet/+/896305 [analytics]
17:08 <nfraison_> restart jobhistory in test cluster to take in account https://gerrit.wikimedia.org/r/c/operations/puppet/+/896305 [analytics]
13:53 <milimetric> killing pageview-monthly_dump-coord, pageview-daily_dump-coord, and pageview-hourly-coord oozie jobs to migrate to airflow [analytics]
13:24 <btullis> restarting an-worker1140 [analytics]
2023-03-10 §
20:04 <milimetric> deployed refinery with new pageview jobs, patched in a manual copy of static_data/pageview/whitelist/whitelist.tsv because that file was renamed in the most recent version and would have broken jobs otherwise [analytics]
2023-03-09 §
19:47 <btullis> shutting down an-worker1078 for RAID BBU replacement T331544 [analytics]
18:51 <mforns> deployed airflow analytics (2.5) with the T326194_airflow_deb_creation_with_gitlab_ci branch [analytics]
17:55 <joal> Force kill druid indexing task to unlock druid_load_navigationtiming_daily__load_to_druid__20230228 [analytics]
17:46 <btullis> deploying spark-operator once more [analytics]
16:49 <btullis> deploying updated spark-operator to dse-k8s cluster. [analytics]
14:04 <btullis> airflow services were started automatically. airflow db check was successful. [analytics]
14:00 <btullis> running puppet on an-launcer1002 to pull the new package after https://gerrit.wikimedia.org/r/c/operations/puppet/+/896098 is merged. [analytics]
13:06 <steve_munene> upgrading analytics airflow to 2.5.1 on an-launcher1002 [analytics]
2023-03-08 §
11:54 <ottomata> Deployed refinery using scap, then deployed onto hdfs [analytics]
10:36 <nfraison> restart namenode in an-master1002 to take in account new quota init threads setting [analytics]
10:25 <nfraison> failover namenode in prod from an-master1002-eqiad-wmnet to an-master1001-eqiad-wmnet [analytics]
09:59 <nfraison> restart namenode in an-master1001 (standby in prod) to take in account new quota init threads setting [analytics]
09:53 <nfraison> restart namenode in an-test-master1002 to take in account new quota init threads setting [analytics]
09:52 <nfraison> failover namenode in test from an-test-master1002-eqiad-wmnet to an-test-master1001-eqiad-wmnet [analytics]
09:47 <nfraison> restart namenode in an-test-master1001 to take in account new quota init threads setting [analytics]
09:36 <nfraison> restart test hiveserver2: T303168 [analytics]
09:13 <nfraison> restart prod resourcemanager to take in account new dedicated exclude file [analytics]
08:58 <nfraison> restart test resourcemanager to take in account new dedicated exclude file [analytics]
07:56 <nfraison> restart prod jobhistory to take in account: https://gerrit.wikimedia.org/r/c/operations/puppet/+/894481 [analytics]
07:47 <nfraison> restart test jobhistory to take in account: https://gerrit.wikimedia.org/r/c/operations/puppet/+/894481 [analytics]
2023-03-07 §
22:03 <mforns> deployed airflow analytics again to try and fix druid_load_edit_hourly [analytics]
16:55 <xcollazo> deployed image-suggestions hotfix to platform_eng Airflow instance. See https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/262. [analytics]