151-200 of 6015 results (46ms)
2024-03-28 §
15:00 <elukey> remove GPU labels in Hadoop Yarn for an-worker[1096-1099] (the hosts don't have a GPU anymore) - T361225 [analytics]
2024-03-27 §
15:14 <brouberol> decommissioning an-tool1009 now that hue is fully offline - T341895 [analytics]
15:02 <brouberol> dropping the hue.wikimedia.org CNAME - T341895 [analytics]
2024-03-25 §
15:02 <btullis> updating the ssl_provider for eventstreams schema servers to cfssl for T360412 [analytics]
2024-03-22 §
13:17 <elukey> `elukey@cumin1002:~$ sudo cumin 'stat100[4,5,8,9]*' 'kill `pgrep -u kcv-wikimf`'` to unblock puppet on various stat nodes [analytics]
10:44 <btullis> shut down an-worker1168 to investigate disk controller failure for T360594 [analytics]
2024-03-20 §
10:50 <brouberol> superset.wikimedia.org is now migrated to the DSE k8s cluster, CAS errors have receeded [analytics]
10:20 <brouberol> migrating superset to Kubernetes. Some CAS errors are expected during ~15 minutes [analytics]
2024-03-07 §
14:01 <btullis> deploying updated mediwiki_history_reduced snapshots to AQS 2.0 [analytics]
2024-03-04 §
12:22 <btullis> restarting hive-server2 and hive-metastore service on an-coord1003 [analytics]
12:00 <btullis> migrating analytics-hive from an-coord1003 to an-coord1004 with https://gerrit.wikimedia.org/r/c/operations/dns/+/1008414 [analytics]
10:32 <btullis> restart hive-server2 and hive-metastore service on an-coord1004 [analytics]
2024-02-29 §
14:06 <btullis> sudo systemctl reset-failed refinery-sqoop-whole-mediawiki.service [analytics]
09:59 <joal> Deploying refinery with scap (fix sqoop for tomorrow) [analytics]
09:25 <brouberol> decommissioning an-tool1005 now that superset-next is migrated to k8s - T358706 [analytics]
2024-02-28 §
11:08 <btullis> reimaging dbstore1007 to bookworm for T356961 [analytics]
09:48 <joal> Deploying refinery onto HDFS [analytics]
09:28 <joal> Deploying Refinery for T357859 [analytics]
2024-02-27 §
18:14 <tchin> deploying eventstreams [analytics]
2024-02-22 §
11:52 <brouberol> redeploying the spark-history server with expanded egress rules for hadoop workers - T358206 [analytics]
2024-02-21 §
21:21 <joal> Update airflow variable for pageview_actor-hourly leading to 64 written files instead of 32 - this should ease the job resource consumption and prevent failures [analytics]
19:51 <joal> Rerun pageview_actor_hourly for hour 2024-02-20T07:00 [analytics]
2024-02-20 §
22:52 <sfaci> Deployed refinery using scap, then deployed onto hdfs [analytics]
22:18 <sfaci> Starting refinery deployment [analytics]
15:57 <xcollazo> deployed latest Airflow DAG updates for the analytics instance [analytics]
2024-02-19 §
11:14 <sfaci> rerunning the compute_pageview_actor_hourly task in the pageview_actor_hourly DAG 2024-02-17 08:00:00 UTC [analytics]
2024-02-13 §
09:03 <brouberol> attempting a reimage of apifeatureusage1001 to bookworm - T346053 [analytics]
2024-02-09 §
14:01 <brouberol> superset was successfully deployed once the MySQL password was updated - T347710 [analytics]
13:47 <brouberol> deploying superset/superset-next services in dse-k8s-eqiad - T347710 [analytics]
2024-02-08 §
09:50 <stevemunene> failover hadoop namenode back to an-master1003 T353776 [analytics]
2024-02-07 §
20:17 <joal> Relaunch session_length_daily failed task [analytics]
20:09 <joal> Relaunch druid_load_unique_devices_per_domain_daily_aggregated_monthly after deploy [analytics]
19:49 <joal> deploying Refinery onto HDFS [analytics]
19:49 <joal> Deployed refinery using scap [analytics]
19:49 <joal> Release refinery-source v0.2.32 [analytics]
17:26 <btullis> roll-restarting kafka-jumbo for T356382 [analytics]
15:35 <btullis> rolling out a change of the discovery-uri to presto workers and clients https://gerrit.wikimedia.org/r/c/operations/puppet/+/998425 [analytics]
13:01 <stevemunene> failover hadoop namenode back to an-master1003 after the jvm service restart to pick up new JDK and T353776 [analytics]
12:48 <stevemunene> restart jvm services on an-master1003 for T353776 and to pick up new JDK [analytics]
12:36 <stevemunene> failover hadoop namenode to an-master1004 for jvm service restart to pick up new JDK and T353776 [analytics]
12:24 <stevemunene> restart jvm services on an-master1004 for T353776 and to pick up new JDK [analytics]
2024-02-06 §
19:57 <joal> Deploy refinery onto HDFS [analytics]
19:34 <joal> Deploying refinery using scap [analytics]
19:34 <joal> Refinery-source v0.2.31 released to archiva [analytics]
14:57 <btullis> roll-restarting the presto workers for T356382 [analytics]
14:04 <joal> Rerun mediawiki-history-reduced druid indexation after airflow variable update [analytics]
13:39 <brouberol> add new TLS SANs to the superset/superset-next certificates in dse-k8s-eqiad - T356481 [analytics]
13:29 <stevemunene> roll restart hadoop masters to pick up the right rack assignment for new hosts T353776 [analytics]
11:45 <stevemunene> add new an-workers to analytics_cluster hadoop worker role analytics_cluster::hadoop::worker T353776 [analytics]
11:03 <btullis> reimaging an-web1001 to bullseye for T349398 [analytics]