951-1000 of 5455 results (24ms)
2022-05-31 §
15:36 <milimetric> dropped razzi databases and deleted HDFS directories (in trash) [analytics]
06:26 <elukey> `elukey@an-master1001:~$ sudo systemctl reset-failed hadoop-clean-fairscheduler-event-logs.service` [analytics]
2022-05-30 §
20:19 <SandraEbele> Restarted oozie job pageview-druid-daily-coord [analytics]
11:28 <joal> deploy airflow spark3 aqs_hourly [analytics]
2022-05-25 §
21:09 <joal> Resume aqs_hourly job in airflow test [analytics]
20:33 <joal> Pausing aqs_hourly job in airflow test intil we fix the spark3 issue [analytics]
06:20 <elukey> `elukey@an-tool1011:~$ sudo systemctl reset-failed ifup@ens13.service` - T273026 [analytics]
2022-05-24 §
19:54 <SandraEbele> Deployed refinery using scap, then deployed onto hdfs successfully. [analytics]
18:34 <SandraEbele> Deploying refinery, regular weekly deployment [analytics]
13:18 <joal> Release refinery-source v0.2.0 to archiva [analytics]
10:21 <btullis> restarted hadoop-yarn-nodemanager on an-worker1139 [analytics]
2022-05-23 §
18:27 <mforns> killed mobile_apps-session_metrics-coord (Airflow job is taking over) [analytics]
2022-05-21 §
15:52 <joal> Kill yarn app application_1651744501826_83884 in order to prevent the HDFS alerts [analytics]
2022-05-19 §
16:59 <ottomata> deploying airflow-dags analytics with new artifact names, first clearing artifacts cache dir - T307115 [analytics]
2022-05-18 §
10:57 <btullis> upgrading datahub to version 0.8.34 [analytics]
2022-05-17 §
21:32 <razzi> sudo systemctl reset-failed ifup@ens13.service on an-tool1007 [analytics]
08:54 <btullis> booted an-tool1007 from network to begin buster upgrade [analytics]
2022-05-12 §
14:49 <razzi> undo the 2 previous confctl changes to repool dbproxy1019 to wikireplicas-b only [analytics]
14:35 <razzi> razzi@cumin1001:~$ sudo confctl select service=wikireplicas-a,name=dbproxy1019.eqiad.wmnet set/pooled=yes # for T298940 [analytics]
2022-05-11 §
18:20 <razzi> disregard the above log; wrote out the command but then saw there was a warning for cr2-eqiad [analytics]
18:15 <razzi> razzi@lvs1019:~$ systemctl stop pybal.service to apply change https://gerrit.wikimedia.org/r/c/operations/puppet/+/779915 [analytics]
18:06 <razzi> razzi@lvs1020:~$ systemctl stop pybal.service to apply change https://gerrit.wikimedia.org/r/c/operations/puppet/+/779915 [analytics]
13:29 <mforns> restarted oozie jobs after deployment: mediarequest_top_files, pageview_top_articles, unique_devices_per_domain_monthly, unique_devices_per_project_family_monthly [analytics]
2022-05-10 §
20:32 <mforns> finished refinery deploy (regular weekly train) [analytics]
19:34 <mforns> starting refinery deploy (regular weekly train) [analytics]
2022-05-09 §
15:06 <SandraEbele> killed ‘apis-coord' oozie job and started corresponding airflow job ‘apis_metrics_to_graphite’ [analytics]
2022-05-06 §
09:11 <joal> kill cassandra-monthly-wf-local_group_default_T_mediarequest_top_files-2022-4 again [analytics]
08:44 <joal> Rerun cassandra-monthly-wf-local_group_default_T_mediarequest_top_files-2022-4 with SRE watching network [analytics]
08:29 <joal> kill cassandra-monthly-wf-local_group_default_T_mediarequest_top_files-2022-4 as it was probably saturating network [analytics]
2022-05-05 §
18:53 <btullis> restarting airflow-scheduler@platform_eng.service on an-airflow1003 [analytics]
18:53 <btullis> restarted airflow-scheduler@research.service on an-airflow1002 [analytics]
18:49 <btullis> restarting airflow-scheduler@analytics service on an-launcher1002 [analytics]
12:26 <aqu> Regular analytics weekly train [analytics/refinery@cc4b2bd] [analytics]
09:53 <btullis> roll-restarting hadoop masters to pick up new heap size [analytics]
09:16 <btullis> re-enabling gobblin jobs now [analytics]
09:15 <btullis> restarting failed eventlogging_to_druid_ services on an-launcher1002 [analytics]
09:00 <btullis> restarting an-coord1001 [analytics]
08:53 <btullis> stopping oozie on an-coord1001 [analytics]
2022-05-04 §
08:47 <btullis> rebooting an-coord1002 to pick up new kernel [analytics]
2022-05-03 §
18:24 <razzi> remove /etc/apache2/sites-available/50-superset-wikimedia-org.conf from an-tool1005 (superset staging) since it was removed from puppet but has no ensure: absent [analytics]
2022-04-27 §
19:37 <ottomata> restarting airflow services on all airflow instances after installing updated airflow debian package [analytics]
2022-04-26 §
19:02 <aqu> About to deploy analytics/refinery: Weekly deployment train + Artifacts to 0.1.27 [analytics]
12:02 <joal> Rerun cassandra-daily-wf-local_group_default_T_mediarequest_per_file-2022-4-23 [analytics]
2022-04-25 §
20:09 <ottomata> dropping event.ios_notification_interaction hive table and data for backwards incompatible schema change in T290920 [analytics]
11:51 <btullis> failing back hdfs active role to an-master1001 [analytics]
11:49 <btullis> restarted hadoop-yarn-resourcemanager on an-master1002 to force the active role back to an-master1001 [analytics]
11:01 <btullis> rebooting an-master1001 [analytics]
10:25 <btullis> restarting the `check_webrequest_partitions` service on an-launcher1002 [analytics]
09:39 <btullis> failover to an-master1002 successful at 3rd attempt [analytics]
09:30 <btullis> 2nd attempt to switch HDFS services to an-master1002 [analytics]