1401-1450 of 6002 results (30ms)
2022-07-06
§
|
09:50 |
<btullis> |
roll-restarting hadoop workers on the test cluster. |
[analytics] |
09:46 |
<btullis> |
restarting refinery-drop-webrequest-raw-partitions.service on an-test-coord1001 |
[analytics] |
09:44 |
<btullis> |
restarting refinery-drop-webrequest-refined-partitions.service on an-test-coord1001 |
[analytics] |
09:42 |
<btullis> |
restarted drop_event.service on an-test-coord1001 |
[analytics] |
09:35 |
<btullis> |
restarting hive-server2 and hive-metastore on an-test-coord1001 |
[analytics] |
2022-07-05
§
|
11:01 |
<btullis> |
sudo cookbook sre.hadoop.roll-restart-masters test |
[analytics] |
2022-07-04
§
|
16:14 |
<btullis> |
systemctl restart airflow-scheduler@research.service (on an-airflow1002) |
[analytics] |
08:04 |
<elukey> |
kill leftover processes of user `mewoph` on stat100x to allow puppet runs |
[analytics] |
2022-06-29
§
|
17:27 |
<mforns> |
killed mediawiki-history-load bundle in Hue, and started corresponding mediawiki_history_load DAG in Airflow |
[analytics] |
13:12 |
<mforns> |
re-deployed refinery with scap and refinery-deploy-to-hdfs |
[analytics] |
11:51 |
<btullis> |
btullis@an-master1001:~$ sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet |
[analytics] |
2022-06-28
§
|
20:57 |
<mforns> |
refinery deploy failed and I rolled back successfully, will try and repeat tomorrow when other people are present :] |
[analytics] |
20:19 |
<mforns> |
starting refinery deployment for refinery-source v0.2.2 |
[analytics] |
20:19 |
<mforns> |
starting refinery deploymenty |
[analytics] |
17:25 |
<ottomata> |
installing presto 0.273.3 on an-test-coord1001 and an-test-presto1001 |
[analytics] |
12:48 |
<milimetric> |
deploying airflow-dags/analytics to work on the metadata ingestion jobs |
[analytics] |
2022-06-27
§
|
20:33 |
<btullis> |
systemctl reset-failed jupyter-aarora-singleuser and jupyter-seddon-singleuser on stat1005 |
[analytics] |
20:16 |
<btullis> |
checking and restarting prometheus-mysqld-exporter on an-coord1001 |
[analytics] |
15:25 |
<btullis> |
upgraded conda-base-env on an-test-client1001 from 0.0.1 to 0.0.4 |
[analytics] |
2022-06-24
§
|
15:14 |
<ottomata> |
backfilled eventlogging data lost during failed gobblin job - T311263 |
[analytics] |
2022-06-23
§
|
13:48 |
<btullis> |
started the namenode service on an-master1001 after failback failure |
[analytics] |
13:41 |
<btullis> |
The failback didn't work again. |
[analytics] |
13:39 |
<btullis> |
attempting failback of namenode service from an-master1002 to an-master1001 |
[analytics] |
13:07 |
<btullis> |
restarted hadoop-hdfs-namenode service on an-master1001 |
[analytics] |
11:25 |
<joal> |
kill oozie mediawiki-geoeditors-monthly-coord in favor of airflow job |
[analytics] |
08:52 |
<joal> |
Deploy airflow |
[analytics] |
2022-06-22
§
|
20:55 |
<aqu> |
`scap deploy -f analytics/refinery` because of a crash during `git-fat pull` |
[analytics] |
19:30 |
<aqu> |
Deploying analytics/refinery |
[analytics] |
2022-06-21
§
|
14:56 |
<aqu> |
RefineSanitize from an-launcher1002: sudo -u analytics kerberos-run-command analytics spark2-submit --class org.wikimedia.analytics.refinery.job.refine.RefineSanitize --master yarn --deploy-mode client /srv/deployment/analytics/refinery/artifacts/org/wikimedia/analytics/refinery/refinery-job-0.1.15.jar --config_file /home/aqu/refine.properties --since "2022-06-19T09:52:00+0000" --until |
[analytics] |
13:33 |
<aqu> |
sudo systemctl start monitor_refine_event_sanitized_main_immediate.service on an-launcher1002 |
[analytics] |
10:47 |
<btullis> |
proceeding with the hadoop.roll-restart-masters cookbook |
[analytics] |
2022-06-20
§
|
07:14 |
<SandraEbele> |
Started Airflow 3 Wikidata metrics jobs (Articleplaceholder, Reliability and SpecialEntityData metrics). |
[analytics] |
07:12 |
<SandraEbele> |
Started Airflow3 Wikidata metrics jobs (Articleplaceholder, Relia) |
[analytics] |
07:11 |
<SandraEbele> |
killed Oozie wikidata-articleplaceholder_metrics-coord, wikidata-reliability_metrics-coord, and wikidata-specialentitydata_metrics-coord jobs. |
[analytics] |
2022-06-17
§
|
12:35 |
<SandraEbele> |
deployed daily airflow dag for 3 Wikidata metrics. |
[analytics] |
08:36 |
<btullis> |
power cycled an-worker1109 as it was stuck with CPU soft lockups |
[analytics] |
2022-06-16
§
|
06:49 |
<joal> |
Rerun webrequest-load-wf-upload-2022-6-15-22 after weird oozie failure |
[analytics] |
2022-06-15
§
|
14:48 |
<btullis> |
deploying datahub 0.8.38 |
[analytics] |
2022-06-14
§
|
10:48 |
<joal> |
unpause renamed dags |
[analytics] |
10:44 |
<joal> |
Deploy Airflow |
[analytics] |
10:12 |
<btullis> |
manually failing back hdfs-namenode to an-master1001 after fixing typo |
[analytics] |
09:36 |
<joal> |
deploy refinery onto HDFS |
[analytics] |
08:48 |
<btullis> |
roll-restarting hadoop masters T310293 |
[analytics] |
08:40 |
<joal> |
Deploying using scap again after failure cleanup on an-launcher1002 |
[analytics] |
07:45 |
<joal> |
deploy refinery using scap |
[analytics] |
2022-06-13
§
|
14:00 |
<btullis> |
restarting presto service on an-coord1001 |
[analytics] |
13:20 |
<btullis> |
btullis@datahubsearch1001:~$ sudo systemctl reset-failed ifup@ens13.service T273026 |
[analytics] |
13:09 |
<btullis> |
restarting oozie service on an-coord1001 |
[analytics] |
12:59 |
<btullis> |
havaing failed over hive to an-coord1002 10 minutes ago, I'm restarting hive services on an-coord1001 |
[analytics] |
12:26 |
<btullis> |
restarting hive-server2 and hive-metastore on an-coord1002 |
[analytics] |