1-50 of 4204 results (17ms)
2021-11-22
§
|
12:18 |
<btullis> |
failed back the hive services to an-coord1001 via CNAME change |
[analytics] |
11:36 |
<btullis> |
btullis@an-coord1001:~$ sudo systemctl restart hive-server2 hive-metastore |
[analytics] |
10:44 |
<btullis> |
deploying DNS change to switch hive to the standby server. |
[analytics] |
10:18 |
<btullis> |
btullis@an-coord1002:~$ sudo systemctl restart hive-server2 hive-metastore |
[analytics] |
2021-11-18
§
|
17:26 |
<elukey> |
varnishkafka-webrequest on cp3050 is running with /etc/ssl/localcerts/wmf_trusted_root_CAs.pem |
[analytics] |
10:03 |
<elukey> |
restart prometheus-druid-exporter on Druid Analytics to clear unnecessary metrics |
[analytics] |
07:32 |
<elukey> |
restart prometheus-druid-exporter on Druid Public to see metrics difference |
[analytics] |
2021-11-17
§
|
16:01 |
<btullis> |
roll-restarting kafka-test brokers |
[analytics] |
12:12 |
<btullis> |
roll-restarting the presto analytics workers |
[analytics] |
11:44 |
<btullis> |
btullis@archiva1002:~$ sudo systemctl restart archiva.service |
[analytics] |
07:29 |
<elukey> |
`apt-get clean` on an-tool1005 to free space in the root partition |
[analytics] |
07:28 |
<elukey> |
`sudo pkill -U jmixter` on stat100[5,8] to allow puppet to run and remove the offboarded user |
[analytics] |
2021-11-16
§
|
19:40 |
<joal> |
Deploying refinery to HDFS |
[analytics] |
19:15 |
<joal> |
Deploying refinery with scap |
[analytics] |
18:23 |
<joal> |
Releasing refinery-source v0.1.21 |
[analytics] |
11:32 |
<btullis> |
btullis@cumin1001:~$ sudo cookbook sre.druid.roll-restart-workers public |
[analytics] |
10:20 |
<btullis> |
roll-restarting hadoop masters |
[analytics] |
2021-11-15
§
|
16:37 |
<joal> |
Rerun failed mediawiki-wikitext-history-wf-2021-10 |
[analytics] |
2021-11-11
§
|
06:56 |
<elukey> |
`systemctl start prometheus-mysqld-exporter@analytics_meta` on db1108 |
[analytics] |
2021-11-10
§
|
18:20 |
<btullis> |
btullis@an-launcher1002:~$ sudo systemctl reset-failed monitor_refine_event_sanitized_analytics_delayed.service |
[analytics] |
10:19 |
<btullis> |
btullis@an-launcher1002:~$ sudo systemctl reset-failed monitor_refine_event_sanitized_analytics_delayed |
[analytics] |
2021-11-09
§
|
16:52 |
<razzi> |
restart presto server on an-coord1001 to apply change for T292087 |
[analytics] |
16:30 |
<razzi> |
set superset presto version to 0.246 in ui |
[analytics] |
16:30 |
<razzi> |
set superset presto timeout to 170s: {"connect_args":{"session_props":{"query_max_run_time":"170s"}}} for T294771 |
[analytics] |
12:23 |
<btullis> |
btullis@an-launcher1002:~$ sudo systemctl reset-failed monitor_refine_event_sanitized_analytics_delayed |
[analytics] |
07:23 |
<elukey> |
`apt-get clean` on stat1006 to free some space (root partition full) |
[analytics] |
2021-11-08
§
|
19:51 |
<ottomata> |
an-coord1002: drop user 'admin'@'localhost'; start slave; to fix broken replication - T284150 |
[analytics] |
19:44 |
<razzi> |
create admin user on an-coord1001 for T284150 |
[analytics] |
18:07 |
<razzi> |
run `create user 'admin'@'localhost' identified by <password>; grant all privileges on *.* to admin;` to allow milimetric to access mysql on an-coord1002 for T284150 |
[analytics] |
2021-11-04
§
|
16:39 |
<razzi> |
add "can sql json on superset" permission to Alpha role on superset.wikimedia.org |
[analytics] |
16:14 |
<razzi> |
drop and restore superset_staging database to test permissions as they are in production |
[analytics] |
2021-11-03
§
|
17:07 |
<razzi> |
razzi@an-tool1010:~$ sudo systemctl stop superset |
[analytics] |
16:57 |
<razzi> |
dump mysql in preparation for superset upgrade |
[analytics] |
02:23 |
<milimetric> |
deployed refinery with regular train |
[analytics] |
2021-10-29
§
|
23:04 |
<btullis> |
deleted all remaining old cassandra snapshots on aqs100x servers. |
[analytics] |
22:58 |
<btullis> |
deleted old snapshots from aqs1006 and aqs1009 |
[analytics] |
17:45 |
<razzi> |
set presto_analytics_hive extra parameter engine_params.connect_args.session_props.query_max_run_time to 55s on superset.wikimedia.org |
[analytics] |
10:39 |
<elukey> |
roll restart of kafka-test to pick up new truststore (root PKI added) |
[analytics] |
2021-10-28
§
|
19:13 |
<ottomata> |
re-enable hdfs-cleaner for /wmf/gobblin |
[analytics] |
2021-10-26
§
|
09:01 |
<btullis> |
reverted hive services back to an-coord1001. |
[analytics] |
2021-10-25
§
|
16:03 |
<btullis> |
btullis@an-coord1001:~$ sudo systemctl restart hive-server2 hive-metastore |
[analytics] |
13:02 |
<btullis> |
btullis@an-coord1002:~$ sudo systemctl restart hive-server2 hive-metastore |
[analytics] |
12:51 |
<btullis> |
btullis@aqs1007:~$ sudo nodetool-a clearsnapshot |
[analytics] |
2021-10-21
§
|
14:05 |
<ottomata> |
rerun refine_eventlogging_analytics refine_eventlogging_legacy and refine_event with -ignore-done-flag=true --since=2021-10-21T01:00:00 --until=2021-10-21T04:00:00 for backfill of missing data after gobblin problems |
[analytics] |
13:39 |
<btullis> |
btullis@an-launcher1002:~$ sudo systemctl restart gobblin-event_default |
[analytics] |
10:35 |
<joal> |
Re-refine netflow data after gobblin pulled data fix |
[analytics] |
08:41 |
<joal> |
Rerun webrequest-load jobs for hour 2021-10-21T02:00 |
[analytics] |
2021-10-20
§
|
18:11 |
<razzi> |
Deployed refinery using scap, then deployed onto hdfs |
[analytics] |
16:36 |
<razzi> |
deploy refinery change for https://phabricator.wikimedia.org/T287084 |
[analytics] |
07:15 |
<joal> |
rerun webrequest-load-wf-upload-2021-10-20-1 after node issue |
[analytics] |