2021-01-08
§
|
18:54 |
<joal> |
Restart jobs for permissions-fix (clickstream, mediacounts-archive, geoeditors-public_monthly, geoeditors-yearly, mobile_app-uniques-[daily|monthly], pageview-daily_dump, pageview-hourly, projectview-geo, unique_devices-[per_domain|per_project_family]-[daily|monthly]) |
[analytics] |
18:14 |
<joal> |
Restart projectview-hourly job (permissions test) |
[analytics] |
18:03 |
<joal> |
Deploy refinery onto HDFS |
[analytics] |
17:50 |
<joal> |
deploy refinery with scap |
[analytics] |
10:01 |
<elukey> |
restart varnishkafka-webrequest on cp5001 - timeouts to kafka-jumbo1001, librdkafka seems not recovering very well |
[analytics] |
08:46 |
<elukey> |
force restart of check_webrequest_partitions.service on an-launcher1002 |
[analytics] |
08:44 |
<elukey> |
force restart of monitor_refine_eventlogging_legacy_failure_flags.service |
[analytics] |
08:18 |
<elukey> |
raise default max executor heap size for Spark refine to 4G |
[analytics] |
2021-01-07
§
|
18:22 |
<elukey> |
chown -R /tmp/analytics analytics:analytics-privatedata-users (tmp dir for data quality stats tables) |
[analytics] |
18:21 |
<elukey> |
"sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chown -R analytics:analytics-privatedata-users /wmf/data/wmf/data_quality_stats" |
[analytics] |
18:10 |
<elukey> |
disable temporarily hdfs-cleaner.timer to prevent /tmp/DataFrameToDruid to be dropped |
[analytics] |
18:08 |
<elukey> |
chown -R /tmp/DataFrameToDruid analytics:druid (was: analytics:hdfs) on hdfs to temporarily unblock Hive2Druid jobs |
[analytics] |
16:31 |
<elukey> |
remove /etc/mysql/conf.d/research-client.cnf from stat100x nodes |
[analytics] |
15:40 |
<elukey> |
deprecate the 'reseachers' posix group for good |
[analytics] |
11:24 |
<elukey> |
execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod -R o-rwx /wmf/data/event_sanitized" to fix some file permissions as well |
[analytics] |
10:36 |
<elukey> |
execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod -R o-rwx /wmf/data/event" on an-master1001 to fix some file permissions (an-launcher executed timers during the past hours without the new umask) - T270629 |
[analytics] |
09:37 |
<elukey> |
forced re-run of monitor_refine_event_failure_flags.service on an-launcher1002 to clear alerts |
[analytics] |
08:26 |
<joal> |
Rerunning 4 failed refine jobs (mediawiki_cirrussearch_request, day=6/hour=20|21, day=7/hour=0|2) |
[analytics] |
08:14 |
<elukey> |
re-enable puppet on an-launcher1002 to apply new refine memory settings |
[analytics] |
07:59 |
<elukey> |
re-enabling all oozie jobs previously suspended |
[analytics] |
07:54 |
<elukey> |
restart oozie on an-coord1001 |
[analytics] |
2020-12-22
§
|
19:35 |
<elukey> |
restart hive daemons on an-coord1001 to pick up new settings |
[analytics] |
18:13 |
<elukey> |
failover analytics-hive.eqiad.wmnet to an-coord1002 (to allow maintenance on an-coord1001) |
[analytics] |
18:07 |
<elukey> |
restart hive server on an-coord1002 (current standby - no traffic) to pick up the new config (use the local metastore as opposed to what it is pointed by analytics-hive) |
[analytics] |
17:00 |
<mforns> |
Deployed refinery as part of weekly train (v0.0.142) |
[analytics] |
16:42 |
<mforns> |
Deployed refinery-source v0.0.142 |
[analytics] |
16:30 |
<mforns> |
Deployed refinery-source v0.0.142 |
[analytics] |
15:00 |
<razzi> |
stopping superset server on analytics-tool1004 |
[analytics] |
10:36 |
<elukey> |
restart presto coordinator to pick up analytics-hive settings |
[analytics] |