201-250 of 10000 results (89ms)
2025-04-03 ยง
13:25 <cgoubert@deploy1003> helmfile [eqiad] START helmfile.d/services/mw-cron: apply [production]
13:20 <taavi@deploy1003> ihurbain, taavi: Continuing with sync [production]
13:17 <taavi@deploy1003> ihurbain, taavi: Backport for [[gerrit:1133113|Enable Parsoid Read Views on 13 wiktionaries (T390680)]], [[gerrit:1133141|Enable Parsoid Read Views to incubator and dagwiki mobile frontend (T380768 T381002)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
13:07 <brouberol@deploy1003> helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply [production]
13:07 <jmm@cumin2002> END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-eqiad [production]
13:07 <taavi@deploy1003> Started scap sync-world: Backport for [[gerrit:1133113|Enable Parsoid Read Views on 13 wiktionaries (T390680)]], [[gerrit:1133141|Enable Parsoid Read Views to incubator and dagwiki mobile frontend (T380768 T381002)]] [production]
13:07 <brouberol@deploy1003> helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply [production]
13:06 <brouberol@deploy1003> helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply [production]
13:06 <brouberol@deploy1003> helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply [production]
13:05 <jmm@cumin2002> START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-eqiad [production]
13:04 <jmm@cumin2002> END (FAIL) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=1) rolling restart_daemons on A:thanos-fe [production]
13:02 <jmm@cumin2002> START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe [production]
12:56 <moritzm> prune now obsolete nginx packages from testreduce1002 T329529 [production]
12:55 <godog> move k8s instances from prometheus1006 to prometheus1008 - T383232 [production]
12:55 <jmm@cumin2002> END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public [production]
12:54 <klausman@deploy1003> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. [production]
12:53 <jmm@cumin2002> START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public [production]
12:53 <klausman@deploy1003> helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. [production]
12:48 <klausman@deploy1003> helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. [production]
12:47 <klausman@deploy1003> helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. [production]
12:42 <jmm@cumin2002> END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-all [production]
12:28 <jmm@cumin2002> START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-all [production]
12:25 <klausman@deploy1003> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. [production]
12:24 <klausman@deploy1003> helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. [production]
12:22 <jmm@cumin2002> END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-test [production]
12:21 <jmm@cumin2002> START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-test [production]
12:16 <moritzm> installing libxslt security updates [production]
11:58 <moritzm> installing Intel microcode security updates [production]
11:56 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet [production]
11:50 <jmm@cumin2002> START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet [production]
11:46 <moritzm> installing Django security updates on Bullseye [production]
11:37 <moritzm> installing Python 3.9 security updates [production]
11:33 <topranks> reboot cr2-eqord to complete JunOS upgrade T364092 [production]
11:31 <topranks> disable EBGP sessions to internet peers on cr2-eqord to prep for JunOS upgrade T364092 [production]
11:30 <cmooney@cumin1002> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr2-codfw,cr2-eqiad,cr2-eqord,cr2-eqord IPv6,cr3-ulsfo with reason: Upgrade cr2-eqord JunOS [production]
11:07 <moritzm> installing nodejs security updates [production]
11:06 <topranks> pre-pend as paths announced to codfw/eqiad from eqord to prep for JunOS upgrade T364092 [production]
11:02 <ladsgroup@deploy1003> Finished scap sync-world: Backport for [[gerrit:1133862|Bump thumbnail steps to 65% (T360589)]] (duration: 16m 34s) [production]
10:55 <ladsgroup@deploy1003> ladsgroup: Continuing with sync [production]
10:54 <mvernon@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host apus-fe2003.codfw.wmnet with OS bookworm [production]
10:54 <mvernon@cumin2002> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002" [production]
10:53 <ladsgroup@deploy1003> ladsgroup: Backport for [[gerrit:1133862|Bump thumbnail steps to 65% (T360589)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
10:51 <mvernon@cumin2002> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002" [production]
10:50 <topranks> drain transport circuits to eqord (Chicago network pop) to prep for Junos upgrade cr2-eqord T364092 [production]
10:48 <moritzm> remove nodejs from aqs* hosts, no longer used/needed and spares us needless security rollouts T350143 [production]
10:46 <ladsgroup@deploy1003> Started scap sync-world: Backport for [[gerrit:1133862|Bump thumbnail steps to 65% (T360589)]] [production]
10:32 <mvernon@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apus-fe2003.codfw.wmnet with reason: host reimage [production]
10:27 <mvernon@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on apus-fe2003.codfw.wmnet with reason: host reimage [production]
10:22 <akosiaris@deploy1003> helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. [production]
10:22 <akosiaris@deploy1003> helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. [production]