|
2025-09-15
ยง
|
| 14:01 |
<ladsgroup@cumin1003> |
dbctl commit (dc=all): 'Removing db1259 from dumps/vslow group (T403966)', diff saved to https://phabricator.wikimedia.org/P83325 and previous config saved to /var/cache/conftool/dbconfig/20250915-140115-ladsgroup.json |
[production] |
| 13:59 |
<lucaswerkmeister-wmde@deploy1003> |
jforrester, lucaswerkmeister-wmde: Continuing with sync |
[production] |
| 13:59 |
<lucaswerkmeister-wmde@deploy1003> |
jforrester, lucaswerkmeister-wmde: Backport for [[gerrit:1188342|SECURITY: Do not let getErrorMessages() etc. return HTML ever, at least for now (T404392)]], [[gerrit:1188343|SECURITY: Do not let error type labels or arguments return HTML either (T404392)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. |
[production] |
| 13:53 |
<jmm@deploy1003> |
helmfile [codfw] DONE helmfile.d/services/thumbor: apply |
[production] |
| 13:50 |
<Dreamy_Jazz> |
Created suggested investigation database tables on test2wiki - T404594 |
[production] |
| 13:49 |
<jmm@deploy1003> |
helmfile [codfw] START helmfile.d/services/thumbor: apply |
[production] |
| 13:45 |
<ladsgroup@cumin1003> |
dbctl commit (dc=all): 'Fix weight of s3 replicas in codfw (T403966)', diff saved to https://phabricator.wikimedia.org/P83324 and previous config saved to /var/cache/conftool/dbconfig/20250915-134537-ladsgroup.json |
[production] |
| 13:42 |
<ladsgroup@cumin1003> |
dbctl commit (dc=all): 'Bump weight of db1211 (T403966)', diff saved to https://phabricator.wikimedia.org/P83323 and previous config saved to /var/cache/conftool/dbconfig/20250915-134220-ladsgroup.json |
[production] |
| 13:37 |
<jmm@deploy1003> |
helmfile [staging] DONE helmfile.d/services/thumbor: apply |
[production] |
| 13:37 |
<jmm@deploy1003> |
helmfile [staging] START helmfile.d/services/thumbor: apply |
[production] |
| 13:36 |
<jmm@deploy1003> |
helmfile [staging] DONE helmfile.d/services/thumbor: apply |
[production] |
| 13:36 |
<jmm@deploy1003> |
helmfile [staging] START helmfile.d/services/thumbor: apply |
[production] |
| 13:36 |
<jmm@deploy1003> |
helmfile [staging] START helmfile.d/services/thumbor: apply |
[production] |
| 13:36 |
<lucaswerkmeister-wmde@deploy1003> |
Started scap sync-world: Backport for [[gerrit:1188342|SECURITY: Do not let getErrorMessages() etc. return HTML ever, at least for now (T404392)]], [[gerrit:1188343|SECURITY: Do not let error type labels or arguments return HTML either (T404392)]] |
[production] |
| 13:33 |
<fceratto@cumin1002> |
dbctl commit (dc=all): 'Depool db2229 T404586', diff saved to https://phabricator.wikimedia.org/P83322 and previous config saved to /var/cache/conftool/dbconfig/20250915-133322-fceratto.json |
[production] |
| 13:31 |
<fceratto@cumin1002> |
dbctl commit (dc=all): 'Promote db2214 to s6 primary T404586', diff saved to https://phabricator.wikimedia.org/P83321 and previous config saved to /var/cache/conftool/dbconfig/20250915-133108-fceratto.json |
[production] |
| 13:28 |
<federico3> |
Starting s6 codfw failover from db2229 to db2214 - T404586 |
[production] |
| 13:24 |
<jgleeson> |
SmashPig upgraded from 4206f06c to 70316e96 |
[production] |
| 12:59 |
<fceratto@cumin1002> |
dbctl commit (dc=all): 'Set db2214 with weight 0 T404586', diff saved to https://phabricator.wikimedia.org/P83320 and previous config saved to /var/cache/conftool/dbconfig/20250915-125903-fceratto.json |
[production] |
| 12:49 |
<fceratto@cumin1002> |
DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s6 T404586 |
[production] |
| 11:55 |
<ladsgroup@cumin1003> |
dbctl commit (dc=all): 'Repooling after maintenance db2207 (T402925)', diff saved to https://phabricator.wikimedia.org/P83313 and previous config saved to /var/cache/conftool/dbconfig/20250915-115527-ladsgroup.json |
[production] |
| 11:40 |
<ladsgroup@cumin1003> |
dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P83312 and previous config saved to /var/cache/conftool/dbconfig/20250915-114020-ladsgroup.json |
[production] |
| 11:35 |
<fabfur> |
restarting pybal on lvs1020 to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1188309 (T404388) |
[production] |
| 11:35 |
<ladsgroup@cumin1003> |
END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1160* gradually with 4 steps - Work done |
[production] |
| 11:32 |
<btullis@cumin1003> |
END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons. |
[production] |
| 11:26 |
<btullis@cumin1003> |
START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons. |
[production] |
| 11:25 |
<ladsgroup@cumin1003> |
dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P83310 and previous config saved to /var/cache/conftool/dbconfig/20250915-112512-ladsgroup.json |
[production] |
| 11:10 |
<ladsgroup@cumin1003> |
dbctl commit (dc=all): 'Repooling after maintenance db2207 (T402925)', diff saved to https://phabricator.wikimedia.org/P83308 and previous config saved to /var/cache/conftool/dbconfig/20250915-111005-ladsgroup.json |
[production] |
| 11:08 |
<jiji@cumin1003> |
END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs |
[production] |
| 10:55 |
<jiji@cumin1003> |
START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs |
[production] |
| 10:44 |
<ladsgroup@cumin1003> |
dbctl commit (dc=all): 'Depooling db2207 (T402925)', diff saved to https://phabricator.wikimedia.org/P83305 and previous config saved to /var/cache/conftool/dbconfig/20250915-104420-ladsgroup.json |
[production] |
| 10:44 |
<ladsgroup@cumin1003> |
DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance |
[production] |
| 10:40 |
<jayme@cumin1002> |
END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - jayme@cumin1002" |
[production] |
| 10:40 |
<jayme@cumin1002> |
END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - jayme@cumin1002 |
[production] |
| 10:39 |
<jayme@cumin1002> |
START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - jayme@cumin1002 |
[production] |
| 10:39 |
<jayme@cumin1002> |
START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - jayme@cumin1002" |
[production] |
| 10:38 |
<btullis@cumin1003> |
END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons. |
[production] |
| 10:20 |
<btullis@cumin1003> |
START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons. |
[production] |
| 10:17 |
<jiji@cumin1003> |
END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs |
[production] |
| 10:15 |
<ladsgroup@deploy1003> |
Finished scap sync-world: Backport: [[gerrit:1188285|Reduce db lock timeout in LinksUpdate and CategoryMembershipChangeJob]] (T366938) (duration: 42m 06s) |
[production] |
| 10:14 |
<ladsgroup@cumin1003> |
START - Cookbook sre.mysql.pool db1160* gradually with 4 steps - Work done |
[production] |
| 10:09 |
<jiji@cumin1003> |
START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs |
[production] |
| 10:04 |
<elukey@deploy1003> |
helmfile [codfw] DONE helmfile.d/services/wikifeeds: sync |
[production] |
| 10:02 |
<elukey@deploy1003> |
helmfile [codfw] START helmfile.d/services/wikifeeds: sync |
[production] |
| 09:57 |
<elukey@deploy1003> |
helmfile [codfw] DONE helmfile.d/services/termbox: sync |
[production] |
| 09:56 |
<elukey@deploy1003> |
helmfile [codfw] START helmfile.d/services/termbox: sync |
[production] |
| 09:55 |
<elukey@deploy1003> |
helmfile [codfw] DONE helmfile.d/services/mobileapps: sync |
[production] |
| 09:54 |
<elukey@deploy1003> |
helmfile [codfw] START helmfile.d/services/mobileapps: sync |
[production] |
| 09:53 |
<elukey@deploy1003> |
helmfile [codfw] DONE helmfile.d/services/linkrecommendation: sync |
[production] |
| 09:52 |
<elukey@deploy1003> |
helmfile [codfw] START helmfile.d/services/linkrecommendation: sync |
[production] |