2025-05-12
§
|
08:47 |
<dcaro@cloudcumin1001> |
START - Cookbook wmcs.toolforge.component.deploy for component volume-admission |
[tools] |
08:46 |
<dcaro@cloudcumin1001> |
END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission |
[toolsbeta] |
08:43 |
<mvernon@cumin1002> |
START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on P{thanos-fe200[4-7]*} or P{thanos-fe1*} and (A:thanos-fe or A:thanos-fe-codfw or A:thanos-fe-eqiad) |
[production] |
08:39 |
<mvernon@cumin1002> |
END (FAIL) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=1) rolling restart_daemons on A:thanos-fe |
[production] |
08:39 |
<mvernon@cumin1002> |
START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe |
[production] |
08:35 |
<jayme@deploy1003> |
helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. |
[production] |
08:34 |
<dcaro@cloudcumin1001> |
START - Cookbook wmcs.toolforge.component.deploy for component volume-admission |
[toolsbeta] |
08:34 |
<jayme@deploy1003> |
helmfile [staging-eqiad] START helmfile.d/admin 'apply'. |
[production] |
08:33 |
<jayme@deploy1003> |
helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. |
[production] |
08:31 |
<jayme@deploy1003> |
helmfile [staging-codfw] START helmfile.d/admin 'apply'. |
[production] |
08:29 |
<mvernon@cumin1002> |
conftool action : set/pooled=yes; selector: service=apus,name=apus-fe1003.eqiad.wmnet |
[production] |
08:29 |
<mvernon@cumin1002> |
conftool action : set/weight=40; selector: service=apus,name=apus-fe1003.eqiad.wmnet |
[production] |
08:28 |
<hashar> |
Disabled https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/ due to a failure with Etcd/expired certificate # T393855 |
[releng] |
08:17 |
<taavi> |
powercycle clouservices2005-dev.codfw.wmnet |
[admin] |
08:15 |
<hashar> |
Updated jobs for "Replace all uses of `$(pwd)` with `$PWD`" | https://gerrit.wikimedia.org/r/c/integration/config/+/1143967/ |
[releng] |
08:10 |
<jayme@deploy1003> |
helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. |
[production] |
08:09 |
<jayme@deploy1003> |
helmfile [staging-codfw] START helmfile.d/admin 'apply'. |
[production] |
08:07 |
<wmbot~valeriobozzolan@tools-bastion-13> |
toolforge jobs: itwiki-orphanizerbot, itwiki-deletionbot: adopt timeout 7200 seconds for [[w:it:Special:PermaLink/144874982#Bot_Fermo_3]] |
[tools.itwiki] |
07:58 |
<hashar> |
Disabled https://integration.wikimedia.org/ci/job/beta-scap-sync-world/ due to a failure with Etcd/expired certificate # T393855 |
[releng] |
07:57 |
<slyngshede@cumin1002> |
DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Cicalese out of all services on: 2402 hosts |
[production] |
07:20 |
<brouberol@deploy1003> |
helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. |
[production] |
07:20 |
<brouberol@deploy1003> |
helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. |
[production] |
07:12 |
<slyngshede@cumin1002> |
DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Debt out of all services on: 2402 hosts |
[production] |
03:15 |
<andrew@cloudcumin1001> |
END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment codfw1dev for all services |
[admin] |
03:07 |
<andrew@cloudcumin1001> |
START - Cookbook wmcs.openstack.restart_openstack on deployment codfw1dev for all services |
[admin] |
02:36 |
<andrewbogott> |
rebooting cloudnet2005-dev from mgmt -- ssh is failing and the console shows a user prompt but not a password prompt. |
[admin] |
02:36 |
<andrew@cloudcumin1001> |
END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-19 |
[tools] |
02:32 |
<andrew@cloudcumin1001> |
START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-19 |
[tools] |
2025-05-10
§
|
19:58 |
<lucaswerkmeister> |
webservice restart (per request on behalf of tool maintainer, as the bastion is having issues atm) |
[tools.letaxobot] |
17:35 |
<lucaswerkmeister> |
root@tools-bastion-13:~# systemctl restart sssd-sudo{,.socket} # looks like the reset-failed didn’t work properly, systemd didn’t even try to start the service again afaict (T393732) |
[tools] |
17:33 |
<lucaswerkmeister> |
root@tools-bastion-13:~# systemctl reset-failed sssd-{pam,sudo}.service && systemctl restart sssd-pam{,-priv}.socket # try to reset the rate limits this way (T393732) |
[tools] |
16:22 |
<lucaswerkmeister> |
systemctl restart sssd-{pam{,-priv},sudo}.socket # service-start-limit-hit, T393732? |
[tools] |
14:10 |
<lucaswerkmeister> |
root@tools-bastion-13:~# systemctl restart sssd-sudo.socket # service-start-limit-hit, T393732? |
[tools] |
11:53 |
<lucaswerkmeister> |
T393732 note: restart of sssd-pam.service actually failed, “may be requested by dependency only”; overall it still seems to have worked though (so next time restarting the sockets is probably sufficient) |
[tools] |
11:52 |
<lucaswerkmeister> |
root@tools-bastion-13:~# systemctl restart sssd-pam{,{,-priv}.socket} # all three failed with start-limit-hit / Start request repeated too quickly; T393732? |
[tools] |
00:41 |
<dani@deploy1003> |
helmfile [codfw] DONE helmfile.d/services/miscweb: apply |
[production] |
00:41 |
<dani@deploy1003> |
helmfile [codfw] START helmfile.d/services/miscweb: apply |
[production] |
00:41 |
<dani@deploy1003> |
helmfile [eqiad] DONE helmfile.d/services/miscweb: apply |
[production] |
00:41 |
<dani@deploy1003> |
helmfile [eqiad] START helmfile.d/services/miscweb: apply |
[production] |
00:41 |
<dani@deploy1003> |
helmfile [staging] DONE helmfile.d/services/miscweb: apply |
[production] |
00:41 |
<dani@deploy1003> |
helmfile [staging] START helmfile.d/services/miscweb: apply |
[production] |
00:23 |
<dani@deploy1003> |
helmfile [codfw] DONE helmfile.d/services/miscweb: apply |
[production] |
00:22 |
<dani@deploy1003> |
helmfile [codfw] START helmfile.d/services/miscweb: apply |
[production] |
00:22 |
<dani@deploy1003> |
helmfile [eqiad] DONE helmfile.d/services/miscweb: apply |
[production] |
00:22 |
<dani@deploy1003> |
helmfile [eqiad] START helmfile.d/services/miscweb: apply |
[production] |
00:22 |
<dani@deploy1003> |
helmfile [staging] DONE helmfile.d/services/miscweb: apply |
[production] |