|
2020-02-27
§
|
| 18:53 |
<bstorm_> |
hard rebooted a rather stuck tools-sgecron-01 |
[tools] |
| 18:20 |
<bd808> |
Building tools-k8s-worker-[36-55] |
[tools] |
| 17:56 |
<bd808> |
Deleted instances tools-worker-10[21-40] |
[tools] |
| 16:14 |
<bd808> |
Decommissioning tools-worker-10[21-40] |
[tools] |
| 16:02 |
<bd808> |
Drained tools-worker-1021 |
[tools] |
| 15:51 |
<bd808> |
Drained tools-worker-1022 |
[tools] |
| 15:44 |
<bd808> |
Drained tools-worker-1023 (there is no tools-worker-1024) |
[tools] |
| 15:39 |
<bd808> |
Drained tools-worker-1025 |
[tools] |
| 15:39 |
<bd808> |
Drained tools-worker-1026 |
[tools] |
| 15:11 |
<bd808> |
Drained tools-worker-1027 |
[tools] |
| 15:09 |
<bd808> |
Drained tools-worker-1028 (there is no tools-worker-1029) |
[tools] |
| 15:07 |
<bd808> |
Drained tools-worker-1030 |
[tools] |
| 15:06 |
<bd808> |
Uncordoned tools-worker-10[16-20]. Was over optimistic about repacking legacy Kubernetes cluster into 15 instances. Will keep 20 for now. |
[tools] |
| 15:00 |
<bd808> |
Drained tools-worker-1031 |
[tools] |
| 14:54 |
<bd808> |
Hard reboot tools-worker-1016. Direct virsh console unresponsive. Stuck in shutdown since 2020-01-22? |
[tools] |
| 14:44 |
<bd808> |
Uncordoned tools-worker-1009.tools.eqiad.wmflabs |
[tools] |
| 14:41 |
<bd808> |
Drained tools-worker-1032 |
[tools] |
| 14:37 |
<bd808> |
Drained tools-worker-1033 |
[tools] |
| 14:35 |
<bd808> |
Drained tools-worker-1034 |
[tools] |
| 14:34 |
<bd808> |
Drained tools-worker-1035 |
[tools] |
| 14:33 |
<bd808> |
Drained tools-worker-1036 |
[tools] |
| 14:33 |
<bd808> |
Drained tools-worker-10{39,38,37} yesterday but did not !log |
[tools] |
| 00:29 |
<bd808> |
Drained tools-worker-1009 for reboot (NFS flakey) |
[tools] |
| 00:11 |
<bd808> |
Uncordoned tools-worker-1009.tools.eqiad.wmflabs |
[tools] |
| 00:08 |
<bd808> |
Uncordoned tools-worker-1002.tools.eqiad.wmflabs |
[tools] |
| 00:02 |
<bd808> |
Rebooting tools-worker-1002 |
[tools] |
| 00:00 |
<bd808> |
Draining tools-worker-1002 to reboot for NFS problems |
[tools] |
|
2020-02-26
§
|
| 23:42 |
<bd808> |
Drained tools-worker-1040 |
[tools] |
| 23:41 |
<bd808> |
Cordoned tools-worker-10[16-40] in preparation for shrinking legacy Kubernetes cluster |
[tools] |
| 23:12 |
<bstorm_> |
replacing all tool limit-ranges in the 2020 cluster with a lower cpu request version |
[tools] |
| 22:29 |
<bstorm_> |
deleted pod maintain-kubeusers-6d9c45f4bc-5bqq5 to deploy new image |
[tools] |
| 21:06 |
<bstorm_> |
deleting loads of stuck grid jobs |
[tools] |
| 20:27 |
<jeh> |
rebooting tools-worker-[1008,1015,1021] |
[tools] |
| 20:15 |
<bstorm_> |
rebooting tools-sgegrid-master because it actually had the permissions thing going on still |
[tools] |
| 18:03 |
<bstorm_> |
downtimed toolschecker for nfs maintenance |
[tools] |