5001-5050 of 7750 results (33ms)
2020-09-17 §
15:34 <andrewbogott> depooling tools-k8s-worker-70 and tools-k8s-worker-66 for flavor remapping [tools]
15:30 <andrewbogott> repooling tools-sgeexec-0909, 0908, 0907, 0906, 0904 [tools]
15:21 <andrewbogott> depooling tools-sgeexec-0909, 0908, 0907, 0906, 0904 for flavor remapping [tools]
13:55 <andrewbogott> depooled tools-sgewebgrid-lighttpd-0917 and tools-sgewebgrid-lighttpd-0920 [tools]
13:55 <andrewbogott> repooled tools-sgeexec-0937 after move to ceph [tools]
13:45 <andrewbogott> depooled tools-sgeexec-0937 for move to ceph [tools]
2020-09-16 §
23:20 <andrewbogott> repooled tools-sgeexec-0941 and tools-sgeexec-0939 for move to ceph [tools]
23:03 <andrewbogott> depooled tools-sgeexec-0941 and tools-sgeexec-0939 for move to ceph [tools]
23:02 <andrewbogott> uncordoned tools-k8s-worker-58, tools-k8s-worker-56, tools-k8s-worker-42 for migration to ceph [tools]
22:29 <andrewbogott> draining tools-k8s-worker-58, tools-k8s-worker-56, tools-k8s-worker-42 for migration to ceph [tools]
17:37 <andrewbogott> service gridengine-master restart on tools-sgegrid-master [tools]
2020-09-10 §
15:37 <arturo> hard-rebooting tools-proxy-05 [tools]
15:33 <arturo> rebooting tools-proxy-05 to try flushing local DNS caches [tools]
15:25 <arturo> detected missing DNS record for k8s.tools.eqiad1.wikimedia.cloud which means the k8s cluster is down [tools]
10:22 <arturo> enabling ingress dedicated worker nodes in the k8s cluster (T250172) [tools]
2020-09-09 §
11:12 <arturo> new ingress nodes added to the cluster, and tainted/labeled per the docs https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes/Deploying#ingress_nodes (T250172) [tools]
10:50 <arturo> created puppet prefix `tools-k8s-ingress` (T250172) [tools]
10:42 <arturo> created VMs tools-k8s-ingress-1 and tools-k8s-ingress-2 in the `tools-ingress` server group T250172) [tools]
10:38 <arturo> created server group `tools-ingress` with soft anti affinity policy (T250172) [tools]
2020-09-08 §
23:24 <bstorm> clearing grid queue error states blocking job runs [tools]
22:53 <bd808> forcing puppet run on tools-sgebastion-07 [tools]
2020-09-02 §
18:13 <andrewbogott> moving tools-sgeexec-0920 to ceph [tools]
17:57 <andrewbogott> moving tools-sgeexec-0942 to ceph [tools]
2020-08-31 §
19:58 <andrewbogott> migrating tools-sgeexec-091[0-9] to ceph [tools]
17:19 <andrewbogott> migrating tools-sgeexec-090[4-9] to ceph [tools]
17:19 <andrewbogott> repooled tools-sgeexec-0901 [tools]
16:52 <bstorm> `apt install uwsgi` was run on tools-checker-03 in the last log T261677 [tools]
16:51 <bstorm> running `apt install uwsgi` with --allow-downgrades to fix the puppet setup there T261677 [tools]
14:26 <andrewbogott> depooling tools-sgeexec-0901, migrating to ceph [tools]
2020-08-30 §
00:57 <Krenair> also ran qconf -ds on each [tools]
00:34 <Krenair> Tidied up SGE problems (it was spamming root@ every minute for hours) following host deletions some hours ago - removed tools-sgeexec-0921 through 0931 from @general, ran qmod -rj on all jobs registered for those nodes, then qdel -f on the remainders, then qconf -de on each deleted node [tools]
2020-08-29 §
16:02 <bstorm> deleting "tools-sgeexec-0931", "tools-sgeexec-0930", "tools-sgeexec-0929", "tools-sgeexec-0928", "tools-sgeexec-0927" [tools]
16:00 <bstorm> deleting "tools-sgeexec-0926", "tools-sgeexec-0925", "tools-sgeexec-0924", "tools-sgeexec-0923", "tools-sgeexec-0922", "tools-sgeexec-0921" [tools]
2020-08-26 §
21:08 <bd808> Disabled puppet on tools-proxy-06 to test fixes for a bug in the new T251628 code [tools]
08:54 <arturo> merged several patches by bryan for toolforge front proxy (cleanups, etc) example: https://gerrit.wikimedia.org/r/c/operations/puppet/+/622435 [tools]
2020-08-25 §
19:38 <andrewbogott> deleting tools-sgeexec-0943.tools.eqiad.wmflabs, tools-sgeexec-0944.tools.eqiad.wmflabs, tools-sgeexec-0945.tools.eqiad.wmflabs, tools-sgeexec-0946.tools.eqiad.wmflabs, tools-sgeexec-0948.tools.eqiad.wmflabs, tools-sgeexec-0949.tools.eqiad.wmflabs, tools-sgeexec-0953.tools.eqiad.wmflabs — they are broken and we're not very curious why; will retry this exercise when everything is standardized on [tools]
15:03 <andrewbogott> removing non-ceph nodes tools-sgeexec-0921 through tools-sgeexec-0931 [tools]
15:02 <andrewbogott> added new sge-exec nodes tools-sgeexec-0943 through tools-sgeexec-0953 (for real this time) [tools]
2020-08-19 §
21:29 <andrewbogott> shutting down and removing tools-k8s-worker-20 through tools-k8s-worker-29; this load can now be handled by new nodes on ceph hosts [tools]
21:15 <andrewbogott> shutting down and removing tools-k8s-worker-1 through tools-k8s-worker-19; this load can now be handled by new nodes on ceph hosts [tools]
18:40 <andrewbogott> creating 13 new xlarge k8s worker nodes, tools-k8s-worker-67 through tools-k8s-worker-79 [tools]
2020-08-18 §
15:24 <bd808> Rebuilding all Docker containers to pick up newest versions of installed packages [tools]
2020-07-30 §
16:28 <andrewbogott> added new xlarge ceph-hosted worker nodes: tools-k8s-worker-61, 62, 63, 64, 65, 66. T258663 [tools]
2020-07-29 §
23:24 <bd808> Pushed a copy of docker-registry.wikimedia.org/wikimedia-jessie:latest to docker-registry.tools.wmflabs.org/wikimedia-jessie:latest in preparation for the upstream image going away [tools]
2020-07-24 §
22:33 <bd808> Removed a few more ancient docker images: grrrit, jessie-toollabs, and nagf [tools]
21:02 <bd808> Running cleanup script to delete the non-sssd toolforge images from docker-registry.tools.wmflabs.org [tools]
20:17 <bd808> Forced garbage collection on docker-registry.tools.wmflabs.org [tools]
20:06 <bd808> Running cleanup script to delete all of the old toollabs-* images from docker-registry.tools.wmflabs.org [tools]
2020-07-22 §
23:24 <bstorm> created server group 'tools-k8s-worker' to create any new worker nodes in so that they have a low chance of being scheduled together by openstack unless it is necessary T258663 [tools]
23:22 <bstorm> running puppet and NFS 4.2 remount on tools-k8s-worker-[56-60] T257945 [tools]