| 2019-02-25
      
      § | 
    
  | 23:20 | <bstorm_> | Depooled tools-sgeexec-0914 and tools-sgeexec-0915 for T217066 | [tools] | 
            
  | 21:41 | <andrewbogott> | depooling tools-sgeexec-0911, tools-sgeexec-0912, tools-sgeexec-0913 to test T217066 | [tools] | 
            
  | 13:11 | <chicocvenancio> | PAWS:  Stopped AABot notebook pod T217010 | [tools] | 
            
  | 12:54 | <chicocvenancio> | PAWS:  Restarted Criscod notebook pod T217010 | [tools] | 
            
  | 12:21 | <chicocvenancio> | PAWS: killed proxy and hub pods to attempt to get it to see routes to open notebooks servers to no avail. Restarted BernhardHumm's notebook pod T217010 | [tools] | 
            
  | 09:50 | <gtirloni> | rebooted tools-sgeexec-09{16,22,40} (T216988) | [tools] | 
            
  | 09:41 | <gtirloni> | rebooted tools-sgeexec-09{16,22,40} | [tools] | 
            
  | 08:37 | <zhuyifei1999_> | uncordon tools-worker-1015.tools.eqiad.wmflabs | [tools] | 
            
  | 08:34 | <legoktm> | hard rebooted tools-worker-1015 via horizon | [tools] | 
            
  | 07:48 | <zhuyifei1999_> | systemd stuck in D state. :( | [tools] | 
            
  | 07:44 | <zhuyifei1999_> | I saved dmesg and process list to a few files in /root if that helps debugging | [tools] | 
            
  | 07:43 | <zhuyifei1999_> | D states are not responding to SIGKILL. Will reboot. | [tools] | 
            
  | 07:37 | <zhuyifei1999_> | tools-worker-1015.tools.eqiad.wmflabs having severe NFS issues (all NFS accessing processes are stuck in D state). Draining. | [tools] | 
            
  
    | 2019-02-20
      
      § | 
    
  | 23:30 | <zhuyifei1999_> | begin rebuilding all docker images T178601 T193646 T215683 | [tools] | 
            
  | 23:25 | <zhuyifei1999_> | upgraded toollabs-webservice on tools-bastion-02 to 0.44 (newly-built version) | [tools] | 
            
  | 23:19 | <zhuyifei1999_> | this was built for stretch. hopefully it works for all distros | [tools] | 
            
  | 23:17 | <zhuyifei1999_> | begin build new tools-webservice package T178601 T193646 T215683 | [tools] | 
            
  | 21:57 | <andrewbogott> | moving tools-static-13  to a new virt host | [tools] | 
            
  | 21:34 | <andrewbogott> | moving the tools-static IP from tools-static-13 to tools-static-12 | [tools] | 
            
  | 19:17 | <andrewbogott> | moving tools-bastion-02 to labvirt1004 | [tools] | 
            
  | 16:56 | <andrewbogott> | moving tools-paws-worker-1003 | [tools] | 
            
  | 15:53 | <andrewbogott> | moving tools-worker-1017, tools-worker-1027, tools-worker-1028 | [tools] | 
            
  | 15:03 | <andrewbogott> | moving tools-exec-1413 and tools-exec-1442 | [tools] | 
            
  
    | 2019-02-16
      
      § | 
    
  | 05:00 | <zhuyifei1999_> | fixed by restarting flannel. another puppet run simply started kubelet | [tools] | 
            
  | 04:58 | <zhuyifei1999_> | puppet logs: https://phabricator.wikimedia.org/P8097. Docker is failing with 'Failed to load environment files: No such file or directory' | [tools] | 
            
  | 04:52 | <zhuyifei1999_> | copied the resolv.conf from tools-k8s-master-01, removing secondary DNS to make sure puppet fixes that, and starting puppet | [tools] | 
            
  | 04:48 | <zhuyifei1999_> | that host's resolv.conf is badly broken https://phabricator.wikimedia.org/P8096. The last Puppet run was at Thu Feb 14 15:21:09 UTC 2019 (2247 minutes ago) | [tools] | 
            
  | 04:44 | <zhuyifei1999_> | puppet is also failing bad here 'Error: Could not request certificate: getaddrinfo: Name or service not known' | [tools] | 
            
  | 04:43 | <zhuyifei1999_> | this one has logs full of 'Can't contact LDAP server' | [tools] | 
            
  | 04:41 | <zhuyifei1999_> | nslcd also broken on tools-worker-1005 | [tools] | 
            
  | 04:34 | <zhuyifei1999_> | uncordon tools-worker-1014.tools.eqiad.wmflabs | [tools] | 
            
  | 04:33 | <zhuyifei1999_> | the issue was, /var/run/nslcd/socket was somehow a directory, AFAICT | [tools] | 
            
  | 04:31 | <zhuyifei1999_> | then started nslcd vis systemctl and `id zhuyifei1999` returns correct stuffs | [tools] | 
            
  | 04:30 | <zhuyifei1999_> | `nslcd -nd` complains about 'nslcd: bind() to /var/run/nslcd/socket failed: Address already in use'. SIGTERMed a background nslcd, `rmdir /var/run/nslcd/socket`, and `nslcd -nd` seemingly starts to work | [tools] |