| 
      
        2019-02-25
      
      §
     | 
  
    
  | 23:20 | 
  <bstorm_> | 
  Depooled tools-sgeexec-0914 and tools-sgeexec-0915 for T217066 | 
  [tools] | 
            
  | 21:41 | 
  <andrewbogott> | 
  depooling tools-sgeexec-0911, tools-sgeexec-0912, tools-sgeexec-0913 to test T217066 | 
  [tools] | 
            
  | 13:11 | 
  <chicocvenancio> | 
  PAWS:  Stopped AABot notebook pod T217010 | 
  [tools] | 
            
  | 12:54 | 
  <chicocvenancio> | 
  PAWS:  Restarted Criscod notebook pod T217010 | 
  [tools] | 
            
  | 12:21 | 
  <chicocvenancio> | 
  PAWS: killed proxy and hub pods to attempt to get it to see routes to open notebooks servers to no avail. Restarted BernhardHumm's notebook pod T217010 | 
  [tools] | 
            
  | 09:50 | 
  <gtirloni> | 
  rebooted tools-sgeexec-09{16,22,40} (T216988) | 
  [tools] | 
            
  | 09:41 | 
  <gtirloni> | 
  rebooted tools-sgeexec-09{16,22,40} | 
  [tools] | 
            
  | 08:37 | 
  <zhuyifei1999_> | 
  uncordon tools-worker-1015.tools.eqiad.wmflabs | 
  [tools] | 
            
  | 08:34 | 
  <legoktm> | 
  hard rebooted tools-worker-1015 via horizon | 
  [tools] | 
            
  | 07:48 | 
  <zhuyifei1999_> | 
  systemd stuck in D state. :( | 
  [tools] | 
            
  | 07:44 | 
  <zhuyifei1999_> | 
  I saved dmesg and process list to a few files in /root if that helps debugging | 
  [tools] | 
            
  | 07:43 | 
  <zhuyifei1999_> | 
  D states are not responding to SIGKILL. Will reboot. | 
  [tools] | 
            
  | 07:37 | 
  <zhuyifei1999_> | 
  tools-worker-1015.tools.eqiad.wmflabs having severe NFS issues (all NFS accessing processes are stuck in D state). Draining. | 
  [tools] | 
            
  
    | 
      
        2019-02-20
      
      §
     | 
  
    
  | 23:30 | 
  <zhuyifei1999_> | 
  begin rebuilding all docker images T178601 T193646 T215683 | 
  [tools] | 
            
  | 23:25 | 
  <zhuyifei1999_> | 
  upgraded toollabs-webservice on tools-bastion-02 to 0.44 (newly-built version) | 
  [tools] | 
            
  | 23:19 | 
  <zhuyifei1999_> | 
  this was built for stretch. hopefully it works for all distros | 
  [tools] | 
            
  | 23:17 | 
  <zhuyifei1999_> | 
  begin build new tools-webservice package T178601 T193646 T215683 | 
  [tools] | 
            
  | 21:57 | 
  <andrewbogott> | 
  moving tools-static-13  to a new virt host | 
  [tools] | 
            
  | 21:34 | 
  <andrewbogott> | 
  moving the tools-static IP from tools-static-13 to tools-static-12 | 
  [tools] | 
            
  | 19:17 | 
  <andrewbogott> | 
  moving tools-bastion-02 to labvirt1004 | 
  [tools] | 
            
  | 16:56 | 
  <andrewbogott> | 
  moving tools-paws-worker-1003 | 
  [tools] | 
            
  | 15:53 | 
  <andrewbogott> | 
  moving tools-worker-1017, tools-worker-1027, tools-worker-1028 | 
  [tools] | 
            
  | 15:03 | 
  <andrewbogott> | 
  moving tools-exec-1413 and tools-exec-1442 | 
  [tools] | 
            
  
    | 
      
        2019-02-16
      
      §
     | 
  
    
  | 05:00 | 
  <zhuyifei1999_> | 
  fixed by restarting flannel. another puppet run simply started kubelet | 
  [tools] | 
            
  | 04:58 | 
  <zhuyifei1999_> | 
  puppet logs: https://phabricator.wikimedia.org/P8097. Docker is failing with 'Failed to load environment files: No such file or directory' | 
  [tools] | 
            
  | 04:52 | 
  <zhuyifei1999_> | 
  copied the resolv.conf from tools-k8s-master-01, removing secondary DNS to make sure puppet fixes that, and starting puppet | 
  [tools] | 
            
  | 04:48 | 
  <zhuyifei1999_> | 
  that host's resolv.conf is badly broken https://phabricator.wikimedia.org/P8096. The last Puppet run was at Thu Feb 14 15:21:09 UTC 2019 (2247 minutes ago) | 
  [tools] | 
            
  | 04:44 | 
  <zhuyifei1999_> | 
  puppet is also failing bad here 'Error: Could not request certificate: getaddrinfo: Name or service not known' | 
  [tools] | 
            
  | 04:43 | 
  <zhuyifei1999_> | 
  this one has logs full of 'Can't contact LDAP server' | 
  [tools] | 
            
  | 04:41 | 
  <zhuyifei1999_> | 
  nslcd also broken on tools-worker-1005 | 
  [tools] | 
            
  | 04:34 | 
  <zhuyifei1999_> | 
  uncordon tools-worker-1014.tools.eqiad.wmflabs | 
  [tools] | 
            
  | 04:33 | 
  <zhuyifei1999_> | 
  the issue was, /var/run/nslcd/socket was somehow a directory, AFAICT | 
  [tools] | 
            
  | 04:31 | 
  <zhuyifei1999_> | 
  then started nslcd vis systemctl and `id zhuyifei1999` returns correct stuffs | 
  [tools] |