| 2016-01-21
      
      § | 
    
  | 21:00 | <YuviPanda> | stop gridengine master | [tools] | 
            
  | 20:51 | <YuviPanda> | repooled exec nodes on labvirt1007 was last message | [tools] | 
            
  | 20:51 | <YuviPanda> | repooled exec nodes on labvirt1006 | [tools] | 
            
  | 20:39 | <YuviPanda> | failover tools-static too tools-web-static-01 | [tools] | 
            
  | 20:38 | <YuviPanda> | failover tools-checker to tools-checker-01 | [tools] | 
            
  | 20:32 | <YuviPanda> | depooled exec nodes on 1007 | [tools] | 
            
  | 20:32 | <YuviPanda> | repooled exec nodes on 1006 | [tools] | 
            
  | 20:14 | <YuviPanda> | depooled all exec nodes in labvirt1006 | [tools] | 
            
  | 20:11 | <YuviPanda> | repooled exec node son 1005 | [tools] | 
            
  | 19:53 | <YuviPanda> | depooled exec nodes on labvirt1005 | [tools] | 
            
  | 19:49 | <YuviPanda> | repooled exec nodes from labvirt1004 | [tools] | 
            
  | 19:48 | <YuviPanda> | failed over proxy to tools-proxy-01 again | [tools] | 
            
  | 19:31 | <YuviPanda> | depooled exec nodes from labvirt1004 | [tools] | 
            
  | 19:29 | <YuviPanda> | repooled exec nodes from labvirt1003 | [tools] | 
            
  | 19:13 | <YuviPanda> | depooled instances on labvirt1003 | [tools] | 
            
  | 19:06 | <YuviPanda> | re-enabled queues on exec nodes that were on labvirt1002 | [tools] | 
            
  | 19:02 | <YuviPanda> | failed over tools proxy to tools-proxy-02 | [tools] | 
            
  | 18:46 | <YuviPanda> | drained and disabled queues on all nodes on labvirt1002 | [tools] | 
            
  | 18:38 | <YuviPanda> | restarted all restartable jobs in instances on labvirt1001 and deleted all non-restartable ghost jobs. these were already dead | [tools] | 
            
  
    | 2016-01-11
      
      § | 
    
  | 22:19 | <valhallasw`cloud> | reset maxujobs 0->128, job_load_adjustments none->np_load_avg=0.50, load_ad... -> 0:7:30 | [tools] | 
            
  | 22:12 | <YuviPanda> | restarted gridengine master again | [tools] | 
            
  | 22:07 | <valhallasw`cloud> | set job_load_adjustments from np_load_avg=0.50 to none and load_adjustment_decay_time to 0:0:0 | [tools] | 
            
  | 22:05 | <valhallasw`cloud> | set maxujobs back to 0, but doesn't help | [tools] | 
            
  | 21:57 | <valhallasw`cloud> | reset to 7:30 | [tools] | 
            
  | 21:57 | <valhallasw`cloud> | that cleared the measure, but jobs still not starting. Ugh! | [tools] | 
            
  | 21:55 | <valhallasw`cloud> | set job_load_adjustments_decay_time = 0:0:0 | [tools] | 
            
  | 21:45 | <YuviPanda> | restarted gridengine master | [tools] | 
            
  | 21:43 | <valhallasw`cloud> | qstat -j <jobid> shows all queues overloaded; seems to have started just after a load test for the new maxujobs setting | [tools] | 
            
  | 21:42 | <valhallasw`cloud> | resetting to 0:7:30, as it's not having the intended effect | [tools] | 
            
  | 21:41 | <valhallasw`cloud> | currently 353 jobs in qw state | [tools] | 
            
  | 21:40 | <valhallasw`cloud> | that's load_adjustment_decay_time | [tools] | 
            
  | 21:40 | <valhallasw`cloud> | temporarily sudo qconf -msconf to 0:0:1 | [tools] | 
            
  | 19:59 | <YuviPanda> | Set maxujobs (max concurrent jobs per user) on gridengine to 128 | [tools] | 
            
  | 17:51 | <YuviPanda> | kill all queries running on labsdb1003 | [tools] | 
            
  | 17:20 | <YuviPanda> | stopped webservice for quentinv57-tools | [tools] |