| 
      
        2014-08-18
      
      §
     | 
  
    
  | 22:22 | 
  <^d> | 
  dropped apache01/02 instances, unused and need the resources | 
  [releng] | 
            
  | 18:23 | 
  <manybubbles> | 
  finished upgrading elasticsearch in beta - everything seems ok so far | 
  [releng] | 
            
  | 18:15 | 
  <bd808> | 
  Restarted salt-minion on deployment-mediawiki01 & deployment-rsync01 | 
  [releng] | 
            
  | 18:15 | 
  <bd808> | 
  Ran `sudo pkill python` on deployment-rsync01 to kill hundreds of grain-ensure processes | 
  [releng] | 
            
  | 18:12 | 
  <bd808> | 
  Ran `sudo pkill python` on deployment-mediawiki01 to kill hundreds of grain-ensure processes | 
  [releng] | 
            
  | 18:10 | 
  <manybubbles> | 
  finally restarting beta's elasticsearch servers now that they have new jars | 
  [releng] | 
            
  | 17:56 | 
  <bd808> | 
  Manually ran trebuchet fetches on deployment-elastic0* | 
  [releng] | 
            
  | 17:49 | 
  <bd808> | 
  Forcing puppet run on deployment-elastic01 | 
  [releng] | 
            
  | 17:47 | 
  <godog> | 
  upgraded hhvm on mediawiki02 to 3.3-dev+20140728+wmf5 | 
  [releng] | 
            
  | 17:44 | 
  <bd808> | 
  Trying to restart minions again with `salt '*' -b 1 service.restart salt-minion` | 
  [releng] | 
            
  | 17:39 | 
  <bd808> | 
  Restarting minions via `salt '*' service.restart salt-minion` | 
  [releng] | 
            
  | 17:38 | 
  <bd808> | 
  Restarted salt-master service on deployment-salt | 
  [releng] | 
            
  | 17:19 | 
  <bd808> | 
  16:37 Restarted Apache and HHVM on deployment-mediawiki02 to pick up removal of /etc/php5/conf.d/mail.ini (logged in prod SAL by mistake) | 
  [releng] | 
            
  | 16:59 | 
  <manybubbles|lunc> | 
  upgrading Elasticsearch in beta to 1.3.2 | 
  [releng] | 
            
  | 16:11 | 
  <bd808> | 
  Manually applied https://gerrit.wikimedia.org/r/#/c/141287/12/templates/mail/exim4.minimal.erb on deployment-mediawiki02 and restarted exim4 service | 
  [releng] | 
            
  | 15:28 | 
  <bd808> | 
  Puppet failing for deployment-mathoid due to duplicate definition error in trebuchet config | 
  [releng] | 
            
  | 15:15 | 
  <bd808> | 
  Reinstated puppet patch to depool deployment-mediawiki01 and forced puppet run on all deployment-cache-* hosts | 
  [releng] | 
            
  | 15:04 | 
  <bd808> | 
  Puppet run failing on deployment-mediawiki01 (apache won't start); Puppet disabled on deployment-mediawiki02 ('reason not specified') Probably needs to wait until Giuseppe is back from vacation for fixing. | 
  [releng] | 
            
  | 15:00 | 
  <bd808> | 
  Rebooting deployment-eventlogging02 via wikitech; console filling with OOM killer messages and puppet runs failing with "Cannot allocate memory - fork(2)" | 
  [releng] | 
            
  | 14:29 | 
  <bd808> | 
  Forced puppet run on deployment-cache-upload02 | 
  [releng] | 
            
  | 14:27 | 
  <bd808> | 
  Forced puppet run on deployment-cache-text02 | 
  [releng] | 
            
  | 14:24 | 
  <bd808> | 
  Forced puppet run on deployment-cache-mobile03 | 
  [releng] | 
            
  | 14:20 | 
  <bd808> | 
  Forced puppet run on deployment-cache-bits01 | 
  [releng] | 
            
  
    | 
      
        2014-08-15
      
      §
     | 
  
    
  | 21:57 | 
  <legoktm> | 
  set $wgVERPsecret in PrivateSettings.php | 
  [releng] | 
            
  | 21:42 | 
  <hashSpeleology> | 
  Beta cluster database updates are broken due to CentralNotice. Fix up is {{gerrit|154231}} | 
  [releng] | 
            
  | 20:57 | 
  <hashSpeleology> | 
  deployment-rsync01 : deleting /usr/local/apache/common-local content.  Then ln -s /srv/common-local /usr/local/apache/common-local  as set by beta::common which is not applied on that host for some reason.  {{bug|69590}} | 
  [releng] | 
            
  | 20:55 | 
  <hashSpeleology> | 
  puppet administratively disabled on mediawiki02 . Assuming some work in progress on that host. Leaving it untouched | 
  [releng] | 
            
  | 20:54 | 
  <hashSpeleology> | 
  puppet is proceeding on mediawiki01 | 
  [releng] | 
            
  | 20:52 | 
  <hashSpeleology> | 
  attempting to unbreak mediawiki code update {{bug|69590}} by cherry picking {{gerrit|154329}} | 
  [releng] | 
            
  | 20:39 | 
  <hashSpeleology> | 
  in case it is not in SAL.  MediaWiki is no more synced to app server {{bug|69590}} | 
  [releng] | 
            
  | 20:20 | 
  <hashSpeleology> | 
  rebooting mediawiki01  , /var refuses to clear out and stick at 100% usage | 
  [releng] | 
            
  | 20:16 | 
  <hashSpeleology> | 
  cleaning up /var/log on deployment-mediawiki02 | 
  [releng] | 
            
  | 20:14 | 
  <hashSpeleology> | 
  on deployment-mediawiki01 deleting /var/log/apache2/access.log.1 | 
  [releng] | 
            
  | 20:13 | 
  <hashSpeleology> | 
  on deployment-mediawiki01 deleting /var/log/apache2/debug.log.1 | 
  [releng] | 
            
  | 20:13 | 
  <hashSpeleology> | 
  bunch of instances have a full /var/log :-/ | 
  [releng] | 
            
  | 11:37 | 
  <ori> | 
  deployment-cache-bits01 unresponsive; console shows OOMs: https://dpaste.de/LDRi/raw . rebooting | 
  [releng] | 
            
  | 03:20 | 
  <jeremyb> | 
  02:46:37 UTC <ebernhardson> !log beta /dev/vda1 full. moved /srv-old to /mnt/srv-old and freed up 2.1G | 
  [releng] |