2016-09-20
§
|
19:00 |
<thcipriani> |
cherry-picked https://gerrit.wikimedia.org/r/#/c/311760/ to deployment-puppetmaster to fix failing beta-scap-eqiad job, had to manually start rsync, puppet failed to start |
[releng] |
18:38 |
<hashar> |
on tin: `sudo -u jenkins-deploy -H SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh mwdeploy@deployment-mira02.deployment-prep.eqiad.wmflabs` - T144006 |
[releng] |
18:33 |
<hashar> |
on deployment-mira02 ran `sudo -u jenkins-deploy -H SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh mwdeploy@deployment-mediawiki04.deployment-prep.eqiad.wmflabs` per T144006 |
[releng] |
18:01 |
<marxarelli> |
deployed mediawiki-config changes on beta cluster. back in read/write mode using new database instances |
[releng] |
17:37 |
<marxarelli> |
deployment-db04 restored from backup and replication started |
[releng] |
16:54 |
<marxarelli> |
upgraded package and data to mariadb 10 on deployment-db03 |
[releng] |
16:31 |
<marxarelli> |
cherry picking operations/puppet patches (T138778) to deployment-puppetmaster |
[releng] |
16:30 |
<moritzm> |
rebooting deployment-mira02 |
[releng] |
16:23 |
<marxarelli> |
applied innodb transaction logs to deployment-db1 backup and successfully restored on deployment-db03 |
[releng] |
15:47 |
<marxarelli> |
completed innobackupex on deployment-db1. copying backup to deployment-db03 for restoration |
[releng] |
14:54 |
<hashar> |
beta: cherry picking fix up for the jobrunner logging https://gerrit.wikimedia.org/r/#/c/311702/ and https://gerrit.wikimedia.org/r/311719 T146040 |
[releng] |
14:44 |
<marxarelli> |
entering read-only mode on beta cluster |
[releng] |
14:27 |
<elukey> |
stopped puppet, jobrunner and jobchron on deployment-jobrunner01 |
[releng] |
14:20 |
<marxarelli> |
disabling beta cluster jenkins jobs in preparation for data migration (T138778) |
[releng] |
13:07 |
<godog> |
add deployment-prometheus01 instance T53497 |
[releng] |
11:20 |
<elukey> |
applied beta::deployaccess, role::labs::lvm::srv, role::mediawiki::jobrunner to jobrunner02 |
[releng] |
10:45 |
<elukey> |
created deployment-jobrunner02 in deployment-prep |
[releng] |
2016-09-19
§
|
22:01 |
<legoktm> |
shutdown integration-puppetmaster |
[releng] |
21:29 |
<yuvipanda> |
regenerated client certs only on integration-puppetmaster01, seems ok now |
[releng] |
20:46 |
<yuvipanda> |
re-enable puppet everywhere |
[releng] |
20:43 |
<yuvipanda> |
enable puppet and run on integration-slave-trusty-1003.eqiad.wmflabs |
[releng] |
20:42 |
<yuvipanda> |
accidentally deleted /var/lib/puppet/ssl on integration-puppetmaster01 as well, causing it to lose keys. Reprovision by pointing to labs puppetmaster |
[releng] |
20:34 |
<yuvipanda> |
rm -rf /var/lib/puppet/ssl on all integration nodes |
[releng] |
20:34 |
<yuvipanda> |
copied /etc/puppet/puppet.conf from integration-trusty-slave-1001 to all integration |
[releng] |
20:25 |
<yuvipanda> |
delete /etc/puppet/puppet.conf.d/10-self.conf and /var/lib/puppet/ssl on integration-slave-trusty-1001 |
[releng] |
20:20 |
<yuvipanda> |
re-enabled puppet on integration-slave-trusty-1001 |
[releng] |
20:08 |
<yuvipanda> |
reset puppetmaster of integration-puppetmaster01 to be labs puppetmaster |
[releng] |
20:03 |
<yuvipanda> |
disable puppet across integration project, moving puppetmasters |
[releng] |
19:49 |
<legoktm> |
creating T144951 enabled role::puppetmaster::standalone role on integration-puppetmaster01 |
[releng] |
19:33 |
<legoktm> |
creating T144951 integration-puppetmaster01 instance using m1.small and debian jessie |
[releng] |
15:11 |
<hashar> |
beta: updating jobrunner service 0dc341f..a0e8216 |
[releng] |
2016-09-16
§
|
21:03 |
<hashar> |
deployment-tin did a git gc on /srv/deployment/ores That freed up disk space and cleared an alarm on co master mira02 |
[releng] |
21:00 |
<hashar> |
deleted deployment-parsoid05 |
[releng] |
20:52 |
<hashar> |
fixed puppet on deployment-parsoid05 . Temporary instance will delete it later to clear out shinken.wmflabs.org |
[releng] |
20:27 |
<hashar> |
beta: force running puppet in batches of 4 instances: salt --batch 4 -v 'deployment-*' cmd.run 'puppet agent -tv' |
[releng] |
20:13 |
<hashar> |
beta: restarted puppetmaster |
[releng] |
20:07 |
<hashar> |
beta: salt -v '*' cmd.run 'rm -fR /var/lib/puppet/client/ssl/' |
[releng] |
20:07 |
<hashar> |
beta: stopping puppetmaster, rm -f /var/lib/puppet/server/ssl/ca/signed/* |
[releng] |
19:53 |
<hashar> |
beta created instance "deployment-parsoid05" Should be deleted later, that is merely to purge the hostname from Shinken ( http://shinken.wmflabs.org/host/deployment-parsoid05 ) |
[releng] |
11:42 |
<hashar> |
beta: apt-get upgrade on deployment-jobrunner01 |
[releng] |
11:36 |
<hashar> |
apt-get upgrade on deployment-tin , bring in a new hhvm version and others |
[releng] |
2016-09-15
§
|
22:29 |
<legoktm> |
sudo salt '*precise*' cmd.run 'service mysql start', all mysql's are down |
[releng] |
16:45 |
<godog> |
install xenial kernel on deployment-zotero01 and reboot T145793 |
[releng] |
16:18 |
<hashar> |
prometheus enabled on all beta cluster instance. Does not support Precise hence puppet will fail on the last two Precise instances deployment-db1 and deployment-db2 until they are migrated to Jessie T138778 |
[releng] |
15:53 |
<godog> |
add role::prometheus::node_exporter to classes in hiera:deployment-prep T144502 |
[releng] |
15:10 |
<hashar> |
beta: Applying puppet class role::prometheus::node_exporter to mira02 just like mira. That is for godog |
[releng] |
15:08 |
<hashar> |
T144006 Disabled Jenkins job beta-scap-eqiad. On mira02 rm -fR /srv/* . Applying puppet for role::labs::lvm::srv |
[releng] |
15:05 |
<hashar> |
T144006 Applying class role::labs::lvm::srv to mira02 (it is out of disk space :D ) |
[releng] |
14:45 |
<hashar> |
T144006 sudo -u jenkins-deploy -H SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh mwdeploy@mira02.deployment-prep.eqiad.wmflabs |
[releng] |