701-750 of 1437 results (24ms)
2021-02-04 §
10:12 <dcaro> Increasing the memory limit of osds in eqiad from 8589934592(8G) to 12884901888(12G) (T273851) [admin]
2021-02-03 §
09:59 <dcaro> Doing a full vm backup on cloudvirt1024 with the new script (T260692) [admin]
01:50 <bstorm> icinga-downtime cloudnet1004 for a week T271058 [admin]
2021-02-02 §
17:14 <dcaro> Changed osd memory limit from 4G to 8G (T273649) [admin]
11:00 <arturo> icinga-downtime cloudvirt-wdqs1001 for 1 week (T273579) [admin]
03:12 <andrewbogott> running /usr/local/sbin/wmcs-purge-backups and /usr/local/sbin/wmcs-backup-instances on cloudvirt1024 to see why the backup job paged [admin]
2021-01-29 §
15:36 <andrewbogott> disabling puppet and some services on eqiad1 cloudcontrol nodes; replacing nova-placement-api with placement-api [admin]
2021-01-28 §
19:44 <andrewbogott> shutting down cloudcontrol2001-dev because it's in a partially upgraded state; will revive when it's time for Train [admin]
2021-01-27 §
00:50 <bstorm> icinga-downtime cloudnet1004 for a week T271058 [admin]
2021-01-22 §
16:44 <andrewbogott> upgrading designate on cloudvirt1003/1004 to OpenStack 'train' [admin]
11:29 <dcaro> Doing some tests removed cloudcontrol1003 puppet cert, regenerating... [admin]
2021-01-21 §
11:35 <arturo> merging core router firewall changes https://gerrit.wikimedia.org/r/c/operations/homer/public/+/657439 (T209082) [admin]
11:30 <arturo> merging core router firewall changes https://gerrit.wikimedia.org/r/c/operations/homer/public/+/657358 (T272486, T209082) [admin]
2021-01-20 §
10:49 <arturo> merging core router firewall change https://gerrit.wikimedia.org/r/c/operations/homer/public/+/657302 (T209082) [admin]
10:05 <dcaro> Everything looks ok, created a new vm with a volume in ceph without issues, and on warnings/errors on ceph status, closing (T272303) [admin]
09:55 <dcaro> Eqiad ceph cluster uprgaded, doing sanity checks (T272303) [admin]
09:46 <dcaro> 75% of the eqiad cluster upgraded... continuing (T272303) [admin]
09:37 <dcaro> 25% of the eqiad cluster upgraded... continuing (T272303) [admin]
09:24 <dcaro> Mgr daemons upgraded and running, upgrading osd daemons on servers cloudcephosd1*, this make take a bit longer (T272303) [admin]
09:22 <dcaro> Mon daemons upgraded and running, upgrading mgr daemons on servers cloudcephmon1* (T272303) [admin]
09:16 <dcaro> Starting eqiad ceph upgrade, upgrading the mon servers cloudcephmon1* (T272303) [admin]
09:01 <dcaro> Will start the ceph upgrade in 15 min, no downtime nor performance impact is expected (T272303) [admin]
2021-01-19 §
10:17 <arturo> icinga-downtime cloudnet1004 for 1 week (T271058) [admin]
2021-01-18 §
16:00 <dcaro> Codfw1 ceph cluster uprgaded, will wait until tomorrow to see if there's any instability, but everything looks fine (T272303) [admin]
15:38 <dcaro> Upgraded mgr sevices on codfw ceph cluster, starting with osd ones (T272303) [admin]
15:35 <dcaro> Upgraded mon sevices on codfw ceph cluster, starting with mgr ones (T272303) [admin]
15:21 <dcaro> Starting upgrade of ceph mon nodes on codfw (T272303) [admin]
15:06 <dcaro> re-enabling puppet on cloudcephosd2* hosts [admin]
13:53 <dcaro> disabling puppet on cloudcephosd2* to resume perf tests [admin]
10:50 <dcaro> re-enabling puppet on cephcloudosd2* (codfw) [admin]
10:07 <dcaro> disabling puppet on cephcloudosd2* (codfw) to do some performance tests [admin]
09:00 <dcaro> Enabling custom application 'cinder' on pool codfw1dev-cinder to get rid of health warnings [admin]
2021-01-17 §
16:53 <arturo> icinga downtime labstore1004 /srv/tools space check for 3 days (T272247) [admin]
2021-01-15 §
13:41 <arturo> icinga downtime labstore1004 maintain-dbuser alert until 2021-01-19 (T272125) [admin]
09:47 <arturo> labstore1004 maintain-dbusers affected by T272127 and T272125 [admin]
09:22 <arturo> restart maintain-dbusers.service in labstore1004 [admin]
08:19 <dcaro> Merging the patch to disable write caches on ceph osds (T271527) [admin]
2021-01-13 §
17:03 <arturo> remove cloudvirt1013 cloudvirt1032 cloudvirt1037 to the 'toobusy' host aggregate to prevent further CPU oversubscribing [admin]
12:40 <arturo> try increasing systemd watchdog timeout for conntrackd in cloudnet1004 (T268335) [admin]
11:45 <dcaro> https://gerrit.wikimedia.org/r/c/operations/puppet/+/654419 merged and deployed (and tested) (T268877) [admin]
11:40 <dcaro> merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/654419 that might affect the encapi service (puppet on cloud environment), no downtime expected though (T268877) [admin]
10:56 <arturo> trying to cleanup dpkg package mess in cloudnet2002-dev [admin]
10:02 <arturo> prevent floating IP allocation from neutron transport subnet: root@cloudcontrol1005:~# neutron subnet-update --allocation-pool start=185.15.56.244,end=185.15.56.244 cloud-instances-transport1-b-eqiad1 (T271867) [admin]
2021-01-12 §
10:33 <arturo> reboot cloudnet1004 [admin]
10:32 <arturo> update firmware-bnx2x from 20190114-2 to 20200918-1~bpo10+1 on cloudnet1004 (T271058) [admin]
2021-01-11 §
10:22 <arturo> doubling size of conntrack table in cloudnet servers https://gerrit.wikimedia.org/r/c/operations/puppet/+/655407 (T271058) [admin]
10:07 <arturo> manually cleanup conntrack table in cloudnet1004 (T271058) [admin]
09:19 <dcaro> cleaned up ~1800 snapshots, 109 remaining only, one for each host x image combination (plus some ephemeral ones while doing backups), closing the task (T270478) [admin]
08:39 <dcaro> cleaning up dangling snapshots now that we have the new suffixed ones (T270478) [admin]
2021-01-10 §
16:02 <andrewbogott> restarting rabbitmq-server on all eqiad1 cloudcontrols [admin]