1101-1150 of 2418 results (16ms)
2018-06-29 §
17:11 <bd808> Rescheduled jobs away from toole-exec-1404 where linkwatcher is currently stealing most of the CPU (T123121) [tools]
16:46 <bd808> Killed orphan tool owned processes running on the job grid. Mostly jembot and wsexport php-cgi processes stuck in deadlock following an OOM. T182070 [tools]
2018-06-28 §
19:50 <chasemp> tools-clushmaster-01:~$ clush -w @all 'sudo umount -fl /mnt/nfs/dumps-labstore1006.wikimedia.org' [tools]
18:02 <chasemp> tools-clushmaster-01:~$ clush -w @all "sudo umount -fl /mnt/nfs/dumps-labstore1007.wikimedia.org" [tools]
17:53 <chasemp> tools-clushmaster-01:~$ clush -w @all "sudo puppet agent --disable 'labstore1007 outage'" [tools]
17:20 <chasemp> tools-worker-1007:~# /sbin/reboot [tools]
16:48 <arturo> rebooting tools-docker-registry-01 [tools]
16:42 <andrewbogott> rebooting tools-worker-<everything> to get NFS unstuck [tools]
16:40 <andrewbogott> rebooting tools-worker-1012 and tools-worker-1015 to get their nfs mounts unstuck [tools]
2018-06-21 §
13:18 <chasemp> tools-bastion-03:~# bash -x /data/project/paws/paws-userhomes-hack.bash [tools]
2018-06-20 §
15:09 <bd808> Killed orphan processes on webgrid nodes (T182070); most owned by jembot and croptool [tools]
2018-06-14 §
14:20 <chasemp> timeout 180s bash -x /data/project/paws/paws-userhomes-hack.bash [tools]
2018-06-11 §
10:11 <arturo> T196137 `aborrero@tools-clushmaster-01:~$ clush -w@all 'sudo wc -l /var/log/exim4/paniclog 2>/dev/null | grep -v ^0 && sudo rm -rf /var/log/exim4/paniclog && sudo service prometheus-node-exporter restart || true'` [tools]
2018-06-08 §
07:46 <arturo> T196137 more rootspam today, restarting again `prometheus-node-exporter` and force rotating exim4 paniclog in 12 nodes [tools]
2018-06-07 §
11:01 <arturo> T196137 force rotate all exim panilog files to avoid rootspam `aborrero@tools-clushmaster-01:~$ clush -w@all 'sudo logrotate /etc/logrotate.d/exim4-paniclog -f -v'` [tools]
2018-06-06 §
22:00 <bd808> Scripting a restart of webservice for tools that are still in CrashLoopBackOff state after 2nd attempt (T196589) [tools]
21:10 <bd808> Scripting a restart of webservice for 59 tools that are still in CrashLoopBackOff state after last attempt (P7220) [tools]
20:25 <bd808> Scripting a restart of webservice for 175 tools that are in CrashLoopBackOff state (P7220) [tools]
19:04 <chasemp> tools-bastion-03 is virtually unusable [tools]
09:49 <arturo> T196137 aborrero@tools-clushmaster-01:~$ clush -w@all 'sudo service prometheus-node-exporter restart' <-- procs using the old uid [tools]
2018-06-05 §
18:02 <bd808> Forced puppet run on tools-bastion-03 to re-enable logins by dubenben (T196486) [tools]
17:39 <arturo> T196137 clush: delete `prometheus` user and re-create it locally. Then, chown prometheus dirs [tools]
17:38 <bd808> Added grid engine quota to limit user debenben to 2 concurrent jobs (T196486) [tools]
2018-06-04 §
10:28 <arturo> T196006 installing sqlite3 package in exec nodes [tools]
2018-06-03 §
10:19 <zhuyifei1999_> Grid is full. qdel'ed all jobs belonging to tools.dibot except lighttpd, and tools.mbh that has a job name starting 'comm_delin', 'delfilexcl' T195834 [tools]
2018-05-31 §
11:31 <zhuyifei1999_> building & pushing python/web docker image T174769 [tools]
11:13 <zhuyifei1999_> force puppet run on tools-worker-1001 to check the impact of https://gerrit.wikimedia.org/r/#/c/433101 [tools]
2018-05-30 §
10:52 <zhuyifei1999_> undid both changes to tools-bastion-05 [tools]
10:50 <zhuyifei1999_> also making /proc/sys/kernel/yama/ptrace_scope 0 temporarily on tools-bastion-05 [tools]
10:45 <zhuyifei1999_> installing mono-runtime-dbg on tools-bastion-05 to produce debugging information; was previously installed on tools-exec-1413 & 1441. Might be a good idea to uninstall them once we can close T195834 [tools]
2018-05-28 §
12:09 <arturo> T194665 adding mono packages to apt.wikimedia.org for jessie-wikimedia and stretch-wikimedia [tools]
12:06 <arturo> T194665 adding mono packages to apt.wikimedia.org for trusty-wikimedia [tools]
2018-05-25 §
05:31 <zhuyifei1999_> Edit /data/project/.system/gridengine/default/common/sge_request, h_vmem 256M -> 512M, release precise -> trusty T195558 [tools]
2018-05-22 §
11:53 <arturo> running puppet to deploy https://gerrit.wikimedia.org/r/#/c/433996/ for T194665 (mono framework update) [tools]
2018-05-18 §
16:36 <bd808> Restarted bigbrother on tools-services-02 [tools]
2018-05-16 §
21:01 <zhuyifei1999_> maintain-kubeusers on stuck in infinite sleeps of 10 seconds [tools]
2018-05-15 §
04:28 <andrewbogott> depooling, rebooting, re-pooling tools-exec-1414. It's hanging for unknown reasons. [tools]
04:07 <zhuyifei1999_> Draining unresponsive tools-exec-1414 following Portal:Toolforge/Admin#Draining_a_node_of_Jobs [tools]
04:05 <zhuyifei1999_> Force deletion of grid job 5221417 (tools.giftbot sga), host tools-exec-1414 not responding [tools]
2018-05-12 §
10:09 <Hauskatze> tools.quentinv57-tools@tools-bastion-02:~$ webservice stop | T194343 [tools]
2018-05-11 §
14:34 <andrewbogott> repooling labvirt1001 tools instances [tools]
13:59 <andrewbogott> depooling a bunch of things before rebooting labvirt1001 for T194258: tools-exec-1401 tools-exec-1407 tools-exec-1408 tools-exec-1430 tools-exec-1431 tools-exec-1432 tools-exec-1435 tools-exec-1438 tools-exec-1439 tools-exec-1441 tools-webgrid-lighttpd-1402 tools-webgrid-lighttpd-1407 [tools]
2018-05-10 §
18:55 <andrewbogott> depooling, rebooting, repooling tools-exec-1401 to test a kernel update [tools]
2018-05-09 §
21:11 <Reedy> Added Tim Starling as member/admin [tools]
2018-05-07 §
21:02 <zhuyifei1999_> re-building all docker images T190893 [tools]
20:48 <zhuyifei1999_> building, signing, and publishing toollabs-webservice 0.39 T190893 [tools]
00:25 <zhuyifei1999_> `renice -n 15 -p 28865` (`tar cvzf` of `tools.giftbot`) on tools-bastion-02, been hogging the NFS IO for a few hours [tools]
2018-05-05 §
23:37 <zhuyifei1999_> regenerate k8s creds for tools.zhuyifei1999-test because I messed up while testing [tools]
2018-05-03 §
14:48 <arturo> uploaded a new ruby docker image to the registry with the libmysqlclient-dev package T192566 [tools]
2018-05-01 §
14:05 <andrewbogott> moving tools-webgrid-lighttpd-1406 to labvirt1016 (routine rebalancing) [tools]