the number one rule of systems administration is to type carefully. tonight, I halted the wrong machine, which prompted a mad scramble to bring everything back up.

there’s a few timeout issues with LDAP and other boot scripts, notably udev. there’s also fun issues with things like rsyslog and slapd not failing gracefully if they cant make their network connections too.

on the bright side, the main downtime was less than 15mins and there was no data loss or DB corruption. I also used the time to fix a few issues that required reboots. and the grsec kernel I was working on does work.

I need another server.

PS: public shell server coming soon….

  • Macbook3,1 (OSX/Debian/Gentoo) with power adapter, double padded.
  • IBM Thinkpad T42 (Debian/Fedora), with power adapter
  • 7ft CAT5e cable
  • 1GB flashdrive (data/apps), 4GB flashdrive (portage tree lulz.)
  • TI-83+
  • a textbook or two
  • assorted squished papers
  • leather work gloves (servers have sharp edges!)
  • business cards, rack screws, matches, chapstick and Excedrin migraine.

I like to push my traffic around in absurd ways. today, I VPN’d from work to home… then VPN’d to my server at work… then SSH’d back to my home server… then ran speedtest.net to my work.

lets look at some traceroutes…

						loss	sent	avg
	rtr3.vl101.[redacted]			0.00%	6	1.4
	ge-7-12.r01.sttlwa01.us.bb.gin.n	0.00%	6	2.2
	ae-2.r21.sttlwa01.us.bb.gin.ntt.	0.00%	6	1.7
	0.so-0-0-0.BR1.SEA7.ALTER.NET		0.00%	6	2.5
	0.so-0-2-0.XT1.SEA7.ALTER.NET		0.00%	6	2.5
	0.so-6-1-0.SEA01-BB-RTR1.verizon	0.00%	6	3.6
	108.57.128.195				0.00%	5	4
	184.19.242.37				0.00%	5	5.7
	pool-[redacted].sttlwa.fios.		0.00%	5	12.6
						loss	sent	avg
	elysium.local				0.00%	7	13.6
	router.neoice.net			0.00%	7	21
	L100.STTLWA-VFTTP-23.verizon-gni	0.00%	7	25.6
	184.19.242.36				0.00%	7	22.9
	108.57.128.194				0.00%	7	21.6
	0.so-7-1-0.XT1.SEA7.ALTER.NET		0.00%	6	28.4
	0.so-6-0-0.BR1.SEA7.ALTER.NET		0.00%	6	20.6
	204.255.169.74				0.00%	6	26.3
	po-1.r01.sttlwa01.us.bb.gin.ntt.	0.00%	6	34.8
	ge-7-12.r01.sttlwa01.us.ce.gin.n	0.00%	6	30.9
	[redacted]				0.00%	6	34.6
						loss	sent	avg
	rtr3.vl101.[redacted]			0.00%	8	1.4
	ge-7-12.r01.sttlwa01.us.bb.gin.ntt.net	0.00%	8	61.5
	ae-2.r21.sttlwa01.us.bb.gin.ntt.net	0.00%	8	1.7
	0.so-0-0-0.BR1.SEA7.ALTER.NET		0.00%	7	2.7
	0.so-0-2-0.XT1.SEA7.ALTER.NET		0.00%	7	2.5
	0.so-6-1-0.SEA01-BB-RTR1.verizon-gni.ne	0.00%	7	8.7
	108.57.128.195				0.00%	7	4.1
	184.19.242.37				0.00%	7	5.4
	pool-[redacted].sttlwa.fios.verizon	0.00%	7	21.3

the headers (lost/sent/avg) show up between endpoints. here’s the final result:

not too bad, considering the fact that the slowest link (my personal server) is 2Mbps symmetric.

so it’s been an exciting weekend. first up, re-racking a live server. my 1U needed “properly mounted”. this involved taking off the current (crappy) rails, inserting shelf railing and then racked the server. the goal was zero downtime since I’m awesome.

turns out, phonebooks are about 1U high.

second, I had to run a database restore! yippie! I got a little careless and rm’d /var/lib/mysql. luckily, I have weekly rsyncs to my home server. unfortunately, the rsync only runs on Sundays. I dont really see a need to run it too often, since the only things that change regularly are IRC logs aaaand the SQL db. I have been meaning to set up more regular SQL backups and this sure lit a fire under my ass. I’ve been wanting to do hourly local dumps with rotation, but never felt like writing the script. the restore was pretty painless (tar the backup, netcat across VPN, untar, mysql REPAIR TABLES) and I only lost a blog post or so. I now have a script for daily mysql dumps, complete with weekly and monthly rotation. close enough.

(apologies to my wife, who’s blogpost I nuked)

I realized that I could modify my automator script to upload arbitrary files… enjoy!
I even uploaded both of my scripts using this action :D

### LOGIN DETAILS
HOST="thule.neoice.net"
USER="status"
KEY="/Users/neoice/Library/Services/automator-key"
 
for file in "$@"
do
	### LOCAL DETAILS
	LOCALFILE="$file"
	BASE=`basename $file`
 
	### REMOTE DETAILS
	REMOTEFILE="/var/status/upload/$BASE"
 
	### upload...
	scp -i $KEY $LOCALFILE $USER@$HOST:$REMOTEFILE
done
 
### clipboard
URL="http://thule.neoice.net/~neoice/upload/"
echo $URL | pbcopy

I noticed that one of the most frequent things I do is take screenshots and then SFTP them to my web server. I got sick of manually doing this, so inspired by a blind hacker on IRC and a Reddit thread, I looked into OSX Automator. you can create a little service, bind it to a key command and then use it at will. you pretty much need 2 actions: “Take Screenshot” (obviously) and “Run Shell Script.” I also used “Show Growl Notification”, but that’s just icing on the cake.

now, my first version was pretty simple, just using scp to send the file. then a few symlinks on the web server finished the job. but about a week later, I wanted to link to a file that was my temp screenshot a few days ago. so today, I rewrote my script. I would like to use /usr/bin/scponly, but I dont think scp can create symlinks.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
### LOGIN DETAILS
HOST="thule.neoice.net"
USER="status"
KEY="/Users/neoice/Library/Services/automator-key"
 
### LOCAL DETAILS
LOCALFILE="/Users/neoice/Desktop/tmpScreen.png"
 
### REMOTE DETAILS
REMOTEFILE="/var/status/screenshots/`date +'%Y-%m-%d-%H%M%S'`-tmpScreen.png"
URL="http://thule.neoice.net/external/tmpScreen.png"
 
### upload...
scp -i $KEY $LOCALFILE $USER@$HOST:$REMOTEFILE
 
### reorganize...
### TODO: switch to scponly!
ssh -i $KEY $USER@$HOST <<EOF
rm tmpScreen.png
ln -s $REMOTEFILE tmpScreen.png
exit
EOF
 
### clean and paste.
rm ~/Desktop/tmpScreen.png
echo $URL | pbcopy

obviously, change the variables to meet your needs. as you can see, I have a system user that is designed for unattended jobs. I’d highly recommend against using a passwordless key on your own account. (really, I’d also recommend setting the system user’s shell to /usr/bin/scponly, but its a sacrifice I’m willing to make.) you could even use passwords if you wanted to change the arguments around, but I actually turn password login off completely on my servers.

I dont like open directories, but for linking, its convenient to browse the Index. the ’screenshots’ Index is hidden, but /external/tmpScreen.png can be easily found and just for paranoia’s sake, I went a made a little cronjob. its just a simple Python script that will relink the image to a placeholder if it’s older than 30 minutes, just so wandering eyes cant view my browser tabs or Terminal windows.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#!/usr/bin/python 
 
# stdlib
import os, stat
from datetime import datetime, timedelta
 
# symlink path
symlink = '/var/status/tmpScreen.png'
 
# find out when the symlink was last updated
modtime = os.stat(symlink)[stat.ST_MTIME]
 
# if its 30mins old, rewrite the link
if (datetime.now() - datetime.fromtimestamp(modtime)) > timedelta(minutes=30):
        os.remove(symlink)
        os.symlink('/var/www/error/media/expired.png', symlink)

today, I was met with some strange output…

root@collective:~# w
 15:03:28 up 67 days,  7:18,  2 users,  load average: 0.15, 0.09, 0.07
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
neoice   pts/0    shambhala.local  15:03    0.00s  0.02s  0.02s sshd: neoice [priv]

top reported 2 users as well. I checked last and only saw myself and an FTP user since the first of the year. ps aux, lsof, iftop and netstat all looked normal. what the hell could be going on…

root@collective:~# md5sum /usr/bin/w
20910d08ee903de072a914dafe5043a6  /usr/bin/w
root@thule:~# md5sum /usr/bin/w
c4ee78c19dafccbdd64d773d55510fe4  /usr/bin/w

well that’s worrying…

so I netcat my new binary over and…

root@collective:~# ./w 
-su: ./w: No such file or directory

what the hell…

root@collective:~# file w 
w: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, stripped
root@collective:~# file /usr/bin/w
/usr/bin/w: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, stripped

thule has ia32-libs!

root@collective:~# aptitude install ia32-libs
root@collective:~# ldd w 
	linux-gate.so.1 =>  (0xf7f1f000)
	libproc-3.2.7.so => not found
	libc.so.6 => /lib32/libc.so.6 (0xf7da2000)
	/lib/ld-linux.so.2 (0xf7f20000)

doh! libproc is part of procfs, which is installed 64bit, but not 32. so I netcat the library across and drop it into /lib32…

root@collective:~# ldd w
	linux-gate.so.1 =>  (0xf7f7b000)
	libproc-3.2.7.so => /lib32/libproc-3.2.7.so (0xf7f50000)
	libc.so.6 => /lib32/libc.so.6 (0xf7dfe000)
	/lib/ld-linux.so.2 (0xf7f7c000)
root@collective:~# ./w 
 15:13:26 up 67 days,  7:28,  2 users,  load average: 0.18, 0.14, 0.10
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
neoice   pts/0    shambhala.local  15:03    0.00s  0.06s  0.02s sshd: neoice [priv]

hooray… oh wait.

at this point, I’m really stumped. I’ve determined that its not a malicious binary but what the hell IS it? this is a production machine, so I cant really just reboot to fix it. I had one last idea, hopefully my log rotation is configured correctly…

root@collective:~# last -f /var/log/wtmp.1 
neoice   hvc0                          Fri Dec 18 22:50    gone - no logout 
neoice   hvc0                          Fri Dec 18 22:50 - 22:50  (00:00)

that looks promising. google seems to confirm it. ghosts in the machine.

I’ve had an idea kicking around for a while now. I finally got around to implementing it.

tonight’s boredom-induced project was a script to take input from fail2ban and email the contacts listed in the IP’s WHOIS record. I feel like kind of a dick sending automated mails to other sysadmins, but I also feel like I’d be grateful to receive this kind of alert myself. most of the automated attacks come from China or Eastern Block

computers, kaizoku @ 04 October 2009, “No Comments”

the past month has been occupied almost exclusively with Xen goodness and infrastructure implementation. between Puppet, LDAP, custom logcheck rules, VPN settings, certificate management, iptables and Nagios, I’ve had my hands full with just building everything. this is on top of learning the ins and outs of Xen and LVM and running/migrating production services. its been a total blast.

coming up, I’ll have more Xen goodness and more infrastructure setup (an office this time). I’m also planning to cross-post (and supplement) from eRepublik. I’ve been running the website for a certain eCountry and got talked into playing by my good eFriend (President of said eCountry). you can run a newspaper (for experience points or some shit), so I dedicated mine to server blogging. why not?

I’ve been considering moving neoice.net itself away from my current provider as well. the price is right ($1/mo) and its nice to have a website that I DONT manage… but I’ve outgrown this nook. we’ll see how that goes over the next month or so. I wouldnt be surprised if I started hosting neoice.net on my personal server and gave the whole place a major overhaul (my productivity in remote vim is orders of magnitude higher than local editing+ftp). there’s a lot for me to do…

see you starside.