SCP + spaces

When copying a file from one host to another using SCP (as one does) if the pathname of the “to” location contains spaces it has to be double-escaped, since the path is interpreted twice, once by your system and again by the “from” system.

E.g. copying files from quasar to singularity, where there is a space in the latter path requires:

scp -v quasar.entropy.me.uk:/somepath/somefile.tar.gz singularity.entropy.me.uk:”‘/somewhere/A\ Directory\ With\ Spaces/'”

The first system (in this case my laptop) interprets the outer quotes, then the remote system we’re copying a file from interprets the contents of the inner ones.

Heavens help you if you want to copy a file with a quote in its name!

FreeBSD

Our file server Singularity has long run OpenBSD, this has been alright however we’ve constantly bumped up against the limits of its filesystem and other architectural issues which make is tedious to use for multi-terabyte storage. We made the decision to move it to FreeBSD, which has much better support for such a task as well as generally higher kernel performance.

The downside to FreeBSD is that it has a less coherent userland experience. The default shell isn’t very usable, the installer downright sucks and the places it puts some files is quite strange. For example installing software from either packages (binaries) or ports (source) tends to install the binaries and configuration under /usr/, e.g. binaries go in /usr/sbin/, config files in /usr/local/etc/ and init scripts in /usr/local/etc/rc.d/

Aside from being quite strange this makes finding things twice as annoying, as there’s two places they could be hiding.

These are only really minor annoyances however. I actually find the rc.d system to be fairly nicely implemented compared to Debian’s init.d. Once you get Bash and set your shell up the system is perfectly usable, it’s just that you have to do this whereas OpenBSD comes with it all by default.

The migration process involved:

1. Making a copy of all the data, along with a backup copy “just in case”
2. Swapping the main storage array drives for larger ones
3. Installing FreeBSD on the system disk
4. Setting up the volume, restoring from the copy
5. Setting up access services, AFP, SMB and mDNSResponder (to allow the server to be discovered automatically by Mac OSX’s Finder)
6. Setting up a new backup scheme (which will feature in a later blog post)

As a bit of background Singularity is a fairly normal ATX-based PC system with a hardware RAID card from Areca which gives us up to 4 disk RAID configurations. The previous setup was a 4x1TB RAID5 (3TB effective storage before overheads, about 2.5 after). The new setup will be 4x2TB RAID5 (6TB, or 5TB in practice). The system also has a 4 disk backplane to allow for the swapping out of two backup sets comprised of 4 hard disks each.

Stage 1
This was fairly easy, we just split the data into 4 folders (2 1TB in size, the other two 500GB as these were the disk sizes available) and copied the data over. We had experimented with simply migrating the main storage volume over to FreeBSD but found incompatibilities in the disklabel structure between OpenBSD and FreeBSD. It looks like OpenBSD went with extensions to the disklabel format to allow it to support 2TB+ partitions which FreeBSD doesn’t understand (probably because FreeBSD uses GUID partition tables natively instead).

Stage 2
Owing to the sensible design of the case, this was fairly easy. I’ll put up pictures later

Stage 3
FreeBSD installation was fairly standard. We avoided installing X11 since this is a headless system and the shell is perfectly adequate. Post-installation various userland configuration was done along with setting up networking etc.

Stage 4
Setting up the volume was again very straightforward. We used the standard GUID partition table along with a UFS2 filesystem. Since it will be exclusively used for storage the volume doesn’t permit execution or setuid. It’s mounted at the root of the filesystem as /store/

During this stage of the process we started to copy the data back from the OpenBSD formatted disks. The two 500GB disks worked perfectly, however the 1TB ones appeared to have a different disklabel format that FreeBSD wasn’t happy to read. From the looks of it OpenBSD had set up the disklabel with an OpenBSD-specific partition in DOS partition 4, then put another partition table inside that one with the more usual scheme. Quite why it did this I don’t know. The eventual solution was to install OpenBSD temporarily onto another machine and use that to copy the files across the network.

Stage 5
Key services which needed to be configured were:
AFP (netatalk) – Mac-style file sharing
SMB (samba) – Windows-style file sharing
mDNSResponder – a form of zeroconf networking technology which permits local network service discovery

I’ll go into each of these in more detail in future posts.

Overall the process so far has been an interesting learning experience. I think if I were going to do this again I’d definitely consider simply copying the files over the network to begin with. It’s quite annoying that it isn’t incompatibilities in the filesystem itself which have caused us issues but rather incompatibilities in the disklabel format between the two OSes.

IMP – Or how I learned to stop worrying and love the crypto

So it appears our government wants to resurrect plans for the Interception Modernisation Program, which was dropped by our previous government owing to massive cost and huge controversy. If indeed they do want to return to the same idea of recording every contact between individuals on the Internet (which was the main jist of the original proposal) then the next couple of years could get pretty interesting for the Internet in the UK.

One of the biggest issues with such a plan is the scope, or rather the boundaries of that scope. The plan was always only to track the who, when and where of Internet communications – not the what. This is similar to how phone records are currently stored, the intelligence services can pull up records on who contacted who, when and (to a limited extent) where. So I know that Jane contacted Jill on December 21st at 5pm, and the communication happened from a landline phone (the address of which is known) to a mobile (which can be located using base-station triangulation).

This in itself is a massive amount of information which, properly analysed, can tell you a lot about a person’s movements over time. This information is obviously very valuable to the intelligence services as it allows them to determine what relationships groups of people have to one another by who they contact.

This works for phones because they engender one medium, voice calls (text messaging works similarly, since it uses the same endpoints). When Jane calls Jill it is very easy to record the time, endpoints, duration and (if needed) location of the endpoints. The content of the call isn’t intercepted (unless the intelligence services have a wiretapping order) and cannot be analysed “after the fact”.

For communications on the Internet a similar form of recording for communications does not work. This is because the content of a message (the what) and the details of who, when and where are too closely linked together. This is due to the nature of TCP/IP and “flows” of data on the Internet over higher level protocols.

For example, say Jane is communicating with Jill using an instant messaging application. Jane is on her home broadband connection, Jill is at work behind her corporate network. We intercept the communication at some midpoint (a so-called “black box” installed in the infrastructure of Jane’s ISP). What can we see? At the IP level we see a series of packets moving backwards and forwards between two endpoints, however, the endpoints are not easily identified as Jane and Jill.

At Jane’s end we’d see the public IP address of Jane’s broadband connection. This could potentially be the same for all members of Jane’s household, so intercepting only the source and destination IP addresses would lead to communications coming from Jane being confused with communications coming from her son, John. Similarly on Jill’s end it’s likely that the entire corporate network will be behind a NAT and firewall, and subsequently a single IP address. This could potentially be thousands of employees.

Almost all IP endpoints in the UK are behind some kind of NAT, very few home broadband connections lack this and it makes simple IP-based tracking impossible. Similar issues are present for mobile data networks and ISP-level NAT, but these can be worked around by installing a “black box” in the network to track which subscriber maps to which IP address at any time. This technique could not be used for every home network (at least not without massive government interference, which I would hope people might rebel against!)

The other problem comes from the lack of any real meaning engendered by an IP packet. All you can tell by looking at it is that it’s from somewhere (which could be “fake” info, e.g. a NAT address) and it’s going to somewhere (with the same issue). No other information is readily available without looking inside the packet. (You could probably tell what protocol it is employing from the source and destination ports, but that isn’t particularly useful either).

Of course modern networking equipment can track “flows” of information, e.g. TCP sessions, and give more information about them. The IM conversation may take place over one or more TCP sessions between ports on the endpoints, and this session can be tracked. Without looking inside the content of the message though it is impossible to reliably say that it was Jane talking to Jill.

For the same kind of tracking plan to work for all forms of Internet communication as it does for phones the tracking equipment will need to understand the higher level protocols being used by each messaging system. They’ll need to understand SMTP, to tell from who and to who email is being sent. They’ll need to understand proprietary protocols (e.g. Skype). They’ll need to understand things like Facebook too.

The other thing they’ll have to understand is the links between these services and the people using them. They’ll need to correlate Jane’s Facebook, MSN, Skype and Twitter accounts with her phone, postal address and so-forth. Quickly the amount of information and complexity of obtaining that information grows enormously compared to simply logging phone calls.

One other issue is tracking non-immediate forms of communication. For example postings on message boards, or social networking services. Here a person can make a post which is viewed by thousands of other people. If these connections are not tracked then these kind of systems would be easy ways to exchange information. The message “My cat Jess just had kittens” may seem innocuous enough but it could easily be code directing a terrorist cell to launch an attack. These kind of techniques have been used since before the cold war and they are made all the more effective due to the sheer size of the Internet. The larger the haystack the harder it is to find the needle.

All this isn’t even taking into account encryption of services which will present a real impediment to tracking communications. IPSec can encrypt the contents of IP packets meaning that none of this information will even be available. Many protocols have crypto built into them at higher levels as well (the most commonly used for most people being SSL/TLS used during HTTPS transactions). It’s true that many people don’t know anything about encryption, but it is very easy to find out about it and to start to use systems which not only conceal what you are saying but who you are talking to as well (for example, ToR).

It’s also true that the people most likely to use such technologies are likely to be the ones who have something to hide. The phrase “if you have nothing to hide you have nothing to fear” works both ways – if you have something to fear you damn well better hide it. Criminals and terrorists know how to use cryptographic software and steganography techniques to hide their communications – the only reason we catch these people at the moment is that they are careless or incompetent (and we have to be thankful that they are!)

It’s my opinion that this plan is doomed to failure one way or another, and we can only hope that it fails before much money is spent on it. My main objection to it isn’t on cost grounds, or the civil liberties issues it raises (though both of these things I object to strongly…) it’s that it is just such an impossibly stupid proposal.

It won’t work and cannot work. I only hope that others realise this soon.

Who am I now?

Apparently some kinds of network can cause one’s Mac’s hostname to change. I noticed this at work, with the name of the machine in Terminal changing every time I connected to their domain network. Simple fix is to modify /etc/hostconfig and add the line:

HOSTNAME=foo

Save the file and your Mac can remember who it is.

Info from: http://reviews.cnet.com/8301-13727_7-10362633-263.html

Mixed Debian

Debian’s package management system is pretty cool, it can even solve Sudoku puzzles. Debian has three “current” branches, Stable (currently Lenny), Testing (which will become Squeeze) and Unstable (always called Sid). See Wikipedia for more.

Generally speaking Stable is stable, Testing is pretty stable and Unstable is unstable. The idea is that the Stable branch provides a rock-solid “just works” platform for use in production systems. Testing provides a way to work towards these Stable releases, and is a mostly solid system but lacking the polish of a proper Stable release. This sacrifice in stability is offset by it having much newer versions of most of the packages included with Debian.

Sometimes one desires a more up-to-date package than what Stable provides, an example of this I came across recently was Apache. Apache is everyone’s favorite web server, the current version is 2.2.16. The version in Stable is 2.2.9. This is fine for most purposes, the package has been tested a great deal and there are few issues. However for my uses the newer version has become necessary due to a need for SNI (Server Name Indication), a nifty feature which allows you to host multiple sites delivered over SSL/TLS on the same IP address using different SSL certificates.

This feature isn’t available in the Stable package (except by using gnutls, which I don’t trust anywhere near as much as the OpenSSL library). It is however available in the package from Testing.

Upgrading a full production system to Testing isn’t IMO a good idea, so I decided to install just the Apache package from testing into my existing Lenny/Stable server. This is actually fairly easy to do once you know how.

First the following needs to be added to /etc/apt/sources.list:

deb http://ftp2.de.debian.org/debian/ testing main
deb-src http://ftp2.de.debian.org/debian/ testing main
deb http://security.debian.org/ testing/updates main

Next, to prevent Debian from just upgrading everything in the system to Testing it is necessary to instruct it which version you want to be preferred. This is done by editing /etc/apt/apt.conf to add this line:

APT::Default-Release “stable”;

This means that the package manager will only consider packages from package sources which aren’t Stable if you explicitly tell it to (unless you already have a package from another version installed, in which case it will process updates as normal). So if I install Apache2 from Testing it’ll then update that package from Testing in future.

The last thing to do then is to instruct Aptitude to install the package:

aptitude -t testing install apache2-mpm-worker

Which will (if Apache is already installed) overwrite the existing version. A note here on config file merging, if you select the “D” option (e.g. don’t overwrite my existing config) the package manager will place a copy of the new file it would’ve replaced your version with in the same directory, but with a suffix. This helps you manually go through and apply any changes you want to after the upgrade.

All went fairly smoothly, the mod_cband package I had installed doesn’t seem to be compatible so I’ll have to find a new version of that. When Squeeze becomes Stable this will of course become a moot point. At that point the references to Testing can be removed from sources.list and the package version in Squeeze used from that point onwards.

Overall then an interesting insight into the Debian package management system!

TFTP + WDS

WDS logging can be enabled by editing the value of this registry key and setting it to 1:

HKLM\SOFTWARE\Microsoft\Tracing\WDSSERVER\EnableFileTracing

This then logs to %WINDIR%\tracing\WDSServer.log

One thing which can go wrong with TFTP is that WDS tries to use a temporary range of UDP ports, if any of these are already in use instead of nicely failing the connection and trying again on another port it simply borks, and fails, silently (unless you enable the log…)

The logging in question is:

[8436] 12:01:36: [698808][WDSPXE] [WDSPXE][UDP][Ep:10.10.0.11:4011] Sent To:10.10.0.114:68 Len:1024
[8436] 12:01:36: [d:\longhorn\base\ntsetup\opktools\wds\wdssrv\server\src\udphandler.cpp:369] Expression: , Win32 Error=2
[8436] 12:01:36: [WDSTFTP][UDP][Ep=0] Registration Failed (rc=2)
[8436] 12:01:36: [d:\longhorn\base\ntsetup\opktools\wds\wdssrv\server\src\ifhandler.cpp:238] Expression: , Win32 Error=2

Oddly it seems that under “normal” operation you get a lot of these:

[9488] 12:42:17: [d:\longhorn\base\ntsetup\opktools\wds\wdssrv\server\src\udpendpoint.cpp:811] Expression: , Win32 Error=5023

I’ll have to investigate why.