2011 March 25
You've probably heard of Square, the company that makes it easy for you to do credit card processing through your cellphone. They send out a little dongle that you can plug in to your phone's mic jack, which can scan cards. They also provide an app which takes what the dongle sends as audio and decodes it into the credit card info. But of course, that only works for credit cards, and it doesn't actually show you all the data that it can read.
I recently broke my first reader, and to make the best of a broken reader, here's an image of the insides.
As you can see, it's very simple. It's just a read head and two wires connected to mic and ground.
A couple of weeks ago, I solved an online puzzle to get invited to a Hacker Underground meet up at SXSW. Fortunately for me, I live in Austin, so getting there was no problem. I even brought some beer. Anyway, everyone that attended had to have a "hack" to show off, and I'd been intending to play with this Square reader for a while, so I whipped up an app to dump card info.
Let's talk about that magnetic stripe on the back of your credit cards (and driver's licenses, and store club cards, and gift cards, and...). The physical specs of those cards are standardized. I got all of my info from an old article in Phrack which I found online. That article has everything you need to know about the data format on the cards. I'll restate some of it here anyway.
The magnetic stripe consists of 3 physically separated "tracks". Track 1 is closest to the bottom of the card, and track 3 is the highest. Square's reader is positioned to read track 2. Track 2 is the most commonly used track, but most credit cards also use track 1. Track 2 includes card numbers and expiration dates. Track 1 includes that plus names. There may be other data too, depending on the particular card. These tracks are specced to be .11 inches wide, so to read track 1 with Square's reader, we just need to reposition the stripe so that track 1 is lined up with the read head. To do that, we just need to raise it by .11 inches. And we can do that by cutting a .11 (or in my case 1/8) inch slice from a card we do not care about, and putting that at the bottom of the reader. This is called a "shim".
So, now we know where and what the data is, let's talk about how it's encoded. Again, this is all from that phrack article. Data in each track is encoded via magnetic domain flipping. Long story short: The series of domain flips encodes a waveform, that waveform is interpreted as binary. A binary 0 in this encoding is some arbitrary frequency. A 1 is twice that frequency.
Since it's a waveform, the magnetic read head of the Square device can send that over the mic channel to your phone just like any other audio.
So, now we've got a bunch of audio on our device. How do we decode it? I based my code on an Android tutorial which shows how to record data and then play it back. In my case, I made sure to save that audio as 16bit PCM encoded. I sampled at 44100hz. On Android (and elsewhere, I suppose) 16bit PCM data means that each sample is a signed 16bit value. Since we only care about the frequency, we only need to care about how much time there is between "zero-crossings". A zero-crossing is when the signal goes from postive to negative or vice-versa. A 0 bit will be represented by the space between 2 crossings, and a 1 will have an extra crossing in approximately the same time period.
Card data in each track starts off with some (variable) number of 0s, to establish the base frequency. What I did was listen for the first sample above a certain "quiet" threshold, then count the number of samples between zero-crossings. That number becomes the base value for a 0. Since these cards are hand-swiped, the actual frequencies will change somewhat from the start of the scan to the end. So, I made a simple method which determines if the number of samples since the last zero-crossing is closer to the base frequency or twice the base frequency (half the base number of samples). It then adjusts the expected base frequency accordingly. This works well, so long as the changes between any two logical bits are fairly small. And they almost certainly will be.
Okay, after some hand-waving, we now have a binary sequence of data, which we want to turn back into ASCII. The most common encoding (and the only one I wrote a handler for) encodes each character as some number of bits, plus one parity bit. In the case of track 2, that's 4 bits for the character, and 1 for parity, making 5 bit groups. The bits are read from least significant to most, with the parity bit last. The parity bit is set to make the number of 1s in the group odd. In my implementation, I just disregard the parity bit, but it would help determine whether the read was good or not. In track 1, it's 6 bits for the character, plus the parity.
The character set of the tracks differ too, but both are subsets of ASCII with some offset. In the case of track 2, which only encodes some symbols and digits, the character set starts at 48, which is the ASCII code for "0". So if we get 0,0,0,0,1 as our character, we turn that into 0, add 48, and get 48. Similarly, 1,0,0,0,0 is 1. 1+48 = 49 = ASCII "1".
For track 1, the character set starts with " " (space) which is ASCII 32. So we add 32 to the decoded numeric value and get our ASCII character. After that, we have the data, so all that remains is hooking up the UI glue.