Sound and Software: Build a better WAVE reader, pt 1

I have recently been working on creating an optimal audio I/O library in Haskell. By optimal, I mean the library should do all of the following:

Have a clean, functional, usable interface. Operations should be composable, and the user should not have to deal with recursion, folds or the like. Procedural-style interfaces (anything exposing filehandle-type operations to the user) are out too.
Be space efficient. Some Haskell implementations suffer from space leaks, or can have space leaks if the user isn't careful with data.
Have performance comparable to hsndfile, a Haskell binding to libsndfile.
Be implemented purely in Haskell. This makes it interesting.

The standard for audio I/O performance in Haskell has been hsndfile, as was discussed on haskell-cafe not long ago. Testing shows that using hsndfile to read data in a lazy stream fashion (similar to lazy ByteStrings) is about 10 times faster than the next-fastest implementation. Not good for Haskell's claim of having speed comparable to C! Still, I'm convinced it's possible to do better.

My test case consists of finding the peak normalized amplitude of a 16-bit stereo WAVE file. The audio duration is about 6 minutes, with a total file size of about 66 MB. This task involves reading and processing the entire file. The processing is quite minimal, so the speed of an implementation should depend primarily on the efficiency of reading and normalizing audio data.

Using hsndfile in a lazy stream does provide an interface which is familiar to many Haskell users, but I'm not particularly fond of it. If the user is not careful, the entire file can be retained in memory after processing. For a large audio file, this leads to an unacceptable situation. It should be impossible for data to be retained without explicit action from the user. Also, dependencies on foreign libraries can be difficult for some (i.e. Windows) users to resolve.

So having decided to create a native Haskell implementation, the first place to start is with the semantics and interface. I recently read Oleg K.'s presentations on Iteratee processing in Haskell, and I'm convinced this is the best model for functional I/O. It just feels right, what else can I say? Read the paper, see the code.

A preliminary version of Iteratee-based processing was benchmarked as "Enumerator" in the Haskell-art discussion. It was roughly comparable with other Haskell solutions, but has some problems. Notably it doesn't support seeking within the file, which would be nice. After I developed that, Oleg released a new version of Iteratee which does support seek operations via the RBIO monad. So I dropped Enumerator in favor of the new Iteratee + RBIO.

Oleg helpfully provides a TIFF reader with his Iteratee code. I based my wave reader on that, using the same technique of storing Enumerators for each wave sub-chunk in an IntMap. Since they aren't tagged, a list probably would have served just as well, but since most wave files only have 2 or 3 chunks anyway it doesn't seem like it matters much. Changes here aren't going to have a large performance impact.

The first version to use Iteratee + RBIO executes in about 45 seconds. Ouch.

RBIO uses IORef's to communicate seek requests and answers between Iteratees and Enumerators. Unfortunately IORefs are slow, so I'll start by getting rid of them. The data type

data IterateeG el m a = IE_done a (StreamG el)
| IE_cont (StreamG el -> IterateeGM el m a)

can be changed to

data IterateeG el m a = IE_done a (StreamG el)
| IE_cont (StreamG el -> IterateeGM el m a)
| IE_seek FileOffset (StreamG el -> IterateeGM el m a)

This also necessitates adding one more case to (>>=) on Iteratees:

iter_bind m f = m >>== docase
where
docase (IE_done a (Chunk [])) = f a
docase (IE_done a stream) = f a >>== (\r -> case r of
IE_done x _ -> liftI $ IE_done x stream
IE_cont k -> k stream
iter -> liftI iter)
docase (IE_cont k) = liftI $ IE_cont ((>>= f) . k)
docase (IE_seek off k) = liftI $ IE_seek off ((>>= f) . k)

After these changes, the provided Iteratees can be adapted to use the new structure relatively easily. The enumerator that reads from a file, enum_fd_random, must also be updated to understand seek requests. At this point we can remove RBIO entirely, and our execution time is about 33 sec. Adding a few carefully-chosen INLINE's and SPECIALIZATION's gets runtime down to 25 sec. Better, but we're very far from the goal of being comparable with C.

The next step is to change the Iteratee's internal representation of data.

Sound and Software

Monday, February 2, 2009

Build a better WAVE reader, pt 1

No comments:

Post a Comment