Tuesday, February 3, 2009

Build a better WAVE reader, pt 2

In the last post, I looked at the beginnings of using an Iteratee to create a WAVE file reader. I achieved a runtime of 25 seconds, compared to 1.9 seconds for my benchmark code.

Oleg's Iteratee code uses a Stream type which internally represents data as Chunk [a]. Haskell lists have many great properties, but they are not particularly efficient for this type of numerical processing. There are many better options, including Arrays, UVectors, and StorableVectors.

Unfortunately all of these options I would want to use introduce a type class restriction on types of elements in the array. This isn't a problem for sound files in particular, although it does greatly limit the utility of the Iteratees in general. One of the Iteratee examples shows how Iteratees can be layered for text processing, using IterateeGM String m a iteratees layered on IterateeGM Char m a iteratees to provide operations on words in a text stream. This is no longer possible, as String is not an instance of the type classes necessary to use one of these more compact data structures. In the end, the performance benefits should be worth it.

I chose StorableVector to replace the basic list, because it seemed handy while I was experimenting. StorableVector turns out to have another good property which is key to achieving the goal of performance close to C.

Adapting the Iteratees to use StorableVector is straightforward. Unfortunately when we've finished, performance is actually worse! In fact, it's as bad as the original RBIO-based version. It turns out that one key function is the culprit: conv_stream.

conv_stream converts between two streams. It is capable of converting multiple elements from one stream to one (or no) elements of the other, using a user-supplied conversion function. The wave file reader uses conv_stream to convert from a stream of Word8's to a stream of Doubles so the user can operate on Doubles, with the library handling all conversions transparently. The original type of conv_stream is:
conv_stream :: Monad m =>
IterateeGM el m (Maybe [el']) -> EnumeratorN el el' m a
Unfortunately, this means our data needs to unpacked from the StorableVector Word8 into separate Word8's, those Word8's assembled into Just [Double], and finally packed into a StorableVector Double for the next Iteratee. This is very inefficient, but easily fixed. Just change the type of conv_stream to
conv_stream :: (Storable el, Storable el', Monad m) =>
IterateeGM el m (Maybe (Vec.Vector el')) -> EnumeratorN el el' m a

and we're done. In fact, the implementation doesn't change at all from the original, just the type. Now no packing or unpacking need take place.

With the new conv_stream, it's necessary to change the conversion function. This is the IterateeGM which converts Word8's into Doubles. I have actually split this code into two parts, a function (called unroller because the original list-based version was a manually-unrolled endian_read) which converts Word8s into Word16 or Word32 as necessary, and a normalization function to normalize the result and convert from Words to Double. The normalization function should be fine as is, all that's necessary to change is the unroller function. Ideally, it would operate on StorableVectors directly. Fortunately this is possible. All that's necessary is to get a ForeignPtr Word8 from a StorableVector, and cast it to ForeignPtr Wordn. A somewhat hackish 16-bit converter can be implemented as follows:
import qualified Foreign.Ptr as FP
import qualified Foreign.ForeignPtr as FFP
import qualified Data.StorableVector as Vec

unroll_16 :: (Monad m) => IterateeGM Word8 m (Maybe (Vec.Vector Word16))
unroll_16 = liftI $ IE_cont step
where
step (Chunk vec)
| Vec.null vec = unroll_16
| Vec.length vec == 1 = liftI $ IE_cont $ step' vec
| Vec.length vec `rem` 2 == 0 = liftI $ IE_done (convert_vec vec) (Chunk $ Vec.empty)
| True = let (h, t) = Vec.splitAt (Vec.length vec - 1) vec
in
liftI $ IE_done (convert_vec h) (Chunk t)
step stream = liftI $ IE_done Nothing stream
step' i (Chunk vec)
| Vec.null vec = liftI $ IE_cont $ step' i
| Vec.length vec `rem` 2 == 1 = let vec' = Vec.append i vec
in
liftI $ IE_done (convert_vec vec') (Chunk $ Vec.empty)
| True = let (h, t) = Vec.splitAt (Vec.length vec - 1) vec
in
liftI $ IE_done (convert_vec $ Vec.append i h) (Chunk t)
step' _i stream = liftI $ IE_done Nothing stream
convert_vec vec = let (fp, off, len) = VB.toForeignPtr vec
f = FP.plusPtr (FFP.unsafeForeignPtrToPtr fp) off
fp' = (FFP.castForeignPtr $ unsafePerformIO $ FFP.newForeignPtr_ f) :: FFP.ForeignPtr Word16
in
Just $ VB.fromForeignPtr fp' (len `div` 2)

and that's it! The big hack is that this only works on little-endian platforms. To be correct, the bytes would need to be swapped on big-endian systems only. Implementing a suitable swap function is currently an excercise for the reader. Note that unsafeForeignPtrToPtr should be safe as there is no finalizer associated with the ForeignPtr, and in any case the memory pointed to won't be GC'd until after the Iteratee is complete.

At this point, the data should be handled entirely as a StorableVector, with no conversions to or from any intermediate lists at any point. Here are results from running the test:
$ time ./test_iter_sb
Just (AudioFormat {numberOfChannels = 2, sampleRate = 44100, bitDepth = 16})
Just 0.977568895535142

real 0m1.661s
user 0m1.335s
sys 0m0.140s
And the proof is in the pudding. This is the fastest Haskell implementation yet, even faster than hsndfile in a lazy stream.

A bit of work remains to be done. It should be cleaned up, with the hack fixed properly. Also I've currently only implemented unroll_16. For 24-bit audio, I expect the creation of a Word24 type and Storable instance will be necessary in order to employ the same technique. However, I think I've proven that a pure-Haskell, high performance audio I/O library is possible.

No comments:

Post a Comment