My Altivec code is coming along great. My standard procedure for writing SIMD code is to create a duplicate code path that uses a simple but mathematically accurate scalar algorithm. Then, I run the algorithms in parallel, making sure that the SIMD code is doing what it is supposed to be doing. This approach works great, and is especially good at rooting out off-by-one errors that show up when switching from fixed 16.16 to float with rounding and so forth.
Even though my Altivec code is working, I'm not sure the overall performance has increased that much. Next up I'm going to write a simple function timing routine. I'm suspecting that one of the bottlenecks is the 'get frame' routine provided by QuickTime.