Is well-supported open source software always the best choice?
Not in my case. I'm working on Altivec optimizations right now in my Mac products, and I started out with a foray into Apple's sample code using the freely available Project Builder application which is based on gcc. Maybe I'm using a crappy out-moded 400Mhz G4 Titanium, but that aforementioned development environment is dawg-slow. After spending about 20 excrutiating minutes with it, I switched back to CodeWarrior and was suprised at how much faster it is. Probably two orders of magnitude at least.
Speaking of SIMD optimizations, time is ripe for some sort of proper language tool for expressing SIMD code. C language extensions are fine for now; I am looking forward to seeing how the compiler generates SIMD code. I hand-coded all the MMX routines in the Windows products using assembler. I can't see that it is that much faster to write code with the c-extensions, since most of the time is spent trying to figure out the dataflow and which instructions to use. As well, doing simple things like shifting the entire Velocity register is very arcane in Altivec, seems like Intel did a better job in that arena, even as early as SSE. However, one great thing about Altivec is that there are 32 registers. After squeezing code into the 8 MMX registers, it seems like a 'huge tract of land' to be working with. In my current routine, it looks like the compiler is squeezing it into 7 registers so far. Generally speaking, in order to avoid contention for registers and dependencies (like having to wait for the result of an operation to avoid a pipeline stall) I try to code one or two algorithms in parallel. Usually this means processing two pixels or four pixels at once.