first of all: thanks a lot for all the great input in this discussion! Sorry for not replying earlier — I'm damn busy getting on with the handbook, so everything else is taken a little longer than I would like ...
First of all: you're perfectly right, everything I'm doing in Starling is a trade-off between ease-of-use and efficiency. Especially when designing the architecture of Starling 2, I had to make some really hard decisions — and I didn't make them easily. For example, deciding between ByteArray and Vector.<Number> easily took me a month, switching back and forth between one branch for one solution, and one for the other.
In the end, ByteArray won because of its flexibility and good-enough performance in *most* situations. That MovieClip test of yours is one of those extreme situations where it clearly loses.
In extreme situations, where the existing code just does not cut it, at least you can always completely override the default rendering mechanisms. If you override the "render" method in any display object, you can take over. An example of this can be found in the render method of the ParticleSystem extension. That class extends "Mesh" and directly calls the render-methods of the "style" and "effect" instances. You could also go one step further, extend "DisplayObject" and not use Starling's other mechanisms at all. That is, of course, much more work, and it should only be necessary in extreme situations. Still, I think it's good to have that option.
@bwhiting, as for your "fast memory" experiments: definitely keep me updated on this! I have toyed around with the idea, but decided against it for several reasons.
1) To make the code readable and easy to handle (→ usability!), it would be necessary to write a wrapper class (like "starling.ByteArray") that forwards calls to the internally used (fast memory) byte array. Because of that wrapper, a lot of the gained performance would already go to waste (additional method calls!).
2) You can't subclass "ByteArray", so you can't make that change transparent. You'd always have to decide which type to use; and when calling a standard-API method, you'd have to pass offsets around.
3) I also see Starling as kind of a benchmark library for AIR and Stage3D in general. If it turns out that a part of the standard API is just too slow, I rather try to get Adobe to fix this (so that all other AIR software can profit from the change, too), than try to reinvent the wheel. (That's also the reason why I'm using "Array" and "Vector" instead of custom linked list implementations, which would have advantages in some areas).
That said: I'd love to see what you come up with! At the very least, it might be a good way to show Adobe that their ByteArray implementation still has room for improvement.