If performance was a big concern I would suggest an ANE, to access native code performance. That only makes sense if the extra overhead of handing it off to an ANE is worth it, for the performance gain you will see.
It's only going to be an issue if you are processing a lot of each ByteArray. Starling uses ByteArrays internally, and they are a bottleneck. But that's as you might be updating thousands of bits of data in them every frame, to upload them to the GPU.