There are all kinds of optimizations I thought about doing.
One thing is to have the ability to have more than one viewPort like Unity does,
that way for DisplayObjects with no transform will render much faster and you could avoid reuploading and rebatch.
Also have a viewPort that has rotation.
Next have something like Unity SortingLayer.
There are also optimizations to be done to the particle system, instead of having DisplayObject
for each PDParticleSystem , have a an array of PDParticleSystem in one DisplayObject, that way you
don't need to batch them at all, on the way we can allow to use more than one MeshBatch so there will not be
particle count limitation per DisplayObject.
Next is to move the PDParticleSystem to be more SOA.
Next is better reusing of data
Lets say all your particles have the same UV and Color, so we ask starling for MeshBatch with
the color and size we want and give it back to starling after we are done with it, same thing for IndexData.
Or if we know the particles do not change position then reuse the VertexData from previous frame.
Other inefficacy we have is for each Image we render we call a function in the MeshStyle
instead of that have DisplayObjectContainer that all its children must have the same style,
then pass the array of children to the MeshStyle.
The next level after that is finding a way to have DisplayObjectContainer that
each of its children have static vertex size and must be visible, that way you can calculate
the MeshBatch size, that way you avoid ByteArray resizing, that will also be nice for my Starling version that use DomainMemory op codes.