Runtime efficiency, underlying VM, MonkeyC.....

I'd like to understand how to get the most from the ConnectIQ / MonkeyC environment.
Another way to say that.... I'd like to know what is expensive at runtime and what is not so that I can write code that is as responsive as possible, and uses as little power as possible.
With that in mind, I'd like to understand how the underlying "virtual machine" does business, and what's going on behind the scenes when I write MonkeyC code, and make various API calls.
I've looked for write-ups about the ConnectIQ VM and MonkeyC that address this, and have not had any luck finding anything that addresses these questions.

In my short time working with MonkeyC I've found that unlike the C environments that I'm used to, "compiled" MonkeyC code seems to generate code that is dependent on a run-time interpreter.
That tells me that there are operations that are probably very expensive at runtime.

So.... is there a white paper or something similar that describes which operations are compiled and are interpreted at run-time ?
Which operations are fast and efficient and which are costly ?
How to write highly efficient code for the platform ?

Thanks