Techniques for faster pixel-level drawing?

Is there any non-obvious way to push multiple, arbitrary pixel values into a bitmap/display that's faster than making setColor and drawPixel calls on each pixel?

My watchface wants to fill a relatively large area of the screen with pixel values that it computes on the fly. Specifically, I have some source grayscale pixel data which I'm scaling, clipping, rotating, and dithering to generate a different image every few minutes.

I've done a lot of work to make my pixel-transforming code reasonably quick, such that now much of the execution time of each frame is just the calls to `Dc.setColor` and `Dc.drawPixel` that actually push pixels into an offscreen bitmap. I've verified that in the Simulator's Profiler view, and also by commenting out those calls and running on the device.

Right now I can generate and draw ~100 pixels on each update before my watchface uses too much time and gets killed. But it seems like there are enough cycles to generate many more pixels than that, if I could get them into a Bitmap or the display more efficiently.

For example:

  • Somehow access the raw memory used by an offscreen bitmap, say as an `Array<Number>`?
  • Put some pixel values in an array and copy (blit) them all at once to a Bitmap?
  • Use a smaller palette, fitting more pixels into each Number, and set multiple pixels at once using a single Number value?
  • Construct a bit mask and use it to fill multiple pixels with the same color in one call?
  • Somehow reduce the overhead of repeatedly mapping the same few Color values to 4-bit raw pixel values that I imagine is eating time?

Lower-level pixel operations like this are often available on performance-constrained platforms, but I can't find anything useful in the docs.

Some things that I've tried that aren't faster:

  • Offscreen Bitmap vs drawing to the display (same speed)
  • Use a palette with just WHITE, LT_GRAY, DK_GRAY, BLACK, and TRANSPARENT (same)
  • Write all pixels of each color in a single pass to avoid most of the setColor calls (the overhead of managing bit vectors, etc. is more than the savings)
  • Batch runs of similar values and draw using drawLine (doesn't really apply to my dithered pixels)

For the time being, I'm splitting the rendering across multiple update cycles, which is pretty tricky to manage, and isn't always responsive enough when the device decides not to call onUpdate when I have drawing to finish.

I realize I may be pushing the platform in a way it wasn't really intended to be pushed, but I really like the results I'm getting, and if such a technique existed, it would unlock lots of interesting possibilities (Doom, anyone?)

  • Dang, I'm learning the APIs, so this was all new to me.  But it's a bummer they don't offer access as a mutable ByteArray or similar.

    My only thought was, if you had a custom font resource where each character was a single pixel, could you use text rendering to draw a whole row of pixels faster?  Seems like it would be absolutely slower: font rendering is way more work than modifying a single pixel. But if the VM is imposing a lot of per-call overhead for each setColor call, then maybe rendering an entire row of pixels from a string ends up faster?

  • Maybe each character is several pixels wide?  A 16 character font can encode every possible row of 4 pixels.  StringUtil.convertEncodedString can convert a ByteArray to a hex string.  If you added newline characters, you could render the whole display with a single call to Dc.drawText.

  • When you say your watch face gets killed, I'm thinking it's killed by the watch dog timer which limits what you can do without returning to the VM.  And that varies by which device is being used..

    Another thing to consider is that each time onUpdate() is called, you want to update the entire display.  You can't just do parts.  You might not see an issue in the sim or on some devices.  You can do a part of the screen in onPartialUpdate(), but there you are also limited to the 30ms average time (see "View watch face diagnostic" when in low power mode)

    There is no access to the raw memory with CIQ.

  • Some other thoughts, maybe you tried these:

    Along the lines of "batch runs of similar values" you can avoid calling setColor if previous pixel color is the same.  Still using drawPixel, not drawLine.  Not sure if your comment about dithering means that you never have runs of pixels.

    I wonder if it's slightly faster to store setColor and drawPixel as Method objects in local variables, then call them like dcSetColorMethod.invoke.  You pay extra for the .invoke resolution and overhead, but might shave cycles by not resolving setColor and drawPixel methods off of the relatively large Dc interfaces.  My understanding is that method lookups can be slightly slower for objects with more methods, since the VM has to do a dynamic lookup of the field on the object.

  • I like the font idea a lot, thanks! I'll do some experiments and see how far I get with it.

  • I am skipping setColor calls when I get consecutive pixels the same color. That's actually a big win for un-dithered images, but you're correct, after dithering there aren't many such runs.

    I hadn't thought to try Method as a way of saving cycles. That's easy to try, thanks!

  • That's right, on Fenix 7, which is my initial target, the face seems to get killed if it takes longer than about 500ms to update, so I have code to keep track of time and suspend drawing before I get close to the limit. On the simulator the limit is a lot shorter, but the principle is the same.

    I haven't played much with onPartialUpdate, which isn't as relevant to my watchface, but it seems the limits would be even a bigger problem.

  • Specifically, I have some source grayscale pixel data which I'm scaling, clipping, rotating, and dithering to generate a different image every few minutes

    Isn't Dc.drawBitmap2 exactly what you're looking for? It can do the transformation and tinting in one go. Admittedly thats a 4.2.1 feature - but you said later that you're targeting fenix7, so that should work...

  • I haven't experimented with drawBitmap2, but just from the docs I don't see how I could achieve the result I want. The point is to dither the image after it's been transformed to screen pixels. If the API provided FILTER_MODE_DITHER, I guess I would just use that (no doubt with much better performance than I'm ever going to achieve). Note: I'm assuming _POINT means nearest neighbor and _BILINEAR is bilinear interpolation.

    Am I missing something?

  • A 16 character font can encode every possible row of 4 pixels.

    I have some potentially interesting results with this approach. I generated a font with all the combinations of 6 pixels (so, 64 glyphs). It wasn't too tricky to use that to draw each group of 6 pixels at once with one call to setColor/drawText.

    However, it's slower because I've traded (up to) 6 setColor/drawPoint calls for (up to) 4 more expensive drawText calls. This is using a lookup table to avoid actually converting runs of pixels into String values on the fly, because those calls seem to be very slow (e.g. StringUtil.utf8ArrayToString).

    I think could increase the number of bits per char, within reason. Probably 8 or 9 bits would work, if I can sort out the Unicode encoding issues. I assume at some point the font gets too big for the device to handle.

    With additional restructuring of my code, I could assemble larger runs of bits into multiple chars. If there's a lot of per-call overhead in drawText, this might pay off, but I'm not super optimistic.

    I'll keep experimenting, but I'm hopeful that this will lead to something good, thanks!