Techniques for faster pixel-level drawing?

Is there any non-obvious way to push multiple, arbitrary pixel values into a bitmap/display that's faster than making setColor and drawPixel calls on each pixel?

My watchface wants to fill a relatively large area of the screen with pixel values that it computes on the fly. Specifically, I have some source grayscale pixel data which I'm scaling, clipping, rotating, and dithering to generate a different image every few minutes.

I've done a lot of work to make my pixel-transforming code reasonably quick, such that now much of the execution time of each frame is just the calls to `Dc.setColor` and `Dc.drawPixel` that actually push pixels into an offscreen bitmap. I've verified that in the Simulator's Profiler view, and also by commenting out those calls and running on the device.

Right now I can generate and draw ~100 pixels on each update before my watchface uses too much time and gets killed. But it seems like there are enough cycles to generate many more pixels than that, if I could get them into a Bitmap or the display more efficiently.

For example:

  • Somehow access the raw memory used by an offscreen bitmap, say as an `Array<Number>`?
  • Put some pixel values in an array and copy (blit) them all at once to a Bitmap?
  • Use a smaller palette, fitting more pixels into each Number, and set multiple pixels at once using a single Number value?
  • Construct a bit mask and use it to fill multiple pixels with the same color in one call?
  • Somehow reduce the overhead of repeatedly mapping the same few Color values to 4-bit raw pixel values that I imagine is eating time?

Lower-level pixel operations like this are often available on performance-constrained platforms, but I can't find anything useful in the docs.

Some things that I've tried that aren't faster:

  • Offscreen Bitmap vs drawing to the display (same speed)
  • Use a palette with just WHITE, LT_GRAY, DK_GRAY, BLACK, and TRANSPARENT (same)
  • Write all pixels of each color in a single pass to avoid most of the setColor calls (the overhead of managing bit vectors, etc. is more than the savings)
  • Batch runs of similar values and draw using drawLine (doesn't really apply to my dithered pixels)

For the time being, I'm splitting the rendering across multiple update cycles, which is pretty tricky to manage, and isn't always responsive enough when the device decides not to call onUpdate when I have drawing to finish.

I realize I may be pushing the platform in a way it wasn't really intended to be pushed, but I really like the results I'm getting, and if such a technique existed, it would unlock lots of interesting possibilities (Doom, anyone?)

  • Am I missing something?

    I guess it depends... from your description I had assumed you were transforming a fixed greyscale image; in which case you could dither the fixed image and then transform (this might give slightly worse results, of course). But reading again, I see no reason to think that its fixed, or even a Bitmap; in which case, yes, drawBitmap2 would be useless...

  • That's a bummer, but definitely interesting.

    Is your pixel data stored in an Array or a ByteArray?  I had hoped ByteArray would allow faster conversion to string, but I didn't test this.

    Also, is any of your code available for tinkering, or is it private / proprietary?  It would be fun to experiment myself, but I understand if you don't want to share it.

  • asked some questions, and made a suggestion here. My tools don't currently provide an option to do what he suggested, but it's quite easy to estimate what the savings would be.

    I started with:

                for (var x = 0; x < 40; x++) {
                    for (var y = 0; y < 40; y++) {
                        var u = x + 140;
                        var v = y + 140;
                        dc.drawPoint(u, v);
                    }
                }
    

    With some code around it to time it, and then draw the resulting time (in ms) to the screen.

    Running that on a fenix5xplus averaged about 245ms.

    So then I tried:

                for (var x = 0; x < 40; x++) {
                    for (var y = 0; y < 40; y++) {
                        var u = x + 140;
                        var v = y + 140;
                        var z = dc.drawPoint;
                    }
                }

    I had to turn off the type checker to get that to compile (garmin's compiler doesn't like assigning a function to a local variable - but there's no problem actually doing it at runtime).

    This version averaged about 125ms. So a good chunk of the time is spent outside of the actual call to drawPoint. But some of that is loop overhead. To figure out how much I did:

                for (var x = 0; x < 40; x++) {
                    for (var y = 0; y < 40; y++) {
                        var u = x + 140;
                        var v = y + 140;
                        var z = u + v;
                    }
                }

    And now it took about 75ms. (I'll just note that I verified via "-g" that the binary actually computes u, v, and z in all cases, even though the compiler warns that the variables are unused. I'll also note that my optimizer would have stripped them out in both the second and third cases).

    So it looks like the lookup of drawPoint is costing about 50ms out of the 245 the original loop took. So if I did add an optimization along the lines suggested, it would get about a 20% speed up.

    Of course, your real code has more to do inside the loop, so we're probably looking at more like 5-10% at best.

  • The way I've always understood the watch dog timer, is that it's not based on time, but number of bytecode executed.  That's why it might happen faster in the sim as the pc/mac is faster than the device

    Ok so consider this:

        for (var x = 0; x < 100; x++) {
            for (var y = 0; y < 100; y++) {
                var u = x + 140;
                var v = y + 140;
                dc.drawPoint(u, v);
            }
        }

    The watch dog kills it, and 100x100 pixels isn't that big on a 454x454 display.  Now what if there is no drawings?

        for (var x = 0; x < 150; x++) {
            for (var y = 0; y < 150; y++) {
                var u = x + 140;
                var v = y + 140;
                //dc.drawPoint(u, v);
            }
        }

    150x150 triggers the watch dog

  • It would be more interesting to know the biggest size that doesn't trigger the watchdog in each case. Or are you implicitly asserting that 99 and 149 respectively don't trigger the watchdog? Also was your testing in the simulator, or on an actual device?

    But if its entirely based on bytecode count,  's suggestion still saves 2 bytecodes per call - although that is going to be a much smaller percentage.

  • As far as the exact numbers, try it yourself...

    90x100 didn't with the drawPixel.

    But now more to consider:

        for (var x = 140; x < 240; x++) {
            for (var y = 140; y < 240; y++) {
                dc.drawPoint(x, y);
            }
        }

    does 100x100 with no watchdog.  The first level optimizer is located between a chair and keyboard!

  • I just tried it myself, and with the call 97x97 doesn't trigger the watchdog, while 98x98 does. And without the call, 117 doesn't but 118 does. So the difference is much smaller than indicated in jim's original post.

  • The first level optimizer is located between a chair and keyboard!

    Sure... but if you were actually trying to optimize *that* loop nest you'd just use dc.fillRectangle...

  • The whole topic of this thread is drawing pixels!

    And I started with the code you posted..

  • If we have memory to spare (hah!) we can also partially unroll the loops so that the jumps and conditional checks only happen every 5 iterations or so.  Something like:

    // relies on assumption that we're iterating a multiple of 5 times
    for (var y = 140; y < 240; y += 5) {
        dc.drawPoint(x, y);
        dc.drawPoint(x, y + 1);
        dc.drawPoint(x, y + 2);
        dc.drawPoint(x, y + 3);
        dc.drawPoint(x, y + 4);
    }

    More of a general optimization trick than anything garmin- or graphics-specific.