Tips for programming good performance

Hi,

I designed a datafield which shows a lot of information. Some of that information needs to be calculated (like lap pace, 60 sec pace, 10 sec pace, etc). Apart from this I have some string format calculations and some (maybe a lot) drawings to do. During my last run I realized that:
a) the datafield's graphics updates only every 2nd second
b) the last lap time (I visualize the current lap time and the last lap time via the datafield) is sometimes 3 seconds behind that lap time the watch calculates for its own (and sometimes not - i. e. I loose some +1 events)
c) the lap pace jumps around up to plus/minus 10-15 seconds during the lap (which is not if you simulate the run afterwards in the simulator)

I checked my code and realized e. g. that I converted the elapsed time from ms to seconds in every pace calculation, in fact I did that 4 times. OK, no problem, I refactored the code here. Furthermore I handled the calculation of lap time in two different threads seperately. I have to refactor the code to calculate and use the lap time just once.

After this obvious optimizations there are several more left which I want to discuss here:
1. The algorithms itself to calculate the paces
2. The math algorithm to ease the micro processor's life
3. The drawings

To 2.:
For the 60 seconds pace I have written this:
sixtySecondsSpeed = ((sixtySecondsSpeed * 59) + info.currentSpeed ) / 60;


Maybe it is more effective for the micro processor to code something like this:
sixtySecondsSpeed = (sixtySecondsSpeed * 64) - sixtySecondsSpeed;
sixtySecondsSpeed = (sixtySecondsSpeed + info.currentSpeed) /64;

IMO the second implementation could be much more efficient since the multiplication and division is just a bit shift for the micro processor. The disadvantage is that the average is now a 64 seconds average, but this is negligible.

Maybe the same is valid for comparison of counting values: I have e. g. this code:
if (LapTime < 20) { do something } else { do other stuff }

Maybe it is more efficient to code this:
if (Laptime = 0) { LapTimeSmallerThanTwenty = true; }
if (Laptime = 20) { LapTimeSmallerThanTwenty = false; }
if (LapTimeSmallerThanTwenty) { do something } else { do other stuff }


What do you think?



To 3.:
I have to do all of the drawings again and again in the onUpdate() method. IMO this is not efficient since a lot of parts of the datafield are static (like lines and labels). Therefore I tried to use a layout but the datafield gets very fast out of memory if I define all the datafield's lines and labels in the XML file. Therefore I am seeking for an option which is more efficient than drawing the same every second. Any hints on that?


To 1.:
Algorithm to get to the mean values. I showed in (2.) above how I calculate the mean values of the 10 seconds and 60 seconds pace. Nevertheless, maybe there is a better and more efficient way to calculate the 10 seconds pace. Apart from the algorithm itself I could use a 8 seconds or a 16 seconds pace and would ease the compiler's resources as I showed in (2.). But I consider to update the entire algorithm:
1. I define an array with 10 elements and use it as a FIFO
2. I write the following (pseudo) code (note that tenSecondsSpeedSum shall be the sum of all array values):
tenSecondsSpeedSum = tenSecondsSpeedSum - tenSecondsSpeedArray[0] + info.currentSpeed;
tenSecondsSpeedArray[0:8] = tenSecondsSpeedArray[1:9];
tenSecondsSpeedArray[9] = info.currentSpeed;
tenSecondsSpeed = tenSecondsSpeedSum / 10;


By doing this I have the following advantages:
  • I get rid of the multiplication
  • I get a real 10 seconds average

but I get following disadvantages:
  • I get an additional minus operation
  • I need to define an array with 10 elements which needs memory


What do you think?
  • Former Member
    Former Member
    Which device are you using?
  • I don't think your calculation for sixtySecondSpeed is right. I don't see how you exclude any previous speed value from the calculation...

    sixtySecondsSpeed = ((sixtySecondsSpeed * 59) + info.currentSpeed ) / 60;


    Am I missing something?

    If you use layouts, you aren't avoiding any drawing. The system will call dc.drawText() to draw the labels you put into the layout, just as you would need to do so manually. The only thing layouts get you is making it easier to support various device form factors. You'd have to do something smarter to avoid redraws, and it is my understanding that tricks to do this won't work on some devices as they clear the screen before passing the dc to you for redrawing.

    As for trying to use cheap power-of-two divides, I think you're going too far. If you really want to do it, you should do some performance testing to determine how much benefit there is (I don't believe anyone has profiled the difference between (x / 60) vs (x / 64) vs (x << 6). I have doubts that the MonkeyC compiler does this sort of optimization itself, so chances are good you'd need to replace the divides with shifts anyway. Not to mention the fact that those optimizations depend on x being an integral, which means you'd have to scale the value to get reasonably accurate results.

    Finally, your algorithm for getting the tenSecondSpeedSum can easily be optimized by using a circular buffer of values. This will allow you to avoid copying values down as they are removed and instead just replace the oldest value. Note also that it isn't a good idea to do this kind of thing with floating point values as error can creep in. You may want to consider re-calculating the sum from scratch every frame to avoid precision issues.

    Travis
  • You'd have to do something smarter to avoid redraws, and it is my understanding that tricks to do this won't work on some devices as they clear the screen before passing the dc to you for redrawing.
    Travis


    Yes, I've seen this with data fields on watches (i wanted to skip every other onUpdate() in a DF that just used dc calls), and the way I understand it, is actually the whole screen is cleared with the DF background color, then the lines separating the data fields are drawn (by the FW), then each CIQ DF does it's own thing in it's own cleared space on the screen when called in onUpdate().

    Skipping every other one made the DF blink.
  • So this might not be a problem for a watch-app.
  • So this might not be a problem for a watch-app.


    except for a watch app on an Edge 520 (and maybe the other edges) Seems in the FW there, a "clear" is done before the onUpdate() in the view for watch-apps.

    With Hike2 (I think you know that one! :) ), the screens that draw graphs and the chart page, I ignore every other onUpdate() (I don't need to redraw the same things as often!). On a 520, the same screens "blink"!.

    On the other devices, yes, partial redraws or ignoring onUpdate() calls shouldn't be a problem at all for watch-apps if you just use dc calls (You know me, I never use layouts! :) )
  • I don't think your calculation for sixtySecondSpeed is right. I don't see how you exclude any previous speed value from the calculation...

    sixtySecondsSpeed = ((sixtySecondsSpeed * 59) + info.currentSpeed ) / 60;


    Am I missing something?


    No, you are missing nothing. IMO this is an easy way to save valuable memory space (by not defining an array of 60 values) and is called in german (I haven't found an english translation): PT1-Filter: Accu = Value * Weight + Accu * (1.0-Weight);

    ... you should do some performance testing to determine how much benefit there is (I don't believe anyone has profiled the difference between (x / 60) vs (x / 64) vs (x << 6). I have doubts that the MonkeyC compiler does this sort of optimization itself, so chances are good you'd need to replace the divides with shifts anyway.


    This is a good hint. Are there any performance analysing tools and experiences for monkey C?

    Not to mention the fact that those optimizations depend on x being an integral, which means you'd have to scale the value to get reasonably accurate results.


    To get accurate results, it's important to calculate the multiplication first, do the other arithmetic before doing the division.

    Finally, your algorithm for getting the tenSecondSpeedSum can easily be optimized by using a circular buffer of values. This will allow you to avoid copying values down as they are removed and instead just replace the oldest value. Note also that it isn't a good idea to do this kind of thing with floating point values as error can creep in. You may want to consider re-calculating the sum from scratch every frame to avoid precision issues.
    Travis

    Do you have a code snippet for a good algorithm?
  • Is this a good algorithm?

    var speedHistory = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0];
    var tenSecondSpeedSum = 0.0;
    var tenSecondSpeed = 0.0;


    And in the compute(info) method:

    tenSecondSpeedSum = tenSecondSpeedSum + info.currentSpeed - speedHistory[0];
    speedHistory.add(info.currentSpeed);
    speedHistory.remove(0);
    tenSecondSpeed = tenSecondSpeedSum / 10;


    Or doing it with 8 values:
    var speedHistory = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0];
    var eightSecondSpeedSum = 0.0;
    var eightSecondSpeed = 0.0;


    And in the compute(info) method:

    eightSecondSpeedSum = eightSecondSpeedSum + info.currentSpeed - speedHistory[0];
    speedHistory.add(info.currentSpeed);
    speedHistory.remove(0);
    eightSecondSpeed = eightSecondSpeedSum >> 3;
  • You just need to remember all items you use are objects and VM handles them as objects.
    If you call delete/add on array it causes memory fragmentation, most simple variant is to use index in array, that points to the current value and average the array when you need.
    const totalX = 50;

    var timerX;
    var countX; // current value index
    var valuesX; // array with values

    function initialize() {
    View.initialize();
    valuesX = new [totalX];
    for (countX = 0; countX < totalX; countX += 1) {
    valuesX[countX] = null;
    }
    countX = 0;
    Math.srand(Sys.getTimer()); // initialize random generator
    }

    function onLayout(dc) {
    timerX = new Timer.Timer();
    }

    function onShow() {
    timerX.start(method(:onTimerX), 2000, true);
    }

    function onHide() {
    timerX.stop(); // save energy, I believe it works
    }


    function getValue() {
    return Math.rand() % 256; // do some stuff here for your values
    }

    function onTimerX() {
    var cnt;
    var averageX = 0;
    var averageXcnt = 0;
    var tmpSt; // resulting string

    valuesX[countX] = getValue();
    Sys.println(valuesX[countX]);
    for (cnt = 0; cnt < totalX; cnt += 1) {
    if (valuesX[cnt] != null) { // value should be valid
    averageX += valuesX[cnt];
    averageXcnt += 1;
    }
    }
    Sys.println(averageX);
    Sys.println(averageXcnt);

    countX += 1; // increment index, cycle if high index reached
    if (countX == totalX) {
    countX = 0;
    }

    if (averageXcnt == 0) { // no items still available to average
    tmpSt = "*";
    }
    else {
    averageX /= averageXcnt;
    tmpSt = averageX.format("%5.1f");
    }
    Sys.println(tmpSt);
    }
  • tenSecondSpeedSum = tenSecondSpeedSum + info.currentSpeed - speedHistory[0];
    speedHistory.add(info.currentSpeed);
    speedHistory.remove(0);
    tenSecondSpeed = tenSecondSpeedSum / 10;



    I'm not sure how the revised Lang.Array is implemented. It seems that it should have a capacity and only reallocate when necessary (like a C++ std::vector), but it is possible that it it always reallocates when being resized. If it reallocates for every add/remove, then that code is potentially very expensive (allocate new, copy old-to-new, deallocate old, allocate new, copy old-to-new, deallocate old). If Lang.Array resizes as necessary, then in most cases the add/remove will only involve incrementing the size, copying each element down one position, then decrementing the size. This is still pretty expensive.

    If you call delete/add on array it causes memory fragmentation
    .
    I've seen nothing to indicate that using add/remove on an array causes fragmentation (or reallocation at all). Do you have any evidence to indicate otherwise?

    All that aside, the code a1234 has provided is similar to what I'd write. Once you've determined how many samples you need, no memory should be allocated. If you are worried about error creep when using floating point values, you can handle that as well. This is an approximation of what I've used in the past. I just wrote it for this post and haven't tested it, so there could be bugs.

    class averager
    {
    var samples;
    var sum;
    var index;
    var count;

    function initialize(size) {
    resize(size);
    }

    function put(new_sample) {

    var old_sample = samples[index];
    if (old_sample != null) {
    sum -= old_sample;
    count -= 1;
    }

    samples [index] = new_sample;

    if (new_sample != null) {
    sum += new_sample;
    count += 1;
    }

    // the mod could be expensive, could use the if below
    index = (index + 1) % samples.size();

    //index += 1;
    //if (samples.size() == index) {
    // index = 0;
    //}
    }

    function average() {
    return 1.0 * sum / count;
    }

    // change the number of samples, or reset everything back to
    // the initial state
    function resize(size) {
    samples = new ;
    for (var i = 0; i < size; ++i) {
    samples= null;
    }

    sum = 0;
    count = 0;
    index = 0;
    }

    // recalculate the components for the average. this is useful if we want to
    // eliminate any error that has crept into `sum' by many calls to `put'
    function recalculate() {
    sum = 0;
    count = 0;

    for (var i = 0; i < samples.size(); ++i) {
    if (samples!= null) {
    sum += samples;
    count += 1;
    }
    }
    }
    }
    [/code]