Big update to prettier-extension-monkeyc

I've posted about prettier-extension-monkeyc before, but I've added a bunch of new features that developers will probably like (well, I've been missing them, so maybe you have too).

The new features it implements for VSCode include:

  • Goto Definition. Point at a symbol, Ctrl/Cmd click, and it will take you to the definition. Or F12
  • Goto References. Right click on a symbol and select "Goto References". It will show you all the references. Or Shift-F12
  • Peek Definition/Peek References. Same as above, but in a popup window so you don't lose your place in the original document.
  • Rename Symbol. Right click on a local, function, class or module name, and select "Rename Symbol". It will rename all the references. It doesn't yet work for class members/methods.
  • Goto Symbol. Type Ctrl/Cmd-Shift-O and pick a symbol from the drop down (which has a hierarchical view of all symbols in the current file). This also appears as an outline across the top of the file.
  • Open Symbol By Name. Type Ctrl/Cmd-T, then start typing letters from a symbol name. A drop down will be populated with all matching symbols from anywhere in your project.

Older features include a prettier based formatter for monkeyc, and a monkeyc optimizer that will build/run/export an optimized version of your project.

[edit: My last couple of replies seem to have just disappeared, and the whole conversation seems to be in a jumbled order, so tldr: there's a new test-release at https://github.com/markw65/prettier-extension-monkeyc/releases/tag/v2.0.9 which seems to work for me on linux. I'll do more verification tomorrow, and push a proper update to the vscode store once I'm sure everything is working]

  • 0. Cool, 2.0.40 now generates code where code+data is less than it was with 2.0.38 :)

    1. When I have warnings it display only the relative path:
    WARNING: fenix3: source/Config.mc:1360,24: This comparison seems redundant because query should never be null

    So when I click on the file to be opened it doesn't know which one it is, I have to chose from a list. I think previously I could just click (probably the path was absolute). My guess is that you don't like long logs (as I don't :) and made it shorter, but now the Ctrl+click doesn't work. 

    2. But then: Config.mc has only 140 lines, so how come it became line 1360??? Something else has to be the problem.

    3. Then here's the number normalization thing: I already noticed previously that I might use 7 or 0x07 and you'll probably use 7, and when I use 0xFF then it becomes 0xff. Now I understand (and agree) that in some cases there's no way to keep it as it was (for example in the same function I used 7 and 0x7 and you change it to pre_7) but it would be nice to keep the format when there's no reason not to do it. Why? Because when I search for 0xFF I hope I'll find it, and I don't remember to also search for 0xff or 255. So why I mention it now? Because of this:
    The type Number<4294967295> cannot be converted to Float because they have nothing in common
    Which turns out to be 0xFFFFFFFF if anyone does't recognize it :)

    4. Which brings to me to this: what should be the code? The warning seems to be right. I have this in the code:
    const INVALID_FRACTIONAL_ZONE = 0xFFFFFFFF as Float;
    So what's the right way to represent the float that has all the bits as 1?

    5. Looks like the redundant comparsion warnings (or I guess any warning) are a bit hard to find because of the inining:
    I have this warning:
    source/ExtHRMField.mc:1059,13: This comparison seems redundant because info should never be null
    and this line indeed is underlined with yellow in VSC:
    extHr = strfHrZone(mConfFormat1, extHr, "", "", computeIntHR(info));

    but there's no null check in that line... unless we look at the inlined function:

    (:inline_memory16K)
    hidden function strfHrZone(format as String, extHr as AlphaNumeric, extZone as AlphaNumeric, fractZone as String, intHr as AlphaNumeric) as String {
      var name = mName;
      if (name == null) {
        name = "";
      }
      return Lang.format(format, [extHr, intHr, name, extZone, fractZone]);
    }
    so there are 3 problems here:
    5.a the warning's statement is false. mName is String?; and it does get null assigned to it in some circumstances, even the default is null (not assigned: var mName as String?;)
    5.b it's confusing / not very useful to see the warning in the place it's inlined into (I can just imagine what happens when a() calls b() and b calls c() and both b and c are inlined and there's a warning in c) I know this is a big wish, but I guess I don't need to elaborate how useful this could be if we saw the warning on the line it really belongs to.
    5.c this maybe the explanation to #2 above: I guess that the code that prints the warning on one hand noticed that the source is in Config.mc, on the other hand somehow it got the line number of the place where it's inlined into in another file
    6. Oh, great, thanks to the warnings I could fix some bugs that were hidden from me until now :)
  • I tested this, and I was right:

        public function initialize() {
            var heart_rate = "ANT_HR";
            f(heart_rate);
    
            var name = "avg_" + heart_rate;
            f(name);
            f(name);
    
            name = "max_" + heart_rate;
            f(name);
            f(name);
    
            name = "min_" + heart_rate;
            f(name);
            f(name);
        }
    

    even 2.0.40 changes this to:

        public function initialize() {
            var heart_rate = "ANT_HR";
            f(heart_rate);
    
            f("avg_ANT_HR");
            f("avg_ANT_HR");
    
            f("max_ANT_HR");
            f("max_ANT_HR");
    
            f("min_ANT_HR");
            f("min_ANT_HR");
        }
    }

    and even though this decreased code by 16 bytes it increased data by 22 bytes, so this was a bad decision of the optimizer.

  • My guess is that you don't like long logs (as I don't :) and made it shorter, but now the Ctrl+click doesn't work.

    They've always been relative paths in the output - but that has always worked for me. When I Cmd-click on the error in the output pane, it takes me to the code. Also, the "problems" pane gets the full path anyway, so clicking on the error there should always work.

    Do you maybe have a strange project layout, and the jungle file isn't at the root?

    But then: Config.mc has only 140 lines

    I think your guess lower down is probably correct; but it means that somewhere I lost track of the location info. When a function gets inlined, I try to consistently adjust the location info to match the caller (and yes, I accept that that is less than ideal; but preserving the info can also cause problems).

    So what's the right way to represent the float that has all the bits as 1?

    What exactly are you going for? That's not a valid float; so perhaps you just want NaN? Whatever you're doing, "0xffffffff as Float" is just the Number -1 at runtime. The "as Float" only tells the type checker what to believe about it; it does nothing to its value.

    5.a the warning's statement is false

    The warning refers to "info" though, so I don't think its talking about anything in strfHrZone, but about the expansion of computeHR(info).

    5.b it's confusing / not very useful to see the warning

    Agreed. I'm planning to pull out the type checking pass as part of the "live" analysis that the extension does (you may have noticed that unlike all the other warnings the extension produces, these currently only happen when you actually build). That will run on the unoptimized code, so it won't be a problem there. But that will miss some of the problems because some issues don't show up in an inline function until you actually inline it. ie after inlining, some inlined instances of a function may warn, while others do not; so the analysis-only pass won't see all issues. So I'll try to come up with a better way of recording inlined locations, so that it can show a "stack":

    WARNING: foo.mc:100 "Something went wrong" in function bar inlined from bar.mc:150 in function baz inlined from baz.mc:22

    I'm not sure how well that will integrate with VSCode's diagnostics, but at least it should be able to show you where the problem came from.

    Oh, great, thanks to the warnings

    Yes, I found the same. Mostly harmless, but I had no idea they were there. And I've got a lot more checking to add...

  • and even though this decreased code by 16 bytes it increased data by 22 bytes, so this was a bad decision of the optimizer

    I guess there's always a danger of something like that coming up, but I think it's generally a good optimization to do. I think the best solution here is to give you a way to prevent it; and the easy way to do that is to provide a wrapper identity function, so you can say:

    (:identity) function id(x) { return x; }
    
    function foo() {
      var name = id("avg_") + heart_rate;
      ...
    }
    
    
    

    And then the optimizer treats id as an opaque wrapper, and then removes it from the code right at the end. I've been thinking of adding something like this anyway, because a lot of the tests I wrote a long time ago are no longer testing what I originally intended, because the optimizer changes the code (or sometimes gets rid of it entirely).

    Also, I'll just note that the problem *isn't* that there are two instances of each string. Garmin's compiler de-dupes strings, so the real problem in your test case is that it started with one 6 character string, and ended with three 10 character strings.

  • Do you maybe have a strange project layout, and the jungle file isn't at the root?

    No, it's in the project root. Maybe the difference is that I have same backed up files and they are under saved/source/Foo.mc so it doesn't automatically know what source/Foo.mc should it choose?

    The warning refers to "info" though, so I don't think its talking about anything in strfHrZone, but about the expansion of computeHR(info).

    Ah, yes, that was it. But it's very confusing that I see it in the uppermost call (the one that is in a function that is not inlined) but the problem can be in a very different place.

    WARNING: foo.mc:100 "Something went wrong" in function bar inlined from bar.mc:150 in function baz inlined from baz.mc:22

    So you're saying that baz.mc:22 wouldn't be enough because the same function inlined in different places would or would not produce the warning? Like:

    function a(s as String) {
      s(s);
    }
    function b(s as String?) {
      s(s);
    }
    (:inline)
    function s(s as String?) {
      if (s == null) { code that produces warning } else {code that doesn't produce warning}
    }

    So in the if's line I might get a warning when called from b but not when called from a?

  • I'm not sure... this (:identity) has some problems:

    1. it increases the non-optimized code size

    2. How would a regular user know where to use it? The only reason I saw this problem was because I diff-ed 2.0.38 and 2.0.39. There might be numerous other cases when the optimizer creates suboptimal change. And I'm not talking about sub-optimal optimization! The difference is this:

    sub-optimal optimization:

    original code: 100, data: 50
    optimized code: 95, data: 50
    but in theory the optimizer could decide to do better:
    code: 99, data: 40

    the above is fine. Next version of the optimizer will improve, and all is fine.

    What I don't like is sub-optimal change:

    original code: 100, data: 50
    optimized code: 95, data: 58

    Just as the size based PRE compares different options and chooses the best one, the original unchanged code should also be considered as an option and kept in case a better version wasn't found.

    I'll ask this: because at the end of the optimized compilation I see the code and data size printed I assumed that the size based decisions are based on the sum of the two. Is it possible that the optimizer only considers the code size and that's why it preferred inlining the strings instead of the manually written and tested code where it had (less) shorter strings that even with the code needed to concatenate them and store them as a variable was still better?

  • 1. it increases the non-optimized code size

    Yes. But if you're not going to use the optimizer, you don't need to worry about non optimized code size.

    But actually, there is another option for your specific issue, that will already work.

    I noticed that constant propagation was breaking some of my PRE tests (in the sense that it changed the code that PRE was supposed to optimize enough that it was no longer really testing it). So I added (:noConstProp) which can be applied to a function to disable it. That should have no overhead in non-optimized builds, but is a function-at-a-time thing.

    2. How would a regular user know where to use it?

    They're wouldn't. Its for users who are working hard for every byte.

    What I don't like is sub-optimal change

    Sadly, it's unavoidable. Almost every change I make will make *some* things worse. And the more carefully you've tuned your application to the *current* version, the more likely it is that something will get worse in the next version.

    If you've not noticed this before, its because the changes I make are intended to be good "on average"; meaning that across a broad range of code, most things will get smaller - which I determine by compiling hundreds of open source projects, and a few of my own. I then look at whether the overall size has gone up or down, and also look at the before after code diffs to see where things might have gotten worse. I make every effort to ensure that things only get better, but it's unrealistic to think that will always be possible.

    'll ask this: because at the end of the optimized compilation I see the code and data size printed I assumed that the size based decisions are based on the sum of the two

    No. For the most part, I can't even predict the sizes. It's very much dependent on what Garmin's compiler does. The numbers I print out after the build are obtained by decoding the binary, and literally extracting the size of the code and data sections. I also can't predict how various optimizations will interact.

    Take your suggestion for example. Suppose I could predict the code/data size for a function (and tbh, I could, if I thought it would help come up with a good guess; although the actual sizes would vary as Garmin releases new sdks, of course). So your suggestion is to apply all my optimizations, and then go back to the unoptimized version if it would result in smaller code/data size.

    Ok, so I do that, and find that one function is worse after optimization; so I go back to the original, but now that version didn't inline one of your (:inline) functions (obviously, because its the original code), so I can't delete the callee, and now things are worse than if I'd kept the optimized version that was two bytes longer. So its not enough to compare function by function whether things are better or worse; I have to compare every combination of optimized/unoptimized functions, and then for each combination strip all unused symbols and recompute the overall size. Presumably you can see why thats not realistic.

    But back to how I make decisions. The one thing I do mostly know is how much code a variable reference takes (eg X.Y.foo, or $.A.B.x), how much the various kinds of literals take, and how much a local variable takes. This lets me make fairly accurate decisions in PRE, and I can pretty much guarantee that barring bugs, every change made by PRE will reduce code size. But the main reason for that is that it's the very last thing I do; so nothing PRE does interacts with any other optimizations.

    And even then, note that it often makes very suboptimal decisions; and although I've made several improvements to PRE, pretty much every "improvement" has resulted in worse results for certain code. ie, although its guaranteed to be better than not running PRE, and each new version is better than the previous version *on average*, on any particular piece of code, it *may* produce worse results. And as I said, PRE is pretty much the easiest thing to asses wrt code size.

    Otoh, constant propagation, for example, will often make things worse; but the expectation is that PRE will fix those cases; and the combination of constant propagation and PRE will *usually* be better than PRE alone - but not always. And only doing constant propagation when its profitable isn't realistic because it often enables optimizations that wouldn't be available otherwise, and it's almost impossible to predict when it will, and when it won't. eg I could decide not to change "var x=0; foo(x,x)" into "var x=0; return foo(0,0)" because that makes the code bigger; but it looks like x is dead now, and dropping that will make it smaller. And if foo is "function foo(x,y) {return (x+1)*(y+2);}", the original code is going to end up as "return 2;" which is much better...

    So I'm sorry to say that your expectations would be unrealistic for a state of the art optimizer written and maintained by a team of full time programmers, and certainly unrealistic for a hobby project.

    That said, there is always room for improvement - and Im always looking to make improvements. If you have specific cases that you think are wrong, I definitely want to hear about them; but sometimes the answer is going to be that the optimization in question is usually good, and its hard to determine that its bad in your specific case until its too late. I may be able to give you a way to disable it selectively (and something like id is typical in optimizing compilers).

  • 1.

    it increases the non-optimized code size

    Well, I still like to run and especially debug the code without the optimizer, and keep the gained free memory for production. This really helps for example when I test it with my settings, but saves enough memory so if a user puts the longest possible string in every place it still doesn't experience out of memory error. (I know what you'll say, that in case it's an advantage that the non-optimized code is bigger ;) but well, I also would like to be able to run the code without optimizer and without out of memory problems)

    2. Regarding (:noConstProp) - it's the same: how would I know where to put it? What if I put it, but later the code changes and it's no longer optimal? The difference between manual optimizations I make (like if I have 3 or more number literals then I replace it with a variable) can also be broken later (I remove one usage, and it's no more beneficial to have a variable) but these things are things I know about, and unless the compiler will change somehow, even if I miss this immediately I am able to recognize it later. But the optimizer is a back-box. How should I know when/where it's better to have the annotation and where not? How should I know if for 2.0.41 the same will be true? Because of this it would only make sense to do it automatically IMHO. But isn't size based PRE for this? So it can make decisions based on multiple variations?

  • I tested this, and I was right:

    One more data point. I verified your results - the optimization is a 6 byte overall regression. But then I added

    public function foo() { return "min_ANT_HR"; }

    And now my optimization is a 6 byte win. If foo returns ["min_ANT_HR","max_ANT_HR", "avg_ANT_HR"] the optimization is a big win.

    So now you say - well its obvious what to do; just look for the new strings elsewhere, and if they're present do the optimization, otherwise don't.

    But what if foo was 

    public function foo() { var hr = "ANT_HR"; return "min_" + hr; }

    Now the optimization is a regression for *both* functions when looked at individually, but its a win when they're done together.

    So again, we're looking at an O(2^N) process to determine whether the optimization is good, before we even start thinking about how it interacts with *other* optimizations.

  • Well, I still like to run and especially debug the code without the optimizer, and keep the gained free memory for production

    So just edit Devices/<device>/compiler.json, and increase appTypes[].memoryLimit, and you can debug as much as you want.

    If you don't like that, copy an existing device, rename it (you'll have to rename it in the various json files too), and increase the memory limit there.

    2. Regarding (:noConstProp) - it's the same: how would I know where to put it?

    Well, thats up to you. When I update the optimizer you can

    1. check to see if the new code is smaller than the old, and be happy if its smaller; or
    2. check to see if the new code is smaller than the old, and then even if its smaller, investigate every single change to see whether it made things better or worse, and why.

    In case 1, you don't use or care about the feature. In case 2, you use it when you've determined that it's worthwhile.

    [ this is not meant to be snarky; I appreciate you reporting where things got worse, and when I can I'll fix them ]

    But isn't size based PRE for this? So it can make decisions based on multiple variations?

    Well, yes, I suppose so to an extent.

    I mean yes, I could have it look for strings with common substrings, and then break out the substrings and reconstructed full strings as pre variables. But thats definitely going to come long after it handles "i + 2"!

    And it *still* suffers from the problem that doing it one function at a time can't see the big picture (ie you don't want to convert "min_ANT_HR" to "min_" + "ANT_HR" if there's another function that references "min_ANT_HR", unless you can also profitably do the transformation in the other function as well).

    I can't believe I'm actually thinking about all the moving parts that need to be tracked to make this work :-)