String concatenation feature or bug?

Found this the usual way (meaning the hard way):

Background: 2024-10-13,06:25:21 beginLP _nextPos 0

		l_pos = '_' + _nextPos;
		_util.logTime("requestLogPos " + _nextPos + ", l_pos " + l_pos);
		
Background: 2024-10-13,06:25:21 requestLogPos 0, l_pos _

		l_pos = "_" + _nextPos;
		_util.logTime("requestLogPos " + _nextPos + ", l_pos " + l_pos);

Background: 2024-10-13,06:25:21 requestLogPos 0, l_pos _0

Characters can work, as in this:

	public function httpURL(php, rest) {
		return _host + php + '/' + G_uID + '_' + System.getTimer() + '_' + _httpRetrys + '/' + rest;
	}

		_util.logTime("requestLogPos l_pos " + l_pos + ", l_url " + l_url);
	
Background: 2024-10-13,06:25:21 requestLogPos l_pos logPos_0, l_url https://my.example.com/g1/lp/ffa0f28f1a9e91d2133a05fc4ff987c817fb2ab4_361254406_0/logPos_0

Multiple characters OK, single characters bad and have to use string as workaround. Normally it wouldn't matter except that strings are objects which add up after a while.

  • There are other differences that you'll need to take into account if you're trying to optimize for memory size. I did the same as you tried to do here, and in some places I changed for example char.toString() to "" + char, because at the end it takes up less code + data (at least in my case in that specific occasion)

  • That's really the prize here, getting the "best" code. Best being defined by the situation, so it's mostly subjective. It's hard to know for sure which reductions come from compiler optimizations vs. source code tricks to reduce runtime operations into fewer instructions.

    Since "string" as multiple characters is implemented as as a packed array of characters in every language I've ever used, I wouldn't expect Monkey C to be any different. Thus, concatenating two strings requires that the strings be unpacked into arrays of characters and manipulated from there. If you only need to add one character to the string, in theory it would be less expensive if that one character was already in the format needed, and didn't have to be unpacked from character array as a string with only one character. In theory. How Monkey C does it is anybody's guess, but I'm using single characters when that's all I need. Unless someone can demonstrate otherwise.

    I did see once that passing a null character, two single quotes with nothing in between, resulted in an invalid number number of arguments to a function call, so I had to use two double quotes instead. It seemed that the compiler optimized out the nullish argument if it was the last one. Or maybe it's a feature.

  • I don't know how Monkey C concatenates 2 strings, but if I was to guess I don't think it's the way you described it. I think the String object knows the length of the string, and concatenating 2 strings IMHO is more like:

    1. create a new string with the length of str1.length + str2.length

    2. copy str1.bytes to dst.bytes, starting from 0

    3. copy str2.bytes to dst.bytes, starting from str1.length

    '' is not a null character IMHO. Though I don't know what it is... I'm not sure if an "empty character" is a thing. A null character would be something like '\0'. Neither is "". It's an empty string. Depending on the language's implementation and how it stores where the string ends it could be that the byte array for such an empty string is: ['\0'], but it's also possible it's: {len:0, bytes:[]}

    Anyway start using strict type checking, it helps in some of these things and some more.

    • Deal with the language as it is (it won't be changed).
    • Adding integers to characters is fairly common outside of Monkey C.
    • Characters are often treated as type of integers. This is why adding an int to a char changes the value of the character.
    • + is understood as the concatenation operator when one of the operands is a string (the first being the best thing to be sure is a string).
  • '' is not a null character IMHO. Though I don't know what it is... I'm not sure if an "empty character" is a thing. A null character would be something like '\0'. Neither is "". It's an empty string. Depending on the language's implementation and how it stores where the string ends it could be that the byte array for such an empty string is: ['\0'], but it's also possible it's: {len:0, bytes:[]}

    "\0" is mostly a C language quirk.

    The "proper" way to deal with strings is to store the chars and a length. Thus, an empty string has a length of zero.

  • I did see once that passing a null character, two single quotes with nothing in between, resulted in an invalid number number of arguments to a function call, so I had to use two double quotes instead.

    You are really determined to think of the Char type in Monkey C as a special kind of string, despite all evidence to the contrary, huh?

    Like flocsy said, an "empty" (zero-length) character doesn't make any sense. (inb4 zero-width space char exists - clearly I'm talking about length in bytes). '' is not a null character, it's a nonsensical syntactical construct which is invalid in C (for example).

    Actually, in every case, the Monkey C parser simply treats any instance of '' (two adjacent single quote chars) as if it did not exist in the code in the first place.

    var x = 'a'; // valid code
    var y = 'a'.toNumber(); // valid code
    var x2 = ''; // <== syntax error at ";"
    var y2 = ''.toNumber(); // <== syntax error at ".";

    var z = 42 + '' 50 '' + 50; // valid code
    System.println(z); // prints 142

    System.println(''"abcd"''); // valid code, prints "abcd"
    System.println(''); // not enough args error, as you said

    It seemed that the compiler optimized out the nullish argument if it was the last one. Or maybe it's a feature.

    As usual, your intuition about Monkey C's character syntax/implementation is just plain wrong.

    As you can see from the examples above, the parser (most likely ANTLR based on the error messages it prints) simply treats '' as literally nothing. Not a null character (unicode/ascii 0), not an empty character (whatever that would mean), and not a null string.

    It would be better if '' was always a syntax error tho. That's definitely something I would change if I were Garmin. I don't think it's an intentional language feature, I think it's something that was overlooked when the syntax was defined.

    + is understood as the concatenation operator when one of the operands is a string (the first being the best thing to be sure is a string).

    In Monkey C, there's also the special case where "+" will concatenate characters (after conversion to string) if both operands are characters, which is why OP referred to "+" as the "character/string concatenation operator" and now expects c + x to result in string concatenation in every case where c is is a char and x is any other type.

    If Garmin had not implemented that special case, this thread would not exist.

    Since they have, we'll never hear the end of it.

  • If you were still of the mindset that "+" is the "character/string concatenation operator" (despite the fact nobody calls it that),

    Nobody except Garmin.

    Is that supposed be an epic own when I quoted that documentation before you did?

    The mistake here is really in assuming that characters should behave just like strings when you use "+" operator.

    Indeed, the documentation says:

    The + operator is also used to concatenate String values.
    Yes, it fails to explain all the cases where values are implicitly converted to strings when the + operator is used, which is a shame.

    They say that it concatenates strings, not characters, a point I tried to make over and over and over again. In all cases, it's concatenating 2 strings, it's just that in some cases, one or both of operands is implicitly converted to a string first.

    This should really hit home when you consider something like System.println("A " + false); // prints "A false"

    It doesn't directly concatenate a string and a boolean, it concatenates a string and the *string representation of a boolean*.

    In JavaScript and Monkey C, "+" is used to concatenate strings and numbers/booleans/null/etc (after implicit conversion to string), but js doesn't call "+", the "string/character/number/boolean/null concatenation operator".

    Same as you don't call "+" the "string/number" concatenation operator, despite the fact that it can concatenate a string with a number (after implicit conversion to string) in both js and Monkey C

    If you would stop thinking of "+" as the "character/string concatenation operator" and start thinking of it as the "string concatenation operator (* with one special where both operands are characters)", maybe you would finally be able to accept Monkey C's "+" operator behavior for what it is.

    CPU's are deterministic, otherwise they would be useless. Computer programming languages should be as well. Inconsistent rules are not ok. We will just have to agree to disagree.

    The behavior of Monkey C is certainly not non-deterministic in this respect. Just because you refuse to accept Monkey C's rules for string concatenation (especially the situations in which one or both of the operands will be implicitly converted to string), does not mean those rules are non-deterministic or inconsistent

    There are two rules for concatenation, one of which is fairly simple and intuitive, and other of which is a special case which arguably should never have been implemented (to avoid this entire discussion).

    1) If one or both of the operands is a string, any non-string operand is implicitly converted to string, and the two resulting strings are concatenated

    2) (here's the contentious one) if both operands are characters, they are implicitly converted to strings, and the two resulting strings are concatenated. Here's the part you refuse to understand: just because c1 + c2 (where c1 and c2 are characters) results in implicit conversion to String and concatenation, does not mean that c + x (where c is a character and x is any other type) will also result in implicit conversion to String and concatenation. You are literally the only person in this thread who thinks this should be the case.

    If not for the 2nd rule, I argue that you would have no problem with the behavior of "+" (bc there would be no expectation that it should concatenate characters in all cases), and as a bonus, you would stop implying that strings and characters should be more or less interchangeable in Monkey C.

  • Anyway start using strict type checking, it helps in some of these things

    Once again, strict typing would not help with the misapprehension in the OP.

    A char is not a handy type of string which has the perk of saving memory but is otherwise completely interchangeable with a string in every other way. But nothing will stop OP from insisting that it should be.

    Again, I'd be really curious to hear of a popular language which has both string and char data types, and basically treats chars the same as strings. Again, what would be the point of having a distinct char data type in that case? Simply to save memory when you want to concatenate what would otherwise be strings of length 1 with other data types (after implicit conversion to string)? That's a really narrow use case to justify creating a whole other data type.

  • We will just have to agree to disagree.

    Clearly Garmin doesn't see it your way either, unfortunately.

  • "+" is the "character/string concatenation operator" (despite the fact nobody calls it that

    Maybe you're right, and Google and Microsoft are wrong. I've accused them of being wrong about other things as well myself.