How can I include "binary" string as resource?

I have some string resources that until now were included in my app as regular strings. However they are huge and I figured that I'll compress them to decrease the prg size. I have the compress.py that generates the strings.xml with the compressed strings and the Monkey C decompress. The problem is that the compression essentially turned the original unicode strings into binary. There can be any byte in it (0-255).

First I tried to escape them in the xml file, but it seems to be impossible "by definition" of xml, because most of the control characters (0-31) are impossible to escape in an xml.

Then I generated the strings as JsonData, each string in it's own file, but I have similar problems.

Here's an example json file:

"_}\u00fa\u001b;gB\u00fc\u00d9\u00a77c\u00b67?_\u00casw\u00dfW\u00da\u00bf\\\u00db\u00b7o\u00b7\u009d\u00da1\u00b6\u00fd\u00af\u00da\u008b\u00e7\u001dY\u00c7W2\u00fc\u00f9u[m\u0017==Qm\u001a\u00e7\u00e7\u00db\u00f1\u0013=\u00dd\u00f1\u00b6\u00f3\u00d5\u00a6W\u00ca\u00b3o\u00c6\u00b1\u00ed\u00b3\u0018;\u00d11|;\u00de\u0087F\u0019o\u00db\u00bf}\u00ba\u00d3v;\u00d1\u00f3"

  • I assume as you can run the uncompress command you do not need the strings for your properties, so a way out may be to generate your strings as global vars or even a global array in a .mc file

  • Yes, the input for uncompress() is the compressed string. But if I add the strings as constants then it would make it not possible to even start the app on older devices (that don't have Sysytem7, and having constants would keep them in the code (in the memory) and also in the data. At least that's my understanding. Did I misunderstand it?) And actually even in newer devices, this would mean that though the strings wouldn't occupy double space in the RAM, it would still have ALL strings in the memory ALL the time, which would probably crash with out of memory on some devices. I did start at the very beginning with constants, and then I moved them to resources so I only load and keep in the memory the 2 strings I need at a time.

  • It was a solution for the problem you stated, I'm not saying it's ideal.

    A variation could be to inline your uncompress statements where you actually need them which would reduce the memory pressure a little bit.

    However a big issue with all paths (even your string resource path if it would actually work to store a binary string) is that it's going to be a killer for your battery doing uncompress over and over....

  • That's true. That's the reason I wanted to use WeakReferences to make a cache that uses as much memory as is available, and then frees it when needed. But it turned out that WeakReference wasn't created with this usage in mind, so it's not possible to use it, because it immediately frees memory that is only referenced by WeakReference. So the only way I could do some kind of cache is to use System.getSystemStats().freeMemory. Which is doable (especially that I even know each string's size before I load them), but I don't think that I'll do it for now.

    BTW it's an interesting question what uses more battery I/O or computation. My compression code is pretty straightforward, it's not really a compression but rather a packing. Since there are only 26 (capital) latin letters 5 bits is enough for one letter so I was packing them one after the other. This would ideally use 5 bytes to store 8 letters, if I could just store them as they are. The problem is that exactly the 0-31 are control characters, so only a few of them are allowed in an xml. With json IMHO it's even worse. I THINK that xmls would be stored as binary in the prg. So if I have "&" in my xml file it would be translated to "&" so it would only use 1 byte in the app. But with json I don't think this is the case. And unfortunately (probably for the same reason with the control characters) most of the compressed data was written out by python as "\u1234" which IMHO is stored exactly as it looks like, and only is parsed into 2 bytes in loadResource. I might be wrong here, but unfortunately it failed before I got to the point where it would be necessary to check this theory.

    So for now my option would be to only use 7 bits and then move the blocks, so there's nothing 0x00-0x1F and also 0x80-0x9F, because both are reserved (though it turns out it also depends whether I use xml version 1.0 or 1.1 which have different reserved characters, and no 1.1 is not "better" than 1.0, at least not for my use case...) But to compress this way would mean that I can only fit 7 letters in 5 bytes and the decompression is more complicated, so it probably isn't worth it. So I commented out the compression/decompression for now. If there will be a way to store and retrieve binary data then I'll revisit the idea.

  • But if I add the strings as constants then it would make it not possible to even start the app on older devices (that don't have Sysytem7, and having constants would keep them in the code (in the memory) and also in the data. At least that's my understanding. Did I misunderstand it?) And actually even in newer devices, this would mean that though the strings wouldn't occupy double space in the RAM, it would still have ALL strings in the memory ALL the time, which would probably crash with out of memory on some devices.

    I've had this problem in the past - the need to have large-ish amounts of constant binary data in my app for old devices which didn't support json resources.

    My solution was to wrap the data in function calls - instead of having global constants, I'd have functions which return the constant data.

    Of course, this still incurred a memory hit for the code to initialize the data, but it avoided the problem where the data would be in memory at all times. I only had to worry about memory spikes due to accessing the data at the times I needed it.

  • Why, where is the code???? Is it loaded dynamically only when you call a function?

    I don't think so. According to what I saw in the memory viewer code is in the memory all the time. This would work if you had a small algorithm that generates big data, but my use case isn't like that. I would generate the data before build.

    As for the constants according to the changelog of sdk 7 for sdk 7.0.0-beta1 devices now: "Place constant array and dictionary definitions into the data space to reduce code space", so if I understand what this means then old devices will have the constants in the code, and also in the data so you can reach them, but new devices will only have it in data (so IMHO half the amount - the half that until now was in the code space - is now spared in the memory)

  • Ah, now I understand :) So you do have the whole dat in the code, but only the small parts are copied to the data - on demand. So this is a middle solutions, it's not as good as it would be to have them in resources, but not as good as we can have now with system 7 devices, where we only have to have them once in the data, and then it can be used without additional memory need.

  • Yeah exactly. In my case, it was binary data representing custom layouts in my bespoke run-time layout engine (which I crafted to save memory for old devices). I wrapped each custom layout in a function which would be called on demand.

    I only needed the layout data during onUpdate(), so I had to take the memory spike at that point. At other points of execution, I could use the memory that I saved to implement other features. Ofc this doesn't help if you need that extra memory you saved for code, since code is always present in memory. It only helps if you use the extra memory for something else that causes a transient memory spike.

  • Yeah, so it's definitely a good strategy for a DF that supports old devices, if it has some kind of memory tense calculation.

    In my app I am more concerned about the prg size, because by moving the data to (uncompressed) strings and loading them in chunks on demand was good enough so that it runs even on the devices with only 64k app memory. But the 200kb prg size is a bit heavy.