Basic data compression to save watch memory use.

Good morning everyone, 

In one of my applications I have a very large array that is full of float values,  ( up to around 150 float values ), and I would like to convert each array float into a 4 digit  value ( always with two decimal places, and allways with value less that 99, and always an integer ), and then combine several of the array values into a single number. 

For example [1.2344667,24.565544,32.342354,77.3424234]   would become 0123245732347734

My array of 150 float values, would then become an array of 33 numbers..... and I could have lots of these arrays.

I need to retain accuracy with the number, so my question is how many of these 4 digit, non signed , non decimal  numbers should I be able to group and store as non signed integers?

I might have for example, 200 arrays each with 150 values, and the watch memory cannot cope with that number of vlaues.

  • since the numbers are < 40, so he decided that he can represent them by *100 (so he has 2 decimal digits precision after the dot) so each number is between 0 and 4000, so in 12 bits he can represent them, 12 x 5 = 60 < 64

  • Not in BCD you can't, 3999 would occupy 16 bits, 1000 would represent 16 bits etc.

  • yes, but he wrote about 12 bit "fixed point"

  • Ok, thanks for the advice.... I did filter the original array to remove the float values, and then convert all values into a positive 4  digit integer... are you suggesting that there are not much in terms of memory saving in this process of converting that into an array 25% of the size containing longs,  than leaving the array as floats 4 x the size?    

    There will be up to 300 arrays, each with up to 150 floats. I think that is too many objects and too much memory to store on the watch.Data is purely stored to be sent http post to a Web server database,  then not required on the watch after. The database can handle the parsing. 

    Based on the use case I described  , any further recommendation? And from experience roughly how much memory saving actually is there  % wise going from 8 digit float to 4 digit int?

    Nick

  • If you insist to keep your data in the memory, then keeping 300 comressed arrays of 150 items + the uncompressor code + 1 uncompressed array of 150 floats is probably better than keeping 300 arrays of 150 floats.

    If instead you keep your data on the disk (aka: resources), and only load the 1 array on-demand then it's even better for the memory, because you won't need to keep 299 unused arrays in the memory (even if it's compresed it's a lot) + the loading of a json array is 1 line of simple code vs complex, longer decompression code.

    Re: "saving % wise going from 8 digit float to 4 digit int": IMHO you're still confusing digits with bits.

  • Thanks - from what you are saying,  the best option for simplicity is that I format the data for insertion into a JSON array, which I am doing anyway by converting the float to a 4 digit integer and filtering out sub zero values, then everytime an event occurs I store the data in recourses.

    I have another consideration for that, I don't actually know how many fields the user needs within the 3D array and I am unfamiliar with the JSON array type. Does it need to be pre-initialized or can I add to it like with the monkey C array objects? 

    Also - given I need the array data to be sent via http, If I write to recourses, how long does that persist after my application is closed, ie how do I reset or remove the array next time the application is loaded? Manually or is it automatic?

    Thanks, 

  • Maybe I misunderstood the use-case. I thought the data is static and compiled into the app.
    If it's generated during the usage then you'll need to add the code size for "compression" (and serialization) as well. And then you will probably be able to store the data in Storage, not resources.

    You don't need to initialize the Json. It's basically a string that is loaded into an array (deserialization) or an array that is saved as a string (serialization).

    You'll need to have some management in the storage. For example you can have "numberOfArrays" saved in the storage, and then you have "arr_<i>" with the serialized json as value. Now you know that to read them you need to read "arr_0", ... "arr_<n>" (where n is numberOfArrays) and when you write you write to numberOfArrays+1 then save numberOfArrays+1 to numberOfArrays. When you delete you'd need to delete from the end. Of course you can fit this to your needs

  • Sure he can through extracting the binary representation bit packing into the long, But if you’re going to do that you may as well use the full 64 bits as the code complexity to do that is about the same. Otherwise there are much simpler code constructs that will get you a compression ratio within 10-15% or so. Of course there may be a much simpler solution to the OPs problem as we don’t know why they need to keep all this data in memory, instead of writing it to persisted storage as necessary, till it can be sent to their server.

  • Thanks, Ok  I have done some extensive testing this afternoon and have some amazing results, which probably can help any coder who is facing array memory issues.

    My testing methodology was to use a base array size 300, which contained 200 nulls, and almost 100 (6-7decimal floats), to see  if I can cram more information into the watch live ( data recording ), quite astonished with the results.

    1. Replicating the original, untouched array every 12 seconds, I was able to get only 12 copies (1200 values plus 2400 nulls) until out of memory

    • peak memory use was just over 500kB ( Using  fenix 7 pro which has over 700kB available ), base memory 108kB

    2. I then converted all the floats, to 4 digit Numbers, and tried replicating the array again.

    • Changing the data in the array from a float, to a 4 digit number had no change in memory use at all

    3. I then removed all the nulls, so now we have the array size decreased from 300, to about 100 and replicated this.  (2400 values)

    • I got to 24 replications  ( so about double ), for removing 66% of the array fields ( all the nulls )

    4. I then filtered the array values, to shrink the dataset further by only taking every 2nd to 3rd value. This left me with 34 values from each original 300 array, but the number of arrays is now increasing. ( More arrays, with smaller sizes)

    • Curiously.... I only got to 30 arrays before the out of memory crash. At this point I started to suspect that the mere fact of creating and having an array, regardless of size is causing a massive spike in the memory. The base memory useage when out of memory was still only around100kB, meaning it was a peak memory demand causing the crash.

    5. I then combined the 30 values into longs, thus massively decreasing the size of each dataset, right down to each dataset was now a total of 9 long entries, and I still only got 37 arrays before out of memory.

    • This confirmed my suspicion, that the number of arrays was actually using up all the memory, rather than the content of the arrays. Baseload memory use was only 102kB, but spiking over 500kB just prior to the out of memory. ( co-incidentally this still only represented data of about 1200 values, like way back in point 1, since I effectively got more capture events recorded by skipping every 2nd or 3rd dataset.
    • I suspect here, Monkey C is allocating extra memory for intended use with arrays above the amount needed for the initialized size of the array  ( to allow .add command )

    6. Bearing all this information in mind, I then realised I need as little number of arrays as possible, so decided to test out the maximum size of an array containing longs. I simply kept adding the longs onto the back of the same array , and periodically checked the size - with no clue how many I could stuff in there.

    • Peak memory useage was massively smaller only using 1 array, and appeared to be only 20kB above the baseload, where as adding arrays it was eventually 500kB above the baseload with far , far LESS information contained in it.
    • At 100 replication events stuffed into 1 array ( 1000 compressed values ) 1160 of 65,000 objects available, there was now an 8 second watch face freeze as the computations occurred, normally not noticable. Memory base 126, and peak 160kB at this point
    • 125 events, 10 second pause 
    • At 160 events stuffed into 1 array ( 1600 values )   14 second pause for computations, 1611 of 65,000 objects used, base memory 143, peak 191.
    • At 200 events ( replications)... ( 2000 long values entered ) 16 second pause. 150kB Base, 214kB peak use, 2000 objects out of 65,000 objects
    • At 250 events, 19 second pause   ( 170kB base, 245kB Peak )  2400 objects used
    • At 300 events, 25 second pause (3000 long values entered  and my initial goal achieved of storing 300 replications).,  184kB base, 282kB peak,   2800 objects ........ I stopped the test here. 

    7. I then speculated what is causing the time delays as I add more data into the array...... and thought what if it is due to adding the data at different times (   I believe arrays are stored in different memory areas re-sized, deleted and re-created, and if I am allways adding to one, then the storage might be getting fragmented, making access painfully slow. ) ........  which brings me to the next thought ......  pre-initializing  a massive array in 1 go, then modifying the data within it.

    • I tried the command from the monkey C guide to initialize a typed array... and it fails every time. What am I doing wrong, I even copied an pasted the exact code :
    • "var typedArray = new Array<Number>[3000];"

    1. I then created a pre-initialized array as per below, however because I could not make it a typed array because the commands are not working, I don't know if that is also having a bad effect. I noted straight away that there was about 180 kB Base load, Peak 300kB, and this time only a 1-2 second computational pause, when my app uses it's AI coding it is very computationally intensive so this is expected over the two seconds it does this. 
              massivearray = new[3000];
              for(var i=0; i<3000; i+=1){
                  massivearray[i] = 99999999999999999l;
              }
      Peak memory and Base memory was stable ( non changing at 188 and 300kB) with this pre-initialization, and the pause was allways consistant.
      I then tried the initialization with just the new[3000], and base use was 107, peak 140 with half a second pause, by 25 replications, the pause was 5 seconds, confirming my fears that new[size] and .add are basically doing exactly the same thing ( not allocating enough room for the array which has to re-size ) causing huge peak memory use.
      7.    In summary what I learnt regarding arrays and data storage in arrays - new[size] and .add for arrays have the same computational burden which is when they are added to, or have data actually inserted, the array is re-sized and moved to a different part of the memory and old data is deleted. This causes memory use spike and watch face pause in my case.  To avoid this, immediately after creating the array, fill it with the type of intended data, and it will be re-sized once, but then later when you change the data it does not get re-sized and moved again ( lower base overhead and no big spikes in memory ).    Less entries in an array are better than more, BUT less numbers of arrays is more important than the number of entries by far. I can have a max of 36 arrays with 10 entries, or 1 array with 3,000+ entries.  If someone can figure out if there even is a command to pre-initialize a typed array let me know as I cannot get it to work.
  • I then tried, an array of size 6,000 and computational pause was about double ( 4 seconds watch face freeze ), with 280kB base and 500kB peak ~ so approaching the limit for the watch.