First list of FIT decoding libraries and basic benchmarking

Hi,

Just wanted to share, I started to put together a list of libraries that decode fit files and a simple benchmarking of the performance of each as for my own application performance is an issue.

Hopefully can be useful to others as reference, feel free to add other libraries or suggest other to look at.

https://github.com/roznet/fit-benchmarks

Top Replies

bricerosenzweig over 4 years ago in reply to jlar +1

Hi,

I have now updated my parser to contain a generic parsing mode that processes all the message, including the unknown one dynamically, I can now see the gps points from your VIRB file. I updated FitFileExplorer…

All Replies

jlar over 4 years ago

Interesting, thanks for sharing!

Would you happen to know what device generated the FIT-files in your repo? I ended up building a somewhat naive parser from scratch in Rust for a command-line tool I'm developing, since I couldn't find what I needed when starting out. So far it works for most FIT-files I've encountered including, all test-files, many others and my main data: large VIRB action camera FIT-files for work (largest is roughly 19MB, parsed in <1s on an old dual-core 13" MBP). However, both sample.fit and large.fit return UTF-8 errors. Could possibly be in a "device_settings" message (global id 2). The string begins with "Dark" at byte offset 1590 in large.fit. FitCSVTool also returns an invalid string for this position.

If I return a valid dummy string for these cases I only get to somewhere after offset 4799, where I get a byte parse error for a numerical value - this one I'll have to look into a bit more, though. I don't yet support compressed timestamp headers for example, in case these are present.

The FIT-parsing bit of my tool is unfortunately not available as a separate library at this point. It's isolated enough to become one, but I need to sharpen my skills a bit more before I can confidently release a library. Naively coded, no tests, not documented etc etc. :)

bricerosenzweig over 4 years ago in reply to jlar

Hi,

Interesting and a bit surprising.

The fit file was generated from my watch a Fenix 6X. it was downloaded from garmin connect, and had no issue there, it was also sync'd to strava without issue, and I could open the file with multiple tools and the different libraries included in the benchmarks repo which were all afaik independently developed...

Is it possible it had some issue when your downloaded it? maybe some kind of git hook? Do you have the same issue if you download the file independently directly from GitHub: https://github.com/roznet/fit-benchmarks/blob/main/sample.fit

jlar over 4 years ago in reply to bricerosenzweig

Thanks.

I'm sure those libraries/tools you use have much better error mitigation than I have currently implemented. The string in question seems a bit odd, though, and it's reported correctly as invalid utf-8 by my code from what I can tell, but I'll inspect the string further. For large.fit, check line 27 in the csv generated by FitCSVTool. Look for a string starting with "Dark".

No difference between the direct download link and downloading the whole repo via github's zipped link.

For the other error I get, it's probably something I have yet to implement. For large.fit, it seems to occur just *after* a definition that corresponds to line 61 in large.csv. Anyway, I'll take a look.

While the string still looks odd to me, since your files are from a Garmin product they are indeed a bit of a golden sample in terms of what kind of input data we can expect. :)

bricerosenzweig over 4 years ago in reply to jlar

This string "Dark" seem to correspond to an undocumented field_num=167 reported of base_type 0x07=FIT_STRING (null separated string) and of field size 64. Actually just after the "Dark" it's a null, so you should ignore everything after that (definitely not valid utf). The libraries I use are skipping these 64 bytes all together because the field_num is not recognised/unknown as a field, but it's really just the string "Dark" I think.

jlar over 4 years ago in reply to bricerosenzweig

Aha, that's good info thanks! My parser probably tried to use the entire reported field length of 64 THEN trim nulls. when the string should really have been truncated at the first encountered null. I seem to have misinterpreted the SDK docs. Yes, seems like it should just say "Dark" (whatever it means :)) - I see the null you mention.

EDIT: Yes, that worked for now! Now for the other error.

jlar over 4 years ago in reply to jlar

I found the issue and your files now [seem] to parse fine, in full.

I accidentally stored and overwrote the developer definitions similar to how one would store and overwrite the corresponding definition for "local message type" for normal definitions. So I only ever had one "active" definition for developer data. This is a summary I get for large.fit:

 Global ID | Message type                 | Count
...................................................
        20 | record                       |  42709
        23 | device_info                  |     26
        21 | event                        |      7
        22 | UNDEFINED_MESSAGE_TYPE_22    |    149
       325 | UNDEFINED_MESSAGE_TYPE_325   |   1254
       327 | UNDEFINED_MESSAGE_TYPE_327   |      1
       104 | UNDEFINED_MESSAGE_TYPE_104   |    144
       206 | field_description            |      9
        19 | lap                          |     53
       216 | UNDEFINED_MESSAGE_TYPE_216   |     54
       140 | UNDEFINED_MESSAGE_TYPE_140   |      1
        49 | file_creator                 |      1
        34 | activity                     |      1
        12 | sport                        |      1
       141 | UNDEFINED_MESSAGE_TYPE_141   |      1
       113 | UNDEFINED_MESSAGE_TYPE_113   |      7
       160 | gps_metadata                 |  43247 *
       326 | UNDEFINED_MESSAGE_TYPE_326   |     74
        18 | session                      |      1
         3 | user_profile                 |      1
         7 | zones_target                 |      1
         2 | device_settings              |      1
       288 | UNDEFINED_MESSAGE_TYPE_288   |     22
        13 | UNDEFINED_MESSAGE_TYPE_13    |      1
       207 | developer_data_id            |      2
        79 | UNDEFINED_MESSAGE_TYPE_79    |      1
         0 | file_id                      |      1
       147 | UNDEFINED_MESSAGE_TYPE_147   |      8
...................................................
                                    Total:   87778 
---------------------------------------------------

The "UNDEFINED_MESSAGE_TYPE..." are simply ones that do not have a corresponding definition in Profile.xlsx (or one I did not import, or it did not exist when I imported the descriptions into my code - done manually right now :D)

Thank you for checking your data for me. Managed to [at least seemingly] solve two critical issues due to this thread and your data.

bricerosenzweig over 4 years ago in reply to jlar

great

rinserepeat over 4 years ago in reply to jlar

performance tip to PHP devs and maybe some other languages, the 25+ lines of code you just wrote to handle signed types 131 sint16, 133 sint32 and 142 sint64 since they appear to not be handled natively are completely unnecessary

first unpack them as unsigned as the appropriate little/big endianness using the native formats

then a trick is to pack and unpack them again, tada they are signed properly

$signed=array(131=>'sS',133=>'lL',142=>'qQ');  // define this BEFORE any loops
if (isset($signed[$type])) {  // inside the loop
    $result=unpack($signed[$type][0],pack($signed[$type][1],$result]))[1];
}

tada it's now properly signed and I would be shocked if this isn't radically faster than oodles of long if/then code blocks and bit/byte shifting

note your local cpu/os type doesn't matter for the second pack/unpack process, it will just work since it's a matched pair

rinserepeat over 4 years ago

Side-challenge for your benchmark, scanning all the records recursively for fastest known distances

This what your large.fit looks like in my current dev tool

It takes less than a second for my code to parse the FIT even on my "ancient" 3ghz machine in single-threaded CLI php but another seven seconds to scan all those distances, unfortunately your ultra is "boring" (j/k) in that all your fastest distances start at zero since you are reserving energy for the TWELVE hour run? (wow)

There are like 42,000 records in there so you had 1-second recording turned on.

If you ever want to test your code/libraries on some interesting FIT files I highly recommend the files that DCRAINMAKER publishes on his website when he tests various new watches, the ones in the mountains with the Fenix 6X Pro Solar, etc. are fascinating

I still have to finish the code to populate strings and completely handle dev global/local types. Rainmaker has some interesting fit files with "enhanced" altitude, HRV and various other devices that inject data into the fit but they don't get attached directly to each record so they have to be further parsed which will slow things down.

Eventually I want to take a shot at creating strava-like segments, scanning thousands of lat/lon points for similar matches seems like it would be tricky to do quickly, I suspect they use advanced math to see where routes would cross.

sample.fit is a little more interesting with fastest distances starting later in the run but the entire parse and processing happens in less than a second on my machine even with other processes going on (interesting there is no metmax in this file)

Ben FIT over 4 years ago in reply to rinserepeat

The altitude field is a uint16 and has enough range to go from Death Valley to the top of Everest. Whereas, the enhanced altitude field is a uint32. The data size is the only difference, there is no enhancement of the values. Most devices write either altitude or enhanced_altitude to the file. The FIT SDK uses component expansion to copy the altitude field into the enhanced altitude field, so you only need to look at the one field. If you use the FIT CSV Tool to convert a file that only contains the altitude field, in the output csv file you will see both the altitude field and the enhanced altitude field; and the two values will be identical. The same goes for speed and enhanced_speed, and any other field prefixed with "enhanced_"