Corruption on 2nd Decode after an encode, C#

2015472 9 months ago

using Dynastream.Fit;
namespace BugTest {
    class Program {
        static List<Mesg>? _allMessages = null;
        static void Main(string[] args) {
            for (int i = 0; i < 2; i++) {
                _allMessages = new List<Mesg> { };
                FileStream fitSource = new FileStream("test.fit", FileMode.Open);
                Decode decoder = new Decode();
                decoder.MesgEvent += OnMesgCustom;
                decoder.Read(fitSource);
                fitSource.Close();
                Console.WriteLine("Decoded pass " + (i + 1));

                Encode encoder = new Encode(ProtocolVersion.V20);
                FileStream fitDest = new FileStream("test-out.fit", FileMode.Create, FileAccess.ReadWrite, FileShare.Read);
                encoder.Open(fitDest);
                encoder.Write(_allMessages);
                encoder.Close();
                fitDest.Close();
                Console.WriteLine("Encoded pass " + (i + 1));
            }
        }

        private static void OnMesgCustom(object sender, MesgEventArgs e) {
            if (e.mesg != null) {
                _allMessages.Add(new Mesg(e.mesg));
            }
        }
    }
}

Limiting it to one pass produces a good test-out.fit On the second pass, I get corrupt fields during decode (observed after the decode in debugger in this minimal code, or by a printing routine in an expanded version). The file to be decoded has not been touched, and a "new" decoder is constructed for the second pass.

Specifically the corrupt fields occur at least mostly in Record messages. They appear to be fields that are not used for the first several message records, like latitude and longitude before gps is working, but are used in later messages. However on the second pass these fields are produced even for the earlier messages. A couple are produced twice even (Same Field.Num, with the second copy seeming to have two subfields, or maybe just multiple values, I forgot which). These extraneous fields have values have max integer values such as 0x7FFFFFFF 0xFFFFFFFF and 0xFFFF. The final encoded files cause errors on most reading tools such as gpxsee (undefined data message). The bug does not occur if decoding twice and then encoding once.

I was able to fix this by adding a simple cleanup function after the decode that searches _allMessages for fields with the bad values, and removes the fields (Mesg.RemoveField()). The resulting messages all look like they did in the first pass decode, and the encoded file is then fine. It's clear that these fields should not be there on a 2nd pass of decode. It's not clear why they create "undefined data message" when encoded and read with gpxsee or why removing them fixes that. They may be malformed in other ways.

This is with C# 12 and .NET core SDK 8.0.something (latest). This could be a factor as the code was targeted to and I guess originally tested with a much older build system.

(Edit, I found the edit, and preview, both a little hidden.)

2015472 9 months ago

Using Fit.BaseType[].size and Fit.BaseType[].isSigned to construct the max byte values to compare against works perfectly for me to find and remove all incorrectly read fields. Of course it removes these potentially valid values from the data too, but I can live with that.

To reiterate, the corruption here is not in a file that I wrote. The file is only being read twice. The corruption occurs on the 2nd read. Expected behavior is that reading a file twice generates identical data. The bug also occurs though when reading many files (if encoding to other files in between). It's just that reading the same file twice is more convincing evidence of the problem.

2015472 9 months ago

No thoughts on this? Did I do something dumb here? It's weird, because this would seem to require some static state in Decode(). A little rummaging through the decompile output didn't show any obvious signs of such, but I didn't follow every level of inheritance and composition in it.