I have found that a hundred or so activities have ended up with duplicates (named "Untitled", with activity type of "Uncategorized") since the start of the year. I've read the posts where activities with map data are mentioned, but these aren't like that ; they differ in minor details like some cadence values (or some speed values). eg.
303c303
< <Cadence>75</Cadence>
---
> <Cadence>76</Cadence>
321c321
< <Cadence>75</Cadence>
---
> <Cadence>76</Cadence>
1436,1438c1436,1438
< <Name>EDGE705 JHL SYSBLD Jul 12 2007 16:11:30 1.23</Name>
< <UnitId>0</UnitId>
< <ProductID>1</ProductID>
---
> <Name>Garmin Edge 705</Name>
> <UnitId>3415200069</UnitId>
> <ProductID>625</ProductID>
I don't really care at present how the duplication occurred. My problem is more that detecting and deleting them is a pain as I can't search for Untitled or Uncategorized in my activities list. Instead I've had to resort to screen-scraping and build some scripts (along with some manual massaging of data) to download all the activities and grep for activities listing the same time.
Since GC staff appear to feel that allowing us to search for any (including Untitled) activities violates some special understanding they have of the way searching should work (and good riddance to decades of search tools that can show all data), perhaps they could expose some functionality to show which activities duplicate other ones - since they obviously can't (or don't care to?) prevent duplicates from being created. That way users can more easily correct problems caused by faulty GC software.
Even better, perhaps they would consider following Google's Data Liberation lead and allow a mass export of a user's data so that we can have a backup in the case that GC has a catastrophic failure, which is starting to feel more and more likely to me.