Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError when trying to run the script #1

Closed
finishthepint opened this issue Jan 9, 2025 · 10 comments
Closed

KeyError when trying to run the script #1

finishthepint opened this issue Jan 9, 2025 · 10 comments

Comments

@finishthepint
Copy link

I'm trying to run the script on my strava data export but running into the following error:
Traceback (most recent call last): File "/home/george/Downloads/strava-local-heatmap-tool-main/strava-local-heatmap-tool.py", line 760, in <module> activities_df = activities_import( ^^^^^^^^^^^^^^^^^^ File "/home/george/Downloads/strava-local-heatmap-tool-main/strava-local-heatmap-tool.py", line 336, in activities_import activities_coordinates_df = activities_coordinates_import( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/george/Downloads/strava-local-heatmap-tool-main/strava-local-heatmap-tool.py", line 198, in activities_coordinates_import activities_coordinates_df = activities_coordinates_df[activities_coordinates_df['latitude'].notna()] ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^ File "/home/george/.local/lib/python3.11/site-packages/pandas/core/frame.py", line 4102, in __getitem__ indexer = self.columns.get_loc(key) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/george/.local/lib/python3.11/site-packages/pandas/core/indexes/range.py", line 417, in get_loc raise KeyError(key) KeyError: 'latitude'

@roboes
Copy link
Owner

roboes commented Jan 9, 2025

Hi @finishthepint, thank you for bringing this issue.

It appears that one of the underlying packages used in this project, sweatpy, is no longer maintained and is no longer functioning as expected.

To resolve this, I’ve just released a major update to the project. I’ve replaced the sweatpy package with three custom functions: fit_file_parse(), gpx_file_parse() and tcx_file_parse(). These functions now handle the import and parsing of .fit/.gpx/.tcx files into a DataFrame.

Additionally, I've updated the documentation and added a new "Code Workflow Example" section, including example code to demonstrate the process.

I've tested the updated code with a recent Strava export using Python 3.13.1, and everything is working as expected.

@roboes roboes closed this as completed Jan 9, 2025
@anisart
Copy link

anisart commented Jan 24, 2025

It's reproducible on last commit too. Python 3.13.1, Windows
Traceback (most recent call last): File "C:\Users\anisart\gh\strava-local-heatmap-tool\strava-local-heatmap-tool.py", line 610, in <module> activities_df, activities_coordinates_df = activities_import( ~~~~~~~~~~~~~~~~~^ activities_directory=os.path.join(os.path.expanduser('~'), 'Downloads', 'strava_export_4753179', 'activities'), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ activities_file=os.path.join(os.path.expanduser('~'), 'Downloads', 'strava_export_4753179', 'activities.csv'), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ skip_geolocation=True, ^^^^^^^^^^^^^^^^^^^^^^ ) ^ File "C:\Users\anisart\gh\strava-local-heatmap-tool\strava-local-heatmap-tool.py", line 314, in activities_import activities_coordinates_df = activities_coordinates_import(activities_directory=activities_directory) File "C:\Users\anisart\gh\strava-local-heatmap-tool\strava-local-heatmap-tool.py", line 196, in activities_coordinates_import activities_coordinates_df = activities_coordinates_df[activities_coordinates_df['latitude'].notna()] ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^ File "C:\Users\anisart\python-strava-env\Lib\site-packages\pandas\core\frame.py", line 4102, in __getitem__ indexer = self.columns.get_loc(key) File "C:\Users\anisart\python-strava-env\Lib\site-packages\pandas\core\indexes\range.py", line 417, in get_loc raise KeyError(key) KeyError: 'latitude'

PS: Seems like you made a typo in year at line 2 of script - # Last update: 2024-01-10

@roboes
Copy link
Owner

roboes commented Jan 24, 2025

Hi @anisart,

Thank you for reporting this! It seems the error is due to the activities_coordinates_df DataFrame missing the latitude column. This typically happens if none of your activities include GPS data (latitude/longitude), or if the parsing functions - fit_file_parse(), gpx_file_parse(), or tcx_file_parse() - did not extract the GPS data correctly.

To troubleshoot, I suggest running these parsing functions independently on your activity files (depending on their format) to verify if they are correctly extracting latitude and longitude.

I've also made some adjustments to the code to better handle the scenario where no latitude/longitude is available. Additionally, thanks for pointing out the typo in the year!

@anisart
Copy link

anisart commented Jan 24, 2025

Thank you for the quick reply! Last change also didn't help me. Exception just transformed to error about no activities without GPS data)

After some debugging I found that code stops at fit_file_parse()

KeyError                                  Traceback (most recent call last)
Cell In[34], line 1
----> 1 df = fit_file_parse(file_path=activities_files[0])

File ~\gh\strava-local-heatmap-tool\strava-local-heatmap-tool.py:102, in fit_file_parse(file_path)
     98     pass
    100 df = pd.DataFrame(data=parsed_data, index=None, dtype=None)
--> 102 print(f'fit_file_parse datetime type: {df["datetime"].dtype}')
    103 return df

KeyError: '__builtins__' 

Fixed by removing line 13 - globals().clear()

gpx_file_parse() works OK and now code stops at tcx_file_parse()

KeyError                                  Traceback (most recent call last)
File ~\python-strava-env\Lib\site-packages\pandas\core\indexes\base.py:3805, in Index.get_loc(self, key)
   3804 try:
-> 3805     return self._engine.get_loc(casted_key)
   3806 except KeyError as err:

File index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()

File index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()

File pandas\\_libs\\hashtable_class_helper.pxi:7081, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas\\_libs\\hashtable_class_helper.pxi:7089, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'filename'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[56], line 1
----> 1 activities_df, activities_coordinates_df = activities_import(activities_directory=activities_directory, activities_file=activities_file, skip_geolocation=True)

# skipped many lines

File ~\python-strava-env\Lib\site-packages\pandas\core\indexes\base.py:3812, in Index.get_loc(self, key)
   3807     if isinstance(casted_key, slice) or (
   3808         isinstance(casted_key, abc.Iterable)
   3809         and any(isinstance(x, slice) for x in casted_key)
   3810     ):
   3811         raise InvalidIndexError(key)
-> 3812     raise KeyError(key) from err
   3813 except TypeError:
   3814     # If we have a listlike key, _check_indexing_error will raise
   3815     #  InvalidIndexError. Otherwise we fall through and re-raise
   3816     #  the TypeError.
   3817     self._check_indexing_error(key)

KeyError: 'filename'

Debugging to be continue...

@anisart
Copy link

anisart commented Jan 24, 2025

In tcx_file_parse() need to add check for coordinates. It brokes on activities from trainer without coordinates

@roboes
Copy link
Owner

roboes commented Jan 25, 2025

Hi @anisart,

Thank you for the detailed feedback!

I have updated the code once again, merging the previously created fit_file_parse(), gpx_file_parse(), and tcx_file_parse() functions into a single function: activity_file_parse(). Some checks are now performed before parsing these files.

Please update the code from your side accordingly. Additionally, please note that there is now a new Python package dependency: pyjanitor.

While importing .fit files, you might encounter a deprecation warning related to datetime.datetime.utcfromtimestamp(). I have submitted a pull request to the fitparse repository to address this by using the new datetime.datetime.fromtimestamp() method.

Let me know if you need any further changes or clarifications.

@anisart
Copy link

anisart commented Jan 28, 2025

I've finally created a heatmap. It was a quite long time for 2020 activities and consumed 2GB RAM at peak.

You forgot to add 'import janitor' to strava-local-heatmap-tool.py. And need to update requirements.txt with last changes.

One of the difficulties was that I wanted to process the old archive (2019) first. But Strava constantly changes structure of activities.csv. Newer archives already use localized values ​​for fields, types and dates (russian in my case). I'v copied english header from another activities.csv (with english locale) and used dateparser module for parse dates.

It might be useful to include this in the readme - switch Strava interface language to English before requesting an archive.

@roboes
Copy link
Owner

roboes commented Jan 28, 2025

You forgot to add 'import janitor' to strava-local-heatmap-tool.py. And need to update requirements.txt with last changes.

It seems that one of my .pre-commit-config.yaml routines automatically removed janitor from both strava-local-heatmap-tool.py and requirements.txt, which caused it to be missing. Thank you for pointing that out - I've made a small update to the code and now import the clean_names() function using from janitor import clean_names thus correcting this issue.

It might be useful to include this in the readme - switch Strava interface language to English before requesting an archive.

The language requirement (English (US)) was already stated in the readme file.

You can avoid geolocation retrieval for activities in the activities_import() function by setting skip_geolocation to True to make the import faster, if you don't require any special location filter in the activities_filter() function. I have changed the default value to True in the latest code update.

@anisart
Copy link

anisart commented Jan 31, 2025

The language requirement (English (US)) was already stated in the readme file.

Sorry, I missed it.

You can avoid geolocation retrieval for activities in the activities_import() function by setting skip_geolocation to True to make the import faster, if you don't require any special location filter in the activities_filter() function.

I set it to True manualy. Maybe it can be processed in multi-thread...

Anyway thank you for this tool!

@roboes
Copy link
Owner

roboes commented Jan 31, 2025

Hi @anisart,

Sorry, I missed it.

No worries.

I set it to True manualy. Maybe it can be processed in multi-thread...

In the readme file I explained why skip_geolocation is slow:

Note that geolocation retrieval relies on the public Nominatim instance (nominatim.openstreetmap.org), which may slow down the import process for exports containing a large number of activities (with "an absolute maximum of 1 request per second").

The code that enforces this limit is: reverse = RateLimiter(func=geolocator.reverse, min_delay_seconds=1). This means that multi-threading won't speed up the process. However, you could experiment with removing the rate limiter or switching to another geopy.geocoders provider. Most alternatives, such as the Google Maps Geocoding API, require payment.

Anyway thank you for this tool!

Thanks for your feedback - I'm glad you found it useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants