Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: 810 feature add huggingface hubutilstelemetry #916

Conversation

davidberenstein1957
Copy link
Member

@davidberenstein1957 davidberenstein1957 commented Aug 20, 2024

Closes #810

  • add documentation
  • capture additional exceptions ( during run method of pipeline for example? @plaguss )

Copy link

Documentation for this PR has been built. You can view it at: https://distilabel.argilla.io/pr-916/

Copy link

codspeed-hq bot commented Aug 20, 2024

CodSpeed Performance Report

Merging #916 will not alter performance

Comparing feat/810-feature-add-huggingface_hubutilstelemetry (79bdd4c) with develop (46d55ed)

Summary

✅ 1 untouched benchmarks

@davidberenstein1957 davidberenstein1957 marked this pull request as ready for review August 20, 2024 12:30
Copy link
Contributor

@plaguss plaguss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding the calls to add_step and add_edge, we could send the information of pipeline.dump() on BasePipeline.run so we have all the relevant information we already use for serialization/caching? It would be a single call containing all the information from the pipeline.

I think it could be useful tracking the exceptions raised from the pipeline as you mention, maybe in the _StepWrapper would be easier, as that's the container in charge of running the steps on each process.

src/distilabel/utils/telemetry.py Outdated Show resolved Hide resolved
tests/integration/test_telemetry.py Outdated Show resolved Hide resolved
tests/integration/test_telemetry.py Outdated Show resolved Hide resolved
@plaguss plaguss requested a review from gabrielmbmb August 20, 2024 13:32
@davidberenstein1957
Copy link
Member Author

@plaguss

Logging on dump: The pipeline dump would be an option but the underlying way to write to HF telemetry would be less convenient I think, also because we would eventually still need to write custom code to transform and get the info in the right way.

Logging exceptions: What do you see as interesting usage exceptions? I assumed it would be more high-level usage, like DAG-validation, param passing errors etc. instead of the low-level exceptions within the StepsWrapper.

@plaguss
Copy link
Contributor

plaguss commented Aug 20, 2024

Not sure how are we going to access the data afterwards to generate the data, I guess it depends, but accessing the dump of the pipeline and cleaning the content we don't want from there I think it's easier, at least from the point of view of distilabel, not sure if this is the best way of accessing tracking the telemetry data.

For the exceptions I'm really not sure of what should we track, I guess everything that it's a user error, so you are right not tracking at the _StepWrapper level.

@davidberenstein1957
Copy link
Member Author

@plaguss, I will do the pipeline thingies based on the dump and make individual calls from the telemetry clients instead.

@gabrielmbmb any specific exceptions you would like to see captured? I capture the RuntimeErrors but perhaps some users errors would be interesting, not sure what we can capture cleanly without filling the code up with try-excepts.

Also, we can start with this and consider adding more at a later stage, in case we feel miss anything.

@davidberenstein1957
Copy link
Member Author

davidberenstein1957 commented Aug 22, 2024

@plaguss, I will do the pipeline thingies based on the dump and make individual calls from the telemetry clients instead.

@gabrielmbmb any specific exceptions you would like to see captured? I capture the RuntimeErrors but perhaps some users errors would be interesting, not sure what we can capture cleanly without filling the code up with try-excepts.

Also, we can start with this and consider adding more at a later stage, in case we feel miss anything.

@plaguss, I updated the code. Decided to leave the batching within the run because I think it is good to know how much data is being generated too and we lost some tracking differentiation for generator, normal and global steps but if we need it wee would be able to back-track.

@davidberenstein1957 davidberenstein1957 changed the title Feat/810 feature add huggingface hubutilstelemetry feat: 810 feature add huggingface hubutilstelemetry Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants