feat: 810 feature add huggingface hubutilstelemetry #916

davidberenstein1957 · 2024-08-20T10:37:33Z

Closes #810

add documentation
capture additional exceptions ( during run method of pipeline for example? @plaguss )

github-actions · 2024-08-20T10:39:08Z

Documentation for this PR has been built. You can view it at: https://distilabel.argilla.io/pr-916/

codspeed-hq · 2024-08-20T10:41:48Z

CodSpeed Performance Report

Merging #916 will not alter performance

_{Comparing feat/810-feature-add-huggingface_hubutilstelemetry (79bdd4c) with develop (46d55ed)}

Summary

✅ 1 untouched benchmarks

plaguss

Instead of adding the calls to add_step and add_edge, we could send the information of pipeline.dump() on BasePipeline.run so we have all the relevant information we already use for serialization/caching? It would be a single call containing all the information from the pipeline.

I think it could be useful tracking the exceptions raised from the pipeline as you mention, maybe in the _StepWrapper would be easier, as that's the container in charge of running the steps on each process.

src/distilabel/utils/telemetry.py

tests/integration/test_telemetry.py

Co-authored-by: Agus <[email protected]>

davidberenstein1957 · 2024-08-20T14:30:22Z

@plaguss

Logging on dump: The pipeline dump would be an option but the underlying way to write to HF telemetry would be less convenient I think, also because we would eventually still need to write custom code to transform and get the info in the right way.

Logging exceptions: What do you see as interesting usage exceptions? I assumed it would be more high-level usage, like DAG-validation, param passing errors etc. instead of the low-level exceptions within the StepsWrapper.

plaguss · 2024-08-20T17:58:46Z

Not sure how are we going to access the data afterwards to generate the data, I guess it depends, but accessing the dump of the pipeline and cleaning the content we don't want from there I think it's easier, at least from the point of view of distilabel, not sure if this is the best way of accessing tracking the telemetry data.

For the exceptions I'm really not sure of what should we track, I guess everything that it's a user error, so you are right not tracking at the _StepWrapper level.

davidberenstein1957 · 2024-08-22T07:19:08Z

@plaguss, I will do the pipeline thingies based on the dump and make individual calls from the telemetry clients instead.

@gabrielmbmb any specific exceptions you would like to see captured? I capture the RuntimeErrors but perhaps some users errors would be interesting, not sure what we can capture cleanly without filling the code up with try-excepts.

Also, we can start with this and consider adding more at a later stage, in case we feel miss anything.

davidberenstein1957 · 2024-08-22T09:58:50Z

@plaguss, I will do the pipeline thingies based on the dump and make individual calls from the telemetry clients instead.

@gabrielmbmb any specific exceptions you would like to see captured? I capture the RuntimeErrors but perhaps some users errors would be interesting, not sure what we can capture cleanly without filling the code up with try-excepts.

Also, we can start with this and consider adding more at a later stage, in case we feel miss anything.

@plaguss, I updated the code. Decided to leave the batching within the run because I think it is good to know how much data is being generated too and we lost some tracking differentiation for generator, normal and global steps but if we need it wee would be able to back-track.

davidberenstein1957 added 2 commits August 20, 2024 11:56

Add telemetry support for distilabel

0d3dfe8

Add exception tracking

f4a48db

davidberenstein1957 added 3 commits August 20, 2024 13:02

Add telemetry integration tests

08091bf

Fix linting

ecae1f5

Fix tests

1f81ad3

davidberenstein1957 marked this pull request as ready for review August 20, 2024 12:30

davidberenstein1957 requested a review from plaguss August 20, 2024 12:31

plaguss requested changes Aug 20, 2024

View reviewed changes

src/distilabel/utils/telemetry.py Outdated Show resolved Hide resolved

tests/integration/test_telemetry.py Outdated Show resolved Hide resolved

tests/integration/test_telemetry.py Outdated Show resolved Hide resolved

plaguss requested a review from gabrielmbmb August 20, 2024 13:32

Apply suggestions from code review

5164ab6

Co-authored-by: Agus <[email protected]>

fix: linting

3a6e3e6

update telemetry to be rendered from dump

c6d469a

davidberenstein1957 requested a review from plaguss August 22, 2024 09:57

Add documentation on telemetry

79bdd4c

davidberenstein1957 changed the title ~~Feat/810 feature add huggingface hubutilstelemetry~~ feat: 810 feature add huggingface hubutilstelemetry Aug 22, 2024

davidberenstein1957 closed this Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: 810 feature add huggingface hubutilstelemetry #916

feat: 810 feature add huggingface hubutilstelemetry #916

davidberenstein1957 commented Aug 20, 2024 •

edited

Loading

github-actions bot commented Aug 20, 2024

codspeed-hq bot commented Aug 20, 2024 •

edited

Loading

plaguss left a comment

davidberenstein1957 commented Aug 20, 2024

plaguss commented Aug 20, 2024

davidberenstein1957 commented Aug 22, 2024

davidberenstein1957 commented Aug 22, 2024 •

edited

Loading

feat: 810 feature add huggingface hubutilstelemetry #916

feat: 810 feature add huggingface hubutilstelemetry #916

Conversation

davidberenstein1957 commented Aug 20, 2024 • edited Loading

github-actions bot commented Aug 20, 2024

codspeed-hq bot commented Aug 20, 2024 • edited Loading

CodSpeed Performance Report

Merging #916 will not alter performance

Summary

plaguss left a comment

Choose a reason for hiding this comment

davidberenstein1957 commented Aug 20, 2024

plaguss commented Aug 20, 2024

davidberenstein1957 commented Aug 22, 2024

davidberenstein1957 commented Aug 22, 2024 • edited Loading

davidberenstein1957 commented Aug 20, 2024 •

edited

Loading

codspeed-hq bot commented Aug 20, 2024 •

edited

Loading

davidberenstein1957 commented Aug 22, 2024 •

edited

Loading