Skip to content

Commit

Permalink
[FEATURE] Include new GeneratorStep classes to load datasets from d…
Browse files Browse the repository at this point in the history
…ifferent formats (#691)

* Add a GeneratorStep to read files from disk as datasets

* Add tests for the new LoadFromDisk loader

* Refactor generator step classes to new naming

* Add deprecation warnings for previous loaders

* Add assertion to remind removing the deprecated classes

* Add docstrings for the new steps

* Apply comments from code review and update dataset info read using exposed function from datasets

* Fix dataloader tests with new class names

* Fix import tests
  • Loading branch information
plaguss authored Jun 7, 2024
1 parent 20aa24e commit 34ac772
Show file tree
Hide file tree
Showing 6 changed files with 446 additions and 62 deletions.
10 changes: 9 additions & 1 deletion src/distilabel/steps/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,12 @@
FormatTextGenerationSFT,
)
from distilabel.steps.generators.data import LoadDataFromDicts
from distilabel.steps.generators.huggingface import LoadHubDataset
from distilabel.steps.generators.huggingface import (
LoadDataFromDisk,
LoadDataFromFileSystem,
LoadDataFromHub,
LoadHubDataset,
)
from distilabel.steps.globals.huggingface import PushToHub
from distilabel.steps.keep import KeepColumns
from distilabel.steps.typing import GeneratorStepOutput, StepOutput
Expand All @@ -49,6 +54,9 @@
"GlobalStep",
"KeepColumns",
"LoadDataFromDicts",
"LoadDataFromDisk",
"LoadDataFromFileSystem",
"LoadDataFromHub",
"LoadHubDataset",
"PushToHub",
"Step",
Expand Down
Loading

0 comments on commit 34ac772

Please sign in to comment.