1.3.0
What's Changed
- Add new step
CombineKeys
by @plaguss in #747 - Refactor naming columns steps combinecolumns combinekeys expandcolumns by @davidberenstein1957 in #758
- Drop remove deprecated
LoadHubDataset
by @davidberenstein1957 in #759 - Add
requirements
list forPipeline
by @plaguss in #720 - Add
StepResources
and step replicas inPipeline
by @gabrielmbmb in #750 - Add load stages by @gabrielmbmb in #760
- Update min required version to
python==3.9
by @gabrielmbmb in #770 - Optionally include the pipeline script in the hub when pushing your distiset by @plaguss in #762
- Add
docs-pr.yml
anddocs-pr-close.yml
workflows by @gabrielmbmb in #774 - Add
RayPipeline
class by @gabrielmbmb in #769 - Fixed closed PR workflow by @gabrielmbmb in #776
- Add
Magpie
andMagpieGenerator
tasks by @gabrielmbmb in #778 - Fix some issues related to
Magpie
task by @gabrielmbmb in #783 - Add
end_with_user
andinclude_system_prompt
flags toMagpie
tasks and handleNone
s. by @gabrielmbmb in #784 - Add workflow concurrency group for publishing docs by @gabrielmbmb in #796
- Add
_desired_num_gpus
attribute toCudaDevicePlacementMixin
by @gabrielmbmb in #795 - Compatibility with
vLLM
withtensor_parallel_size
argument by @gabrielmbmb in #805 - Update default names in
GroupColumns
by @plaguss in #808 - Request batches to
GeneratorStep
if only step in pipeline by @gabrielmbmb in #828 - Add default name for a pipeline by @plaguss in #809
- Update distilabel phrasing based on PR hugging face hub by @davidberenstein1957 in #821
- Some more
Magpie
improvements by @gabrielmbmb in #833 - Add
Embeddings
base class,SentenceTransformerEmbeddings
class,EmbeddingGeneration
andFaissNearestNeighbour
steps by @gabrielmbmb in #830 - Create file per hostname in
CudaDevicePlacementMixin
by @gabrielmbmb in #814 - Create a
GeneratorStep
from a dataset using a helper function by @plaguss in #812 - Do not take into account
disable_cuda_device_placement
for pipeline signature by @gabrielmbmb in #838 - Add
RewardModelScore
step by @gabrielmbmb in #840 - Fix
LoadDataFromHub
attribute_dataset
hadellipsis
by default instead ofNone
by @gabrielmbmb in #841 - Create
PlacementGroup
for steps usingvLLM
by @gabrielmbmb in #842 - Update
argilla
integration to useargilla_sdk
v2 by @alvarobartt in #705 - Make
overall-rating
the default aspect forUltraFeedback
task by @gabrielmbmb in #843 - fix typo index.md by @franperic in #844
- Use
CudaDevicePlacementMixin
inRewardModelScore
step by @gabrielmbmb in #845 - Gather GPUs per Ray node to create placement groups by @gabrielmbmb in #848
- Fix typo in docs by @plaguss in #850
- Add
xfail
routing batch function tests by @gabrielmbmb in #852 - Fix creating placement group when
pipeline_parallel_size>1
by @gabrielmbmb in #851 - docs: 846 docs include google analytics by @davidberenstein1957 in #847
- Add
ClientvLLM
class by @gabrielmbmb in #854 - Add hard-negative flag to include similar challenging negatives on triplets by @plaguss in #856
- Add bibtex references in the docstrings to be shown in the README by @plaguss in #855
- distilabel
1.3.0
by @gabrielmbmb in #857
New Contributors
- @franperic made their first contribution in #844
Full Changelog: 1.2.4...1.3.0