-
Notifications
You must be signed in to change notification settings - Fork 145
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update docs document phrasing and funnel (#718)
* Update mkdocs version * Update align documentation with argilla SDK 2.0 * Updated naming of basices Moved CLI to advanced * Delete unneeded index pages * Update naming * Update navigation and content edit * Update naming of How to guides * Add popular issue and community page * Update GITHUB_ACCESS_TOKEN to GH_ACCESS_TOKEN due to protected naming * Update scoped reqs for token * Add GH_ACCESS_TOKEN to workflow * Delete literate nav * Update jinja templates to hide unrendered navigation * Update navigation orderin for API reference * Update docs/sections/how_to_guides/advanced/structured_generation.md Co-authored-by: Alvaro Bartolome <[email protected]> * docs: prose in guides (#721) * docs: make argilla prose talk about argilla * docs: simplify prose in generator and global steps * Update LLM page * Update LLM docs * Update Pipeline docs * Avoid using "function" * Update Step documentation * Update docs/sections/how_to_guides/basic/step/index.md Co-authored-by: Alvaro Bartolome <[email protected]> * Update docs/sections/how_to_guides/basic/step/index.md Co-authored-by: Alvaro Bartolome <[email protected]> * Update `Task` page * Update definiton of `GeneratorTask` * Update Step documentation * Update advanced documentation --------- Co-authored-by: davidberenstein1957 <[email protected]> Co-authored-by: Agus <[email protected]> Co-authored-by: Alvaro Bartolome <[email protected]> * Make `GH_ACCESS_TOKEN` optional * Add `pandas>=2.0` to `docs` * Fix typo * Update default signature GeneratorStep * Update missing `mkdocs_autorefs` within API reference * Update API page * Update CHATML_TEMPLATE formatting to avoid autodoc issues * Add reference to token scopes required --------- Co-authored-by: Alvaro Bartolome <[email protected]> Co-authored-by: burtenshaw <[email protected]> Co-authored-by: Agus <[email protected]> Co-authored-by: Gabriel Martín Blázquez <[email protected]>
- Loading branch information
1 parent
2f245c6
commit ee573fb
Showing
62 changed files
with
1,063 additions
and
859 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
# Distiset | ||
|
||
This section contains the API reference for the Distiset. For more information on how to use the CLI, see [Tutorial - CLI](../sections/how_to_guides/advanced/distiset.md). | ||
|
||
:::distilabel.distiset.Distiset | ||
:::distilabel.distiset.create_distiset |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# CohereLLM | ||
|
||
::: distilabel.llms.cohere |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
# Pipeline | ||
|
||
This section contains the API reference for the `distilabel` pipelines. For an example on how to use the pipelines, see the [Tutorial - Pipeline](../../sections/learn/tutorial/pipeline/index.md). | ||
This section contains the API reference for the `distilabel` pipelines. For an example on how to use the pipelines, see the [Tutorial - Pipeline](../../sections/how_to_guides/basic/pipeline/index.md). | ||
|
||
::: distilabel.pipeline.base | ||
::: distilabel.pipeline.local |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
# Extra | ||
|
||
::: distilabel.steps.generators.data | ||
::: distilabel.steps.deita | ||
::: distilabel.steps.formatting | ||
::: distilabel.steps.typing |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Hugging Face | ||
|
||
This section contains the existing steps integrated with `Hugging Face` so as to easily push the generated datasets to Hugging Face. | ||
|
||
::: distilabel.steps.LoadDataFromDisk | ||
::: distilabel.steps.LoadDataFromFileSystem | ||
::: distilabel.steps.LoadDataFromHub |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# Task Typing | ||
|
||
This section contains typing classes implemented in distilabel. | ||
|
||
::: distilabel.steps.tasks.typing.ChatType | ||
options: | ||
members: | ||
- _ChatType | ||
- ChatType | ||
::: distilabel.steps.tasks.structured_outputs.outlines.StructuredOutputType | ||
::: distilabel.steps.tasks.structured_outputs.instructor.InstructorStructuredOutputType |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes
File renamed without changes
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,180 @@ | ||
# Copyright 2023-present, Argilla, Inc. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import os | ||
from datetime import datetime | ||
from typing import List, Union | ||
|
||
import pandas as pd | ||
import requests | ||
import mkdocs_gen_files | ||
|
||
|
||
REPOSITORY = "argilla-io/distilabel" | ||
DATA_PATH = "sections/community/popular_issues.md" | ||
|
||
GITHUB_ACCESS_TOKEN = os.getenv( | ||
"GH_ACCESS_TOKEN" | ||
) # public_repo and read:org scopes are required | ||
|
||
|
||
def fetch_issues_from_github_repository( | ||
repository: str, auth_token: Union[str, None] = None | ||
) -> pd.DataFrame: | ||
if auth_token is None: | ||
return pd.DataFrame( | ||
{ | ||
"Issue": [], | ||
"State": [], | ||
"Created at": [], | ||
"Closed at": [], | ||
"Last update": [], | ||
"Labels": [], | ||
"Milestone": [], | ||
"Reactions": [], | ||
"Comments": [], | ||
"URL": [], | ||
"Repository": [], | ||
"Author": [], | ||
} | ||
) | ||
|
||
headers = { | ||
"Authorization": f"token {auth_token}", | ||
"Accept": "application/vnd.github.v3+json", | ||
} | ||
issues_data = [] | ||
|
||
print(f"Fetching issues from '{repository}'...") | ||
with requests.Session() as session: | ||
session.headers.update(headers) | ||
|
||
owner, repo_name = repository.split("/") | ||
issues_url = ( | ||
f"https://api.github.com/repos/{owner}/{repo_name}/issues?state=all" | ||
) | ||
|
||
while issues_url: | ||
response = session.get(issues_url) | ||
issues = response.json() | ||
|
||
for issue in issues: | ||
issues_data.append( | ||
{ | ||
"Issue": f"{issue['number']} - {issue['title']}", | ||
"State": issue["state"], | ||
"Created at": issue["created_at"], | ||
"Closed at": issue.get("closed_at", None), | ||
"Last update": issue["updated_at"], | ||
"Labels": [label["name"] for label in issue["labels"]], | ||
"Milestone": (issue.get("milestone") or {}).get("title"), | ||
"Reactions": issue["reactions"]["total_count"], | ||
"Comments": issue["comments"], | ||
"URL": issue["html_url"], | ||
"Repository": repo_name, | ||
"Author": issue["user"]["login"], | ||
} | ||
) | ||
|
||
issues_url = response.links.get("next", {}).get("url", None) | ||
|
||
return pd.DataFrame(issues_data) | ||
|
||
|
||
def get_org_members(auth_token: Union[str, None] = None) -> List[str]: | ||
if auth_token is None: | ||
return [] | ||
|
||
headers = { | ||
"Authorization": f"token {auth_token}", | ||
"Accept": "application/vnd.github.v3+json", | ||
} | ||
members_list = [] | ||
|
||
members_url = "https://api.github.com/orgs/argilla-io/members" | ||
|
||
while members_url: | ||
response = requests.get(members_url, headers=headers) | ||
members = response.json() | ||
|
||
for member in members: | ||
members_list.append(member["login"]) | ||
|
||
members_list.extend(["pre-commit-ci[bot]"]) | ||
|
||
members_url = response.links.get("next", {}).get("url", None) | ||
|
||
return members_list | ||
|
||
|
||
with mkdocs_gen_files.open(DATA_PATH, "w") as f: | ||
df = fetch_issues_from_github_repository(REPOSITORY, GITHUB_ACCESS_TOKEN) | ||
|
||
open_issues = df.loc[df["State"] == "open"] | ||
engagement_df = ( | ||
open_issues[["URL", "Issue", "Repository", "Reactions", "Comments"]] | ||
.sort_values(by=["Reactions", "Comments"], ascending=False) | ||
.head(10) | ||
.reset_index() | ||
) | ||
|
||
members = get_org_members(GITHUB_ACCESS_TOKEN) | ||
community_issues = df.loc[~df["Author"].isin(members)] | ||
community_issues_df = ( | ||
community_issues[ | ||
["URL", "Issue", "Repository", "Created at", "Author", "State"] | ||
] | ||
.sort_values(by=["Created at"], ascending=False) | ||
.head(10) | ||
.reset_index() | ||
) | ||
|
||
planned_issues = df.loc[df["Milestone"].notna()] | ||
planned_issues_df = ( | ||
planned_issues[ | ||
["URL", "Issue", "Repository", "Created at", "Milestone", "State"] | ||
] | ||
.sort_values(by=["Milestone"], ascending=False) | ||
.head(10) | ||
.reset_index() | ||
) | ||
|
||
f.write('=== "Most engaging open issues"\n\n') | ||
f.write(" | Rank | Issue | Reactions | Comments |\n") | ||
f.write(" |------|-------|:---------:|:--------:|\n") | ||
for ix, row in engagement_df.iterrows(): | ||
f.write( | ||
f" | {ix+1} | [{row['Issue']}]({row['URL']}) | 👍 {row['Reactions']} | 💬 {row['Comments']} |\n" | ||
) | ||
|
||
f.write('\n=== "Latest issues open by the community"\n\n') | ||
f.write(" | Rank | Issue | Author |\n") | ||
f.write(" |------|-------|:------:|\n") | ||
for ix, row in community_issues_df.iterrows(): | ||
state = "🟢" if row["State"] == "open" else "🟣" | ||
f.write( | ||
f" | {ix+1} | {state} [{row['Issue']}]({row['URL']}) | by **{row['Author']}** |\n" | ||
) | ||
|
||
f.write('\n=== "Planned issues for upcoming releases"\n\n') | ||
f.write(" | Rank | Issue | Milestone |\n") | ||
f.write(" |------|-------|:------:|\n") | ||
for ix, row in planned_issues_df.iterrows(): | ||
state = "🟢" if row["State"] == "open" else "🟣" | ||
f.write( | ||
f" | {ix+1} | {state} [{row['Issue']}]({row['URL']}) | **{row['Milestone']}** |\n" | ||
) | ||
|
||
today = datetime.today().date() | ||
f.write(f"\nLast update: {today}\n") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
--- | ||
hide: | ||
- toc | ||
- footer | ||
--- | ||
|
||
We are an open-source community-driven project not only focused on building a great product but also on building a great community, where you can get support, share your experiences, and contribute to the project! We would love to hear from you and help you get started with distilabel. | ||
|
||
<div class="grid cards" markdown> | ||
|
||
- __Slack__ | ||
|
||
--- | ||
|
||
In our Slack you can get direct support from the community. | ||
|
||
|
||
[:octicons-arrow-right-24: Slack ↗](https://join.slack.com/t/rubrixworkspace/shared_invite/zt-whigkyjn-a3IUJLD7gDbTZ0rKlvcJ5g) | ||
|
||
- __Community Meetup__ | ||
|
||
--- | ||
|
||
We host bi-weekly community meetups where you can listen in or present your work. | ||
|
||
[:octicons-arrow-right-24: Community Meetup ↗](https://lu.ma/argilla-event-calendar) | ||
|
||
- __Changelog__ | ||
|
||
--- | ||
|
||
The changelog is where you can find the latest updates and changes to the distilabel project. | ||
|
||
[:octicons-arrow-right-24: Changelog ↗](https://github.com/argilla-io/distilabel/releases) | ||
|
||
- __Roadmap__ | ||
|
||
--- | ||
|
||
We love to discuss our plans with the community. Feel encouraged to participate in our roadmap discussions. | ||
|
||
[:octicons-arrow-right-24: Roadmap ↗](https://github.com/orgs/argilla-io/projects/15) | ||
|
||
</div> |
Oops, something went wrong.