Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure knowledge docs are cloned into unique dirs #416

Merged
merged 1 commit into from
Nov 27, 2024

Conversation

bbrowning
Copy link
Contributor

Previously, we were attempting to clone multiple knowledge documents into the same destination directory, leading to failures generating data for any run that contained 2+ knowledge leaf nodes.

Now, we clone docs into a guaranteed unique (via tempfile.mkdtemp) subdirectory per knowledge leaf node. Just using a subdirectory per leaf node could still have led to collisions if the user ran data generation twice within one minute, which is why this goes the extra step of using mkdtemp for guaranteed uniqueness.

Fixes #404

Previously, we were attempting to clone multiple knowledge documents
into the same destination directory, leading to failures generating
data for any run that contained 2+ knowledge leaf nodes.

Now, we clone docs into a guaranteed unique (via `tempfile.mkdtemp`)
subdirectory per knowledge leaf node. Just using a subdirectory per
leaf node could still have led to collisions if the user ran data
generation twice within one minute, which is why this goes the extra
step of using `mkdtemp` for guaranteed uniqueness.

Fixes instructlab#404

Signed-off-by: Ben Browning <[email protected]>
@mergify mergify bot added the testing Relates to testing label Nov 27, 2024
Copy link
Member

@aakankshaduggal aakankshaduggal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bbrowning for the quick fix.

@mergify mergify bot added the one-approval label Nov 27, 2024
@bbrowning
Copy link
Contributor Author

@Mergifyio backport release-v0.6

Copy link
Contributor

mergify bot commented Nov 27, 2024

backport release-v0.6

✅ Backports have been created

@bbrowning
Copy link
Contributor Author

Still waiting on e2e-medium that is taking its sweet time working through some compositional skill testing, unrelated to this change. For the sake of expediency, I'm going to manually push the merge button on this one and get CI starting to run on the backport PR so they can happen simultaneously.

@bbrowning bbrowning merged commit eef8bae into instructlab:main Nov 27, 2024
20 checks passed
mergify bot added a commit that referenced this pull request Nov 27, 2024
Ensure knowledge docs are cloned into unique dirs (backport #416)
@bbrowning bbrowning deleted the unique-document-dir branch November 28, 2024 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
one-approval testing Relates to testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sdg v0.6.0+ multiple knowledge sources fails to clone
2 participants