-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
framework to be used by RAG for pipeline #164
Conversation
2f9c442
to
b617893
Compare
|
||
## Risks | ||
|
||
N/A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bringing in a new dependency always introduces risks related to building that package (and its dependencies) downstream, as well as license compliance, long term supportability, etc. Those aren't unique to this package, but it would be good to see them addressed here anyway to show that a thorough vetting was done of the package, and its dependencies.
For example, is any part of the haystack-ai dependency list GPL or AGPL? Is its license compatible with the Instructlab license?
Does it, and all of its dependencies, appear to be actively maintained by more than one maintainer?
Are those maintainers responsive to requests to review PRs, especially for CVEs?
Is the project following good practices for releasing, including tagging releases, publishing source packages, and using trusted publishing automation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cooktheryan this type of analysis is fairly routine for OSPO. I'd recommend reaching out to @lhawthorn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comments.
None of Haystack's dependencies (jinja2, lazy-imports, more-itertools, networkx, numpy, openai, pandas, posthog, python-dateutil, pyyaml, requests, tenacity, tqdm, typing-extensions) are GPL/AGPL. All of them are Apache, MIT, BSD or PSFL.
Haystack is licensed under Apache 2.0. What else makes it compatible or non-compatible with InstructLab?
Haystack is very actively maintained and supported. As for active maintenance of the dependencies, I didn't check them all but most are very well-known and widely used libraries. Moreover, many of them are also on the dependency list of the alternatives to Haystack.
Haystack releases tagged versions. They do not publish to Github Packages.
Could you please elaborate what is considered "trusted publishing automation" and whether there are any additional releasing practices we need to verify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be good information to include in the body of the document.
For trusted publisher, I mean https://docs.pypi.org/trusted-publishers/. We've recently seen an instance of a package in the AI ecosystem where that trusted publishing was set up incorrectly, which led to a third-party having the ability to not only trigger releases but to inject malicious code into multiple versions of source packages. ultralytics/ultralytics#18027 for reference.
Downstream, we're building from source packages, but if we can't trust those source packages we would not want to consume the dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks for the clarification. It looks like Haystack indeed satisfies the requirements for trusted publishing with this Github Actions workflow.
We'll make sure to include the details from this thread in the document.
Signed-off-by: Ryan Cook <[email protected]>
b617893
to
fdf2852
Compare
There is a lot of emphasis given to haystack being focused on RAG. Does it also have a good story for supporting more generalized agentic patterns? |
@erikerlandson, well, since Haystack supports building arbitrary graphs of operators (similarly to Langgraph), it can potentially implement many of the agentic patterns. However, as you say, the main focus is on RAG. Since the decision here solely considers our RAG implementation, features and capabilities not directly related to RAG should mostly be out of scope. |
I believe that agentic should not be totally out of scope; mainly because soon enough the focus is going to shift to agentic, and it would be undesirable (IMO) to be in a position of wanting to add a separate framework for agentic, or having to switch to a new framework altogether. |
@erikerlandson, I agree. Since Haystack provides the functionality to build arbitrarily complex pipelines, it is possible to extend our RAG architecture to introduce nodes for tools, reasoning and agentic actions. |
Added details regarding Haystack licensing and maintenance. Signed-off-by: Ilya Kolchinsky <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have requested one more alternative to be added to the list that we considered -- I don't think that will change the outcome but I want it included for completeness.
I understand that the discussion of the dependency risk will be added to the risk section, which sounds good to me.
I am not seeing any other edits to this document that I would recommend. So I think if these changes are made, I will approve (unless something else arises before then).
@jwm4, sure, I will edit the document to include the alternative you proposed. |
Oh, I missed it under "Why". Sorry! I am fine with it there. I do think there is a tail risk that new dependencies get added that are more problematic. If that happens we have the following options:
Any of these could be viable depending on the circumstances. Furthermore, since Haystack hasn't added any dependencies that we'd consider very problematic so far, the likelihood that they will in the future seems minimal. I'd still like this noted under Risks though for reference if you don't mind. |
Expanded risks and alternatives. Signed-off-by: Ilya Kolchinsky <[email protected]>
Done - the document is now updated as we discussed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concerns I have about this document are all addressed. FWIW, the conclusion that we will use Haystack is not the choice I would have made, and I think in the long run it is likely we will undo this choice and eliminate the dependency on Haystack. However, I don't think that will be particularly hard to do if/when there is a strong case for doing so, and I can't deny that for now having a framework to build on is making it quicker and easier to deliver a minimal viable solution. On the whole, I would say that the argument made in this document is strong and well-reasoned, so I am approving it despite my reservations about the decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@instructlab/oversight-committee -- Can someone take a look at this and see if it is ready to approve and merge? If it is not, can we get some guidance about what would be needed to get it approved? |
As an oversight committee member I approve this. Please fix the markdown/spellcheck errors and then we can merge this as a the way forward. |
Signed-off-by: Ryan Cook <[email protected]>
Signed-off-by: Ryan Cook <[email protected]>
Signed-off-by: Ryan Cook <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting some changes, some minor spelling / grammar, some just pulling some items from the PR conversation into the doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Signed-off-by: Ryan Cook <[email protected]>
No description provided.