-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: initial post on llms + data #7217
docs: initial post on llms + data #7217
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Content is good! Flagged a few missing words and reordered a few sentences.
@aaazzam hope you don't mind the random ping, we're close to merging this if you do want to give a review and make any suggestions for how we're talking about Marvin |
Hey Cody!
Thanks for the ping - don't want to block you - LGTM (tldr gave it a quick
ready and think you hit the nail on the head).
We should talk more though feel free to email me at ***@***.*** and we
can grab a zoom sometime
…On Tue, 26 Sept 2023 at 14:51, Cody Peterson ***@***.***> wrote:
@aaazzam <https://github.com/aaazzam> hope you don't mind the random
ping, we're close to merging this if you do want to give a review and make
any suggestions for how we're talking about Marvin
—
Reply to this email directly, view it on GitHub
<#7217 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AH4DG2JS4BQW2WDRRJLTDTTX4MP25ANCNFSM6AAAAAA5HAURHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
gil's feedback
Birdbrain](https://github.com/ibis-project/ibis-birdbrain), our open-source data | ||
& AI project for building next-generation natural language interfaces to data. | ||
|
||
## Discussions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this empty ## Discussions
section intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I'm going to remove it though -- it's before the Giscus discussions at the bottom, but would look weird via RSS and it's obvious w/o a header
When discussed internally at Voltron Data, we identified three distinct | ||
approaches to applying LLMs to data analytics that can be implemented today: | ||
|
||
1. LLM writes SQL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. LLM writes SQL | |
1. LLM writes an analytic routine (SQL query or dataframe code) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably don't need the parenthetical. How about just LLM writes a SQL query or dataframe code
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even just analytics code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that works! I just want to generalize it to be about more than just SQL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep lemme rework per this suggestion
data analytics that supports 18+ backends. But first, let's demonstrate the | ||
three approaches in code. | ||
|
||
### Approach 1: LLM writes SQL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Approach 1: LLM writes SQL | |
### Approach 1: LLM writes an analytic routine |
ones. In many scenarios, it may be easier to express a query in English or | ||
another language than to write it in SQL, especially if working across multiple | ||
SQL dialects. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about Ibis? Can we show Birdbrain writing Ibis code here? Or can we briefly explain that it could also write Ibis code or code for other dataframe libraries?
t = con.table("penguins") # <3> | ||
``` | ||
|
||
1. Import Ibis and Marvin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What role is Ibis actually playing here, besides submitting the query and pretty-printing the results? Couldn't this just use DBAPI and achieve the same thing? If not can you explain why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- easily handoff to LLMs (notice using Ibis to get the schema, table name, nice error messages to feed back, etc)
- translation across SQL dialects (you could just do this with SQLglot)
- access to Ibis for mixed Python/natural language analysis (use language OR Ibis code, mix and match)
- 18+ data platforms, thus this can be a universal standard for AI + data
- once working reliably, write Ibis code that (somewhat) avoids the SQL dialect issues and works on 18+ backends
- then way down the line, Substrait via Ibis to really avoid those issues
will add some more explanation here
"a function named count_vowels that given an input string, returns an int w/ the number of vowels (y_included as a boolean option defaulted to False)" | ||
) | ||
print(udf) | ||
exec(udf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put the exec inside the create_udf_from_text
function. Also for due diligence I would mention how this is a total security nightmare if you are passing in untrusted user input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why move it inside the function? with it outside, you (directly or via the bot this will be called from) can inspect the function before you execute/register it
fyi this post has moved over here: https://ibis-project.github.io/ibis-birdbrain/posts/llms-and-data-pt1/
I'm going to have a series of posts to start off that new project, and will make one here in Ibis to announce. don't want to overwhelm the Ibis blog or start making it all about AI/LLMs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, just a bit more encapsulated, but yes that is a nice benefit that is probably worth it. Sure keep it outside. Sounds good
closing in favor of a series of posts on the Ibis Birdbrain site, work in progress here: https://ibis-project.github.io/ibis-birdbrain/posts appreciate all the feedback! have taken it into account, I'll be updating the posts there over the next week or two and will ask for another round of feedback before soft launching |
needs some finishing up, but I like that the current rendered version had no suggested improvements at the end...