Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: initial post on llms + data #7217

Closed
wants to merge 1 commit into from
Closed

docs: initial post on llms + data #7217

wants to merge 1 commit into from

Conversation

lostmygithubaccount
Copy link
Member

needs some finishing up, but I like that the current rendered version had no suggested improvements at the end...

@cpcloud cpcloud added the docs Documentation related issues or PRs label Sep 26, 2023
docs/posts/llms-and-data/index.qmd Outdated Show resolved Hide resolved
docs/posts/llms-and-data/index.qmd Outdated Show resolved Hide resolved
docs/posts/llms-and-data/index.qmd Outdated Show resolved Hide resolved
@lostmygithubaccount lostmygithubaccount marked this pull request as ready for review September 26, 2023 16:53
Copy link
Member

@gforsyth gforsyth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Content is good! Flagged a few missing words and reordered a few sentences.

docs/posts/llms-and-data/index.qmd Outdated Show resolved Hide resolved
docs/posts/llms-and-data/index.qmd Outdated Show resolved Hide resolved
docs/posts/llms-and-data/index.qmd Outdated Show resolved Hide resolved
docs/posts/llms-and-data/index.qmd Outdated Show resolved Hide resolved
docs/posts/llms-and-data/index.qmd Outdated Show resolved Hide resolved
docs/posts/llms-and-data/index.qmd Outdated Show resolved Hide resolved
docs/posts/llms-and-data/index.qmd Outdated Show resolved Hide resolved
docs/posts/llms-and-data/index.qmd Outdated Show resolved Hide resolved
docs/posts/llms-and-data/index.qmd Outdated Show resolved Hide resolved
@lostmygithubaccount
Copy link
Member Author

@aaazzam hope you don't mind the random ping, we're close to merging this if you do want to give a review and make any suggestions for how we're talking about Marvin

@aaazzam
Copy link

aaazzam commented Sep 26, 2023 via email

Birdbrain](https://github.com/ibis-project/ibis-birdbrain), our open-source data
& AI project for building next-generation natural language interfaces to data.

## Discussions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this empty ## Discussions section intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I'm going to remove it though -- it's before the Giscus discussions at the bottom, but would look weird via RSS and it's obvious w/o a header

When discussed internally at Voltron Data, we identified three distinct
approaches to applying LLMs to data analytics that can be implemented today:

1. LLM writes SQL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. LLM writes SQL
1. LLM writes an analytic routine (SQL query or dataframe code)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't need the parenthetical. How about just LLM writes a SQL query or dataframe code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even just analytics code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that works! I just want to generalize it to be about more than just SQL

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep lemme rework per this suggestion

data analytics that supports 18+ backends. But first, let's demonstrate the
three approaches in code.

### Approach 1: LLM writes SQL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Approach 1: LLM writes SQL
### Approach 1: LLM writes an analytic routine

ones. In many scenarios, it may be easier to express a query in English or
another language than to write it in SQL, especially if working across multiple
SQL dialects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Ibis? Can we show Birdbrain writing Ibis code here? Or can we briefly explain that it could also write Ibis code or code for other dataframe libraries?

t = con.table("penguins") # <3>
```

1. Import Ibis and Marvin.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What role is Ibis actually playing here, besides submitting the query and pretty-printing the results? Couldn't this just use DBAPI and achieve the same thing? If not can you explain why not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • easily handoff to LLMs (notice using Ibis to get the schema, table name, nice error messages to feed back, etc)
  • translation across SQL dialects (you could just do this with SQLglot)
  • access to Ibis for mixed Python/natural language analysis (use language OR Ibis code, mix and match)
  • 18+ data platforms, thus this can be a universal standard for AI + data
  • once working reliably, write Ibis code that (somewhat) avoids the SQL dialect issues and works on 18+ backends
  • then way down the line, Substrait via Ibis to really avoid those issues

will add some more explanation here

@lostmygithubaccount lostmygithubaccount marked this pull request as draft September 27, 2023 15:39
"a function named count_vowels that given an input string, returns an int w/ the number of vowels (y_included as a boolean option defaulted to False)"
)
print(udf)
exec(udf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put the exec inside the create_udf_from_text function. Also for due diligence I would mention how this is a total security nightmare if you are passing in untrusted user input.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why move it inside the function? with it outside, you (directly or via the bot this will be called from) can inspect the function before you execute/register it

fyi this post has moved over here: https://ibis-project.github.io/ibis-birdbrain/posts/llms-and-data-pt1/

I'm going to have a series of posts to start off that new project, and will make one here in Ibis to announce. don't want to overwhelm the Ibis blog or start making it all about AI/LLMs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, just a bit more encapsulated, but yes that is a nice benefit that is probably worth it. Sure keep it outside. Sounds good

@lostmygithubaccount
Copy link
Member Author

closing in favor of a series of posts on the Ibis Birdbrain site, work in progress here: https://ibis-project.github.io/ibis-birdbrain/posts

appreciate all the feedback! have taken it into account, I'll be updating the posts there over the next week or two and will ask for another round of feedback before soft launching

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation related issues or PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants