Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds verify to mall #26

Merged
merged 17 commits into from
Oct 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions _freeze/index/execute-results/html.json

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions _freeze/reference/MallFrame/execute-results/html.json

Large diffs are not rendered by default.

15 changes: 15 additions & 0 deletions _freeze/reference/llm_verify/execute-results/html.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"hash": "07283324dfba84486e7dfa2870efbcb4",
"result": {
"engine": "knitr",
"markdown": "---\ntitle: \"Verify if a statement about the text is true or not\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n[R/llm-verify.R](https://github.com/mlverse/mall/blob/main/R/llm-verify.R)\n\n## llm_verify\n\n## Description\n Use a Large Language Model (LLM) to see if something is true or not based the provided text \n\n\n## Usage\n```r\n \nllm_verify( \n .data, \n col, \n what, \n yes_no = factor(c(1, 0)), \n pred_name = \".verify\", \n additional_prompt = \"\" \n) \n \nllm_vec_verify( \n x, \n what, \n yes_no = factor(c(1, 0)), \n additional_prompt = \"\", \n preview = FALSE \n) \n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| what | The statement or question that needs to be verified against the provided text |\n| yes_no | A size 2 vector that specifies the expected output. It is positional. The first item is expected to be value to return if the statement about the provided text is true, and the second if it is not. Defaults to: `factor(c(1, 0))` |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n `llm_verify` returns a `data.frame` or `tbl` object. `llm_vec_verify` returns a vector that is the same length as `x`. \n\n\n## Examples\n\n\n::: {.cell}\n\n```{.r .cell-code}\n \nlibrary(mall) \n \ndata(\"reviews\") \n \nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE) \n \n# By default it will return 1 for 'true', and 0 for 'false', \n# the new column will be a factor type \nllm_verify(reviews, review, \"is the customer happy\") \n#> # A tibble: 3 × 2\n#> review .verify\n#> <chr> <fct> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. 1 \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too n… 0 \n#> 3 Not sure how to feel about my new washing machine. Great color, but h… 0\n \n# The yes_no argument can be modified to return a different response \n# than 1 or 0. First position will be 'true' and second, 'false' \nllm_verify(reviews, review, \"is the customer happy\", c(\"y\", \"n\")) \n#> # A tibble: 3 × 2\n#> review .verify\n#> <chr> <chr> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. y \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too n… n \n#> 3 Not sure how to feel about my new washing machine. Great color, but h… n\n \n# Number can also be used, this would be in the case that you wish to match \n# the output values of existing predictions \nllm_verify(reviews, review, \"is the customer happy\", c(2, 1)) \n#> # A tibble: 3 × 2\n#> review .verify\n#> <chr> <dbl>\n#> 1 This has been the best TV I've ever used. Great screen, and sound. 2\n#> 2 I regret buying this laptop. It is too slow and the keyboard is too n… 1\n#> 3 Not sure how to feel about my new washing machine. Great color, but h… 1\n```\n:::\n",
"supporting": [],
"filters": [
"rmarkdown/pagebreak.lua"
],
"includes": {},
"engineDependencies": {},
"preserve": {},
"postProcess": true
}
}
58 changes: 58 additions & 0 deletions index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ Currently, the included prompts perform the following:
- [Classify text](#classify)
- [Extract one, or several](#extract), specific pieces information from the text
- [Translate text](#translate)
- [Verify that something it true](#verify) about the text (binary)
- [Custom prompt](#custom-prompt)

This package is inspired by the SQL AI functions now offered by vendors such as
Expand Down Expand Up @@ -298,6 +299,63 @@ For more information and examples visit this function's

:::

### Classify

Use the LLM to categorize the text into one of the options you provide:


::: {.panel-tabset group="language"}
## R

```{r}
reviews |>
llm_classify(review, c("appliance", "computer"))
```

For more information and examples visit this function's
[R reference page](reference/llm_classify.qmd)

## Python

```{python}
reviews.llm.classify("review", ["computer", "appliance"])
```

For more information and examples visit this function's
[Python reference page](reference/MallFrame.qmd#mall.MallFrame.classify)

:::

### Verify

This functions allows you to check and see if a statement is true, based
on the provided text. By default, it will return a 1 for "yes", and 0 for
"no". This can be customized.


::: {.panel-tabset group="language"}
## R

```{r}
reviews |>
llm_verify(review, "is the customer happy with the purchase")
```

For more information and examples visit this function's
[R reference page](reference/llm_verify.qmd)

## Python

```{python}
reviews.llm.verify("review", "is the customer happy with the purchase")
```

For more information and examples visit this function's
[Python reference page](reference/MallFrame.qmd#mall.MallFrame.verify)

:::



### Translate

Expand Down
2 changes: 1 addition & 1 deletion objects.json
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"project": "mall", "version": "0.0.9999", "count": 16, "items": [{"name": "mall.MallFrame.classify", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.classify", "dispname": "-"}, {"name": "mall.polars.MallFrame.classify", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.classify", "dispname": "mall.MallFrame.classify"}, {"name": "mall.MallFrame.custom", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.custom", "dispname": "-"}, {"name": "mall.polars.MallFrame.custom", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.custom", "dispname": "mall.MallFrame.custom"}, {"name": "mall.MallFrame.extract", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.extract", "dispname": "-"}, {"name": "mall.polars.MallFrame.extract", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.extract", "dispname": "mall.MallFrame.extract"}, {"name": "mall.MallFrame.sentiment", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.sentiment", "dispname": "-"}, {"name": "mall.polars.MallFrame.sentiment", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.sentiment", "dispname": "mall.MallFrame.sentiment"}, {"name": "mall.MallFrame.summarize", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.summarize", "dispname": "-"}, {"name": "mall.polars.MallFrame.summarize", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.summarize", "dispname": "mall.MallFrame.summarize"}, {"name": "mall.MallFrame.translate", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.translate", "dispname": "-"}, {"name": "mall.polars.MallFrame.translate", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.translate", "dispname": "mall.MallFrame.translate"}, {"name": "mall.MallFrame.use", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.use", "dispname": "-"}, {"name": "mall.polars.MallFrame.use", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.use", "dispname": "mall.MallFrame.use"}, {"name": "mall.MallFrame", "domain": "py", "role": "class", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame", "dispname": "-"}, {"name": "mall.polars.MallFrame", "domain": "py", "role": "class", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame", "dispname": "mall.MallFrame"}]}
{"project": "mall", "version": "0.0.9999", "count": 18, "items": [{"name": "mall.MallFrame.classify", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.classify", "dispname": "-"}, {"name": "mall.polars.MallFrame.classify", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.classify", "dispname": "mall.MallFrame.classify"}, {"name": "mall.MallFrame.custom", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.custom", "dispname": "-"}, {"name": "mall.polars.MallFrame.custom", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.custom", "dispname": "mall.MallFrame.custom"}, {"name": "mall.MallFrame.extract", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.extract", "dispname": "-"}, {"name": "mall.polars.MallFrame.extract", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.extract", "dispname": "mall.MallFrame.extract"}, {"name": "mall.MallFrame.sentiment", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.sentiment", "dispname": "-"}, {"name": "mall.polars.MallFrame.sentiment", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.sentiment", "dispname": "mall.MallFrame.sentiment"}, {"name": "mall.MallFrame.summarize", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.summarize", "dispname": "-"}, {"name": "mall.polars.MallFrame.summarize", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.summarize", "dispname": "mall.MallFrame.summarize"}, {"name": "mall.MallFrame.translate", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.translate", "dispname": "-"}, {"name": "mall.polars.MallFrame.translate", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.translate", "dispname": "mall.MallFrame.translate"}, {"name": "mall.MallFrame.use", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.use", "dispname": "-"}, {"name": "mall.polars.MallFrame.use", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.use", "dispname": "mall.MallFrame.use"}, {"name": "mall.MallFrame.verify", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.verify", "dispname": "-"}, {"name": "mall.polars.MallFrame.verify", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.verify", "dispname": "mall.MallFrame.verify"}, {"name": "mall.MallFrame", "domain": "py", "role": "class", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame", "dispname": "-"}, {"name": "mall.polars.MallFrame", "domain": "py", "role": "class", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame", "dispname": "mall.MallFrame"}]}
100 changes: 100 additions & 0 deletions python/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# mall

## Intro

Run multiple LLM predictions against a data frame. The predictions are
processed row-wise over a specified column. It works using a
pre-determined one-shot prompt, along with the current row’s content.

## Install

To install from Github, use:

``` python
pip install "mall @ git+https://[email protected]/edgararuiz/mall.git@python#subdirectory=python"
```

## Examples

``` python
import mall
import polars as pl

reviews = pl.DataFrame(
data=[
"This has been the best TV I've ever used. Great screen, and sound.",
"I regret buying this laptop. It is too slow and the keyboard is too noisy",
"Not sure how to feel about my new washing machine. Great color, but hard to figure"
],
schema=[("review", pl.String)],
)
```

## Sentiment


``` python
reviews.llm.sentiment("review")
```

<small>shape: (3, 2)</small>

| review | sentiment |
|----------------------------------|------------|
| str | str |
| "This has been the best TV I've… | "positive" |
| "I regret buying this laptop. I… | "negative" |
| "Not sure how to feel about my … | "neutral" |

## Summarize

``` python
reviews.llm.summarize("review", 5)
```

<small>shape: (3, 2)</small>

| review | summary |
|----------------------------------|----------------------------------|
| str | str |
| "This has been the best TV I've… | "it's a great tv" |
| "I regret buying this laptop. I… | "laptop not worth the money" |
| "Not sure how to feel about my … | "feeling uncertain about new pu… |

## Translate (as in ‘English to French’)

``` python
reviews.llm.translate("review", "spanish")
```

<small>shape: (3, 2)</small>

| review | translation |
|----------------------------------|----------------------------------|
| str | str |
| "This has been the best TV I've… | "Esta ha sido la mejor TV que h… |
| "I regret buying this laptop. I… | "Lo lamento comprar este portát… |
| "Not sure how to feel about my … | "No estoy seguro de cómo sentir… |

## Classify

``` python
reviews.llm.classify("review", ["computer", "appliance"])
```

<small>shape: (3, 2)</small>

| review | classify |
|----------------------------------|-------------|
| str | str |
| "This has been the best TV I've… | "appliance" |
| "I regret buying this laptop. I… | "appliance" |
| "Not sure how to feel about my … | "appliance" |

## LLM session setup

``` python
reviews.llm.use(options = dict(seed = 100))
```

{'backend': 'ollama', 'model': 'llama3.2', 'options': {'seed': 100}}
71 changes: 71 additions & 0 deletions python/README.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
format: gfm
---

# mall

## Intro

Run multiple LLM predictions against a data frame. The predictions are processed row-wise over a specified column. It works using a pre-determined one-shot prompt, along with the current row’s content.

## Install

To install from Github, use:

```python
pip install "mall @ git+https://[email protected]/edgararuiz/mall.git@python#subdirectory=python"
```

## Examples

```{python}
#| include: false
import polars as pl
from polars.dataframe._html import HTMLFormatter
html_formatter = get_ipython().display_formatter.formatters['text/html']
html_formatter.for_type(pl.DataFrame, lambda df: "\n".join(HTMLFormatter(df).render()))
```


```{python}
import mall
import polars as pl
data = mall.MallData
reviews = data.reviews
```

```{python}
#| include: false
reviews.llm.use(options = dict(seed = 100))
```


## Sentiment

```{python}
reviews.llm.sentiment("review")
```

## Summarize

```{python}
reviews.llm.summarize("review", 5)
```

## Translate (as in 'English to French')

```{python}
reviews.llm.translate("review", "spanish")
```

## Classify

```{python}
reviews.llm.classify("review", ["computer", "appliance"])
```

## LLM session setup

```{python}
reviews.llm.use(options = dict(seed = 100))
```
70 changes: 61 additions & 9 deletions python/mall/llm.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,45 @@
import polars as pl
import ollama
import json
import hashlib
import os


def build_msg(x, msg):
out = []
for msgs in msg:
out.append({"role": msgs["role"], "content": msgs["content"].format(x)})
return out
def map_call(df, col, msg, pred_name, use, valid_resps="", convert=None):
if valid_resps == "":
valid_resps = []
valid_resps = valid_output(valid_resps)
ints = 0
for resp in valid_resps:
ints = ints + isinstance(resp, int)

pl_type = pl.String
data_type = str

if len(valid_resps) == ints & ints != 0:
pl_type = pl.Int8
data_type = int

df = df.with_columns(
pl.col(col)
.map_elements(
lambda x: llm_call(
x=x,
msg=msg,
use=use,
preview=False,
valid_resps=valid_resps,
convert=convert,
data_type=data_type,
),
return_dtype=pl_type,
)
.alias(pred_name)
)
return df


def llm_call(x, msg, use, preview=False, valid_resps=""):
def llm_call(x, msg, use, preview=False, valid_resps="", convert=None, data_type=None):

call = dict(
model=use.get("model"),
Expand Down Expand Up @@ -41,9 +69,33 @@ def llm_call(x, msg, use, preview=False, valid_resps=""):
if cache == "":
cache_record(hash_call, use, call, out)

if isinstance(valid_resps, list):
if out not in valid_resps:
out = None
if isinstance(convert, dict):
for label in convert:
if out == label:
out = convert.get(label)

# out = data_type(out)

# if out not in valid_resps:
# out = None

return out


def valid_output(x):
out = []
if isinstance(x, list):
out = x
if isinstance(x, dict):
for i in x:
out.append(x.get(i))
return out


def build_msg(x, msg):
out = []
for msgs in msg:
out.append({"role": msgs["role"], "content": msgs["content"].format(x)})
return out


Expand Down
Loading
Loading