diff --git a/README.md b/README.md index a97e35e..6f14838 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,7 @@ coverage](https://codecov.io/gh/mlverse/mall/branch/main/graph/badge.svg)](https experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental) - + Run multiple LLM predictions against a data frame. The predictions are diff --git a/_freeze/articles/performance/execute-results/html.json b/_freeze/articles/performance/execute-results/html.json index 35973a3..289e5a2 100644 --- a/_freeze/articles/performance/execute-results/html.json +++ b/_freeze/articles/performance/execute-results/html.json @@ -1,5 +1,5 @@ { - "hash": "725f7d35a501e3cb52a4e8704cdb7a78", + "hash": "6c0a29bea5ed5d0d8dd8c1a3992c37d8", "result": { "engine": "knitr", "markdown": "---\ntitle: \"Performance\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n\n## Performance\n\nWe will briefly cover this methods performance from two perspectives: \n\n- How long the analysis takes to run locally \n\n- How well it predicts \n\nTo do so, we will use the `data_bookReviews` data set, provided by the `classmap`\npackage. For this exercise, only the first 100, of the total 1,000, are going\nto be part of this analysis.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(mall)\nlibrary(classmap)\nlibrary(dplyr)\n\ndata(data_bookReviews)\n\ndata_bookReviews |>\n glimpse()\n#> Rows: 1,000\n#> Columns: 2\n#> $ review \"i got this as both a book and an audio file. i had waited t…\n#> $ sentiment 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 1, …\n```\n:::\n\n\n\nAs per the docs, `sentiment` is a factor indicating the sentiment of the review:\nnegative (1) or positive (2)\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlength(strsplit(paste(head(data_bookReviews$review, 100), collapse = \" \"), \" \")[[1]])\n#> [1] 20470\n```\n:::\n\n\n\n\nJust to get an idea of how much data we're processing, I'm using a very, very \nsimple word count. So we're analyzing a bit over 20 thousand words.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews_llm <- data_bookReviews |>\n head(100) |> \n llm_sentiment(\n col = review,\n options = c(\"positive\" ~ 2, \"negative\" ~ 1),\n pred_name = \"predicted\"\n )\n#> ! There were 2 predictions with invalid output, they were coerced to NA\n```\n:::\n\n\n\n\nAs far as **time**, on my Apple M3 machine, it took about 1.5 minutes to process,\n100 rows, containing 20 thousand words. Setting `temp` to 0 in `llm_use()`, \nmade the model run faster.\n\nThe package uses `purrr` to send each prompt individually to the LLM. But, I did\ntry a few different ways to speed up the process, unsuccessfully:\n\n- Used `furrr` to send multiple requests at a time. This did not work because \neither the LLM or Ollama processed all my requests serially. So there was\nno improvement.\n\n- I also tried sending more than one row's text at a time. This cause instability\nin the number of results. For example sending 5 at a time, sometimes returned 7\nor 8. Even sending 2 was not stable. \n\nThis is what the new table looks like:\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews_llm\n#> # A tibble: 100 × 3\n#> review sentiment predicted\n#> \n#> 1 \"i got this as both a book and an audio file… 1 1\n#> 2 \"this book places too much emphasis on spend… 1 1\n#> 3 \"remember the hollywood blacklist? the holly… 2 2\n#> 4 \"while i appreciate what tipler was attempti… 1 1\n#> 5 \"the others in the series were great, and i … 1 1\n#> 6 \"a few good things, but she's lost her edge … 1 1\n#> 7 \"words cannot describe how ripped off and di… 1 1\n#> 8 \"1. the persective of most writers is shaped… 1 NA\n#> 9 \"i have been a huge fan of michael crichton … 1 1\n#> 10 \"i saw dr. polk on c-span a month or two ago… 2 2\n#> # ℹ 90 more rows\n```\n:::\n\n\n\n\nI used `yardstick` to see how well the model performed. Of course, the accuracy\nwill not be of the \"truth\", but rather the package's results recorded in \n`sentiment`.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(forcats)\n\nreviews_llm |>\n mutate(predicted = as.factor(predicted)) |>\n yardstick::accuracy(sentiment, predicted)\n#> # A tibble: 1 × 3\n#> .metric .estimator .estimate\n#> \n#> 1 accuracy binary 0.980\n```\n:::\n", diff --git a/_freeze/index/execute-results/html.json b/_freeze/index/execute-results/html.json index 31569e9..fa563c6 100644 --- a/_freeze/index/execute-results/html.json +++ b/_freeze/index/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "9dee4e73aa0ad5aaff40e3dfda26d5c9", + "hash": "47342fb39f04134910196a040cf3c8a1", "result": { "engine": "knitr", - "markdown": "---\nformat:\n html:\n toc: true\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n\n\n\n\n\n[![R package check](https://github.com/mlverse/mall/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/mlverse/mall/actions/workflows/R-CMD-check.yaml)\n[![R package coverage](https://codecov.io/gh/mlverse/mall/branch/main/graph/badge.svg)](https://app.codecov.io/gh/mlverse/mall?branch=main)\n[![Lifecycle:\nexperimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)\n\n\n\nRun multiple LLM predictions against a data frame. The predictions are processed \nrow-wise over a specified column. It works using a pre-determined one-shot prompt,\nalong with the current row's content. `mall` has been implemented for both R\nand Python. The prompt that is use will depend of the type of analysis needed. \n\nCurrently, the included prompts perform the following: \n\n- [Sentiment analysis](#sentiment)\n- [Text summarizing](#summarize)\n- [Classify text](#classify)\n- [Extract one, or several](#extract), specific pieces information from the text\n- [Translate text](#translate)\n- [Custom prompt](#custom-prompt)\n\nThis package is inspired by the SQL AI functions now offered by vendors such as\n[Databricks](https://docs.databricks.com/en/large-language-models/ai-functions.html) \nand Snowflake. `mall` uses [Ollama](https://ollama.com/) to interact with LLMs \ninstalled locally. \n\n\n\nFor **R**, that interaction takes place via the \n[`ollamar`](https://hauselin.github.io/ollama-r/) package. The functions are \ndesigned to easily work with piped commands, such as `dplyr`. \n\n```r\nreviews |>\n llm_sentiment(review)\n```\n\n\n\nFor **Python**, `mall` is a library extension to [Polars](https://pola.rs/). To\ninteract with Ollama, it uses the official\n[Python library](https://github.com/ollama/ollama-python).\n\n```python\nreviews.llm.sentiment(\"review\")\n```\n\n## Motivation\n\nWe want to new find ways to help data scientists use LLMs in their daily work. \nUnlike the familiar interfaces, such as chatting and code completion, this interface\nruns your text data directly against the LLM. \n\nThe LLM's flexibility, allows for it to adapt to the subject of your data, and \nprovide surprisingly accurate predictions. This saves the data scientist the\nneed to write and tune an NLP model. \n\nIn recent times, the capabilities of LLMs that can run locally in your computer \nhave increased dramatically. This means that these sort of analysis can run\nin your machine with good accuracy. Additionally, it makes it possible to take\nadvantage of LLM's at your institution, since the data will not leave the\ncorporate network. \n\n## Get started\n\n- Install `mall` from Github\n\n \n::: {.panel-tabset group=\"language\"}\n## R\n```r\npak::pak(\"mlverse/mall/r\")\n```\n\n## Python\n```python\npip install \"mall @ git+https://git@github.com/mlverse/mall.git#subdirectory=python\"\n```\n:::\n\n- [Download Ollama from the official website](https://ollama.com/download)\n\n- Install and start Ollama in your computer\n\n\n::: {.panel-tabset group=\"language\"}\n## R\n- Install Ollama in your machine. The `ollamar` package's website provides this\n[Installation guide](https://hauselin.github.io/ollama-r/#installation)\n\n- Download an LLM model. For example, I have been developing this package using\nLlama 3.2 to test. To get that model you can run: \n ```r\n ollamar::pull(\"llama3.2\")\n ```\n \n## Python\n\n- Install the official Ollama library\n ```python\n pip install ollama\n ```\n\n- Download an LLM model. For example, I have been developing this package using\nLlama 3.2 to test. To get that model you can run: \n ```python\n import ollama\n ollama.pull('llama3.2')\n ```\n:::\n \n#### With Databricks (R only)\n\nIf you pass a table connected to **Databricks** via `odbc`, `mall` will \nautomatically use Databricks' LLM instead of Ollama. *You won't need Ollama \ninstalled if you are using Databricks only.*\n\n`mall` will call the appropriate SQL AI function. For more information see our \n[Databricks article.](https://mlverse.github.io/mall/articles/databricks.html) \n\n## LLM functions\n\nWe will start with loading a very small data set contained in `mall`. It has\n3 product reviews that we will use as the source of our examples.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(mall)\ndata(\"reviews\")\n\nreviews\n#> # A tibble: 3 × 1\n#> review \n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too noisy \n#> 3 Not sure how to feel about my new washing machine. Great color, but hard to f…\n```\n:::\n\n\n\n## Python\n\n\n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nimport mall \ndata = mall.MallData\nreviews = data.reviews\n\nreviews \n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
review
"This has been the best TV I've ever used. Great screen, and sound."
"I regret buying this laptop. It is too slow and the keyboard is too noisy"
"Not sure how to feel about my new washing machine. Great color, but hard to figure"
\n```\n\n:::\n:::\n\n\n:::\n\n\n\n\n\n\n\n### Sentiment\n\nAutomatically returns \"positive\", \"negative\", or \"neutral\" based on the text.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_sentiment(review)\n#> # A tibble: 3 × 2\n#> review .sentiment\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. positive \n#> 2 I regret buying this laptop. It is too slow and the keyboard is to… negative \n#> 3 Not sure how to feel about my new washing machine. Great color, bu… neutral\n```\n:::\n\n\n\nFor more information and examples visit this function's \n[R reference page](reference/llm_sentiment.qmd) \n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.sentiment(\"review\")\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewsentiment
"This has been the best TV I've ever used. Great screen, and sound.""positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""neutral"
\n```\n\n:::\n:::\n\n\n\nFor more information and examples visit this function's \n[Python reference page](reference/MallFrame.qmd#mall.MallFrame.sentiment) \n\n:::\n\n### Summarize\n\nThere may be a need to reduce the number of words in a given text. Typically to \nmake it easier to understand its intent. The function has an argument to \ncontrol the maximum number of words to output \n(`max_words`):\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_summarize(review, max_words = 5)\n#> # A tibble: 3 × 2\n#> review .summary \n#> \n#> 1 This has been the best TV I've ever used. Gr… it's a great tv \n#> 2 I regret buying this laptop. It is too slow … laptop purchase was a mistake \n#> 3 Not sure how to feel about my new washing ma… having mixed feelings about it\n```\n:::\n\n\n\nFor more information and examples visit this function's \n[R reference page](reference/llm_summarize.qmd) \n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.summarize(\"review\", 5)\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewsummary
"This has been the best TV I've ever used. Great screen, and sound.""great tv with good features"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop purchase was a mistake"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""feeling uncertain about new purchase"
\n```\n\n:::\n:::\n\n\n\nFor more information and examples visit this function's \n[Python reference page](reference/MallFrame.qmd#mall.MallFrame.summarize) \n\n:::\n\n### Classify\n\nUse the LLM to categorize the text into one of the options you provide: \n\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_classify(review, c(\"appliance\", \"computer\"))\n#> # A tibble: 3 × 2\n#> review .classify\n#> \n#> 1 This has been the best TV I've ever used. Gr… computer \n#> 2 I regret buying this laptop. It is too slow … computer \n#> 3 Not sure how to feel about my new washing ma… appliance\n```\n:::\n\n\n\nFor more information and examples visit this function's \n[R reference page](reference/llm_classify.qmd) \n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.classify(\"review\", [\"computer\", \"appliance\"])\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewclassify
"This has been the best TV I've ever used. Great screen, and sound.""appliance"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""computer"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""appliance"
\n```\n\n:::\n:::\n\n\n\nFor more information and examples visit this function's \n[Python reference page](reference/MallFrame.qmd#mall.MallFrame.classify) \n\n:::\n\n### Extract \n\nOne of the most interesting use cases Using natural language, we can tell the \nLLM to return a specific part of the text. In the following example, we request\nthat the LLM return the product being referred to. We do this by simply saying \n\"product\". The LLM understands what we *mean* by that word, and looks for that\nin the text.\n\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_extract(review, \"product\")\n#> # A tibble: 3 × 2\n#> review .extract \n#> \n#> 1 This has been the best TV I've ever used. Gr… tv \n#> 2 I regret buying this laptop. It is too slow … laptop \n#> 3 Not sure how to feel about my new washing ma… washing machine\n```\n:::\n\n\n\nFor more information and examples visit this function's \n[R reference page](reference/llm_extract.qmd) \n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.extract(\"review\", \"product\")\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewextract
"This has been the best TV I've ever used. Great screen, and sound.""tv"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine"
\n```\n\n:::\n:::\n\n\n\nFor more information and examples visit this function's \n[Python reference page](reference/MallFrame.qmd#mall.MallFrame.extract) \n\n:::\n\n\n### Translate\n\nAs the title implies, this function will translate the text into a specified \nlanguage. What is really nice, it is that you don't need to specify the language\nof the source text. Only the target language needs to be defined. The translation\naccuracy will depend on the LLM\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_translate(review, \"spanish\")\n#> # A tibble: 3 × 2\n#> review .translation \n#> \n#> 1 This has been the best TV I've ever used. Gr… Esta ha sido la mejor televisió…\n#> 2 I regret buying this laptop. It is too slow … Me arrepiento de comprar este p…\n#> 3 Not sure how to feel about my new washing ma… No estoy seguro de cómo me sien…\n```\n:::\n\n\n\nFor more information and examples visit this function's \n[R reference page](reference/llm_translate.qmd) \n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.translate(\"review\", \"spanish\")\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewtranslation
"This has been the best TV I've ever used. Great screen, and sound.""Esta ha sido la mejor televisión que he utilizado hasta ahora. Gran pantalla y sonido."
"I regret buying this laptop. It is too slow and the keyboard is too noisy""Me arrepiento de comprar este portátil. Es demasiado lento y la tecla es demasiado ruidosa."
"Not sure how to feel about my new washing machine. Great color, but hard to figure""No estoy seguro de cómo sentirme con mi nueva lavadora. Un color maravilloso, pero muy difícil de en…
\n```\n\n:::\n:::\n\n\n\nFor more information and examples visit this function's \n[Python reference page](reference/MallFrame.qmd#mall.MallFrame.translate) \n\n:::\n\n### Custom prompt\n\nIt is possible to pass your own prompt to the LLM, and have `mall` run it \nagainst each text entry:\n\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_prompt <- paste(\n \"Answer a question.\",\n \"Return only the answer, no explanation\",\n \"Acceptable answers are 'yes', 'no'\",\n \"Answer this about the following text, is this a happy customer?:\"\n)\n\nreviews |>\n llm_custom(review, my_prompt)\n#> # A tibble: 3 × 2\n#> review .pred\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. Yes \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too noi… No \n#> 3 Not sure how to feel about my new washing machine. Great color, but har… No\n```\n:::\n\n\n\nFor more information and examples visit this function's \n[R reference page](reference/llm_custom.qmd) \n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nmy_prompt = (\n \"Answer a question.\"\n \"Return only the answer, no explanation\"\n \"Acceptable answers are 'yes', 'no'\"\n \"Answer this about the following text, is this a happy customer?:\"\n)\n\nreviews.llm.custom(\"review\", prompt = my_prompt)\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewcustom
"This has been the best TV I've ever used. Great screen, and sound.""Yes"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""No"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""No"
\n```\n\n:::\n:::\n\n\n\nFor more information and examples visit this function's \n[Python reference page](reference/MallFrame.qmd#mall.MallFrame.custom) \n\n:::\n\n## Model selection and settings\n\nYou can set the model and its options to use when calling the LLM. In this case,\nwe refer to options as model specific things that can be set, such as seed or\ntemperature. \n\n::: {.panel-tabset group=\"language\"}\n## R\n\nInvoking an `llm` function will automatically initialize a model selection\nif you don't have one selected yet. If there is only one option, it will \npre-select it for you. If there are more than one available models, then `mall`\nwill present you as menu selection so you can select which model you wish to \nuse.\n\nCalling `llm_use()` directly will let you specify the model and backend to use.\nYou can also setup additional arguments that will be passed down to the \nfunction that actually runs the prediction. In the case of Ollama, that function\nis [`chat()`](https://hauselin.github.io/ollama-r/reference/chat.html). \n\nThe model to use, and other options can be set for the current R session\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_use(\"ollama\", \"llama3.2\", seed = 100, temperature = 0)\n```\n:::\n\n\n\n\n## Python \n\nThe model and options to be used will be defined at the Polars data frame \nobject level. If not passed, the default model will be **llama3.2**.\n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.use(\"ollama\", \"llama3.2\", options = dict(seed = 100))\n```\n:::\n\n\n\n:::\n\n#### Results caching \n\nBy default `mall` caches the requests and corresponding results from a given\nLLM run. Each response is saved as individual JSON files. By default, the folder\nname is `_mall_cache`. The folder name can be customized, if needed. Also, the\ncaching can be turned off by setting the argument to empty (`\"\"`).\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_use(.cache = \"_my_cache\")\n```\n:::\n\n\n\nTo turn off:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_use(.cache = \"\")\n```\n:::\n\n\n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.use(_cache = \"my_cache\")\n```\n:::\n\n\n\nTo turn off:\n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.use(_cache = \"\")\n```\n:::\n\n\n\n:::\n\nFor more information see the [Caching Results](articles/caching.qmd) article. \n\n## Key considerations\n\nThe main consideration is **cost**. Either, time cost, or money cost.\n\nIf using this method with an LLM locally available, the cost will be a long \nrunning time. Unless using a very specialized LLM, a given LLM is a general model. \nIt was fitted using a vast amount of data. So determining a response for each \nrow, takes longer than if using a manually created NLP model. The default model\nused in Ollama is [Llama 3.2](https://ollama.com/library/llama3.2), \nwhich was fitted using 3B parameters. \n\nIf using an external LLM service, the consideration will need to be for the \nbilling costs of using such service. Keep in mind that you will be sending a lot\nof data to be evaluated. \n\nAnother consideration is the novelty of this approach. Early tests are \nproviding encouraging results. But you, as an user, will still need to keep\nin mind that the predictions will not be infallible, so always check the output.\nAt this time, I think the best use for this method, is for a quick analysis.\n\n\n## Vector functions (R only)\n\n`mall` includes functions that expect a vector, instead of a table, to run the\npredictions. This should make it easier to test things, such as custom prompts\nor results of specific text. Each `llm_` function has a corresponding `llm_vec_`\nfunction:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_vec_sentiment(\"I am happy\")\n#> [1] \"positive\"\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_vec_translate(\"Este es el mejor dia!\", \"english\")\n#> [1] \"It's the best day!\"\n```\n:::\n", + "markdown": "---\nformat:\n html:\n toc: true\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n\n\n\n\n\n[![R package check](https://github.com/mlverse/mall/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/mlverse/mall/actions/workflows/R-CMD-check.yaml)\n[![R package coverage](https://codecov.io/gh/mlverse/mall/branch/main/graph/badge.svg)](https://app.codecov.io/gh/mlverse/mall?branch=main)\n[![Lifecycle:\nexperimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)\n\n\n\nRun multiple LLM predictions against a data frame. The predictions are processed \nrow-wise over a specified column. It works using a pre-determined one-shot prompt,\nalong with the current row's content. `mall` has been implemented for both R\nand Python. The prompt that is use will depend of the type of analysis needed. \n\nCurrently, the included prompts perform the following: \n\n- [Sentiment analysis](#sentiment)\n- [Text summarizing](#summarize)\n- [Classify text](#classify)\n- [Extract one, or several](#extract), specific pieces information from the text\n- [Translate text](#translate)\n- [Custom prompt](#custom-prompt)\n\nThis package is inspired by the SQL AI functions now offered by vendors such as\n[Databricks](https://docs.databricks.com/en/large-language-models/ai-functions.html) \nand Snowflake. `mall` uses [Ollama](https://ollama.com/) to interact with LLMs \ninstalled locally. \n\n\n\nFor **R**, that interaction takes place via the \n[`ollamar`](https://hauselin.github.io/ollama-r/) package. The functions are \ndesigned to easily work with piped commands, such as `dplyr`. \n\n```r\nreviews |>\n llm_sentiment(review)\n```\n\n\n\nFor **Python**, `mall` is a library extension to [Polars](https://pola.rs/). To\ninteract with Ollama, it uses the official\n[Python library](https://github.com/ollama/ollama-python).\n\n```python\nreviews.llm.sentiment(\"review\")\n```\n\n## Motivation\n\nWe want to new find ways to help data scientists use LLMs in their daily work. \nUnlike the familiar interfaces, such as chatting and code completion, this interface\nruns your text data directly against the LLM. \n\nThe LLM's flexibility, allows for it to adapt to the subject of your data, and \nprovide surprisingly accurate predictions. This saves the data scientist the\nneed to write and tune an NLP model. \n\nIn recent times, the capabilities of LLMs that can run locally in your computer \nhave increased dramatically. This means that these sort of analysis can run\nin your machine with good accuracy. Additionally, it makes it possible to take\nadvantage of LLM's at your institution, since the data will not leave the\ncorporate network. \n\n## Get started\n\n- Install `mall` from Github\n\n \n::: {.panel-tabset group=\"language\"}\n## R\n```r\npak::pak(\"mlverse/mall/r\")\n```\n\n## Python\n```python\npip install \"mall @ git+https://git@github.com/mlverse/mall.git#subdirectory=python\"\n```\n:::\n\n- [Download Ollama from the official website](https://ollama.com/download)\n\n- Install and start Ollama in your computer\n\n\n::: {.panel-tabset group=\"language\"}\n## R\n- Install Ollama in your machine. The `ollamar` package's website provides this\n[Installation guide](https://hauselin.github.io/ollama-r/#installation)\n\n- Download an LLM model. For example, I have been developing this package using\nLlama 3.2 to test. To get that model you can run: \n ```r\n ollamar::pull(\"llama3.2\")\n ```\n \n## Python\n\n- Install the official Ollama library\n ```python\n pip install ollama\n ```\n\n- Download an LLM model. For example, I have been developing this package using\nLlama 3.2 to test. To get that model you can run: \n ```python\n import ollama\n ollama.pull('llama3.2')\n ```\n:::\n \n#### With Databricks (R only)\n\nIf you pass a table connected to **Databricks** via `odbc`, `mall` will \nautomatically use Databricks' LLM instead of Ollama. *You won't need Ollama \ninstalled if you are using Databricks only.*\n\n`mall` will call the appropriate SQL AI function. For more information see our \n[Databricks article.](https://mlverse.github.io/mall/articles/databricks.html) \n\n## LLM functions\n\nWe will start with loading a very small data set contained in `mall`. It has\n3 product reviews that we will use as the source of our examples.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(mall)\ndata(\"reviews\")\n\nreviews\n#> # A tibble: 3 × 1\n#> review \n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too noisy \n#> 3 Not sure how to feel about my new washing machine. Great color, but hard to f…\n```\n:::\n\n\n\n## Python\n\n\n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nimport mall \ndata = mall.MallData\nreviews = data.reviews\n\nreviews \n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
review
"This has been the best TV I've ever used. Great screen, and sound."
"I regret buying this laptop. It is too slow and the keyboard is too noisy"
"Not sure how to feel about my new washing machine. Great color, but hard to figure"
\n```\n\n:::\n:::\n\n\n:::\n\n\n\n\n\n\n\n### Sentiment\n\nAutomatically returns \"positive\", \"negative\", or \"neutral\" based on the text.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_sentiment(review)\n#> # A tibble: 3 × 2\n#> review .sentiment\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. positive \n#> 2 I regret buying this laptop. It is too slow and the keyboard is to… negative \n#> 3 Not sure how to feel about my new washing machine. Great color, bu… neutral\n```\n:::\n\n\n\nFor more information and examples visit this function's \n[R reference page](reference/llm_sentiment.qmd) \n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.sentiment(\"review\")\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewsentiment
"This has been the best TV I've ever used. Great screen, and sound.""positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""neutral"
\n```\n\n:::\n:::\n\n\n\nFor more information and examples visit this function's \n[Python reference page](reference/MallFrame.qmd#mall.MallFrame.sentiment) \n\n:::\n\n### Summarize\n\nThere may be a need to reduce the number of words in a given text. Typically to \nmake it easier to understand its intent. The function has an argument to \ncontrol the maximum number of words to output \n(`max_words`):\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_summarize(review, max_words = 5)\n#> # A tibble: 3 × 2\n#> review .summary \n#> \n#> 1 This has been the best TV I've ever used. Gr… it's a great tv \n#> 2 I regret buying this laptop. It is too slow … laptop purchase was a mistake \n#> 3 Not sure how to feel about my new washing ma… having mixed feelings about it\n```\n:::\n\n\n\nFor more information and examples visit this function's \n[R reference page](reference/llm_summarize.qmd) \n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.summarize(\"review\", 5)\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewsummary
"This has been the best TV I've ever used. Great screen, and sound.""great tv with good features"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop purchase was a mistake"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""feeling uncertain about new purchase"
\n```\n\n:::\n:::\n\n\n\nFor more information and examples visit this function's \n[Python reference page](reference/MallFrame.qmd#mall.MallFrame.summarize) \n\n:::\n\n### Classify\n\nUse the LLM to categorize the text into one of the options you provide: \n\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_classify(review, c(\"appliance\", \"computer\"))\n#> # A tibble: 3 × 2\n#> review .classify\n#> \n#> 1 This has been the best TV I've ever used. Gr… computer \n#> 2 I regret buying this laptop. It is too slow … computer \n#> 3 Not sure how to feel about my new washing ma… appliance\n```\n:::\n\n\n\nFor more information and examples visit this function's \n[R reference page](reference/llm_classify.qmd) \n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.classify(\"review\", [\"computer\", \"appliance\"])\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewclassify
"This has been the best TV I've ever used. Great screen, and sound.""appliance"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""computer"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""appliance"
\n```\n\n:::\n:::\n\n\n\nFor more information and examples visit this function's \n[Python reference page](reference/MallFrame.qmd#mall.MallFrame.classify) \n\n:::\n\n### Extract \n\nOne of the most interesting use cases Using natural language, we can tell the \nLLM to return a specific part of the text. In the following example, we request\nthat the LLM return the product being referred to. We do this by simply saying \n\"product\". The LLM understands what we *mean* by that word, and looks for that\nin the text.\n\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_extract(review, \"product\")\n#> # A tibble: 3 × 2\n#> review .extract \n#> \n#> 1 This has been the best TV I've ever used. Gr… tv \n#> 2 I regret buying this laptop. It is too slow … laptop \n#> 3 Not sure how to feel about my new washing ma… washing machine\n```\n:::\n\n\n\nFor more information and examples visit this function's \n[R reference page](reference/llm_extract.qmd) \n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.extract(\"review\", \"product\")\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewextract
"This has been the best TV I've ever used. Great screen, and sound.""tv"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine"
\n```\n\n:::\n:::\n\n\n\nFor more information and examples visit this function's \n[Python reference page](reference/MallFrame.qmd#mall.MallFrame.extract) \n\n:::\n\n\n### Translate\n\nAs the title implies, this function will translate the text into a specified \nlanguage. What is really nice, it is that you don't need to specify the language\nof the source text. Only the target language needs to be defined. The translation\naccuracy will depend on the LLM\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_translate(review, \"spanish\")\n#> # A tibble: 3 × 2\n#> review .translation \n#> \n#> 1 This has been the best TV I've ever used. Gr… Esta ha sido la mejor televisió…\n#> 2 I regret buying this laptop. It is too slow … Me arrepiento de comprar este p…\n#> 3 Not sure how to feel about my new washing ma… No estoy seguro de cómo me sien…\n```\n:::\n\n\n\nFor more information and examples visit this function's \n[R reference page](reference/llm_translate.qmd) \n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.translate(\"review\", \"spanish\")\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewtranslation
"This has been the best TV I've ever used. Great screen, and sound.""Esta ha sido la mejor televisión que he utilizado hasta ahora. Gran pantalla y sonido."
"I regret buying this laptop. It is too slow and the keyboard is too noisy""Me arrepiento de comprar este portátil. Es demasiado lento y la tecla es demasiado ruidosa."
"Not sure how to feel about my new washing machine. Great color, but hard to figure""No estoy seguro de cómo sentirme con mi nueva lavadora. Un color maravilloso, pero muy difícil de en…
\n```\n\n:::\n:::\n\n\n\nFor more information and examples visit this function's \n[Python reference page](reference/MallFrame.qmd#mall.MallFrame.translate) \n\n:::\n\n### Custom prompt\n\nIt is possible to pass your own prompt to the LLM, and have `mall` run it \nagainst each text entry:\n\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_prompt <- paste(\n \"Answer a question.\",\n \"Return only the answer, no explanation\",\n \"Acceptable answers are 'yes', 'no'\",\n \"Answer this about the following text, is this a happy customer?:\"\n)\n\nreviews |>\n llm_custom(review, my_prompt)\n#> # A tibble: 3 × 2\n#> review .pred\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. Yes \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too noi… No \n#> 3 Not sure how to feel about my new washing machine. Great color, but har… No\n```\n:::\n\n\n\nFor more information and examples visit this function's \n[R reference page](reference/llm_custom.qmd) \n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nmy_prompt = (\n \"Answer a question.\"\n \"Return only the answer, no explanation\"\n \"Acceptable answers are 'yes', 'no'\"\n \"Answer this about the following text, is this a happy customer?:\"\n)\n\nreviews.llm.custom(\"review\", prompt = my_prompt)\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewcustom
"This has been the best TV I've ever used. Great screen, and sound.""Yes"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""No"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""No"
\n```\n\n:::\n:::\n\n\n\nFor more information and examples visit this function's \n[Python reference page](reference/MallFrame.qmd#mall.MallFrame.custom) \n\n:::\n\n## Model selection and settings\n\nYou can set the model and its options to use when calling the LLM. In this case,\nwe refer to options as model specific things that can be set, such as seed or\ntemperature. \n\n::: {.panel-tabset group=\"language\"}\n## R\n\nInvoking an `llm` function will automatically initialize a model selection\nif you don't have one selected yet. If there is only one option, it will \npre-select it for you. If there are more than one available models, then `mall`\nwill present you as menu selection so you can select which model you wish to \nuse.\n\nCalling `llm_use()` directly will let you specify the model and backend to use.\nYou can also setup additional arguments that will be passed down to the \nfunction that actually runs the prediction. In the case of Ollama, that function\nis [`chat()`](https://hauselin.github.io/ollama-r/reference/chat.html). \n\nThe model to use, and other options can be set for the current R session\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_use(\"ollama\", \"llama3.2\", seed = 100, temperature = 0)\n```\n:::\n\n\n\n\n## Python \n\nThe model and options to be used will be defined at the Polars data frame \nobject level. If not passed, the default model will be **llama3.2**.\n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.use(\"ollama\", \"llama3.2\", options = dict(seed = 100))\n```\n:::\n\n\n\n:::\n\n#### Results caching \n\nBy default `mall` caches the requests and corresponding results from a given\nLLM run. Each response is saved as individual JSON files. By default, the folder\nname is `_mall_cache`. The folder name can be customized, if needed. Also, the\ncaching can be turned off by setting the argument to empty (`\"\"`).\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_use(.cache = \"_my_cache\")\n```\n:::\n\n\n\nTo turn off:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_use(.cache = \"\")\n```\n:::\n\n\n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.use(_cache = \"my_cache\")\n```\n:::\n\n\n\nTo turn off:\n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.use(_cache = \"\")\n```\n:::\n\n\n\n:::\n\nFor more information see the [Caching Results](articles/caching.qmd) article. \n\n## Key considerations\n\nThe main consideration is **cost**. Either, time cost, or money cost.\n\nIf using this method with an LLM locally available, the cost will be a long \nrunning time. Unless using a very specialized LLM, a given LLM is a general model. \nIt was fitted using a vast amount of data. So determining a response for each \nrow, takes longer than if using a manually created NLP model. The default model\nused in Ollama is [Llama 3.2](https://ollama.com/library/llama3.2), \nwhich was fitted using 3B parameters. \n\nIf using an external LLM service, the consideration will need to be for the \nbilling costs of using such service. Keep in mind that you will be sending a lot\nof data to be evaluated. \n\nAnother consideration is the novelty of this approach. Early tests are \nproviding encouraging results. But you, as an user, will still need to keep\nin mind that the predictions will not be infallible, so always check the output.\nAt this time, I think the best use for this method, is for a quick analysis.\n\n\n## Vector functions (R only)\n\n`mall` includes functions that expect a vector, instead of a table, to run the\npredictions. This should make it easier to test things, such as custom prompts\nor results of specific text. Each `llm_` function has a corresponding `llm_vec_`\nfunction:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_vec_sentiment(\"I am happy\")\n#> [1] \"positive\"\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_vec_translate(\"Este es el mejor dia!\", \"english\")\n#> [1] \"It's the best day!\"\n```\n:::\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_freeze/reference/MallFrame/execute-results/html.json b/_freeze/reference/MallFrame/execute-results/html.json index 8ef0091..d03eb13 100644 --- a/_freeze/reference/MallFrame/execute-results/html.json +++ b/_freeze/reference/MallFrame/execute-results/html.json @@ -2,7 +2,7 @@ "hash": "4593a510e46f23a242198ebad071e1ae", "result": { "engine": "jupyter", - "markdown": "---\ntitle: MallFrame\n---\n\n\n\n`MallFrame(self, df)`\n\nExtension to Polars that add ability to use\nan LLM to run batch predictions over a data frame\n\nWe will start by loading the needed libraries, and \nset up the data frame that will be used in the \nexamples:\n\n\n::: {#50083255 .cell execution_count=1}\n``` {.python .cell-code}\nimport mall\nimport polars as pl\npl.Config(fmt_str_lengths=100)\npl.Config.set_tbl_hide_dataframe_shape(True)\npl.Config.set_tbl_hide_column_data_types(True)\ndata = mall.MallData\nreviews = data.reviews\nreviews.llm.use(options = dict(seed = 100))\n```\n:::\n\n\n## Methods\n\n| Name | Description |\n| --- | --- |\n| [classify](#mall.MallFrame.classify) | Classify text into specific categories. |\n| [custom](#mall.MallFrame.custom) | Provide the full prompt that the LLM will process. |\n| [extract](#mall.MallFrame.extract) | Pull a specific label from the text. |\n| [sentiment](#mall.MallFrame.sentiment) | Use an LLM to run a sentiment analysis |\n| [summarize](#mall.MallFrame.summarize) | Summarize the text down to a specific number of words. |\n| [translate](#mall.MallFrame.translate) | Translate text into another language. |\n| [use](#mall.MallFrame.use) | Define the model, backend, and other options to use to |\n\n### classify { #mall.MallFrame.classify }\n\n`MallFrame.classify(col, labels='', additional='', pred_name='classify')`\n\nClassify text into specific categories.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|-------------------------------------------------------------------------------------------------------------------------|--------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `labels` | list | A list or a DICT object that defines the categories to classify the text as. It will return one of the provided labels. | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'classify'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#f576c894 .cell execution_count=2}\n``` {.python .cell-code}\nreviews.llm.classify(\"review\", [\"appliance\", \"computer\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n
reviewclassify
"This has been the best TV I've ever used. Great screen, and sound.""computer"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""computer"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""appliance"
\n```\n:::\n:::\n\n\n::: {#fdb7b590 .cell execution_count=3}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.classify(\"review\", [\"appliance\", \"computer\"], pred_name=\"prod_type\")\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```{=html}\n
\n
reviewprod_type
"This has been the best TV I've ever used. Great screen, and sound.""computer"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""computer"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""appliance"
\n```\n:::\n:::\n\n\n::: {#ec96d2f3 .cell execution_count=4}\n``` {.python .cell-code}\n#Pass a DICT to set custom values for each classification\nreviews.llm.classify(\"review\", {\"appliance\" : \"1\", \"computer\" : \"2\"})\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
\n
reviewclassify
"This has been the best TV I've ever used. Great screen, and sound.""1"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""2"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""1"
\n```\n:::\n:::\n\n\n### custom { #mall.MallFrame.custom }\n\n`MallFrame.custom(col, prompt='', valid_resps='', pred_name='custom')`\n\nProvide the full prompt that the LLM will process.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|-------------|--------|----------------------------------------------------------------------------------------|------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `prompt` | str | The prompt to send to the LLM along with the `col` | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'custom'` |\n\n#### Examples\n\n::: {#f144e1a5 .cell execution_count=5}\n``` {.python .cell-code}\nmy_prompt = (\n \"Answer a question.\"\n \"Return only the answer, no explanation\"\n \"Acceptable answers are 'yes', 'no'\"\n \"Answer this about the following text, is this a happy customer?:\"\n)\n\nreviews.llm.custom(\"review\", prompt = my_prompt)\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n
\n
reviewcustom
"This has been the best TV I've ever used. Great screen, and sound.""Yes"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""No"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""No"
\n```\n:::\n:::\n\n\n### extract { #mall.MallFrame.extract }\n\n`MallFrame.extract(col, labels='', expand_cols=False, additional='', pred_name='extract')`\n\nPull a specific label from the text.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|----------------------------------------------------------------------------------------|-------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `labels` | list | A list or a DICT object that defines tells the LLM what to look for and return | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'extract'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#1ec54c1d .cell execution_count=6}\n``` {.python .cell-code}\n# Use 'labels' to let the function know what to extract\nreviews.llm.extract(\"review\", labels = \"product\")\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n```{=html}\n
\n
reviewextract
"This has been the best TV I've ever used. Great screen, and sound.""tv"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine"
\n```\n:::\n:::\n\n\n::: {#00b7b26e .cell execution_count=7}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.extract(\"review\", \"product\", pred_name = \"prod\")\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n```{=html}\n
\n
reviewprod
"This has been the best TV I've ever used. Great screen, and sound.""tv"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine"
\n```\n:::\n:::\n\n\n::: {#7f5558a8 .cell execution_count=8}\n``` {.python .cell-code}\n# Pass a vector to request multiple things, the results will be pipe delimeted\n# in a single column\nreviews.llm.extract(\"review\", [\"product\", \"feelings\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n```{=html}\n
\n
reviewextract
"This has been the best TV I've ever used. Great screen, and sound.""tv | great"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop|frustration"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine | confusion"
\n```\n:::\n:::\n\n\n::: {#210ab617 .cell execution_count=9}\n``` {.python .cell-code}\n# Set 'expand_cols' to True to split multiple lables\n# into individual columns\nreviews.llm.extract(\n col=\"review\",\n labels=[\"product\", \"feelings\"],\n expand_cols=True\n )\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\n```{=html}\n
\n
reviewproductfeelings
"This has been the best TV I've ever used. Great screen, and sound.""tv "" great"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop""frustration"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine "" confusion"
\n```\n:::\n:::\n\n\n::: {#aad6f14f .cell execution_count=10}\n``` {.python .cell-code}\n# Set custom names to the resulting columns\nreviews.llm.extract(\n col=\"review\",\n labels={\"prod\": \"product\", \"feels\": \"feelings\"},\n expand_cols=True\n )\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n```{=html}\n
\n
reviewprodfeels
"This has been the best TV I've ever used. Great screen, and sound.""tv "" great"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop""frustration"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine "" confusion"
\n```\n:::\n:::\n\n\n### sentiment { #mall.MallFrame.sentiment }\n\n`MallFrame.sentiment(col, options=['positive', 'negative', 'neutral'], additional='', pred_name='sentiment')`\n\nUse an LLM to run a sentiment analysis\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------------|----------------------------------------------------------------------------------------|---------------------------------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `options` | list or dict | A list of the sentiment options to use, or a named DICT object | `['positive', 'negative', 'neutral']` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'sentiment'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#61fc40ee .cell execution_count=11}\n``` {.python .cell-code}\nreviews.llm.sentiment(\"review\")\n```\n\n::: {.cell-output .cell-output-display execution_count=11}\n```{=html}\n
\n
reviewsentiment
"This has been the best TV I've ever used. Great screen, and sound.""positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""neutral"
\n```\n:::\n:::\n\n\n::: {#8b2fffe2 .cell execution_count=12}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.sentiment(\"review\", pred_name=\"review_sentiment\")\n```\n\n::: {.cell-output .cell-output-display execution_count=12}\n```{=html}\n
\n
reviewreview_sentiment
"This has been the best TV I've ever used. Great screen, and sound.""positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""neutral"
\n```\n:::\n:::\n\n\n::: {#83e3261c .cell execution_count=13}\n``` {.python .cell-code}\n# Pass custom sentiment options\nreviews.llm.sentiment(\"review\", [\"positive\", \"negative\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=13}\n```{=html}\n
\n
reviewsentiment
"This has been the best TV I've ever used. Great screen, and sound.""positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""negative"
\n```\n:::\n:::\n\n\n::: {#f1868796 .cell execution_count=14}\n``` {.python .cell-code}\n# Use a DICT object to specify values to return per sentiment\nreviews.llm.sentiment(\"review\", {\"positive\" : \"1\", \"negative\" : \"0\"})\n```\n\n::: {.cell-output .cell-output-display execution_count=14}\n```{=html}\n
\n
reviewsentiment
"This has been the best TV I've ever used. Great screen, and sound.""1"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""0"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""0"
\n```\n:::\n:::\n\n\n### summarize { #mall.MallFrame.summarize }\n\n`MallFrame.summarize(col, max_words=10, additional='', pred_name='summary')`\n\nSummarize the text down to a specific number of words.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|----------------------------------------------------------------------------------------|-------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `max_words` | int | Maximum number of words to use for the summary | `10` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'summary'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#1d6b9023 .cell execution_count=15}\n``` {.python .cell-code}\n# Use max_words to set the maximum number of words to use for the summary\nreviews.llm.summarize(\"review\", max_words = 5)\n```\n\n::: {.cell-output .cell-output-display execution_count=15}\n```{=html}\n
\n
reviewsummary
"This has been the best TV I've ever used. Great screen, and sound.""great tv with good features"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop purchase was a mistake"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""feeling uncertain about new purchase"
\n```\n:::\n:::\n\n\n::: {#0fa6d346 .cell execution_count=16}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.summarize(\"review\", 5, pred_name = \"review_summary\")\n```\n\n::: {.cell-output .cell-output-display execution_count=16}\n```{=html}\n
\n
reviewreview_summary
"This has been the best TV I've ever used. Great screen, and sound.""great tv with good features"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop purchase was a mistake"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""feeling uncertain about new purchase"
\n```\n:::\n:::\n\n\n### translate { #mall.MallFrame.translate }\n\n`MallFrame.translate(col, language='', additional='', pred_name='translation')`\n\nTranslate text into another language.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|----------------------------------------------------------------------------------------|-----------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `language` | str | The target language to translate to. For example 'French'. | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'translation'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#7e3fb531 .cell execution_count=17}\n``` {.python .cell-code}\nreviews.llm.translate(\"review\", \"spanish\")\n```\n\n::: {.cell-output .cell-output-display execution_count=17}\n```{=html}\n
\n
reviewtranslation
"This has been the best TV I've ever used. Great screen, and sound.""Esta ha sido la mejor televisión que he utilizado hasta ahora. Gran pantalla y sonido."
"I regret buying this laptop. It is too slow and the keyboard is too noisy""Me arrepiento de comprar este portátil. Es demasiado lento y la tecla es demasiado ruidosa."
"Not sure how to feel about my new washing machine. Great color, but hard to figure""No estoy seguro de cómo sentirme con mi nueva lavadora. Un color maravilloso, pero muy difícil de en…
\n```\n:::\n:::\n\n\n::: {#cbca2d84 .cell execution_count=18}\n``` {.python .cell-code}\nreviews.llm.translate(\"review\", \"french\")\n```\n\n::: {.cell-output .cell-output-display execution_count=18}\n```{=html}\n
\n
reviewtranslation
"This has been the best TV I've ever used. Great screen, and sound.""Ceci était la meilleure télévision que j'ai jamais utilisée. Écran et son excellent."
"I regret buying this laptop. It is too slow and the keyboard is too noisy""Je me regrette d'avoir acheté ce portable. Il est trop lent et le clavier fait trop de bruit."
"Not sure how to feel about my new washing machine. Great color, but hard to figure""Je ne sais pas comment réagir à mon nouveau lave-linge. Couleur superbe, mais difficile à comprendre…
\n```\n:::\n:::\n\n\n### use { #mall.MallFrame.use }\n\n`MallFrame.use(backend='', model='', _cache='_mall_cache', **kwargs)`\n\nDefine the model, backend, and other options to use to\ninteract with the LLM.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|------------|--------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|\n| `backend` | str | The name of the backend to use. At the beginning of the session it defaults to \"ollama\". If passing `\"\"`, it will remain unchanged | `''` |\n| `model` | str | The name of the model tha the backend should use. At the beginning of the session it defaults to \"llama3.2\". If passing `\"\"`, it will remain unchanged | `''` |\n| `_cache` | str | The path of where to save the cached results. Passing `\"\"` disables the cache | `'_mall_cache'` |\n| `**kwargs` | | Arguments to pass to the downstream Python call. In this case, the `chat` function in `ollama` | `{}` |\n\n#### Examples\n\n::: {#fc5314e1 .cell execution_count=19}\n``` {.python .cell-code}\n# Additional arguments will be passed 'as-is' to the\n# downstream R function in this example, to ollama::chat()\nreviews.llm.use(\"ollama\", \"llama3.2\", seed = 100, temp = 0.1)\n```\n\n::: {.cell-output .cell-output-display execution_count=19}\n```\n{'backend': 'ollama',\n 'model': 'llama3.2',\n '_cache': '_mall_cache',\n 'options': {'seed': 100},\n 'seed': 100,\n 'temp': 0.1}\n```\n:::\n:::\n\n\n::: {#56e28bc2 .cell execution_count=20}\n``` {.python .cell-code}\n# During the Python session, you can change any argument\n# individually and it will retain all of previous\n# arguments used\nreviews.llm.use(temp = 0.3)\n```\n\n::: {.cell-output .cell-output-display execution_count=20}\n```\n{'backend': 'ollama',\n 'model': 'llama3.2',\n '_cache': '_mall_cache',\n 'options': {'seed': 100},\n 'seed': 100,\n 'temp': 0.3}\n```\n:::\n:::\n\n\n::: {#e58e2f08 .cell execution_count=21}\n``` {.python .cell-code}\n# Use _cache to modify the target folder for caching\nreviews.llm.use(_cache = \"_my_cache\")\n```\n\n::: {.cell-output .cell-output-display execution_count=21}\n```\n{'backend': 'ollama',\n 'model': 'llama3.2',\n '_cache': '_my_cache',\n 'options': {'seed': 100},\n 'seed': 100,\n 'temp': 0.3}\n```\n:::\n:::\n\n\n::: {#0b242c6d .cell execution_count=22}\n``` {.python .cell-code}\n# Leave _cache empty to turn off this functionality\nreviews.llm.use(_cache = \"\")\n```\n\n::: {.cell-output .cell-output-display execution_count=22}\n```\n{'backend': 'ollama',\n 'model': 'llama3.2',\n '_cache': '',\n 'options': {'seed': 100},\n 'seed': 100,\n 'temp': 0.3}\n```\n:::\n:::\n\n\n", + "markdown": "---\ntitle: MallFrame\n---\n\n\n\n`MallFrame(self, df)`\n\nExtension to Polars that add ability to use\nan LLM to run batch predictions over a data frame\n\nWe will start by loading the needed libraries, and \nset up the data frame that will be used in the \nexamples:\n\n\n::: {#7da4c2de .cell execution_count=1}\n``` {.python .cell-code}\nimport mall\nimport polars as pl\npl.Config(fmt_str_lengths=100)\npl.Config.set_tbl_hide_dataframe_shape(True)\npl.Config.set_tbl_hide_column_data_types(True)\ndata = mall.MallData\nreviews = data.reviews\nreviews.llm.use(options = dict(seed = 100))\n```\n:::\n\n\n## Methods\n\n| Name | Description |\n| --- | --- |\n| [classify](#mall.MallFrame.classify) | Classify text into specific categories. |\n| [custom](#mall.MallFrame.custom) | Provide the full prompt that the LLM will process. |\n| [extract](#mall.MallFrame.extract) | Pull a specific label from the text. |\n| [sentiment](#mall.MallFrame.sentiment) | Use an LLM to run a sentiment analysis |\n| [summarize](#mall.MallFrame.summarize) | Summarize the text down to a specific number of words. |\n| [translate](#mall.MallFrame.translate) | Translate text into another language. |\n| [use](#mall.MallFrame.use) | Define the model, backend, and other options to use to |\n\n### classify { #mall.MallFrame.classify }\n\n`MallFrame.classify(col, labels='', additional='', pred_name='classify')`\n\nClassify text into specific categories.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|-------------------------------------------------------------------------------------------------------------------------|--------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `labels` | list | A list or a DICT object that defines the categories to classify the text as. It will return one of the provided labels. | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'classify'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#50356a12 .cell execution_count=2}\n``` {.python .cell-code}\nreviews.llm.classify(\"review\", [\"appliance\", \"computer\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n
reviewclassify
"This has been the best TV I've ever used. Great screen, and sound.""computer"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""computer"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""appliance"
\n```\n:::\n:::\n\n\n::: {#929299a7 .cell execution_count=3}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.classify(\"review\", [\"appliance\", \"computer\"], pred_name=\"prod_type\")\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```{=html}\n
\n
reviewprod_type
"This has been the best TV I've ever used. Great screen, and sound.""computer"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""computer"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""appliance"
\n```\n:::\n:::\n\n\n::: {#715eb674 .cell execution_count=4}\n``` {.python .cell-code}\n#Pass a DICT to set custom values for each classification\nreviews.llm.classify(\"review\", {\"appliance\" : \"1\", \"computer\" : \"2\"})\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
\n
reviewclassify
"This has been the best TV I've ever used. Great screen, and sound.""1"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""2"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""1"
\n```\n:::\n:::\n\n\n### custom { #mall.MallFrame.custom }\n\n`MallFrame.custom(col, prompt='', valid_resps='', pred_name='custom')`\n\nProvide the full prompt that the LLM will process.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|-------------|--------|----------------------------------------------------------------------------------------|------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `prompt` | str | The prompt to send to the LLM along with the `col` | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'custom'` |\n\n#### Examples\n\n::: {#75e71ed7 .cell execution_count=5}\n``` {.python .cell-code}\nmy_prompt = (\n \"Answer a question.\"\n \"Return only the answer, no explanation\"\n \"Acceptable answers are 'yes', 'no'\"\n \"Answer this about the following text, is this a happy customer?:\"\n)\n\nreviews.llm.custom(\"review\", prompt = my_prompt)\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n
\n
reviewcustom
"This has been the best TV I've ever used. Great screen, and sound.""Yes"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""No"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""No"
\n```\n:::\n:::\n\n\n### extract { #mall.MallFrame.extract }\n\n`MallFrame.extract(col, labels='', expand_cols=False, additional='', pred_name='extract')`\n\nPull a specific label from the text.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|----------------------------------------------------------------------------------------|-------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `labels` | list | A list or a DICT object that defines tells the LLM what to look for and return | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'extract'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#ae6b4d4c .cell execution_count=6}\n``` {.python .cell-code}\n# Use 'labels' to let the function know what to extract\nreviews.llm.extract(\"review\", labels = \"product\")\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n```{=html}\n
\n
reviewextract
"This has been the best TV I've ever used. Great screen, and sound.""tv"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine"
\n```\n:::\n:::\n\n\n::: {#1602b38d .cell execution_count=7}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.extract(\"review\", \"product\", pred_name = \"prod\")\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n```{=html}\n
\n
reviewprod
"This has been the best TV I've ever used. Great screen, and sound.""tv"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine"
\n```\n:::\n:::\n\n\n::: {#f257bda7 .cell execution_count=8}\n``` {.python .cell-code}\n# Pass a vector to request multiple things, the results will be pipe delimeted\n# in a single column\nreviews.llm.extract(\"review\", [\"product\", \"feelings\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n```{=html}\n
\n
reviewextract
"This has been the best TV I've ever used. Great screen, and sound.""tv | great"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop|frustration"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine | confusion"
\n```\n:::\n:::\n\n\n::: {#d90b2af3 .cell execution_count=9}\n``` {.python .cell-code}\n# Set 'expand_cols' to True to split multiple lables\n# into individual columns\nreviews.llm.extract(\n col=\"review\",\n labels=[\"product\", \"feelings\"],\n expand_cols=True\n )\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\n```{=html}\n
\n
reviewproductfeelings
"This has been the best TV I've ever used. Great screen, and sound.""tv "" great"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop""frustration"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine "" confusion"
\n```\n:::\n:::\n\n\n::: {#370ba370 .cell execution_count=10}\n``` {.python .cell-code}\n# Set custom names to the resulting columns\nreviews.llm.extract(\n col=\"review\",\n labels={\"prod\": \"product\", \"feels\": \"feelings\"},\n expand_cols=True\n )\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n```{=html}\n
\n
reviewprodfeels
"This has been the best TV I've ever used. Great screen, and sound.""tv "" great"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop""frustration"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine "" confusion"
\n```\n:::\n:::\n\n\n### sentiment { #mall.MallFrame.sentiment }\n\n`MallFrame.sentiment(col, options=['positive', 'negative', 'neutral'], additional='', pred_name='sentiment')`\n\nUse an LLM to run a sentiment analysis\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------------|----------------------------------------------------------------------------------------|---------------------------------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `options` | list or dict | A list of the sentiment options to use, or a named DICT object | `['positive', 'negative', 'neutral']` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'sentiment'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#10220325 .cell execution_count=11}\n``` {.python .cell-code}\nreviews.llm.sentiment(\"review\")\n```\n\n::: {.cell-output .cell-output-display execution_count=11}\n```{=html}\n
\n
reviewsentiment
"This has been the best TV I've ever used. Great screen, and sound.""positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""neutral"
\n```\n:::\n:::\n\n\n::: {#d6195d1c .cell execution_count=12}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.sentiment(\"review\", pred_name=\"review_sentiment\")\n```\n\n::: {.cell-output .cell-output-display execution_count=12}\n```{=html}\n
\n
reviewreview_sentiment
"This has been the best TV I've ever used. Great screen, and sound.""positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""neutral"
\n```\n:::\n:::\n\n\n::: {#8d3b25b4 .cell execution_count=13}\n``` {.python .cell-code}\n# Pass custom sentiment options\nreviews.llm.sentiment(\"review\", [\"positive\", \"negative\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=13}\n```{=html}\n
\n
reviewsentiment
"This has been the best TV I've ever used. Great screen, and sound.""positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""negative"
\n```\n:::\n:::\n\n\n::: {#2b51b9ef .cell execution_count=14}\n``` {.python .cell-code}\n# Use a DICT object to specify values to return per sentiment\nreviews.llm.sentiment(\"review\", {\"positive\" : \"1\", \"negative\" : \"0\"})\n```\n\n::: {.cell-output .cell-output-display execution_count=14}\n```{=html}\n
\n
reviewsentiment
"This has been the best TV I've ever used. Great screen, and sound.""1"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""0"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""0"
\n```\n:::\n:::\n\n\n### summarize { #mall.MallFrame.summarize }\n\n`MallFrame.summarize(col, max_words=10, additional='', pred_name='summary')`\n\nSummarize the text down to a specific number of words.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|----------------------------------------------------------------------------------------|-------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `max_words` | int | Maximum number of words to use for the summary | `10` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'summary'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#97fb9a3a .cell execution_count=15}\n``` {.python .cell-code}\n# Use max_words to set the maximum number of words to use for the summary\nreviews.llm.summarize(\"review\", max_words = 5)\n```\n\n::: {.cell-output .cell-output-display execution_count=15}\n```{=html}\n
\n
reviewsummary
"This has been the best TV I've ever used. Great screen, and sound.""great tv with good features"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop purchase was a mistake"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""feeling uncertain about new purchase"
\n```\n:::\n:::\n\n\n::: {#f89461d7 .cell execution_count=16}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.summarize(\"review\", 5, pred_name = \"review_summary\")\n```\n\n::: {.cell-output .cell-output-display execution_count=16}\n```{=html}\n
\n
reviewreview_summary
"This has been the best TV I've ever used. Great screen, and sound.""great tv with good features"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop purchase was a mistake"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""feeling uncertain about new purchase"
\n```\n:::\n:::\n\n\n### translate { #mall.MallFrame.translate }\n\n`MallFrame.translate(col, language='', additional='', pred_name='translation')`\n\nTranslate text into another language.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|----------------------------------------------------------------------------------------|-----------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `language` | str | The target language to translate to. For example 'French'. | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'translation'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#bf769707 .cell execution_count=17}\n``` {.python .cell-code}\nreviews.llm.translate(\"review\", \"spanish\")\n```\n\n::: {.cell-output .cell-output-display execution_count=17}\n```{=html}\n
\n
reviewtranslation
"This has been the best TV I've ever used. Great screen, and sound.""Esta ha sido la mejor televisión que he utilizado hasta ahora. Gran pantalla y sonido."
"I regret buying this laptop. It is too slow and the keyboard is too noisy""Me arrepiento de comprar este portátil. Es demasiado lento y la tecla es demasiado ruidosa."
"Not sure how to feel about my new washing machine. Great color, but hard to figure""No estoy seguro de cómo sentirme con mi nueva lavadora. Un color maravilloso, pero muy difícil de en…
\n```\n:::\n:::\n\n\n::: {#24cae7a4 .cell execution_count=18}\n``` {.python .cell-code}\nreviews.llm.translate(\"review\", \"french\")\n```\n\n::: {.cell-output .cell-output-display execution_count=18}\n```{=html}\n
\n
reviewtranslation
"This has been the best TV I've ever used. Great screen, and sound.""Ceci était la meilleure télévision que j'ai jamais utilisée. Écran et son excellent."
"I regret buying this laptop. It is too slow and the keyboard is too noisy""Je me regrette d'avoir acheté ce portable. Il est trop lent et le clavier fait trop de bruit."
"Not sure how to feel about my new washing machine. Great color, but hard to figure""Je ne sais pas comment réagir à mon nouveau lave-linge. Couleur superbe, mais difficile à comprendre…
\n```\n:::\n:::\n\n\n### use { #mall.MallFrame.use }\n\n`MallFrame.use(backend='', model='', _cache='_mall_cache', **kwargs)`\n\nDefine the model, backend, and other options to use to\ninteract with the LLM.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|------------|--------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|\n| `backend` | str | The name of the backend to use. At the beginning of the session it defaults to \"ollama\". If passing `\"\"`, it will remain unchanged | `''` |\n| `model` | str | The name of the model tha the backend should use. At the beginning of the session it defaults to \"llama3.2\". If passing `\"\"`, it will remain unchanged | `''` |\n| `_cache` | str | The path of where to save the cached results. Passing `\"\"` disables the cache | `'_mall_cache'` |\n| `**kwargs` | | Arguments to pass to the downstream Python call. In this case, the `chat` function in `ollama` | `{}` |\n\n#### Examples\n\n::: {#d1773d8b .cell execution_count=19}\n``` {.python .cell-code}\n# Additional arguments will be passed 'as-is' to the\n# downstream R function in this example, to ollama::chat()\nreviews.llm.use(\"ollama\", \"llama3.2\", seed = 100, temp = 0.1)\n```\n\n::: {.cell-output .cell-output-display execution_count=19}\n```\n{'backend': 'ollama',\n 'model': 'llama3.2',\n '_cache': '_mall_cache',\n 'options': {'seed': 100},\n 'seed': 100,\n 'temp': 0.1}\n```\n:::\n:::\n\n\n::: {#0a90e5ca .cell execution_count=20}\n``` {.python .cell-code}\n# During the Python session, you can change any argument\n# individually and it will retain all of previous\n# arguments used\nreviews.llm.use(temp = 0.3)\n```\n\n::: {.cell-output .cell-output-display execution_count=20}\n```\n{'backend': 'ollama',\n 'model': 'llama3.2',\n '_cache': '_mall_cache',\n 'options': {'seed': 100},\n 'seed': 100,\n 'temp': 0.3}\n```\n:::\n:::\n\n\n::: {#7009a892 .cell execution_count=21}\n``` {.python .cell-code}\n# Use _cache to modify the target folder for caching\nreviews.llm.use(_cache = \"_my_cache\")\n```\n\n::: {.cell-output .cell-output-display execution_count=21}\n```\n{'backend': 'ollama',\n 'model': 'llama3.2',\n '_cache': '_my_cache',\n 'options': {'seed': 100},\n 'seed': 100,\n 'temp': 0.3}\n```\n:::\n:::\n\n\n::: {#87460892 .cell execution_count=22}\n``` {.python .cell-code}\n# Leave _cache empty to turn off this functionality\nreviews.llm.use(_cache = \"\")\n```\n\n::: {.cell-output .cell-output-display execution_count=22}\n```\n{'backend': 'ollama',\n 'model': 'llama3.2',\n '_cache': '',\n 'options': {'seed': 100},\n 'seed': 100,\n 'temp': 0.3}\n```\n:::\n:::\n\n\n", "supporting": [ "MallFrame_files" ], diff --git a/_freeze/reference/llm_classify/execute-results/html.json b/_freeze/reference/llm_classify/execute-results/html.json index 52a79ba..7dc485f 100644 --- a/_freeze/reference/llm_classify/execute-results/html.json +++ b/_freeze/reference/llm_classify/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "ef10107b27d491e4dde2502aa2fa47b4", + "hash": "92d7806ddb22fec4dfd7178631b229f8", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Categorize data as one of options given\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n\n\n[R/llm-classify.R](https://github.com/edgararuiz/mall/blob/main/R/llm-classify.R)\n\n## llm_classify\n\n## Description\n Use a Large Language Model (LLM) to classify the provided text as one of the options provided via the `labels` argument. \n\n\n## Usage\n```r\n \nllm_classify( \n .data, \n col, \n labels, \n pred_name = \".classify\", \n additional_prompt = \"\" \n) \n \nllm_vec_classify(x, labels, additional_prompt = \"\", preview = FALSE) \n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| labels | A character vector with at least 2 labels to classify the text as |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n `llm_classify` returns a `data.frame` or `tbl` object. `llm_vec_classify` returns a vector that is the same length as `x`. \n\n\n## Examples\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n \nlibrary(mall) \n \ndata(\"reviews\") \n \nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE) \n \nllm_classify(reviews, review, c(\"appliance\", \"computer\")) \n#> # A tibble: 3 × 2\n#> review .classify\n#> \n#> 1 This has been the best TV I've ever used. Gr… computer \n#> 2 I regret buying this laptop. It is too slow … computer \n#> 3 Not sure how to feel about my new washing ma… appliance\n \n# Use 'pred_name' to customize the new column's name \nllm_classify( \n reviews, \n review, \n c(\"appliance\", \"computer\"), \n pred_name = \"prod_type\" \n) \n#> # A tibble: 3 × 2\n#> review prod_type\n#> \n#> 1 This has been the best TV I've ever used. Gr… computer \n#> 2 I regret buying this laptop. It is too slow … computer \n#> 3 Not sure how to feel about my new washing ma… appliance\n \n# Pass custom values for each classification \nllm_classify(reviews, review, c(\"appliance\" ~ 1, \"computer\" ~ 2)) \n#> # A tibble: 3 × 2\n#> review .classify\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. 1\n#> 2 I regret buying this laptop. It is too slow and the keyboard is too… 2\n#> 3 Not sure how to feel about my new washing machine. Great color, but… 1\n \n# For character vectors, instead of a data frame, use this function \nllm_vec_classify( \n c(\"this is important!\", \"just whenever\"), \n c(\"urgent\", \"not urgent\") \n) \n#> [1] \"urgent\" \"urgent\"\n \n# To preview the first call that will be made to the downstream R function \nllm_vec_classify( \n c(\"this is important!\", \"just whenever\"), \n c(\"urgent\", \"not urgent\"), \n preview = TRUE \n) \n#> ollamar::chat(messages = list(list(role = \"user\", content = \"You are a helpful classification engine. Determine if the text refers to one of the following: urgent, not urgent. No capitalization. No explanations. The answer is based on the following text:\\nthis is important!\")), \n#> output = \"text\", model = \"llama3.2\", seed = 100)\n```\n:::\n", + "markdown": "---\ntitle: \"Categorize data as one of options given\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n\n\n\n\n\n[R/llm-classify.R](https://github.com/mlverse/mall/blob/main/R/llm-classify.R)\n\n## llm_classify\n\n## Description\n Use a Large Language Model (LLM) to classify the provided text as one of the options provided via the `labels` argument. \n\n\n## Usage\n```r\n \nllm_classify( \n .data, \n col, \n labels, \n pred_name = \".classify\", \n additional_prompt = \"\" \n) \n \nllm_vec_classify(x, labels, additional_prompt = \"\", preview = FALSE) \n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| labels | A character vector with at least 2 labels to classify the text as |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n `llm_classify` returns a `data.frame` or `tbl` object. `llm_vec_classify` returns a vector that is the same length as `x`. \n\n\n## Examples\n\n\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n \nlibrary(mall) \n \ndata(\"reviews\") \n \nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE) \n \nllm_classify(reviews, review, c(\"appliance\", \"computer\")) \n#> # A tibble: 3 × 2\n#> review .classify\n#> \n#> 1 This has been the best TV I've ever used. Gr… computer \n#> 2 I regret buying this laptop. It is too slow … computer \n#> 3 Not sure how to feel about my new washing ma… appliance\n \n# Use 'pred_name' to customize the new column's name \nllm_classify( \n reviews, \n review, \n c(\"appliance\", \"computer\"), \n pred_name = \"prod_type\" \n) \n#> # A tibble: 3 × 2\n#> review prod_type\n#> \n#> 1 This has been the best TV I've ever used. Gr… computer \n#> 2 I regret buying this laptop. It is too slow … computer \n#> 3 Not sure how to feel about my new washing ma… appliance\n \n# Pass custom values for each classification \nllm_classify(reviews, review, c(\"appliance\" ~ 1, \"computer\" ~ 2)) \n#> # A tibble: 3 × 2\n#> review .classify\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. 1\n#> 2 I regret buying this laptop. It is too slow and the keyboard is too… 2\n#> 3 Not sure how to feel about my new washing machine. Great color, but… 1\n \n# For character vectors, instead of a data frame, use this function \nllm_vec_classify( \n c(\"this is important!\", \"just whenever\"), \n c(\"urgent\", \"not urgent\") \n) \n#> [1] \"urgent\" \"urgent\"\n \n# To preview the first call that will be made to the downstream R function \nllm_vec_classify( \n c(\"this is important!\", \"just whenever\"), \n c(\"urgent\", \"not urgent\"), \n preview = TRUE \n) \n#> ollamar::chat(messages = list(list(role = \"user\", content = \"You are a helpful classification engine. Determine if the text refers to one of the following: urgent, not urgent. No capitalization. No explanations. The answer is based on the following text:\\nthis is important!\")), \n#> output = \"text\", model = \"llama3.2\", seed = 100)\n```\n:::\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_freeze/reference/llm_custom/execute-results/html.json b/_freeze/reference/llm_custom/execute-results/html.json index 440cc97..0a28b12 100644 --- a/_freeze/reference/llm_custom/execute-results/html.json +++ b/_freeze/reference/llm_custom/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "52c1b9006a451ed36ef2bd16d7a4a719", + "hash": "ff1685bb77def64dda90659e8896d95a", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Send a custom prompt to the LLM\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n\n\n[R/llm-custom.R](https://github.com/edgararuiz/mall/blob/main/R/llm-custom.R)\n\n## llm_custom\n\n## Description\n Use a Large Language Model (LLM) to process the provided text using the instructions from `prompt` \n\n\n## Usage\n```r\n \nllm_custom(.data, col, prompt = \"\", pred_name = \".pred\", valid_resps = \"\") \n \nllm_vec_custom(x, prompt = \"\", valid_resps = NULL) \n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| prompt | The prompt to append to each record sent to the LLM |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| valid_resps | If the response from the LLM is not open, but deterministic, provide the options in a vector. This function will set to `NA` any response not in the options |\n| x | A vector that contains the text to be analyzed |\n\n\n\n## Value\n `llm_custom` returns a `data.frame` or `tbl` object. `llm_vec_custom` returns a vector that is the same length as `x`. \n\n\n## Examples\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n \nlibrary(mall) \n \ndata(\"reviews\") \n \nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE) \n \nmy_prompt <- paste( \n \"Answer a question.\", \n \"Return only the answer, no explanation\", \n \"Acceptable answers are 'yes', 'no'\", \n \"Answer this about the following text, is this a happy customer?:\" \n) \n \nreviews |> \n llm_custom(review, my_prompt) \n#> # A tibble: 3 × 2\n#> review .pred\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. Yes \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too noi… No \n#> 3 Not sure how to feel about my new washing machine. Great color, but har… No\n```\n:::\n", + "markdown": "---\ntitle: \"Send a custom prompt to the LLM\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n\n\n\n\n\n[R/llm-custom.R](https://github.com/mlverse/mall/blob/main/R/llm-custom.R)\n\n## llm_custom\n\n## Description\n Use a Large Language Model (LLM) to process the provided text using the instructions from `prompt` \n\n\n## Usage\n```r\n \nllm_custom(.data, col, prompt = \"\", pred_name = \".pred\", valid_resps = \"\") \n \nllm_vec_custom(x, prompt = \"\", valid_resps = NULL) \n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| prompt | The prompt to append to each record sent to the LLM |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| valid_resps | If the response from the LLM is not open, but deterministic, provide the options in a vector. This function will set to `NA` any response not in the options |\n| x | A vector that contains the text to be analyzed |\n\n\n\n## Value\n `llm_custom` returns a `data.frame` or `tbl` object. `llm_vec_custom` returns a vector that is the same length as `x`. \n\n\n## Examples\n\n\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n \nlibrary(mall) \n \ndata(\"reviews\") \n \nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE) \n \nmy_prompt <- paste( \n \"Answer a question.\", \n \"Return only the answer, no explanation\", \n \"Acceptable answers are 'yes', 'no'\", \n \"Answer this about the following text, is this a happy customer?:\" \n) \n \nreviews |> \n llm_custom(review, my_prompt) \n#> # A tibble: 3 × 2\n#> review .pred\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. Yes \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too noi… No \n#> 3 Not sure how to feel about my new washing machine. Great color, but har… No\n```\n:::\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_freeze/reference/llm_extract/execute-results/html.json b/_freeze/reference/llm_extract/execute-results/html.json index 5ed1b7a..69e5409 100644 --- a/_freeze/reference/llm_extract/execute-results/html.json +++ b/_freeze/reference/llm_extract/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "5513f09e427f8bbd1d063df200ba192c", + "hash": "b1565de7b50caf7ce282167581a7d0c8", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Extract entities from text\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n\n\n[R/llm-extract.R](https://github.com/edgararuiz/mall/blob/main/R/llm-extract.R)\n\n## llm_extract\n\n## Description\n Use a Large Language Model (LLM) to extract specific entity, or entities, from the provided text \n\n\n## Usage\n```r\n \nllm_extract( \n .data, \n col, \n labels, \n expand_cols = FALSE, \n additional_prompt = \"\", \n pred_name = \".extract\" \n) \n \nllm_vec_extract(x, labels = c(), additional_prompt = \"\", preview = FALSE) \n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| labels | A vector with the entities to extract from the text |\n| expand_cols | If multiple `labels` are passed, this is a flag that tells the function to create a new column per item in `labels`. If `labels` is a named vector, this function will use those names as the new column names, if not, the function will use a sanitized version of the content as the name. |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n `llm_extract` returns a `data.frame` or `tbl` object. `llm_vec_extract` returns a vector that is the same length as `x`. \n\n\n## Examples\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n \nlibrary(mall) \n \ndata(\"reviews\") \n \nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE) \n \n# Use 'labels' to let the function know what to extract \nllm_extract(reviews, review, labels = \"product\") \n#> # A tibble: 3 × 2\n#> review .extract \n#> \n#> 1 This has been the best TV I've ever used. Gr… tv \n#> 2 I regret buying this laptop. It is too slow … laptop \n#> 3 Not sure how to feel about my new washing ma… washing machine\n \n# Use 'pred_name' to customize the new column's name \nllm_extract(reviews, review, \"product\", pred_name = \"prod\") \n#> # A tibble: 3 × 2\n#> review prod \n#> \n#> 1 This has been the best TV I've ever used. Gr… tv \n#> 2 I regret buying this laptop. It is too slow … laptop \n#> 3 Not sure how to feel about my new washing ma… washing machine\n \n# Pass a vector to request multiple things, the results will be pipe delimeted \n# in a single column \nllm_extract(reviews, review, c(\"product\", \"feelings\")) \n#> # A tibble: 3 × 2\n#> review .extract \n#> \n#> 1 This has been the best TV I've ever used. Gr… tv | great \n#> 2 I regret buying this laptop. It is too slow … laptop|frustration \n#> 3 Not sure how to feel about my new washing ma… washing machine | confusion\n \n# To get multiple columns, use 'expand_cols' \nllm_extract(reviews, review, c(\"product\", \"feelings\"), expand_cols = TRUE) \n#> # A tibble: 3 × 3\n#> review product feelings \n#> \n#> 1 This has been the best TV I've ever used. Gr… \"tv \" \" great\" \n#> 2 I regret buying this laptop. It is too slow … \"laptop\" \"frustration\"\n#> 3 Not sure how to feel about my new washing ma… \"washing machine \" \" confusion\"\n \n# Pass a named vector to set the resulting column names \nllm_extract( \n .data = reviews, \n col = review, \n labels = c(prod = \"product\", feels = \"feelings\"), \n expand_cols = TRUE \n) \n#> # A tibble: 3 × 3\n#> review prod feels \n#> \n#> 1 This has been the best TV I've ever used. Gr… \"tv \" \" great\" \n#> 2 I regret buying this laptop. It is too slow … \"laptop\" \"frustration\"\n#> 3 Not sure how to feel about my new washing ma… \"washing machine \" \" confusion\"\n \n# For character vectors, instead of a data frame, use this function \nllm_vec_extract(\"bob smith, 123 3rd street\", c(\"name\", \"address\")) \n#> [1] \"bob smith | 123 3rd street\"\n \n# To preview the first call that will be made to the downstream R function \nllm_vec_extract( \n \"bob smith, 123 3rd street\", \n c(\"name\", \"address\"), \n preview = TRUE \n) \n#> ollamar::chat(messages = list(list(role = \"user\", content = \"You are a helpful text extraction engine. Extract the name, address being referred to on the text. I expect 2 items exactly. No capitalization. No explanations. Return the response exclusively in a pipe separated list, and no headers. The answer is based on the following text:\\nbob smith, 123 3rd street\")), \n#> output = \"text\", model = \"llama3.2\", seed = 100)\n```\n:::\n", + "markdown": "---\ntitle: \"Extract entities from text\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n\n\n\n\n\n[R/llm-extract.R](https://github.com/mlverse/mall/blob/main/R/llm-extract.R)\n\n## llm_extract\n\n## Description\n Use a Large Language Model (LLM) to extract specific entity, or entities, from the provided text \n\n\n## Usage\n```r\n \nllm_extract( \n .data, \n col, \n labels, \n expand_cols = FALSE, \n additional_prompt = \"\", \n pred_name = \".extract\" \n) \n \nllm_vec_extract(x, labels = c(), additional_prompt = \"\", preview = FALSE) \n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| labels | A vector with the entities to extract from the text |\n| expand_cols | If multiple `labels` are passed, this is a flag that tells the function to create a new column per item in `labels`. If `labels` is a named vector, this function will use those names as the new column names, if not, the function will use a sanitized version of the content as the name. |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n `llm_extract` returns a `data.frame` or `tbl` object. `llm_vec_extract` returns a vector that is the same length as `x`. \n\n\n## Examples\n\n\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n \nlibrary(mall) \n \ndata(\"reviews\") \n \nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE) \n \n# Use 'labels' to let the function know what to extract \nllm_extract(reviews, review, labels = \"product\") \n#> # A tibble: 3 × 2\n#> review .extract \n#> \n#> 1 This has been the best TV I've ever used. Gr… tv \n#> 2 I regret buying this laptop. It is too slow … laptop \n#> 3 Not sure how to feel about my new washing ma… washing machine\n \n# Use 'pred_name' to customize the new column's name \nllm_extract(reviews, review, \"product\", pred_name = \"prod\") \n#> # A tibble: 3 × 2\n#> review prod \n#> \n#> 1 This has been the best TV I've ever used. Gr… tv \n#> 2 I regret buying this laptop. It is too slow … laptop \n#> 3 Not sure how to feel about my new washing ma… washing machine\n \n# Pass a vector to request multiple things, the results will be pipe delimeted \n# in a single column \nllm_extract(reviews, review, c(\"product\", \"feelings\")) \n#> # A tibble: 3 × 2\n#> review .extract \n#> \n#> 1 This has been the best TV I've ever used. Gr… tv | great \n#> 2 I regret buying this laptop. It is too slow … laptop|frustration \n#> 3 Not sure how to feel about my new washing ma… washing machine | confusion\n \n# To get multiple columns, use 'expand_cols' \nllm_extract(reviews, review, c(\"product\", \"feelings\"), expand_cols = TRUE) \n#> # A tibble: 3 × 3\n#> review product feelings \n#> \n#> 1 This has been the best TV I've ever used. Gr… \"tv \" \" great\" \n#> 2 I regret buying this laptop. It is too slow … \"laptop\" \"frustration\"\n#> 3 Not sure how to feel about my new washing ma… \"washing machine \" \" confusion\"\n \n# Pass a named vector to set the resulting column names \nllm_extract( \n .data = reviews, \n col = review, \n labels = c(prod = \"product\", feels = \"feelings\"), \n expand_cols = TRUE \n) \n#> # A tibble: 3 × 3\n#> review prod feels \n#> \n#> 1 This has been the best TV I've ever used. Gr… \"tv \" \" great\" \n#> 2 I regret buying this laptop. It is too slow … \"laptop\" \"frustration\"\n#> 3 Not sure how to feel about my new washing ma… \"washing machine \" \" confusion\"\n \n# For character vectors, instead of a data frame, use this function \nllm_vec_extract(\"bob smith, 123 3rd street\", c(\"name\", \"address\")) \n#> [1] \"bob smith | 123 3rd street\"\n \n# To preview the first call that will be made to the downstream R function \nllm_vec_extract( \n \"bob smith, 123 3rd street\", \n c(\"name\", \"address\"), \n preview = TRUE \n) \n#> ollamar::chat(messages = list(list(role = \"user\", content = \"You are a helpful text extraction engine. Extract the name, address being referred to on the text. I expect 2 items exactly. No capitalization. No explanations. Return the response exclusively in a pipe separated list, and no headers. The answer is based on the following text:\\nbob smith, 123 3rd street\")), \n#> output = \"text\", model = \"llama3.2\", seed = 100)\n```\n:::\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_freeze/reference/llm_sentiment/execute-results/html.json b/_freeze/reference/llm_sentiment/execute-results/html.json index 84e4536..da89d2b 100644 --- a/_freeze/reference/llm_sentiment/execute-results/html.json +++ b/_freeze/reference/llm_sentiment/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "af2e15fcaf1c63a992431b417f15191b", + "hash": "b83a7436e93470bb4eb9b7bb7cdbabca", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Sentiment analysis\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n\n\n[R/llm-sentiment.R](https://github.com/edgararuiz/mall/blob/main/R/llm-sentiment.R)\n\n## llm_sentiment\n\n## Description\n Use a Large Language Model (LLM) to perform sentiment analysis from the provided text \n\n\n## Usage\n```r\n \nllm_sentiment( \n .data, \n col, \n options = c(\"positive\", \"negative\", \"neutral\"), \n pred_name = \".sentiment\", \n additional_prompt = \"\" \n) \n \nllm_vec_sentiment( \n x, \n options = c(\"positive\", \"negative\", \"neutral\"), \n additional_prompt = \"\", \n preview = FALSE \n) \n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| options | A vector with the options that the LLM should use to assign a sentiment to the text. Defaults to: 'positive', 'negative', 'neutral' |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n `llm_sentiment` returns a `data.frame` or `tbl` object. `llm_vec_sentiment` returns a vector that is the same length as `x`. \n\n\n## Examples\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n \nlibrary(mall) \n \ndata(\"reviews\") \n \nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE) \n \nllm_sentiment(reviews, review) \n#> # A tibble: 3 × 2\n#> review .sentiment\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. positive \n#> 2 I regret buying this laptop. It is too slow and the keyboard is to… negative \n#> 3 Not sure how to feel about my new washing machine. Great color, bu… neutral\n \n# Use 'pred_name' to customize the new column's name \nllm_sentiment(reviews, review, pred_name = \"review_sentiment\") \n#> # A tibble: 3 × 2\n#> review review_sentiment\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and … positive \n#> 2 I regret buying this laptop. It is too slow and the keyboard… negative \n#> 3 Not sure how to feel about my new washing machine. Great col… neutral\n \n# Pass custom sentiment options \nllm_sentiment(reviews, review, c(\"positive\", \"negative\")) \n#> # A tibble: 3 × 2\n#> review .sentiment\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. positive \n#> 2 I regret buying this laptop. It is too slow and the keyboard is to… negative \n#> 3 Not sure how to feel about my new washing machine. Great color, bu… negative\n \n# Specify values to return per sentiment \nllm_sentiment(reviews, review, c(\"positive\" ~ 1, \"negative\" ~ 0)) \n#> # A tibble: 3 × 2\n#> review .sentiment\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. 1\n#> 2 I regret buying this laptop. It is too slow and the keyboard is to… 0\n#> 3 Not sure how to feel about my new washing machine. Great color, bu… 0\n \n# For character vectors, instead of a data frame, use this function \nllm_vec_sentiment(c(\"I am happy\", \"I am sad\")) \n#> [1] \"positive\" \"negative\"\n \n# To preview the first call that will be made to the downstream R function \nllm_vec_sentiment(c(\"I am happy\", \"I am sad\"), preview = TRUE) \n#> ollamar::chat(messages = list(list(role = \"user\", content = \"You are a helpful sentiment engine. Return only one of the following answers: positive, negative, neutral. No capitalization. No explanations. The answer is based on the following text:\\nI am happy\")), \n#> output = \"text\", model = \"llama3.2\", seed = 100)\n```\n:::\n", + "markdown": "---\ntitle: \"Sentiment analysis\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n\n\n\n\n\n[R/llm-sentiment.R](https://github.com/mlverse/mall/blob/main/R/llm-sentiment.R)\n\n## llm_sentiment\n\n## Description\n Use a Large Language Model (LLM) to perform sentiment analysis from the provided text \n\n\n## Usage\n```r\n \nllm_sentiment( \n .data, \n col, \n options = c(\"positive\", \"negative\", \"neutral\"), \n pred_name = \".sentiment\", \n additional_prompt = \"\" \n) \n \nllm_vec_sentiment( \n x, \n options = c(\"positive\", \"negative\", \"neutral\"), \n additional_prompt = \"\", \n preview = FALSE \n) \n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| options | A vector with the options that the LLM should use to assign a sentiment to the text. Defaults to: 'positive', 'negative', 'neutral' |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n `llm_sentiment` returns a `data.frame` or `tbl` object. `llm_vec_sentiment` returns a vector that is the same length as `x`. \n\n\n## Examples\n\n\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n \nlibrary(mall) \n \ndata(\"reviews\") \n \nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE) \n \nllm_sentiment(reviews, review) \n#> # A tibble: 3 × 2\n#> review .sentiment\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. positive \n#> 2 I regret buying this laptop. It is too slow and the keyboard is to… negative \n#> 3 Not sure how to feel about my new washing machine. Great color, bu… neutral\n \n# Use 'pred_name' to customize the new column's name \nllm_sentiment(reviews, review, pred_name = \"review_sentiment\") \n#> # A tibble: 3 × 2\n#> review review_sentiment\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and … positive \n#> 2 I regret buying this laptop. It is too slow and the keyboard… negative \n#> 3 Not sure how to feel about my new washing machine. Great col… neutral\n \n# Pass custom sentiment options \nllm_sentiment(reviews, review, c(\"positive\", \"negative\")) \n#> # A tibble: 3 × 2\n#> review .sentiment\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. positive \n#> 2 I regret buying this laptop. It is too slow and the keyboard is to… negative \n#> 3 Not sure how to feel about my new washing machine. Great color, bu… negative\n \n# Specify values to return per sentiment \nllm_sentiment(reviews, review, c(\"positive\" ~ 1, \"negative\" ~ 0)) \n#> # A tibble: 3 × 2\n#> review .sentiment\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. 1\n#> 2 I regret buying this laptop. It is too slow and the keyboard is to… 0\n#> 3 Not sure how to feel about my new washing machine. Great color, bu… 0\n \n# For character vectors, instead of a data frame, use this function \nllm_vec_sentiment(c(\"I am happy\", \"I am sad\")) \n#> [1] \"positive\" \"negative\"\n \n# To preview the first call that will be made to the downstream R function \nllm_vec_sentiment(c(\"I am happy\", \"I am sad\"), preview = TRUE) \n#> ollamar::chat(messages = list(list(role = \"user\", content = \"You are a helpful sentiment engine. Return only one of the following answers: positive, negative, neutral. No capitalization. No explanations. The answer is based on the following text:\\nI am happy\")), \n#> output = \"text\", model = \"llama3.2\", seed = 100)\n```\n:::\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_freeze/reference/llm_summarize/execute-results/html.json b/_freeze/reference/llm_summarize/execute-results/html.json index 6fbe1bd..a99d908 100644 --- a/_freeze/reference/llm_summarize/execute-results/html.json +++ b/_freeze/reference/llm_summarize/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "792d98073306022d560b5d6efc450dac", + "hash": "7803afebce0bbedd410084c6c24a196f", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Summarize text\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n\n\n[R/llm-summarize.R](https://github.com/edgararuiz/mall/blob/main/R/llm-summarize.R)\n\n## llm_summarize\n\n## Description\n Use a Large Language Model (LLM) to summarize text \n\n\n## Usage\n```r\n \nllm_summarize( \n .data, \n col, \n max_words = 10, \n pred_name = \".summary\", \n additional_prompt = \"\" \n) \n \nllm_vec_summarize(x, max_words = 10, additional_prompt = \"\", preview = FALSE) \n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| max_words | The maximum number of words that the LLM should use in the summary. Defaults to 10. |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n `llm_summarize` returns a `data.frame` or `tbl` object. `llm_vec_summarize` returns a vector that is the same length as `x`. \n\n\n## Examples\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n \nlibrary(mall) \n \ndata(\"reviews\") \n \nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE) \n \n# Use max_words to set the maximum number of words to use for the summary \nllm_summarize(reviews, review, max_words = 5) \n#> # A tibble: 3 × 2\n#> review .summary \n#> \n#> 1 This has been the best TV I've ever used. Gr… it's a great tv \n#> 2 I regret buying this laptop. It is too slow … laptop purchase was a mistake \n#> 3 Not sure how to feel about my new washing ma… having mixed feelings about it\n \n# Use 'pred_name' to customize the new column's name \nllm_summarize(reviews, review, 5, pred_name = \"review_summary\") \n#> # A tibble: 3 × 2\n#> review review_summary \n#> \n#> 1 This has been the best TV I've ever used. Gr… it's a great tv \n#> 2 I regret buying this laptop. It is too slow … laptop purchase was a mistake \n#> 3 Not sure how to feel about my new washing ma… having mixed feelings about it\n \n# For character vectors, instead of a data frame, use this function \nllm_vec_summarize( \n \"This has been the best TV I've ever used. Great screen, and sound.\", \n max_words = 5 \n) \n#> [1] \"it's a great tv\"\n \n# To preview the first call that will be made to the downstream R function \nllm_vec_summarize( \n \"This has been the best TV I've ever used. Great screen, and sound.\", \n max_words = 5, \n preview = TRUE \n) \n#> ollamar::chat(messages = list(list(role = \"user\", content = \"You are a helpful summarization engine. Your answer will contain no no capitalization and no explanations. Return no more than 5 words. The answer is the summary of the following text:\\nThis has been the best TV I've ever used. Great screen, and sound.\")), \n#> output = \"text\", model = \"llama3.2\", seed = 100)\n```\n:::\n", + "markdown": "---\ntitle: \"Summarize text\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n\n\n\n\n\n[R/llm-summarize.R](https://github.com/mlverse/mall/blob/main/R/llm-summarize.R)\n\n## llm_summarize\n\n## Description\n Use a Large Language Model (LLM) to summarize text \n\n\n## Usage\n```r\n \nllm_summarize( \n .data, \n col, \n max_words = 10, \n pred_name = \".summary\", \n additional_prompt = \"\" \n) \n \nllm_vec_summarize(x, max_words = 10, additional_prompt = \"\", preview = FALSE) \n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| max_words | The maximum number of words that the LLM should use in the summary. Defaults to 10. |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n `llm_summarize` returns a `data.frame` or `tbl` object. `llm_vec_summarize` returns a vector that is the same length as `x`. \n\n\n## Examples\n\n\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n \nlibrary(mall) \n \ndata(\"reviews\") \n \nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE) \n \n# Use max_words to set the maximum number of words to use for the summary \nllm_summarize(reviews, review, max_words = 5) \n#> # A tibble: 3 × 2\n#> review .summary \n#> \n#> 1 This has been the best TV I've ever used. Gr… it's a great tv \n#> 2 I regret buying this laptop. It is too slow … laptop purchase was a mistake \n#> 3 Not sure how to feel about my new washing ma… having mixed feelings about it\n \n# Use 'pred_name' to customize the new column's name \nllm_summarize(reviews, review, 5, pred_name = \"review_summary\") \n#> # A tibble: 3 × 2\n#> review review_summary \n#> \n#> 1 This has been the best TV I've ever used. Gr… it's a great tv \n#> 2 I regret buying this laptop. It is too slow … laptop purchase was a mistake \n#> 3 Not sure how to feel about my new washing ma… having mixed feelings about it\n \n# For character vectors, instead of a data frame, use this function \nllm_vec_summarize( \n \"This has been the best TV I've ever used. Great screen, and sound.\", \n max_words = 5 \n) \n#> [1] \"it's a great tv\"\n \n# To preview the first call that will be made to the downstream R function \nllm_vec_summarize( \n \"This has been the best TV I've ever used. Great screen, and sound.\", \n max_words = 5, \n preview = TRUE \n) \n#> ollamar::chat(messages = list(list(role = \"user\", content = \"You are a helpful summarization engine. Your answer will contain no no capitalization and no explanations. Return no more than 5 words. The answer is the summary of the following text:\\nThis has been the best TV I've ever used. Great screen, and sound.\")), \n#> output = \"text\", model = \"llama3.2\", seed = 100)\n```\n:::\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_freeze/reference/llm_translate/execute-results/html.json b/_freeze/reference/llm_translate/execute-results/html.json index fbd43d6..7896904 100644 --- a/_freeze/reference/llm_translate/execute-results/html.json +++ b/_freeze/reference/llm_translate/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "8bd9baf4ee7f07dbca7be8a2b2ebae70", + "hash": "12a96e1f40dd8886f608b5e94717a542", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Translates text to a specific language\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n\n\n[R/llm-translate.R](https://github.com/edgararuiz/mall/blob/main/R/llm-translate.R)\n\n## llm_translate\n\n## Description\n Use a Large Language Model (LLM) to translate a text to a specific language \n\n\n## Usage\n```r\n \nllm_translate( \n .data, \n col, \n language, \n pred_name = \".translation\", \n additional_prompt = \"\" \n) \n \nllm_vec_translate(x, language, additional_prompt = \"\", preview = FALSE) \n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| language | Target language to translate the text to |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n `llm_translate` returns a `data.frame` or `tbl` object. `llm_vec_translate` returns a vector that is the same length as `x`. \n\n\n## Examples\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n \nlibrary(mall) \n \ndata(\"reviews\") \n \nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE) \n \n# Pass the desired language to translate to \nllm_translate(reviews, review, \"spanish\") \n#> # A tibble: 3 × 2\n#> review .translation \n#> \n#> 1 This has been the best TV I've ever used. Gr… Esta ha sido la mejor televisió…\n#> 2 I regret buying this laptop. It is too slow … Me arrepiento de comprar este p…\n#> 3 Not sure how to feel about my new washing ma… No estoy seguro de cómo me sien…\n```\n:::\n", + "markdown": "---\ntitle: \"Translates text to a specific language\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n\n\n\n\n\n[R/llm-translate.R](https://github.com/mlverse/mall/blob/main/R/llm-translate.R)\n\n## llm_translate\n\n## Description\n Use a Large Language Model (LLM) to translate a text to a specific language \n\n\n## Usage\n```r\n \nllm_translate( \n .data, \n col, \n language, \n pred_name = \".translation\", \n additional_prompt = \"\" \n) \n \nllm_vec_translate(x, language, additional_prompt = \"\", preview = FALSE) \n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| language | Target language to translate the text to |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n `llm_translate` returns a `data.frame` or `tbl` object. `llm_vec_translate` returns a vector that is the same length as `x`. \n\n\n## Examples\n\n\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n \nlibrary(mall) \n \ndata(\"reviews\") \n \nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE) \n \n# Pass the desired language to translate to \nllm_translate(reviews, review, \"spanish\") \n#> # A tibble: 3 × 2\n#> review .translation \n#> \n#> 1 This has been the best TV I've ever used. Gr… Esta ha sido la mejor televisió…\n#> 2 I regret buying this laptop. It is too slow … Me arrepiento de comprar este p…\n#> 3 Not sure how to feel about my new washing ma… No estoy seguro de cómo me sien…\n```\n:::\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_freeze/reference/llm_use/execute-results/html.json b/_freeze/reference/llm_use/execute-results/html.json index 06b3e0a..e2ef898 100644 --- a/_freeze/reference/llm_use/execute-results/html.json +++ b/_freeze/reference/llm_use/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "dfab9f3b795c4388144decd81903c2b8", + "hash": "dd52ad3fbf8db0a52f356ba51ff8d26b", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Specify the model to use\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n\n[R/llm-use.R](https://github.com/edgararuiz/mall/blob/main/R/llm-use.R)\n\n## llm_use\n\n## Description\n Allows us to specify the back-end provider, model to use during the current R session \n\n\n## Usage\n```r\n \nllm_use( \n backend = NULL, \n model = NULL, \n ..., \n .silent = FALSE, \n .cache = NULL, \n .force = FALSE \n) \n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| backend | The name of an supported back-end provider. Currently only 'ollama' is supported. |\n| model | The name of model supported by the back-end provider |\n| ... | Additional arguments that this function will pass down to the integrating function. In the case of Ollama, it will pass those arguments to `ollamar::chat()`. |\n| .silent | Avoids console output |\n| .cache | The path to save model results, so they can be re-used if the same operation is ran again. To turn off, set this argument to an empty character: `\"\"`. 'It defaults to '_mall_cache'. If this argument is left `NULL` when calling this function, no changes to the path will be made. |\n| .force | Flag that tell the function to reset all of the settings in the R session |\n\n\n\n## Value\n A `mall_session` object \n\n\n## Examples\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n \nlibrary(mall) \n \nllm_use(\"ollama\", \"llama3.2\") \n#> \n#> ── mall session object\n#> Backend: ollama\n#> LLM session: model:llama3.2\n#> R session: cache_folder:_mall_cache\n \n# Additional arguments will be passed 'as-is' to the \n# downstream R function in this example, to ollama::chat() \nllm_use(\"ollama\", \"llama3.2\", seed = 100, temp = 0.1) \n#> \n#> ── mall session object \n#> Backend: ollamaLLM session: model:llama3.2\n#> seed:100\n#> temp:0.1\n#> R session: cache_folder:_mall_cache\n \n# During the R session, you can change any argument \n# individually and it will retain all of previous \n# arguments used \nllm_use(temp = 0.3) \n#> \n#> ── mall session object \n#> Backend: ollamaLLM session: model:llama3.2\n#> seed:100\n#> temp:0.3\n#> R session: cache_folder:_mall_cache\n \n# Use .cache to modify the target folder for caching \nllm_use(.cache = \"_my_cache\") \n#> \n#> ── mall session object \n#> Backend: ollamaLLM session: model:llama3.2\n#> seed:100\n#> temp:0.3\n#> R session: cache_folder:_my_cache\n \n# Leave .cache empty to turn off this functionality \nllm_use(.cache = \"\") \n#> \n#> ── mall session object \n#> Backend: ollamaLLM session: model:llama3.2\n#> seed:100\n#> temp:0.3\n \n# Use .silent to avoid the print out \nllm_use(.silent = TRUE) \n \n```\n:::\n", + "markdown": "---\ntitle: \"Specify the model to use\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n\n\n\n\n\n[R/llm-use.R](https://github.com/mlverse/mall/blob/main/R/llm-use.R)\n\n## llm_use\n\n## Description\n Allows us to specify the back-end provider, model to use during the current R session \n\n\n## Usage\n```r\n \nllm_use( \n backend = NULL, \n model = NULL, \n ..., \n .silent = FALSE, \n .cache = NULL, \n .force = FALSE \n) \n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| backend | The name of an supported back-end provider. Currently only 'ollama' is supported. |\n| model | The name of model supported by the back-end provider |\n| ... | Additional arguments that this function will pass down to the integrating function. In the case of Ollama, it will pass those arguments to `ollamar::chat()`. |\n| .silent | Avoids console output |\n| .cache | The path to save model results, so they can be re-used if the same operation is ran again. To turn off, set this argument to an empty character: `\"\"`. 'It defaults to '_mall_cache'. If this argument is left `NULL` when calling this function, no changes to the path will be made. |\n| .force | Flag that tell the function to reset all of the settings in the R session |\n\n\n\n## Value\n A `mall_session` object \n\n\n## Examples\n\n\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n \nlibrary(mall) \n \nllm_use(\"ollama\", \"llama3.2\") \n#> \n#> ── mall session object\n#> Backend: ollama\n#> LLM session: model:llama3.2\n#> R session: cache_folder:_mall_cache\n \n# Additional arguments will be passed 'as-is' to the \n# downstream R function in this example, to ollama::chat() \nllm_use(\"ollama\", \"llama3.2\", seed = 100, temp = 0.1) \n#> \n#> ── mall session object \n#> Backend: ollamaLLM session: model:llama3.2\n#> seed:100\n#> temp:0.1\n#> R session: cache_folder:_mall_cache\n \n# During the R session, you can change any argument \n# individually and it will retain all of previous \n# arguments used \nllm_use(temp = 0.3) \n#> \n#> ── mall session object \n#> Backend: ollamaLLM session: model:llama3.2\n#> seed:100\n#> temp:0.3\n#> R session: cache_folder:_mall_cache\n \n# Use .cache to modify the target folder for caching \nllm_use(.cache = \"_my_cache\") \n#> \n#> ── mall session object \n#> Backend: ollamaLLM session: model:llama3.2\n#> seed:100\n#> temp:0.3\n#> R session: cache_folder:_my_cache\n \n# Leave .cache empty to turn off this functionality \nllm_use(.cache = \"\") \n#> \n#> ── mall session object \n#> Backend: ollamaLLM session: model:llama3.2\n#> seed:100\n#> temp:0.3\n \n# Use .silent to avoid the print out \nllm_use(.silent = TRUE) \n \n```\n:::\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_freeze/reference/m_backend_submit/execute-results/html.json b/_freeze/reference/m_backend_submit/execute-results/html.json index 9c5b867..c426f0b 100644 --- a/_freeze/reference/m_backend_submit/execute-results/html.json +++ b/_freeze/reference/m_backend_submit/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "924d807cc1d6899b2e07a70ae6291338", + "hash": "0b4215060e3219e73d990b8b07903e66", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Functions to integrate different back-ends\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n\n[R/m-backend-prompt.R, R/m-backend-submit.R](https://github.com/edgararuiz/mall/blob/main/R/m-backend-prompt.R, R/m-backend-submit.R)\n\n## m_backend_prompt\n\n## Description\n Functions to integrate different back-ends \n\n\n## Usage\n```r\n \nm_backend_prompt(backend, additional) \n \nm_backend_submit(backend, x, prompt, preview = FALSE) \n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| backend | An `mall_session` object |\n| additional | Additional text to insert to the `base_prompt` |\n| x | The body of the text to be submitted to the LLM |\n| prompt | The additional information to add to the submission |\n| preview | If `TRUE`, it will display the resulting R call of the first text in `x` |\n\n\n\n## Value\n `m_backend_submit` does not return an object. `m_backend_prompt` returns a list of functions that contain the base prompts. \n\n\n\n\n", + "markdown": "---\ntitle: \"Functions to integrate different back-ends\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n\n\n\n\n\n[R/m-backend-prompt.R, R/m-backend-submit.R](https://github.com/mlverse/mall/blob/main/R/m-backend-prompt.R, R/m-backend-submit.R)\n\n## m_backend_prompt\n\n## Description\n Functions to integrate different back-ends \n\n\n## Usage\n```r\n \nm_backend_prompt(backend, additional) \n \nm_backend_submit(backend, x, prompt, preview = FALSE) \n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| backend | An `mall_session` object |\n| additional | Additional text to insert to the `base_prompt` |\n| x | The body of the text to be submitted to the LLM |\n| prompt | The additional information to add to the submission |\n| preview | If `TRUE`, it will display the resulting R call of the first text in `x` |\n\n\n\n## Value\n `m_backend_submit` does not return an object. `m_backend_prompt` returns a list of functions that contain the base prompts. \n\n\n\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_freeze/reference/reviews/execute-results/html.json b/_freeze/reference/reviews/execute-results/html.json index bddf656..3d0e7fe 100644 --- a/_freeze/reference/reviews/execute-results/html.json +++ b/_freeze/reference/reviews/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "450a1111d48e83138306c47f538b455a", + "hash": "a51623e19e8fff2aa4b505aad2197f81", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Mini reviews data set\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n\n[R/data-reviews.R](https://github.com/edgararuiz/mall/blob/main/R/data-reviews.R)\n\n## reviews\n\n## Description\n Mini reviews data set \n\n## Format\n A data frame that contains 3 records. The records are of fictitious product reviews. \n\n## Usage\n```r\n \nreviews \n```\n\n\n\n\n\n\n## Examples\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n \nlibrary(mall) \ndata(reviews) \nreviews \n#> # A tibble: 3 × 1\n#> review \n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too noisy \n#> 3 Not sure how to feel about my new washing machine. Great color, but hard to f…\n```\n:::\n", + "markdown": "---\ntitle: \"Mini reviews data set\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n[R/data-reviews.R](https://github.com/mlverse/mall/blob/main/R/data-reviews.R)\n\n## reviews\n\n## Description\n Mini reviews data set \n\n## Format\n A data frame that contains 3 records. The records are of fictitious product reviews. \n\n## Usage\n```r\n \nreviews \n```\n\n\n\n\n\n\n## Examples\n\n\n::: {.cell}\n\n```{.r .cell-code}\n \nlibrary(mall) \ndata(reviews) \nreviews \n#> # A tibble: 3 × 1\n#> review \n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too noisy \n#> 3 Not sure how to feel about my new washing machine. Great color, but hard to f…\n```\n:::\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_quarto.yml b/_quarto.yml index 8d881d6..f34e991 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -8,9 +8,9 @@ execute: website: title: mall - favicon: "figures/favicon/apple-touch-icon.png" + favicon: "site/images/favicon/apple-touch-icon.png" navbar: - logo: "figures/favicon/apple-touch-icon.png" + logo: "site/images/favicon/apple-touch-icon.png" left: - sidebar:articles - href: reference/index.qmd @@ -41,10 +41,10 @@ format: code-toos: true eval: true freeze: true - css: styles.css + css: site/styles.css theme: - light: [cosmo, theme.scss] - dark: [cosmo, theme-dark.scss] + light: [cosmo, site/theme.scss] + dark: [cosmo, site/theme-dark.scss] knitr: opts_chunk: diff --git a/articles/caching.qmd b/articles/caching.qmd index 5b6653c..56ab5c2 100644 --- a/articles/caching.qmd +++ b/articles/caching.qmd @@ -11,7 +11,7 @@ library(fs) library(jsonlite) dir_delete("_mall_cache") dir_delete("_performance_cache") -source("../utils/knitr-print.R") +source("../site/knitr-print.R") ``` Data preparation, and model preparation, is usually a iterative process. Because diff --git a/articles/databricks.qmd b/articles/databricks.qmd index d56c810..d3575b6 100644 --- a/articles/databricks.qmd +++ b/articles/databricks.qmd @@ -7,7 +7,7 @@ execute: ```{r} #| include: false -source("../utils/knitr-print.R") +source("../site/knitr-print.R") ``` ```{r, include = FALSE} diff --git a/articles/performance.qmd b/articles/performance.qmd index 1ca863a..b57ee33 100644 --- a/articles/performance.qmd +++ b/articles/performance.qmd @@ -9,7 +9,7 @@ execute: #| include: false library(dplyr) mall::llm_use("ollama", "llama3.2", seed = 100, .cache = "_readme_cache") -source("../utils/knitr-print.R") +source("../site/knitr-print.R") ``` ## Performance diff --git a/index.qmd b/index.qmd index 7e165ba..65ce486 100644 --- a/index.qmd +++ b/index.qmd @@ -16,12 +16,12 @@ library(dplyr) library(dbplyr) library(tictoc) library(DBI) -source("utils/knitr-print.R") +source("site/knitr-print.R") mall::llm_use("ollama", "llama3.2", seed = 100, .cache = "_readme_cache") ``` - + @@ -51,7 +51,7 @@ This package is inspired by the SQL AI functions now offered by vendors such as and Snowflake. `mall` uses [Ollama](https://ollama.com/) to interact with LLMs installed locally. - + For **R**, that interaction takes place via the [`ollamar`](https://hauselin.github.io/ollama-r/) package. The functions are @@ -62,7 +62,7 @@ reviews |> llm_sentiment(review) ``` - + For **Python**, `mall` is a library extension to [Polars](https://pola.rs/). To interact with Ollama, it uses the official diff --git a/reference/index.qmd b/reference/index.qmd index b50dfe0..1f1f689 100644 --- a/reference/index.qmd +++ b/reference/index.qmd @@ -4,7 +4,7 @@ ## Python - + [MallFrame](MallFrame.qmd#mall.MallFrame) @@ -22,4 +22,4 @@ an LLM to run batch predictions over a data frame | [use](MallFrame.qmd#mall.MallFrame.use) | Define the model, backend, and other options to use to | ::: - \ No newline at end of file + diff --git a/reference/llm_classify.qmd b/reference/llm_classify.qmd index 5fdb7f8..758df65 100644 --- a/reference/llm_classify.qmd +++ b/reference/llm_classify.qmd @@ -7,7 +7,7 @@ execute: ```{r} #| include: false -source("../utils/knitr-print.R") +source("../site/knitr-print.R") ``` [R/llm-classify.R](https://github.com/mlverse/mall/blob/main/R/llm-classify.R) diff --git a/reference/llm_custom.qmd b/reference/llm_custom.qmd index fc938db..dd2a850 100644 --- a/reference/llm_custom.qmd +++ b/reference/llm_custom.qmd @@ -7,7 +7,7 @@ execute: ```{r} #| include: false -source("../utils/knitr-print.R") +source("../site/knitr-print.R") ``` [R/llm-custom.R](https://github.com/mlverse/mall/blob/main/R/llm-custom.R) diff --git a/reference/llm_extract.qmd b/reference/llm_extract.qmd index d2df9f5..1f4b863 100644 --- a/reference/llm_extract.qmd +++ b/reference/llm_extract.qmd @@ -7,7 +7,7 @@ execute: ```{r} #| include: false -source("../utils/knitr-print.R") +source("../site/knitr-print.R") ``` [R/llm-extract.R](https://github.com/mlverse/mall/blob/main/R/llm-extract.R) diff --git a/reference/llm_sentiment.qmd b/reference/llm_sentiment.qmd index 6a3042d..0aca9c7 100644 --- a/reference/llm_sentiment.qmd +++ b/reference/llm_sentiment.qmd @@ -7,7 +7,7 @@ execute: ```{r} #| include: false -source("../utils/knitr-print.R") +source("../site/knitr-print.R") ``` [R/llm-sentiment.R](https://github.com/mlverse/mall/blob/main/R/llm-sentiment.R) diff --git a/reference/llm_summarize.qmd b/reference/llm_summarize.qmd index 0f93bda..ccb07be 100644 --- a/reference/llm_summarize.qmd +++ b/reference/llm_summarize.qmd @@ -7,7 +7,7 @@ execute: ```{r} #| include: false -source("../utils/knitr-print.R") +source("../site/knitr-print.R") ``` [R/llm-summarize.R](https://github.com/mlverse/mall/blob/main/R/llm-summarize.R) diff --git a/reference/llm_translate.qmd b/reference/llm_translate.qmd index 32e480e..a154d35 100644 --- a/reference/llm_translate.qmd +++ b/reference/llm_translate.qmd @@ -7,7 +7,7 @@ execute: ```{r} #| include: false -source("../utils/knitr-print.R") +source("../site/knitr-print.R") ``` [R/llm-translate.R](https://github.com/mlverse/mall/blob/main/R/llm-translate.R) diff --git a/reference/llm_use.qmd b/reference/llm_use.qmd index 405de32..7c4e6b9 100644 --- a/reference/llm_use.qmd +++ b/reference/llm_use.qmd @@ -7,7 +7,7 @@ execute: ```{r} #| include: false -source("../utils/knitr-print.R") +source("../site/knitr-print.R") ``` [R/llm-use.R](https://github.com/mlverse/mall/blob/main/R/llm-use.R) diff --git a/reference/m_backend_submit.qmd b/reference/m_backend_submit.qmd index 80fd0f2..d0c588e 100644 --- a/reference/m_backend_submit.qmd +++ b/reference/m_backend_submit.qmd @@ -7,7 +7,7 @@ execute: ```{r} #| include: false -source("../utils/knitr-print.R") +source("../site/knitr-print.R") ``` [R/m-backend-prompt.R, R/m-backend-submit.R](https://github.com/mlverse/mall/blob/main/R/m-backend-prompt.R, R/m-backend-submit.R) diff --git a/reference/r_index.qmd b/reference/r_index.qmd index 761cf58..6ef8e11 100644 --- a/reference/r_index.qmd +++ b/reference/r_index.qmd @@ -4,42 +4,18 @@ toc: false [llm_classify()](llm_classify.html) [llm_vec_classify()](llm_classify.html) -       Categorize data as one of options given - - [llm_custom()](llm_custom.html) [llm_vec_custom()](llm_custom.html) -       Send a custom prompt to the LLM - - [llm_extract()](llm_extract.html) [llm_vec_extract()](llm_extract.html) -       Extract entities from text - - [llm_sentiment()](llm_sentiment.html) [llm_vec_sentiment()](llm_sentiment.html) -       Sentiment analysis - - [llm_summarize()](llm_summarize.html) [llm_vec_summarize()](llm_summarize.html) -       Summarize text - - [llm_translate()](llm_translate.html) [llm_vec_translate()](llm_translate.html) -       Translates text to a specific language - - [llm_use()](llm_use.html) -       Specify the model to use - - [reviews](reviews.html) -       Mini reviews data set - - diff --git a/reference/reviews.qmd b/reference/reviews.qmd index d0b7e86..d74ecc9 100644 --- a/reference/reviews.qmd +++ b/reference/reviews.qmd @@ -7,7 +7,7 @@ execute: ```{r} #| include: false -source("../utils/knitr-print.R") +source("../site/knitr-print.R") ``` [R/data-reviews.R](https://github.com/mlverse/mall/blob/main/R/data-reviews.R) diff --git a/site/README.md b/site/README.md new file mode 100644 index 0000000..28258f3 --- /dev/null +++ b/site/README.md @@ -0,0 +1,10 @@ +```r +pkgsite::write_reference( + pkg = "r", + folder = "reference", + not_run_examples = TRUE, + template = "site/_template_reference.qmd", + index_file = "r_index.qmd", + index_template = "site/_template_index.qmd" + ) +``` diff --git a/site/_template_index.qmd b/site/_template_index.qmd new file mode 100644 index 0000000..bbd4f22 --- /dev/null +++ b/site/_template_index.qmd @@ -0,0 +1,6 @@ +--- +toc: false +--- + + +{{{{notitle.reference}}}} diff --git a/utils/website/_reference.qmd b/site/_template_reference.qmd similarity index 93% rename from utils/website/_reference.qmd rename to site/_template_reference.qmd index 4180c6c..99774f9 100644 --- a/utils/website/_reference.qmd +++ b/site/_template_reference.qmd @@ -7,7 +7,7 @@ execute: ```{r} #| include: false -source("../utils/knitr-print.R") +source("../site/knitr-print.R") ``` [{{{{notitle.source}}}}](https://github.com/mlverse/mall/blob/main/{{{{notitle.source}}}}) diff --git a/figures/dplyr.png b/site/images/dplyr.png similarity index 100% rename from figures/dplyr.png rename to site/images/dplyr.png diff --git a/figures/favicon/apple-touch-icon-120x120.png b/site/images/favicon/apple-touch-icon-120x120.png similarity index 100% rename from figures/favicon/apple-touch-icon-120x120.png rename to site/images/favicon/apple-touch-icon-120x120.png diff --git a/figures/favicon/apple-touch-icon-152x152.png b/site/images/favicon/apple-touch-icon-152x152.png similarity index 100% rename from figures/favicon/apple-touch-icon-152x152.png rename to site/images/favicon/apple-touch-icon-152x152.png diff --git a/figures/favicon/apple-touch-icon-180x180.png b/site/images/favicon/apple-touch-icon-180x180.png similarity index 100% rename from figures/favicon/apple-touch-icon-180x180.png rename to site/images/favicon/apple-touch-icon-180x180.png diff --git a/figures/favicon/apple-touch-icon-60x60.png b/site/images/favicon/apple-touch-icon-60x60.png similarity index 100% rename from figures/favicon/apple-touch-icon-60x60.png rename to site/images/favicon/apple-touch-icon-60x60.png diff --git a/figures/favicon/apple-touch-icon-76x76.png b/site/images/favicon/apple-touch-icon-76x76.png similarity index 100% rename from figures/favicon/apple-touch-icon-76x76.png rename to site/images/favicon/apple-touch-icon-76x76.png diff --git a/figures/favicon/apple-touch-icon.png b/site/images/favicon/apple-touch-icon.png similarity index 100% rename from figures/favicon/apple-touch-icon.png rename to site/images/favicon/apple-touch-icon.png diff --git a/figures/favicon/favicon-16x16.png b/site/images/favicon/favicon-16x16.png similarity index 100% rename from figures/favicon/favicon-16x16.png rename to site/images/favicon/favicon-16x16.png diff --git a/figures/favicon/favicon-32x32.png b/site/images/favicon/favicon-32x32.png similarity index 100% rename from figures/favicon/favicon-32x32.png rename to site/images/favicon/favicon-32x32.png diff --git a/figures/favicon/favicon.ico b/site/images/favicon/favicon.ico similarity index 100% rename from figures/favicon/favicon.ico rename to site/images/favicon/favicon.ico diff --git a/figures/logo.png b/site/images/logo.png similarity index 100% rename from figures/logo.png rename to site/images/logo.png diff --git a/figures/mall.png b/site/images/mall.png similarity index 100% rename from figures/mall.png rename to site/images/mall.png diff --git a/figures/polars.png b/site/images/polars.png similarity index 100% rename from figures/polars.png rename to site/images/polars.png diff --git a/utils/knitr-print.R b/site/knitr-print.R similarity index 100% rename from utils/knitr-print.R rename to site/knitr-print.R diff --git a/styles.css b/site/styles.css similarity index 100% rename from styles.css rename to site/styles.css diff --git a/theme-dark.scss b/site/theme-dark.scss similarity index 100% rename from theme-dark.scss rename to site/theme-dark.scss diff --git a/theme.scss b/site/theme.scss similarity index 100% rename from theme.scss rename to site/theme.scss diff --git a/utils/website/README.md b/utils/website/README.md deleted file mode 100644 index 6b07432..0000000 --- a/utils/website/README.md +++ /dev/null @@ -1,10 +0,0 @@ -## Build reference, and render site - -```r -devtools::install(upgrade = "never") -#try(fs::dir_delete("_freeze/reference/")) -source(here::here("utils/website/build_reference.R")) -quarto::quarto_render(as_job = FALSE) -quarto::quarto_preview() -``` - diff --git a/utils/website/build_reference.R b/utils/website/build_reference.R deleted file mode 100644 index e2adf1b..0000000 --- a/utils/website/build_reference.R +++ /dev/null @@ -1,41 +0,0 @@ -local_folder <- here::here() -pkg <- "r" -if(path_file(local_folder) == "r") { - pkg <- "." - local_folder = path_dir(local_folder) -} - -library(fs) -source(path(local_folder, "utils/website/index-page.R")) -source(path(local_folder, "utils/website/list-to-qmd.R")) -source(path(local_folder, "utils/website/rd-to-list.R")) -suppressPackageStartupMessages(library(purrr)) -library(fs) -library(cli) - -build_reference_index <- function(pkg = ".", folder = "reference") { - if (is.character(pkg)) pkg <- pkgdown::as_pkgdown(pkg) - try(dir_create(folder)) - ref_path <- path(folder, "r_index", ext = "qmd") - try(file_delete(ref_path)) - writeLines(reference_index(pkg), ref_path) - cli_inform(col_green(ref_path)) -} - -build_reference <- function(pkg = ".", folder = "reference") { - if (is.character(pkg)) pkg <- pkgdown::as_pkgdown(pkg) - walk( - pkg$topics$file_in, - \(x) { - p <- path_ext_remove(x) - p <- paste0(folder, "/", p, ".qmd") - qmd <- reference_to_qmd(x, pkg) - try(file_delete(p)) - writeLines(qmd, p) - cli_inform(col_green(p)) - } - ) -} - -build_reference_index(pkg, path(local_folder, "reference")) -build_reference(pkg, path(local_folder, "reference")) diff --git a/utils/website/index-page.R b/utils/website/index-page.R deleted file mode 100644 index 7af798b..0000000 --- a/utils/website/index-page.R +++ /dev/null @@ -1,100 +0,0 @@ -reference_index <- function(pkg = ".") { - if (is.character(pkg)) pkg <- pkgdown::as_pkgdown(pkg) - ref_list <- reference_to_list_index(pkg) - ref_convert <- reference_index_convert(ref_list) - res <- imap( - ref_convert, - \(.x, .y) { - if (.y == 1) { - .x - } else { - c(" ", paste("##", .y), " ", .x) - } - } - ) - res <- reduce(res, c) - c( - "---", - "toc: false", - "---", - "", - "", - res - ) -} - -reference_index_convert <- function(index_list) { - out <- map(index_list, \(.x) map(.x, reference_links)) - map( - out, - \(.x) { - c( - map_chr( - .x, - \(.x) paste0( - .x$links, - "\n\n      ", - .x$desc, - "\n\n" - ) - ) - ) - } - ) -} - -reference_links <- function(x) { - # Manual fixes of special characters in funs variable - funcs <- x$funs - file_out <- path(x$file_out) - if (length(funcs) == 0) funcs <- x$alias - funcs <- gsub("<", "<", funcs) - funcs <- gsub(">", ">", funcs) - funcs <- paste0("[", funcs, "](", file_out, ")") - funcs <- paste0(funcs, collapse = " ") - desc <- x$title - list( - links = funcs, - desc = desc - ) -} - -reference_to_list_index <- function(pkg) { - if (is.character(pkg)) pkg <- as_pkgdown(pkg) - pkg_ref <- pkg$meta$reference - pkg_topics <- pkg$topics - pkg_topics <- pkg_topics[!pkg_topics$internal, ] - topics_env <- pkgdown:::match_env(pkg_topics) - if (is.null(pkg_ref)) { - x <- list(data.frame(contents = pkg_topics$name)) - } else { - x <- pkg_ref - } - sections_list <- map( - seq_along(x), - \(.x) { - ref <- x[[.x]] - topic_list <- map( - ref$contents, - ~ { - item_numbers <- NULL - try( - item_numbers <- eval(parse(text = paste0("`", .x, "`")), topics_env), - silent = TRUE - ) - if (is.null(item_numbers)) { - item_numbers <- eval(parse(text = .x), topics_env) - } - item_numbers - } - ) - topic_ids <- as.numeric(list_c(topic_list)) - transpose(pkg_topics[topic_ids, ]) - } - ) - if (!is.null(pkg_ref)) { - sections_title <- map_chr(pkg_ref, \(.x) .x$title) - names(sections_list) <- sections_title - } - sections_list -} diff --git a/utils/website/list-to-qmd.R b/utils/website/list-to-qmd.R deleted file mode 100644 index f1983df..0000000 --- a/utils/website/list-to-qmd.R +++ /dev/null @@ -1,171 +0,0 @@ -reference_to_qmd <- function(file_in, pkg = ".", template = NULL) { - if (is.character(pkg)) pkg <- pkgdown::as_pkgdown(pkg) - parsed <- reference_to_list_page(file_in, pkg) - con <- reference_convert(parsed) - - if (is.null(template)) { - template_path <- "../utils/website/_reference.qmd" - } else { - template_path <- template - } - - - template <- readLines(template_path) - - out <- map(template, parse_line_tag, con) - out <- discard(out, is.null) - out <- list_flatten(out) - out <- list_c(out) - as.character(out) -} - -parse_line_tag <- function(line, con) { - start_tag <- "\\{\\{\\{\\{" - end_tag <- "\\}\\}\\}\\}" - - tr <- c( - "seealso" = "See Also", - "author" = "Author(s)" - ) - - if (grepl(start_tag, line)) { - start_half <- strsplit(line, start_tag)[[1]] - - parsed <- list_c(strsplit(start_half, end_tag)) - - pm <- map(parsed, ~ { - yes_title <- substr(.x, 1, 6) == "title." - yes_notitle <- substr(.x, 1, 8) == "notitle." - if (yes_title | yes_notitle) { - start_sub <- ifelse(yes_title, 7, 9) - tag <- substr(.x, start_sub, nchar(.x)) - tag <- trimws(tag) - x <- con[[tag]] - } else { - x <- .x - tag <- NULL - } - list( - content = x, - title = yes_title, - no_title = yes_notitle, - no_lines = length(x), - tag = tag - ) - }) - pm <- transpose(pm) - - if (all(map_lgl(pm$content, is.null))) { - tag_content <- NULL - } else { - if (any(pm$no_lines > 1)) { - tag_content <- reduce(pm$content, c) - } else { - tag_content <- paste0(pm$content, collapse = "") - } - - yes_notitle <- any(as.logical(pm$no_title)) - yes_title <- any(as.logical(pm$title)) - - if (yes_title && !yes_notitle) { - tag_names <- as.character(pm$tag) - tag_names <- tag_names[!is.null(tag_names)] - if (length(tag_names > 0)) { - tag_name <- tag_names[[1]] - tag_name_label <- paste0( - toupper(substr(tag_name, 1, 1)), - substr(tag_name, 2, nchar(tag_name)) - ) - - tag_match <- names(tr) == tag_name - - if (any(tag_match)) { - tag_name_label <- tr[tag_match] - } - - if (!is.null(tag_content)) { - tag_content <- c(paste0("## ", tag_name_label), tag_content) - } - } - } - } - - - tag_content - } else { - line - } -} - -reference_entry <- function(x, title = NULL) { - out <- NULL - if (!is.null(title)) title <- paste("##", title) - if (!is.null(x)) out <- c(title, "", x, "") - out -} - -reference_convert <- function(x, output = "qmd") { - res <- list() - for (i in seq_along(x)) { - curr <- x[[i]] - curr_name <- names(x[i]) - out <- NULL - - if (curr_name == "examples") { - run_examples <- FALSE - if (output == "md") { - out <- map(curr, reference_qmd_example, FALSE) - out <- list_c(out) - } else { - out <- list() - if (!is.null(curr$code_run)) { - out <- c(out, "```{r}", curr$code_run, "```") - } - if (!is.null(curr$code_dont_run)) { - out <- c(out, "```{r}", curr$code_dont_run, "```") - } - } - } - - if (curr_name == "usage") { - out <- reference_qmd_example(curr, FALSE) - } - - if (curr_name == "arguments") out <- reference_arguments(curr) - - if (curr_name == "section") { - out <- map(curr, ~ c(paste("##", .x$title), .x$contents)) - out <- list_c(out) - out <- reduce(out, function(x, y) c(x, "", y), .init = NULL) - } - - if (is.null(out)) { - out <- curr - if (is.list(out)) out <- list_c(out) - if (length(out) > 1) out <- reduce(out, function(x, y) c(x, "", y), .init = NULL) - } - - out <- list(out) - names(out) <- curr_name - - res <- c(res, out) - } - - res -} - -reference_arguments <- function(x) { - lines <- map_chr(x, ~ paste0(.x[[1]], " | ", paste0(.x[[2]], collapse = "
"))) - rows <- paste0("| ", lines, " |") - c("|Arguments|Description|", "|---|---|", rows) -} - - -reference_qmd_example <- function(x, run = FALSE) { - # x <- x[x != "\n"] - if (run) { - out <- c("```{r}", x, "```") - } else { - out <- c("```r", x, "```") - } -} diff --git a/utils/website/rd-to-list.R b/utils/website/rd-to-list.R deleted file mode 100644 index b80be5d..0000000 --- a/utils/website/rd-to-list.R +++ /dev/null @@ -1,315 +0,0 @@ -reference_to_list_page <- function(file_in, pkg) { - if (is.character(pkg)) pkg <- as_pkgdown(pkg) - out <- tags_process(tags_get(file_in, pkg)) - out$repo <- pkg$repo$url$home - out -} - -tags_get <- function(file_in, pkg) { - pkg_topics <- pkg$topics - topic_row <- pkg_topics[pkg_topics$file_in == file_in, ] - topic <- transpose(topic_row) - topic_rd <- topic[[1]]$rd - tag_names <- map_chr(topic_rd, ~ class(.)[[1]]) - tag_split <- split(topic_rd, tag_names) - tag_split <- tag_split[names(tag_split) != "TEXT"] - imap( - tag_split, - ~ { - nm <- .y - class(.x) <- c(nm, class(.x)) - .x - } - ) -} - -tags_process <- function(x) { - out <- map(x, ~ tag_convert(.x)) - comment <- NULL - comment <- names(out) == "COMMENT" - if (length(comment) > 0) { - comment_list <- list_flatten(out$COMMENT) - reg_list <- out[!comment] - out <- c(comment_list, reg_list) - } - new_names <- substr(names(out), 5, nchar(names(out))) - names(out) <- new_names - out -} - -## ------------------ Conversion methods for the Major Tags ------------------- - -tag_convert <- function(x) { - UseMethod("tag_convert", x) -} - -tag_convert_default <- function(x) { - x <- x[[1]] - rf <- tag_flatten(x) - rf_cr <- NULL - cr <- NULL - for (i in seq_along(rf)) { - cl <- rf[[i]] - if (cl == new_paragraph_symbol) { - if (!is.null(cr)) rf_cr <- c(rf_cr, cr) - cr <- NULL - } else { - cr <- paste0(cr, cl, collapse = "") - } - } - rf_cr -} - -tag_convert.COMMENT <- function(x) { - edit_text <- "% Please edit documentation in " - find_edit <- grepl(edit_text, x) - out <- list(gsub(edit_text, "", x[find_edit])) - names(out) <- "tag_source" - out -} - -tag_convert.default <- tag_convert_default - -tag_convert.tag_section <- function(x) { - map(x, ~ { - list( - title = tag_convert_default(.x[1]), - contents = tag_convert_default(.x[2]) - ) - }) -} - -tag_convert.tag_examples <- function(x) { - x <- x[[1]] - rf <- tag_flatten(x) - on_examples <- TRUE - examples_run <- NULL - examples_dont_run <- NULL - cr <- NULL - for (i in seq_along(rf)) { - cl <- rf[[i]] - if (cl == new_paragraph_symbol | cl == do_not_run_symbol) { - if (!is.null(cr)) { - if (on_examples) { - examples_run <- c(examples_run, cr) - } else { - examples_dont_run <- c(examples_dont_run, cr) - } - } - cr <- NULL - } else { - cr <- c(cr, cl) - } - if (cl == do_not_run_symbol) { - on_examples <- FALSE - } - } - if (all(examples_run == " ")) examples_run <- NULL - if (all(examples_dont_run == " ")) examples_dont_run <- NULL - list( - code_run = examples_run, - code_dont_run = examples_dont_run - ) -} - -s3_label <- "## S3 method for class '" - -tag_convert.tag_usage <- function(x) { - x <- x[[1]] - out <- NULL - not_s3 <- TRUE - for (i in seq_along(x)) { - if (not_s3) { - cs <- x[[i]] - ts <- tag_single(cs) - curr_s3 <- substr(ts[[1]], 1, nchar(s3_label)) == s3_label - if (curr_s3) { - ts1 <- tag_single(x[[i + 1]]) - ts1[[1]] <- paste0(ts[[2]], ts1[[1]], collapse = "") - ts[[2]] <- ts1 - not_s3 <- FALSE - } - out <- c(out, ts) - } else { - not_s3 <- TRUE - } - } - out[out == new_paragraph_symbol] <- "" - if (out[[1]] == "" && length(out) > 1) out <- out[2:length(out)] - out -} - -tag_convert.tag_arguments <- function(x) { - x <- x[[1]] - res <- list() - for (i in seq_along(x)) { - item <- x[[i]] - if ("tag_item" %in% class(item)) { - res <- c( - res, - list(list( - argument = tag_convert(item[1]), - description = tag_convert(item[2]) - )) - ) - } - } - res -} - -## -------------------------------- Tag translation ---------------------------- - -tag_translate <- function(x) { - UseMethod("tag_translate") -} - -tag_translate.TEXT <- function(x) x -tag_translate.RCODE <- function(x) x -tag_translate.VERB <- function(x) x -tag_translate.character <- function(x) x - -tag_translate.tag_code <- function(x) tag_code(x) -tag_translate.tag_samp <- function(x) tag_code(x) -tag_translate.tag_verb <- function(x) tag_code(x) -tag_translate.tag_pkg <- function(x) tag_code(x) -tag_translate.tag_usage <- function(x) tag_code(x) - -tag_translate.tag_if <- function(x) "" -tag_translate.tag_cr <- function(x) "" -tag_translate.tag_tabular <- function(x) "" - - -tag_translate.tag_R <- function(x) "`R`" -tag_translate.tag_link <- function(x) as.character(x[[1]]) -tag_translate.tag_emph <- function(x) paste0("**", x, "**") -tag_translate.tag_strong <- function(x) paste0("**", x, "**") -tag_translate.tag_cite <- function(x) paste0("*", x, "*") -tag_translate.tag_eqn <- function(x) paste0("$", x, "$") -tag_translate.tag_deqn <- function(x) tag_deqn(x) -tag_translate.tag_item <- function(x) list(new_paragraph_symbol, "-") - -tag_translate.tag_preformatted <- function(x) tag_preformatted(x) -tag_translate.tag_email <- function(x) tag_url(x) -tag_translate.tag_url <- function(x) tag_url(x) -tag_translate.tag_itemize <- function(x) tag_itemize1(x) -tag_translate.tag_dontrun <- function(x) tag_dontrun(x) -tag_translate.tag_enumerate <- function(x) tag_itemize1(x) -tag_translate.tag_describe <- function(x) tag_describe(x) -tag_translate.tag_subsection <- function(x) tag_sub_section(x) -tag_translate.tag_method <- function(x) tag_method(x) -tag_translate.tag_href <- function(x) tag_href(x) -tag_translate.LIST <- function(x) tag_LIST(x) - -tag_translate.default <- function(x) { - stop(paste0("Class '", class(x)[[1]], "' not recognized. Value: ", x)) -} - -## Revisit when able --------- -tag_translate.USERMACRO <- function(x) x[[2]] -tag_translate.tag_Sexpr <- function(x) "" -## ---------------------------- - -tag_single <- function(x, rm_return = TRUE) { - out <- tag_translate(x) - if (rm_return) out <- map_chr(out, remove_return) - out -} - -tag_flatten <- function(x) { - x <- list_c(map(x, tag_single)) - c(x, new_paragraph_symbol) -} - -new_paragraph_symbol <- "<<<<<<<<<<<<<<<<<<<<<<<<<" -do_not_run_symbol <- ";;;;;;;;;;;;;;;;;;;;;;;;;" - -remove_return <- function(x) { - if (trimws(x) == "\n") { - x <- new_paragraph_symbol - } - remove_generic(x) -} - -remove_return_code <- function(x) { - if (x == "\n") x <- "```r" - remove_generic(x) -} - -remove_generic <- function(x) { - if (substr(x, nchar(x), nchar(x)) == "\n") { - x <- substr(x, 1, nchar(x) - 1) - x <- paste0(x, " ") - } - x -} - -## -------------------------- Atomic RD tag functions --------------------------- - -tag_LIST <- function(x) { - paste0(paste(map(x, tag_single), collapse = ""), "\n") -} - -tag_describe <- function(x) { - out <- map(list_c(x), \(.x) .x[[1]]) - out <- map(out, tag_single) - out_nulls <- !map_lgl(out, is.null) - out <- out[out_nulls] - reduce(out, function(x, y) c(x, new_paragraph_symbol, y)) -} - -tag_dontrun <- function(x) { - out <- list_c(map(x, tag_single)) - c(do_not_run_symbol, out) -} - -tag_sub_section <- function(x) { - x_class <- map_chr(x, class) - if (any(x_class == "tag")) { - out <- map(x, function(x) list_c(map(x, tag_single))) - } else { - out <- map(x, tag_single) - } - out <- map(list_c(out), remove_return) - c(out, new_paragraph_symbol) -} - -tag_itemize1 <- function(x) { - x <- map(x, tag_single, FALSE) - x <- map(list_c(x), remove_return) - c(x, new_paragraph_symbol) -} - -tag_code <- function(x) { - x <- reduce(map(x, tag_single), paste0) - paste0("`", x, "`") -} - -tag_preformatted <- function(x) { - x <- reduce(as.character(x), function(x, y) c(x, new_paragraph_symbol, y)) - c("```", new_paragraph_symbol, x, new_paragraph_symbol, "```") -} - -tag_url <- function(x) { - label_list <- map_chr(x, ~.x) - label <- paste0(label_list, collapse = " ") - res <- paste0("[", label, "](", label, ")") -} - -tag_href <- function(x) { - address <- as.character(x[[1]]) - label_list <- map_chr(x[[2]], tag_single) - label <- paste0(label_list, collapse = " ") - res <- paste0("[", label, "](", address, ")") -} - -tag_method <- function(x) { - c( - paste0(s3_label, x[[2]], "'"), - as.character(x[[1]]) - ) -} - -tag_deqn <- function(x) { - x_list <- map_chr(x, as.character) - c("$", x_list, "$") -}