Skip to content

Commit

Permalink
Merge pull request #24 from lamalab-org/beyond_images
Browse files Browse the repository at this point in the history
  • Loading branch information
kjappelbaum authored Jun 4, 2024
2 parents a16e4c6 + f865078 commit 21dcac4
Show file tree
Hide file tree
Showing 12 changed files with 303 additions and 96 deletions.
261 changes: 168 additions & 93 deletions beyond_text/beyond_images.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -27,16 +27,29 @@
"source": [
"NLP-LLMs tend to have problems with analysing and understanding complex structures such as tables, plots and images included in scientific articles. Since especially in chemistry and material science information about chemical components is included in these, one should think about different approaches for these structures. Therefore, vision language models (VLMs) since they can analyse images alongside text. There are several open and closed-source VLMs available e.g. [Vision models from OpenAI](https://platform.openai.com/docs/guides/vision), [Claude models](https://docs.anthropic.com/en/docs/vision) and [DeepSeek-VL](https://github.com/deepseek-ai/DeepSeek-VL). As an example the extraction of images with [GPT4-o](https://platform.openai.com/docs/models/gpt-4o) is shown:\n",
"\n",
"First one has to convert the file into images. As an example a PDF file obtained in the [Section 1](../obtaining_data/data_mining.ipynb) is converted into images."
"First one has to convert the file into images."
],
"metadata": {
"collapsed": false
},
"id": "f3a5e72798406fa1"
},
{
"cell_type": "markdown",
"source": [
"::: {.callout-note}\n",
"\n",
"The used PDF file was obtained in the [Section 1](../obtaining_data/data_mining.ipynb).\n",
":::"
],
"metadata": {
"collapsed": false
},
"id": "4d590539057c1dc3"
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 3,
"outputs": [],
"source": [
"from pdf2image import convert_from_path\n",
Expand All @@ -49,8 +62,8 @@
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2024-05-28T07:54:02.942697Z",
"start_time": "2024-05-28T07:54:02.547049Z"
"end_time": "2024-06-03T10:42:11.479846Z",
"start_time": "2024-06-03T10:42:10.275394Z"
}
},
"id": "b886d18ec7e86797"
Expand All @@ -67,7 +80,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 4,
"outputs": [
{
"name": "stderr",
Expand Down Expand Up @@ -177,8 +190,8 @@
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2024-05-28T07:54:08.090636Z",
"start_time": "2024-05-28T07:54:02.944862Z"
"end_time": "2024-06-03T10:42:19.110545Z",
"start_time": "2024-06-03T10:42:11.483622Z"
}
},
"id": "19b39a0896010c5f"
Expand All @@ -193,9 +206,23 @@
},
"id": "fd238c06473113b7"
},
{
"cell_type": "markdown",
"source": [
"::: {.callout-tip}\n",
"## Prompt\n",
"\n",
"This is a very simple example prompt. One should optimize and engineer the prompt before usage. For that one could use a tool like [DSPy](https://github.com/stanfordnlp/dspy).\n",
":::"
],
"metadata": {
"collapsed": false
},
"id": "dc0b4943b2bf26a4"
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 5,
"outputs": [],
"source": [
"# the text prompt text for the model call gets defined\n",
Expand Down Expand Up @@ -225,134 +252,182 @@
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2024-05-28T07:54:08.094290Z",
"start_time": "2024-05-28T07:54:08.092567Z"
"end_time": "2024-06-03T10:42:19.111092Z",
"start_time": "2024-06-03T10:42:19.104520Z"
}
},
"id": "f95a25d2099f080d"
},
{
"cell_type": "markdown",
"source": [
"To call the actual model one could use [LiteLLM](https://github.com/BerriAI/litellm) instead of directly using an API like the OpenAI-API. So one could easily switch between different models for different publishers."
],
"metadata": {
"collapsed": false
},
"id": "21daa806c07c25b5"
},
{
"cell_type": "markdown",
"source": [
"::: {.callout-important}\n",
"## API-Key\n",
"\n",
"One has to provide his or her own API-key in the .env file. \n",
":::"
],
"metadata": {
"collapsed": false
},
"id": "9c11896b70b6aa6a"
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 7,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Output: {\n",
" \"Buchwald-Hartwig Reaction Conditions\": [\n",
" {\n",
" \"Reaction Type\": \"Cross-coupling\",\n",
" \"Key Step\": \"Reaction of an α-amino-BODIPY and the respective halide\",\n",
" \"Catalyst\": \"Pd(OAc)2\",\n",
" \"Ligand\": \"(±)-BINAP\",\n",
" \"Base\": \"Cs2CO3\",\n",
" \"Solvent\": \"PhMe\",\n",
" \"Temperature\": \"80 °C\",\n",
" \"Time\": \"1.5 h\",\n",
" \"Yield\": \"Up to 68% for unsymmetric BODIPYs\"\n",
" },\n",
" {\n",
" \"Monomer\": \"α-chloro- and α-amino-BODIPYs\",\n",
" \"Catalyst\": \"Pd(OAc)2\",\n",
" \"Ligand\": \"(±)-BINAP\",\n",
" \"Base\": \"Cs2CO3\",\n",
" \"Solvent\": \"PhMe\",\n",
" \"Temperature\": \"80 °C\",\n",
" \"Yield\": \"Up to 68% for unsymmetric BODIPYs\"\n",
" },\n",
" {\n",
" \"Monomer\": \"Br-Ar-mono-NH2\",\n",
" \"Catalyst\": \"Pd(OAc)2\",\n",
" \"Ligand\": \"(±)-BINAP\",\n",
" \"Base\": \"Cs2CO3\",\n",
" \"Solvent\": \"PhMe\",\n",
" \"Temperature\": \"80 °C\",\n",
" \"Time\": \"1.5 h\",\n",
" \"Yield\": \"Up to 68% for unsymmetric BODIPYs\"\n",
" }\n",
" ],\n",
" \"Additional Notes\": [\n",
" {\n",
" \"Note\": \"The reaction showed a trend of improvement with increasing level of substitution of the BODIPY core.\"\n",
" },\n",
" {\n",
" \"Note\": \"The reaction of Br-Ar-mono-Br with EDM-Ar-mono-Br required slow addition of Br-Ar-mono-NH2 to a heated solution of the remaining reagents.\"\n",
" },\n",
" {\n",
" \"Note\": \"The reaction yielded 44% of the functionalized dimer, while 45% of the starting material was recovered.\"\n",
" }\n",
" ]\n",
"Output: Here is the extracted information about Buchwald-Hartwig reactions from the provided images:\n",
"\n",
"```json\n",
"{\n",
" \"Buchwald-Hartwig Reactions\": {\n",
" \"Key Step\": \"Cross-coupling reaction of an α-amino-BODIPY and the respective halide.\",\n",
" \"Conditions\": [\n",
" {\n",
" \"Reagents\": [\n",
" \"Pd(OAc)2\",\n",
" \"(±)-BINAP\",\n",
" \"Cs2CO3\",\n",
" \"PhMe\"\n",
" ],\n",
" \"Temperature\": \"80 °C\",\n",
" \"Time\": \"1-5 h\",\n",
" \"Yield\": \"up to 68%\"\n",
" }\n",
" ],\n",
" \"Monomers\": [\n",
" {\n",
" \"Type\": \"α-chlorinated or α-amino-BODIPYs\",\n",
" \"Reagents\": [\n",
" \"Pd(OAc)2\",\n",
" \"(±)-BINAP\",\n",
" \"Cs2CO3\",\n",
" \"PhMe\"\n",
" ],\n",
" \"Temperature\": \"80 °C\",\n",
" \"Time\": \"1-5 h\",\n",
" \"Yield\": \"up to 68%\"\n",
" }\n",
" ],\n",
" \"Functionalized Monomers\": [\n",
" {\n",
" \"Type\": \"Br-Ar-mono-Br or Br-Ar-di\",\n",
" \"Reagents\": [\n",
" \"Pd(OAc)2\",\n",
" \"(±)-BINAP\",\n",
" \"Cs2CO3\",\n",
" \"PhMe\"\n",
" ],\n",
" \"Temperature\": \"80 °C\",\n",
" \"Time\": \"1-5 h\",\n",
" \"Yield\": \"44% for Br-Ar-mono-Br, 45% of starting material recovered\"\n",
" }\n",
" ],\n",
" \"Procedure\": \"Stirring slow addition of Br-Ar-mono-NH2 to a heated solution of the remaining reagents.\",\n",
" \"Selectivity\": \"Maintained excess of Br-Ar-mono-Br to avoid further oligomerization.\"\n",
" }\n",
"}\n",
"Input-token used: 6706 Output_token used: 436\n"
"```\n",
"Input tokens used: 6704 Output tokens used: 387\n"
]
}
],
"source": [
"from openai import OpenAI\n",
"import os\n",
"from dotenv import load_dotenv\n",
"from litellm import completion\n",
"\n",
"# the openai api gets called; the temperature is set to 0 since the output should have a high accuracy\n",
"# the gpt4-o model is used since this is the cheapest and fastest openai vision model\n",
"def call_openai(\n",
" prompt, model=\"gpt-4o\", temperature: float = 0.0, **kwargs\n",
"):\n",
" \"\"\"Call chat openai model\n",
"# Define the function to call the LiteLLM API\n",
"def call_litellm(prompt, model=\"gpt-4o\", temperature: float = 0.0, **kwargs):\n",
" \"\"\"Call LiteLLM model\n",
"\n",
" Args:\n",
" prompt (str): Prompt to send to model\n",
" model (str, optional): Name of the API. Defaults to \"\"gpt-4-vision-preview\".\n",
" temperature (float, optional): inference temperature. Defaults to 0.\n",
" model (str, optional): Name of the API. Defaults to \"gpt-4o\".\n",
" temperature (float, optional): Inference temperature. Defaults to 0.\n",
"\n",
" Returns:\n",
" dict: new data\n",
" dict: New data\n",
" \"\"\"\n",
" client = OpenAI()\n",
" completion = client.chat.completions.create(\n",
" messages = [\n",
" {\n",
" \"role\": \"system\",\n",
" \"content\": (\n",
" \"You are a scientific assistant, extracting important information about reaction conditions \"\n",
" \"out of PDFs in valid JSON format. Extract just data which you are 100% confident about the \"\n",
" \"accuracy. Keep the entries short without details. Be careful with numbers.\"\n",
" ),\n",
" },\n",
" {\"role\": \"user\", \"content\": prompt},\n",
" ]\n",
"\n",
" response = completion(\n",
" model=model,\n",
" messages=[\n",
" {\n",
" \"role\": \"system\",\n",
" \"content\": \"You are a scientific assistant, extracting important information about polymerization conditions\"\n",
" \"out of pdfs in valid json format. Extract just data which you are 100% confident about the \"\n",
" \"accuracy. Keep the entries short without details. Be careful with numbers.\",\n",
" },\n",
" {\"role\": \"user\", \"content\": prompt},\n",
" ],\n",
" messages=messages,\n",
" temperature=temperature,\n",
" response_format={\"type\": \"json_object\"},\n",
" **kwargs,\n",
" )\n",
" # the input and output token are reported in order to track costs of the api calls\n",
" input_tokens = completion.usage.prompt_tokens\n",
" output_token = completion.usage.completion_tokens\n",
" # the output of the model call is saved\n",
" message_content = completion.choices[0].message.content\n",
" return message_content, input_tokens, output_token\n",
"\n",
"# the openai api key is loading\n",
"dotenv_path = '../OPENAI_KEY.env'\n",
"\n",
" # Extract and return the message content and token usage\n",
" message_content = response['choices'][0]['message']['content']\n",
" input_tokens = response['usage']['prompt_tokens']\n",
" output_tokens = response['usage']['completion_tokens']\n",
" return message_content, input_tokens, output_tokens\n",
"\n",
"# Load the OpenAI API key from environment variables\n",
"dotenv_path = '../.env'\n",
"load_dotenv(dotenv_path)\n",
"api_key = os.getenv(\"OPENAI_API_KEY\")\n",
"client = OpenAI(api_key=api_key)\n",
"\n",
"# the openai api is called and the used token and the output printed\n",
"output, input_token, output_token = call_openai(prompt=prompt)\n",
"# Set the API key for LiteLLM\n",
"os.environ[\"OPENAI_API_KEY\"] = api_key\n",
"\n",
"# Call the LiteLLM API and print the output and token usage\n",
"output, input_tokens, output_tokens = call_litellm(prompt=prompt)\n",
"print('Output: ', output)\n",
"print('Input-token used:', input_token, ' Output_token used: ', output_token)"
"print('Input tokens used:', input_tokens, 'Output tokens used:', output_tokens)"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2024-05-28T07:55:06.274592Z",
"start_time": "2024-05-28T07:54:49.753036Z"
"end_time": "2024-06-03T10:56:02.715950Z",
"start_time": "2024-06-03T10:55:46.810446Z"
}
},
"id": "3d37e685fef815ab"
},
{
"cell_type": "markdown",
"source": [
"::: {.callout-tip}\n",
"\n",
"To get only the json part of the output, one could use Regex to extract this content. \n",
":::"
],
"metadata": {
"collapsed": false
},
"id": "b4ca2e31fcdbed75"
},
{
"cell_type": "markdown",
"source": [
"Since in the article there is no experimental section provided the model just extracted general information about the reactions. It failed with extraction the data provided in the reaction schemes. To extract this information, one should use tools presented in the [agentic section](link to agentic section).\n",
"\n",
"Now one could use this structured output to build up a database of Buchwald-Hartwig-Coupling reactions. "
],
"metadata": {
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 21dcac4

Please sign in to comment.