diff --git a/notebooks/community/model_garden/model_garden_translation_llm_translation_and_evaluation.ipynb b/notebooks/community/model_garden/model_garden_translation_llm_translation_and_evaluation.ipynb new file mode 100644 index 000000000..885ae47d5 --- /dev/null +++ b/notebooks/community/model_garden/model_garden_translation_llm_translation_and_evaluation.ipynb @@ -0,0 +1,497 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "OsXAs2gcIpbC" + }, + "outputs": [], + "source": [ + "# Copyright 2024 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "99c1c3fc2ca5" + }, + "source": [ + " # Vertex AI Model Garden - TranslationLLM Translation and Evaluation (Demo)\n", + "\n", + "\n", + " \n", + " \n", + "
\n", + " \n", + " \"Google
Run in Colab Enterprise\n", + "
\n", + "
\n", + " \n", + " \"GitHub
View on GitHub\n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AtUbwvxier8E" + }, + "source": [ + "## Overview\n", + "\n", + "In this tutorial, you will learn how to use the *Vertex AI Python SDK* to generate translation responses and then use the *Gen AI Evaluation Service* to measure the translation quality of your LLM responses using [BLEU](https://en.wikipedia.org/wiki/BLEU), [MetricX](https://github.com/google-research/metricx) and [COMET](https://unbabel.github.io/COMET/html/index.html).\n", + "\n", + "### Costs\n", + "\n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing), [Cloud Storage pricing](https://cloud.google.com/storage/pricing), and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CCkzFPrEer8F" + }, + "source": [ + "## Getting Started" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "N0Wchgzqer8F" + }, + "source": [ + "### Install Vertex AI Python SDK for Gen AI Evaluation Service and Cloud translation Python client" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "T-Cgoq37er8F" + }, + "outputs": [], + "source": [ + "%pip install --upgrade --user --quiet google-cloud-aiplatform[evaluation] google-cloud-translate" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "qP4ihOCkEBje" + }, + "outputs": [], + "source": [ + "# @title Import libraries\n", + "import os\n", + "\n", + "import pandas as pd\n", + "import vertexai\n", + "from google.cloud import aiplatform, translate_v3\n", + "from IPython.display import Markdown, display\n", + "from vertexai import evaluation\n", + "from vertexai.evaluation.metrics import pointwise_metric" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6UbmJ_MyTpw_" + }, + "source": [ + "### Define Google Cloud Project Information" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "N-i6OtB9er8G" + }, + "outputs": [], + "source": [ + "# Get the default project id and region.\n", + "PROJECT_ID = os.environ[\"GOOGLE_CLOUD_PROJECT\"]\n", + "# @markdown If you want to use a different region, please make sure the region is supported by Vertex AI Evaluation.\n", + "# @markdown Please refer to https://cloud.google.com/vertex-ai/generative-ai/docs/learn/locations#eval-locations.\n", + "REGION = os.environ[\"GOOGLE_CLOUD_REGION\"]\n", + "\n", + "# @markdown **[Optional]** Set the experiment name for your experiment.\n", + "EXPERIMENT_NAME = \"my-eval-task-experiment\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ka4wP_ljer8G" + }, + "source": [ + "### Initialize Vertex AI SDK and Google Cloud Translation client." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "kfTC7tqka78-" + }, + "outputs": [], + "source": [ + "client = translate_v3.TranslationServiceClient()\n", + "vertexai.init(project=PROJECT_ID, location=REGION)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JyCI4QfPgm0R" + }, + "source": [ + "## Helper Functions" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "gT_OJBHfCg4Q" + }, + "outputs": [], + "source": [ + "# @title Display evaluation result.\n", + "def display_eval_result(eval_result, metrics=None, model_name=None, rows=0):\n", + " \"\"\"Display the evaluation results.\"\"\"\n", + " if model_name is not None:\n", + " display(Markdown(\"## Eval Result for %s\" % model_name))\n", + "\n", + " summary_metrics, metrics_table = (\n", + " eval_result.summary_metrics,\n", + " eval_result.metrics_table,\n", + " )\n", + "\n", + " metrics_df = pd.DataFrame.from_dict(summary_metrics, orient=\"index\").T\n", + " if metrics:\n", + " metrics_df = metrics_df.filter(\n", + " [\n", + " metric\n", + " for metric in metrics_df.columns\n", + " if any(selected_metric in metric for selected_metric in metrics)\n", + " ]\n", + " )\n", + " metrics_table = metrics_table.filter(\n", + " [\n", + " metric\n", + " for metric in metrics_table.columns\n", + " if any(selected_metric in metric for selected_metric in metrics)\n", + " ]\n", + " )\n", + "\n", + " # Display the summary metrics\n", + " display(Markdown(\"### Summary Metrics\"))\n", + " display(metrics_df)\n", + " if rows > 0:\n", + " # Display samples from the metrics table\n", + " display(Markdown(\"### Row-based Metrics\"))\n", + " display(metrics_table.head(rows))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "TPvnBvTdgm0R" + }, + "outputs": [], + "source": [ + "# @title Translate text.\n", + "def translate_text(\n", + " text: str,\n", + " source_language_code: str,\n", + " target_language_code: str,\n", + ") -> translate_v3.TranslationServiceClient:\n", + " \"\"\"Translating Text from English.\n", + "\n", + " Args:\n", + " text: The content to translate.\n", + " source_language_code: The language code for the text.\n", + " target_language_code: The language code for the translation. E.g. \"fr\" for\n", + " French, \"es\" for Spanish, etc. Available languages:\n", + " https://cloud.google.com/translate/docs/languages#neural_machine_translation_model\n", + " \"\"\"\n", + " parent = f\"projects/{PROJECT_ID}/locations/us-central1\"\n", + " # @markdown Translate text from English to `target_language_code` (your chosen language) using the Translate LLM model.\n", + " # @markdown 1. Translate LLM is available in us-central1.\n", + " # @markdown 2. Supported mime types are listed in https://cloud.google.com/translate/docs/supported-formats.\n", + " response = client.translate_text(\n", + " contents=[text],\n", + " target_language_code=target_language_code,\n", + " parent=parent,\n", + " mime_type=\"text/plain\",\n", + " source_language_code=source_language_code,\n", + " model=f\"{parent}/models/general/translation-llm\", # Use Translate LLM.\n", + " )\n", + "\n", + " # Display the translation for each input text provided\n", + " for translation in response.translations:\n", + " print(f\"Translated text: {translation.translated_text}\")\n", + " # Example response:\n", + " # Translated text: Bonjour comment vas-tu aujourd'hui?\n", + "\n", + " return response" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fQP9YjD_g0Fw" + }, + "source": [ + "## Getting Translations" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "HKDL4oCFbeEc" + }, + "outputs": [], + "source": [ + "# @title Try out a translation.\n", + "translations = translate_text(\n", + " text=\"Dem Feuer konnte Einhalt geboten werden\",\n", + " source_language_code=\"de\",\n", + " target_language_code=\"en\",\n", + ")\n", + "translations" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "YIB0wnHfgbVN" + }, + "outputs": [], + "source": [ + "# @title Generate translations.\n", + "\n", + "# Define original text.\n", + "sources = [\n", + " \"Dem Feuer konnte Einhalt geboten werden\",\n", + " \"Schulen und Kindergärten wurden eröffnet.\",\n", + "]\n", + "\n", + "# Generate responses.\n", + "translations = []\n", + "for source in sources:\n", + " translation = (\n", + " translate_text(\n", + " text=source, target_language_code=\"en\", source_language_code=\"de\"\n", + " )\n", + " .translations[0]\n", + " .translated_text\n", + " )\n", + " translations.append(translation)\n", + "\n", + "translations" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KwgQlLkSg5XP" + }, + "source": [ + "## Evaluating Your translations" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "XGY40wjrQWOc" + }, + "outputs": [], + "source": [ + "# @title Prepare evaluation dataset.\n", + "\n", + "# These are the references we will send for evaluation.\n", + "references = [\n", + " \"They were able to control the fire.\",\n", + " \"Schools and kindergartens opened\",\n", + "]\n", + "\n", + "# Define evaluation dataset using the responses generated.\n", + "eval_dataset = pd.DataFrame(\n", + " {\n", + " \"source\": sources,\n", + " \"response\": translations,\n", + " \"reference\": references,\n", + " }\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jo7ahGbnnYfp" + }, + "source": [ + "### Set up eval metrics for your data." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aG3kUfTmoAwb" + }, + "source": [ + "You can evaluate the translation quality of your data generated from an LLM using any of the metrics below.\n", + "\n", + "- [BLEU](https://en.wikipedia.org/wiki/BLEU):\\\n", + "BLEU calculates a score from 0 to 1 based on how many matching words and phrases appear in a machine translation compared to a human reference, with higher scores indicating better quality.\n", + "\n", + "- [COMET](https://unbabel.github.io/COMET/html/index.html):\\\n", + "COMET uses a neural network to produce a score typically between 0 and 1, reflecting the similarity between a machine translation and a human reference, where higher scores mean better quality.\n", + "\n", + "- [MetricX](https://github.com/google-research/metricx):\\\n", + "Metric-X is a LLM-based evaluation metric for translation quality measurement that aims at maching the Multidimensional Quality Metrics (MQM) score range of 0 (best) to 25 (worst). It is a newer and improved version of Bluert-X that was published publicly by Google.\n", + "\n", + "See [documentations](https://github.com/googleapis/python-aiplatform/blob/main/vertexai/evaluation/metrics/pointwise_metric.py) for more information about supported COMET and MetricX versions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "EGe_vlUvPVOM" + }, + "outputs": [], + "source": [ + "metrics = [\n", + " \"bleu\",\n", + " pointwise_metric.Comet(),\n", + " pointwise_metric.MetricX(),\n", + "]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZJg5FdWfnhjz" + }, + "source": [ + "### Run evaluation\n", + "\n", + "With the evaluation dataset and metrics defined, you can run evaluation for an `EvalTask` on different models and applications, and many other use cases." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "rOvo-LpsQTIj" + }, + "outputs": [], + "source": [ + "eval_task = evaluation.EvalTask(\n", + " dataset=eval_dataset, metrics=metrics, experiment=EXPERIMENT_NAME\n", + ")\n", + "eval_result = eval_task.evaluate()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GwnZMsXBnjS7" + }, + "source": [ + "You can view the summary metrics and row-based metrics for each response in the `EvalResult`.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "2QOFq9YZROPr" + }, + "outputs": [], + "source": [ + "display_eval_result(eval_result, rows=2)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BLMKgL2OnCyt" + }, + "source": [ + "## Clean up" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "Dv8drshKnEf2" + }, + "outputs": [], + "source": [ + "# @title Delete ExperimentRun\n", + "delete_experiment = False\n", + "if delete_experiment:\n", + " aiplatform.ExperimentRun(\n", + " run_name=eval_result.metadata[\"experiment_run\"],\n", + " experiment=eval_result.metadata[\"experiment\"],\n", + " ).delete()" + ] + } + ], + "metadata": { + "colab": { + "name": "model_garden_translation_llm_translation_and_evaluation.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +}