diff --git a/learn/fine-tuning/FineTuningAPI.ipynb b/learn/fine-tuning/FineTuningAPI.ipynb new file mode 100644 index 0000000..7b2cd1a --- /dev/null +++ b/learn/fine-tuning/FineTuningAPI.ipynb @@ -0,0 +1,741 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 347, + "metadata": {}, + "outputs": [], + "source": [ + "account_id = \"ioannidu-0dd70b\"\n", + "api_key=\"fw_3ZcAXU8JWqvW1WNhSPEHJBa7\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Fine Tunning Models using APIs\n", + "\n", + "We will present an example of fine tunning a model using [APIs](https://docs.fireworks.ai/api-reference/introduction). Following API calls can be used for setting up automated fine tuning and inference. A similar example is presented using `firect` in [Fine-tuning models Documentation](https://docs.fireworks.ai/fine-tuning/fine-tuning-models) for a more interactive approach. \n", + "\n", + "In this noteboook we will show how to:\n", + "\n", + "- Prepare a dataset.\n", + "- Initiate and run a tuning job given the Dataset prepared. We present a text completion example.\n", + "- Deploy fine tuned model.\n", + "- Use fine tuned model for inference.\n", + "- How to troubleshoot in case of errors.\n", + "- Clean up all resources produced." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## General Setup\n", + "\n", + "Please instantiate variables `account_id` and `api_key` that match your credentials.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "account_id = \"\"\n", + "api_key = \"\"" + ] + }, + { + "cell_type": "code", + "execution_count": 348, + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "import json\n", + "import os\n", + "import time" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prepare Dataset\n", + "\n", + "In this Notebook we will use a sample Dataset consisting of the following information:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 349, + "metadata": {}, + "outputs": [], + "source": [ + "\"\"\"\n", + "{\"instruction\": \"When did Virgin Australia start operating?\", \"context\": \"Virgin Australia, the trading name of Virgin Australia Airlines Pty Ltd, is an Australian-based airline. It is the largest airline by fleet size to use the Virgin brand. It commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route. It suddenly found itself as a major airline in Australia's domestic market after the collapse of Ansett Australia in September 2001. The airline has since grown to directly serve 32 cities in Australia, from hubs in Brisbane, Melbourne and Sydney.\", \"response\": \"Virgin Australia commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route.\", \"category\": \"closed_qa\"}\n", + "{\"instruction\": \"Which is a species of fish? Tope or Rope\", \"context\": \"\", \"response\": \"Tope\", \"category\": \"classification\"}\n", + "{\"instruction\": \"Why can camels survive for long without water?\", \"context\": \"\", \"response\": \"Camels use the fat in their humps to keep them filled with energy and hydration for long periods of time.\", \"category\": \"open_qa\"}\n", + "{\"instruction\": \"Alice's parents have three daughters: Amy, Jessy, and what\\u2019s the name of the third daughter?\", \"context\": \"\", \"response\": \"The name of the third daughter is Alice\", \"category\": \"open_qa\"}\n", + "{\"instruction\": \"When was Tomoaki Komorida born?\", \"context\": \"Komorida was born in Kumamoto Prefecture on July 10, 1981. After graduating from high school, he joined the J1 League club Avispa Fukuoka in 2000. Although he debuted as a midfielder in 2001, he did not play much and the club was relegated to the J2 League at the end of the 2001 season. In 2002, he moved to the J2 club Oita Trinita. He became a regular player as a defensive midfielder and the club won the championship in 2002 and was promoted in 2003. He played many matches until 2005. In September 2005, he moved to the J2 club Montedio Yamagata. In 2006, he moved to the J2 club Vissel Kobe. Although he became a regular player as a defensive midfielder, his gradually was played less during the summer. In 2007, he moved to the Japan Football League club Rosso Kumamoto (later Roasso Kumamoto) based in his local region. He played as a regular player and the club was promoted to J2 in 2008. Although he did not play as much, he still played in many matches. In 2010, he moved to Indonesia and joined Persela Lamongan. In July 2010, he returned to Japan and joined the J2 club Giravanz Kitakyushu. He played often as a defensive midfielder and center back until 2012 when he retired.\", \"response\": \"Tomoaki Komorida was born on July 10,1981.\", \"category\": \"closed_qa\"}\n", + "\"\"\";" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The Dataset above is stored in a file called `sampleDataset.jsonl`. \n" + ] + }, + { + "cell_type": "code", + "execution_count": 350, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Size of file in ./sampleDataset.jsonl = 2633 bytes\n" + ] + } + ], + "source": [ + "file_path = \"./sampleDataset.jsonl\" \n", + "file_name = \"sampleDataset.jsonl\"\n", + "file_size_in_bytes = os.stat(file_path).st_size\n", + "print('Size of file in', file_path, '=', file_size_in_bytes, 'bytes')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To prepare our Dataset for fine-tuning:\n", + "\n", + "- Create a Dataset record.\n", + "- Upload the data from the file to the Dataset record.\n", + "- Validate the upload. Note that this is a necessary step to complete the process.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create Dataset Record\n", + "\n", + "If `datasetId` is not provided a random id wll be set up by the system. For conveninence, it is best to provide an `id` for the dataset." + ] + }, + { + "cell_type": "code", + "execution_count": 351, + "metadata": {}, + "outputs": [], + "source": [ + "url = f\"https://api.fireworks.ai/v1/accounts/{account_id}/datasets\"\n", + "\n", + "dataset_id = \"my-sample-dataset\"\n", + "\n", + "headers = {\n", + " \"Authorization\": \"Bearer \" + api_key,\n", + " \"Content-Type\": \"application/json\"\n", + "}\n", + "\n", + "payload = {\n", + " \"dataset\": {\n", + " \"displayName\": \"mySampleDataset\",\n", + " \"format\": \"COMPLETION\",\n", + " \"exampleCount\": \"5\"\n", + " },\n", + " \"datasetId\": dataset_id\n", + "}\n", + "\n", + "response = requests.request(\"POST\", url, json=payload, headers=headers)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 354, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Current state of Dataset Create = UPLOADING\n", + "{\n", + " \"createTime\": \"2024-10-28T04:51:26.185571Z\",\n", + " \"displayName\": \"mySampleDataset\",\n", + " \"exampleCount\": \"5\",\n", + " \"format\": \"COMPLETION\",\n", + " \"name\": \"accounts/ioannidu-0dd70b/datasets/my-sample-dataset\",\n", + " \"state\": \"UPLOADING\",\n", + " \"status\": {\n", + " \"code\": \"OK\",\n", + " \"message\": \"\"\n", + " },\n", + " \"userUploaded\": {}\n", + "}\n", + "Dataset creation terminated with final state: UPLOADING\n" + ] + } + ], + "source": [ + "# The request to create a Dataset should change the state of the Dataset to \"UPLOADING\".\n", + "\n", + "# Get state\n", + "dataset_create_dict = json.loads(response.text)\n", + "state = dataset_create_dict[\"state\"]\n", + "print(\"Current state of Dataset Create = \", state)\n", + "print(json.dumps(dataset_create_dict, indent=4))\n", + "\n", + "# Wait until state is \"READY\" \n", + "headers = {\"Authorization\": \"Bearer \" + api_key}\n", + "url = f\"https://api.fireworks.ai/v1/accounts/{account_id}/datasets/{dataset_id}\"\n", + "response = requests.request(\"GET\", url, headers=headers)\n", + "dataset = json.loads(response.text)\n", + "state = dataset[\"state\"]\n", + "# In the following loop we will wait for dataset create to be \"READY\".\n", + "# We could optinally add a time out for the case of the state being stuck at \"UPLOADING\" state.\n", + "#while state != \"READY\":\n", + "# # Update state of the dataset\n", + "# time.sleep(0.1)\n", + "# response = requests.request(\"GET\", url, headers=headers)\n", + "# dataset = json.loads(response.text)\n", + "# state = dataset[\"state\"]\n", + "\n", + "#print(\"Dataset creation terminated with final state:\", state)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Upload Dataset" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First, upload the Dataset created. This step will create a signed ulr to later upload the data fromm local file." + ] + }, + { + "cell_type": "code", + "execution_count": 355, + "metadata": {}, + "outputs": [], + "source": [ + "url = f\"https://api.fireworks.ai/v1/accounts/{account_id}/datasets/{dataset_id}:getUploadEndpoint\"\n", + "\n", + "headers = {\n", + " \"Authorization\": \"Bearer \" + api_key,\n", + " \"Content-Type\": \"application/json\"\n", + "}\n", + "payload = {\"filenameToSize\": {\"sampleDataset.jsonl\": file_size_in_bytes}}\n", + "\n", + "response_dataset_create = requests.request(\"POST\", url, json=payload, headers=headers)\n", + "dataset_ulopad_dict = json.loads(response_dataset_create.text)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 356, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Signed url: https://storage.googleapis.com/fireworks-artifacts-ioannidu-0dd70b-44c3f6/dataset/my-sample-dataset-be5f28/sampleDataset.jsonl?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=fireworks-control-plane%40fw-ai-cp-prod.iam.gserviceaccount.com%2F20241028%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20241028T051225Z&X-Goog-Expires=21599&X-Goog-Signature=632c2367b62bd3e07d44a0e624b49500839f28b53237877341cab0aa18095197110830c7c9388d373879496117ff320188d37f259bd7358f9863bf7c0c995079915973e88d51139415c6011d6ad32209866675b3c5350ab336383bb7e0ebe5adc928efc600418e7a2beed8ae5a5162bfa1924a79e9eb414710cf45ac0f8c826d7527d3e21dd8408f7efb3326d14e2230cb3c9dbfa5c0ccfd9db89773ede5ba9cdff2a5a2d25af433b4af644f6fc6280dffda071059d0ea67e82704629c899877a241a2cdf4189ec00bcd40135114bdb00f7d8d758dddf59af6b453a816275e446b706b91313d8ece9782e96f21ca6c5a6bd3c8239a52a5d791d7a14c68c56794&X-Goog-SignedHeaders=content-type%3Bhost%3Bx-goog-content-length-range\n" + ] + } + ], + "source": [ + "signed_url = dataset_ulopad_dict[\"filenameToSignedUrls\"][file_name]\n", + "print(\"Signed url:\", signed_url)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, upload data from local file into the signed url provided." + ] + }, + { + "cell_type": "code", + "execution_count": 373, + "metadata": {}, + "outputs": [], + "source": [ + "headers = {\n", + " \"x-goog-content-length-range\": f\"{file_size_in_bytes}, {file_size_in_bytes}\",\n", + " \"Content-Type\": \"application/octet-stream\"\n", + "}\n", + "\n", + "with open(file_path, 'rb') as file:\n", + " data = file.read()\n", + "\n", + "response_file = requests.request(\"PUT\", signed_url, data=data, headers=headers)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 374, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "File upload failed: 403 SignatureDoesNotMatchAccess denied.
The request signature we calculated does not match the signature you provided. Check your Google secret key and signing method.
GOOG4-RSA-SHA256\n", + "20241028T051225Z\n", + "20241028/auto/storage/goog4_request\n", + "756ae1df7812b89c7d4c850892500019da1c619f38ebde7e5062e55bfed1ef79PUT\n", + "/fireworks-artifacts-ioannidu-0dd70b-44c3f6/dataset/my-sample-dataset-be5f28/sampleDataset.jsonl\n", + "X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=fireworks-control-plane%40fw-ai-cp-prod.iam.gserviceaccount.com%2F20241028%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20241028T051225Z&X-Goog-Expires=21599&X-Goog-SignedHeaders=content-type%3Bhost%3Bx-goog-content-length-range\n", + "content-type:application/octet-stream\n", + "host:storage.googleapis.com\n", + "x-goog-content-length-range:2633, 2633\n", + "\n", + "content-type;host;x-goog-content-length-range\n", + "UNSIGNED-PAYLOAD
\n" + ] + } + ], + "source": [ + "# Check file upload\n", + "if response_file.status_code == 200:\n", + " print(\"File upload was successful!\")\n", + "else:\n", + " print(\"File upload failed:\", response_file.status_code, response_file.text)\n", + " " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Validate Dataset Upload\n", + "\n", + "This is a necessary step to complete the process." + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": {}, + "outputs": [], + "source": [ + "headers = {\n", + " \"Authorization\": \"Bearer \"+api_key,\n", + " \"Content-Type\": \"application/json\"\n", + "}\n", + "\n", + "url = f\"https://api.fireworks.ai/v1/accounts/{account_id}/datasets/{dataset_id}:validateUpload\"\n", + "\n", + "payload = {}\n", + "\n", + "response_dataset_validate_upload = requests.request(\"POST\", url, json=payload, headers=headers)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check that response is {}\n", + "dataset_validate_upload_dict = json.loads(response_dataset_validate_upload.text)\n", + "print(\"Response should be {}. Response =\", dataset_validate_upload_dict)\n", + "\n", + "# Check state of the dataset and ensure that its state is \"READY\"\n", + "# Get Dataset\n", + "url = f\"https://api.fireworks.ai/v1/accounts/{account_id}/datasets/{dataset_id}\"\n", + "headers = {\"Authorization\": \"Bearer \"+api_key}\n", + "response = requests.request(\"GET\", url, headers=headers)\n", + "# Get state\n", + "dataset_dict = json.loads(response.text)\n", + "state = dataset_dict[\"state\"]\n", + "print(\"Current state of Dataset Upload = \", state)\n", + "\n", + "print(json.dumps(dataset_create_dict, indent=4))\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prepare Fine-tuning Job\n", + "\n", + "Using the Dataset create and uploaded next we will create a fine tuning job to train a model. In this example will use as base model `llama-v3p1-8b-instruct` to which the fine-tuning job will add upon given our Dataset." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a Fine-tuning Job\n", + "\n", + "As a result of a successful creation of a fine-tining job, a (fine tuned) trained model will be automatically created. It is recommended to provide a `modelId` during the creation of the fine tuning job, otherwise a random one will be provided for the model created. This example is a text Completion case and we will train based on `context` and `instruction` provided by the Dataset. " + ] + }, + { + "cell_type": "code", + "execution_count": 195, + "metadata": {}, + "outputs": [], + "source": [ + "url = f\"https://api.fireworks.ai/v1/accounts/{account_id}/fineTuningJobs\"\n", + "\n", + "model_id = \"my-model-id\"\n", + "\n", + "payload = {\n", + " \"displayName\": \"mySampleDatasetFinetuningJob\",\n", + " \"dataset\": f\"{dataset_id}\",\n", + " \"modelId\": model_id,\n", + " \"textCompletion\": {\n", + " # How the fields of the JSON dataset should be formatted into the input text.\n", + " \"inputTemplate\": \"### GIVEN THE CONTEXT: {context} ### INSTRUCTION: {instruction} ### RESPONSE IS: \",\n", + " # How the fields of the JSON dataset should be formatted into the output text.\n", + " \"outputTemplate\": \"ANSWER: {response}\"\n", + " },\n", + " \"baseModel\": \"accounts/fireworks/models/llama-v3p1-8b-instruct\",\n", + "}\n", + "\n", + "response_finetuning_job = requests.request(\"POST\", url, json=payload, headers=headers)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Upon creation of a fine tunning job its state will be set to \"CREATED\". A successfull run of the job will cause its state to change to \"PENDING\" (meaning waiting for resource allocation), \"RUNNING\", and \"COMPLETED\". If the job fails to run its state will change to \"FAILED\".\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# To wait for the job to complete first extract the job's id from its name\n", + "headers = {\"Authorization\": \"Bearer \"+api_key}\n", + "\n", + "url = f\"https://api.fireworks.ai/v1/accounts/{account_id}/fineTuningJobs\"\n", + "response = requests.request(\"GET\", url, headers=headers)\n", + "# Get all fine tuning jobs\n", + "finetuning_jobs = json.loads(response.text)\n", + "\n", + "# In this particular case we are getting the first of the jobs in the list as we only created a single job.\n", + "# In an automated sustem we could extract the index of the job in the list of jobs based on name or a given id.\n", + "# Note currently the APIs don't support providing an id for a fine tuned job hence the workaround to find the randomly assigned id.\n", + "\n", + "# The last part of the name of the job is it's randomly assigned id.\n", + "print(\"Fine tuning job's full name =\", finetuning_jobs['fineTuningJobs'][0]['name'])\n", + "fine_tuning_job_id = finetuning_jobs['fineTuningJobs'][0]['name'].split(\"/\")[-1]\n", + "print(\"Fine tuning job's id =\", fine_tuning_job_id)\n", + "\n", + "# Given the job's id now we can wait for the job to complete or fail\n", + "url = f\"https://api.fireworks.ai/v1/accounts/{account_id}/fineTuningJobs/{fine_tuning_job_id}\"\n", + "response = requests.request(\"GET\", url, headers=headers)\n", + "fine_tuning_job = json.loads(response.text)\n", + "state = fine_tuning_job[\"state\"]\n", + "# In the following loop we will wait for the job to either complete and fail.\n", + "# We could optinally add a time out for the case of the state being stuck at PENDING or RUNNING state.\n", + "while state != \"COMPLETED\" and state != \"FAILED\":\n", + " # Update state of the finetuning\n", + " time.sleep(0.1)\n", + " response = requests.request(\"GET\", url, headers=headers)\n", + " fine_tuning_job = json.loads(response.text)\n", + " state = fine_tuning_job[\"state\"]\n", + "\n", + "print(\"Job run with final state:\", state)\n", + "#print(json.dumps(fine_tuning_job, indent=4))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Deploy and Use Fine-tuned Model for Inference" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Deploy Model\n", + "\n", + "Deploying the model is a necessary step before it can be used for inference. \n", + "\n", + "For deploying an on-demand model please refer to [Deployement APIs](https://docs.fireworks.ai/api-reference/create-deployment). In our case, the base model used is serverless and to upload it we would need to call:\n", + " \n", + "```firectl deploy my-model-id``` \n", + " \n", + " Deploying a serverless model (with addons) will not create a `Deployment` but instead a `deployed_model` that can be verified by calling: \n", + " \n", + " ```firect list deployed-models```\n", + "\n", + "Note: `API`s for deploying serverless fine tuned models that will be listed as `deployed-models` will be added shortly. For now please use the `firectl` command above." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Inference\n", + "\n", + "Setting the prompt to the instructions from the Dataset we trained with, will provide the expected response.\n", + "\n", + "The trained model can be identified in a number of ways as explained in the [document on model identifiers](https://docs.fireworks.ai/models/deploying#model-identifier).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To extract the deployed model id that is needed to identify the newly trained model, please use the following command:\n", + "\n", + "`firectl list deployed-models`\n", + "\n", + "Wait until the state of the deployed model is \"DEPLOYED\". The name of the deployed model will be used for inference.\n", + "\n", + "Note: An API will be added shortly for this functionality.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 198, + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "deployed_model_id = \"my-model-id-d3276e03\" # Updated manually for now, will be updated after API is added.\n", + "\n", + "#kleoniioannidou@Kleonis-MacBook-Pro ~ % firectl get deployed-model my-model-id-d3276e03 \n", + "#Name: accounts/ioannidu-0dd70b/deployedModels/my-model-id-d3276e03\n", + "#Create Time: 2024-10-27 18:54:15\n", + "#Created By: ioannidu@fireworks.ai\n", + "#Model: accounts/ioannidu-0dd70b/models/my-model-id\n", + "#Deployment: accounts/fireworks/deployments/ee744c5f\n", + "#Default: true\n", + "#State: DEPLOYED\n", + "#Serverless: true\n", + "#Status: OK\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 199, + "metadata": {}, + "outputs": [], + "source": [ + "model = f'accounts/{account_id}/deployedModels/{deployed_model_id}'\n", + "\n", + "url = \"https://api.fireworks.ai/inference/v1/completions\"\n", + "\n", + "headers = {\n", + " \"Authorization\": \"Bearer \" + api_key,\n", + " \"Content-Type\": \"application/json\"\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Examples of text completion using our trained model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "instruction = \"Which is a species of fish? Tope or Rope\"\n", + "context = \"\"\n", + "\n", + "payload = {\n", + " \"model\": model,\n", + " \"prompt\": f\"### GIVEN THE CONTEXT: {context} ### INSTRUCTION: {instruction} ### RESPONSE IS: \",\n", + " \"max_tokens\": 300,\n", + " \"temperature\": 0,\n", + "}\n", + "\n", + "response = requests.request(\"POST\", url, json=payload, headers=headers)\n", + "\n", + "output = json.loads(response.text)\n", + "\n", + "#print(response.choices[0].message.content)\n", + "print(output[\"choices\"][0][\"text\"])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "instruction = \"Why can camels survive for long without water?\"\n", + "context = \"\"\n", + "\n", + "\n", + "payload = {\n", + " \"model\": model,\n", + " \"prompt\": f\"### GIVEN THE CONTEXT: {context} ### INSTRUCTION: {instruction} ### RESPONSE IS: \",\n", + " \"max_tokens\": 300,\n", + " \"temperature\": 0,\n", + " #\"context_length_exceeded_behavior\": \"truncate\",\n", + "}\n", + "\n", + "response = requests.request(\"POST\", url, json=payload, headers=headers)\n", + "\n", + "output = json.loads(response.text)\n", + "\n", + "print(output['choices'][0]['text'])\n", + "#print(response.choices[0].message.content)\n", + "#print(output['choices'][0]['text'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Troubleshooting\n", + "\n", + "Any of the resources created above can be listed with the APIs shown below. For example, you could list all datasets uploaded and from the list choose any particular dataset to check its state." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "headers = {\"Authorization\": \"Bearer \" + api_key}\n", + "\n", + "# List Datasets\n", + "url = f\"https://api.fireworks.ai/v1/accounts/{account_id}/datasets\"\n", + "response = requests.request(\"GET\", url, headers=headers)\n", + "datasets = json.loads(response.text)\n", + "print(json.dumps(datasets, indent=4))\n", + "\n", + "# List fine tuning jobs\n", + "url = f\"https://api.fireworks.ai/v1/accounts/{account_id}/fineTuningJobs\"\n", + "response = requests.request(\"GET\", url, headers=headers)\n", + "finetuning_jobs = json.loads(response.text)\n", + "print(json.dumps(finetuning_jobs, indent=4))\n", + "\n", + "# List Models\n", + "url = f\"https://api.fireworks.ai/v1/accounts/{account_id}/models\"\n", + "response = requests.request(\"GET\", url, headers=headers)\n", + "models = json.loads(response.text)\n", + "print(json.dumps(models, indent=4))\n", + "\n", + "# List Deployed Models\n", + "# API will be added shortly\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For each of the resources of the listed provided above you can access it state and status fields to check for potential errors. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Print state of model of the first model in the list\n", + "print(\"State = \", json.dumps(models[\"models\"][0][\"state\"], indent=4))\n", + "\n", + "# Print status of the first model in the list\n", + "print(\"Status = \", json.dumps(models[\"models\"][0][\"status\"], indent=4))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Clean up\n", + "\n", + "You can delete all resources you have created. The order of deletion does not matter with the exception of deployed modelts that first need to be undeployed before they get deleted." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "headers = {\"Authorization\": \"Bearer \" + api_key}\n", + "\n", + "# Delete Dataset\n", + "url = f\"https://api.fireworks.ai/v1/accounts/{account_id}/datasets/{dataset_id}\"\n", + "response = requests.request(\"DELETE\", url, headers=headers)\n", + "print(response.text)\n", + "\n", + "# Delete Fine-Tunning Job\n", + "url = f\"https://api.fireworks.ai/v1/accounts/{account_id}/fineTuningJobs/{fine_tuning_job_id}\"\n", + "response = requests.request(\"DELETE\", url, headers=headers)\n", + "print(response.text)\n", + "\n", + "\n", + "# Delete Model\n", + "# First, undeploy the model\n", + "# Currently we don't have an API for this so please use the following command: `firectl undeploy my-model-id`\n", + "\n", + "# State of deploying model will change to \"UNDEPLOYING\" until the process is completed.\n", + "\n", + "# Second, delete the undeployed model\n", + "url = f\"https://api.fireworks.ai/v1/accounts/{account_id}/models/{model_id}\"\n", + "response = requests.request(\"DELETE\", url, headers=headers)\n", + "print(response.text)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "signed_url" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.0" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/learn/fine-tuning/sampleDataset.jsonl b/learn/fine-tuning/sampleDataset.jsonl new file mode 100644 index 0000000..ec4e82b --- /dev/null +++ b/learn/fine-tuning/sampleDataset.jsonl @@ -0,0 +1,5 @@ +{"instruction": "When did Virgin Australia start operating?", "context": "Virgin Australia, the trading name of Virgin Australia Airlines Pty Ltd, is an Australian-based airline. It is the largest airline by fleet size to use the Virgin brand. It commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route. It suddenly found itself as a major airline in Australia's domestic market after the collapse of Ansett Australia in September 2001. The airline has since grown to directly serve 32 cities in Australia, from hubs in Brisbane, Melbourne and Sydney.", "response": "Virgin Australia commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route.", "category": "closed_qa"} +{"instruction": "Which is a species of fish? Tope or Rope", "context": "", "response": "Tope", "category": "classification"} +{"instruction": "Why can camels survive for long without water?", "context": "", "response": "Camels use the fat in their humps to keep them filled with energy and hydration for long periods of time.", "category": "open_qa"} +{"instruction": "Alice's parents have three daughters: Amy, Jessy, and what\u2019s the name of the third daughter?", "context": "", "response": "The name of the third daughter is Alice", "category": "open_qa"} +{"instruction": "When was Tomoaki Komorida born?", "context": "Komorida was born in Kumamoto Prefecture on July 10, 1981. After graduating from high school, he joined the J1 League club Avispa Fukuoka in 2000. Although he debuted as a midfielder in 2001, he did not play much and the club was relegated to the J2 League at the end of the 2001 season. In 2002, he moved to the J2 club Oita Trinita. He became a regular player as a defensive midfielder and the club won the championship in 2002 and was promoted in 2003. He played many matches until 2005. In September 2005, he moved to the J2 club Montedio Yamagata. In 2006, he moved to the J2 club Vissel Kobe. Although he became a regular player as a defensive midfielder, his gradually was played less during the summer. In 2007, he moved to the Japan Football League club Rosso Kumamoto (later Roasso Kumamoto) based in his local region. He played as a regular player and the club was promoted to J2 in 2008. Although he did not play as much, he still played in many matches. In 2010, he moved to Indonesia and joined Persela Lamongan. In July 2010, he returned to Japan and joined the J2 club Giravanz Kitakyushu. He played often as a defensive midfielder and center back until 2012 when he retired.", "response": "Tomoaki Komorida was born on July 10,1981.", "category": "closed_qa"} \ No newline at end of file