diff --git a/articles/ai-services/openai/api-version-deprecation.md b/articles/ai-services/openai/api-version-deprecation.md index 797cf83093..9b62f584c3 100644 --- a/articles/ai-services/openai/api-version-deprecation.md +++ b/articles/ai-services/openai/api-version-deprecation.md @@ -5,7 +5,7 @@ services: cognitive-services manager: nitinme ms.service: azure-ai-openai ms.topic: conceptual -ms.date: 01/08/2024 +ms.date: 01/10/2024 author: mrbullwinkle ms.author: mbullwin recommendations: false @@ -25,7 +25,10 @@ This article is to help you understand the support lifecycle for the Azure OpenA Azure OpenAI API latest release: - Inference: [2024-12-01-preview](reference-preview.md) -- Authoring: [2024-12-01-preview](https://github.com/Azure/azure-rest-api-specs/blob/main/specification/cognitiveservices/data-plane/AzureOpenAI/authoring/preview/2024-10-01-preview/azureopenai.json) +- Authoring: [2024-10-01-preview](https://github.com/Azure/azure-rest-api-specs/blob/main/specification/cognitiveservices/data-plane/AzureOpenAI/authoring/preview/2024-10-01-preview/azureopenai.json) + +> [!IMPORTANT] +> For features that are part of the dataplane authoring API such as batch, fine-tuning, and assistants files, continue to use API version `2024-10-01-preview` to take advantage of the latest preview features. This version contains support for the latest Azure OpenAI features including: diff --git a/articles/ai-services/openai/includes/api-surface.md b/articles/ai-services/openai/includes/api-surface.md index 80acca3393..ea8fe27a03 100644 --- a/articles/ai-services/openai/includes/api-surface.md +++ b/articles/ai-services/openai/includes/api-surface.md @@ -22,7 +22,7 @@ Each API surface/specification encapsulates a different set of Azure OpenAI capa | API | Latest preview release | Latest GA release | Specifications | Description | |:---|:----|:----|:----|:---| | **Control plane** | [`2024-06-01-preview`](/rest/api/aiservices/accountmanagement/operation-groups?view=rest-aiservices-accountmanagement-2024-06-01-preview&preserve-view=true) | [`2024-10-01`](/rest/api/aiservices/accountmanagement/deployments/create-or-update?view=rest-aiservices-accountmanagement-2024-10-01&tabs=HTTP&preserve-view=true) | [Spec files](https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/resource-manager/Microsoft.CognitiveServices) | Azure OpenAI shares a common control plane with all other Azure AI Services. The control plane API is used for things like [creating Azure OpenAI resources](/rest/api/aiservices/accountmanagement/accounts/create?view=rest-aiservices-accountmanagement-2023-05-01&tabs=HTTP&preserve-view=true), [model deployment](/rest/api/aiservices/accountmanagement/deployments/create-or-update?view=rest-aiservices-accountmanagement-2023-05-01&tabs=HTTP&preserve-view=true), and other higher level resource management tasks. The control plane also governs what is possible to do with capabilities like Azure Resource Manager, Bicep, Terraform, and Azure CLI.| -| **Data plane - authoring** | `2024-12-01-preview` | `2024-10-21` | [Spec files](https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/data-plane/AzureOpenAI/authoring) | The data plane authoring API controls [fine-tuning](/rest/api/azureopenai/fine-tuning?view=rest-azureopenai-2024-08-01-preview&preserve-view=true), [file-upload](/rest/api/azureopenai/files/upload?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true), [ingestion jobs](/rest/api/azureopenai/ingestion-jobs/create?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true), [batch](/rest/api/azureopenai/batch?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true) and certain [model level queries](/rest/api/azureopenai/models/get?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true) +| **Data plane - authoring** | `2024-10-01-preview` | `2024-10-21` | [Spec files](https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/data-plane/AzureOpenAI/authoring) | The data plane authoring API controls [fine-tuning](/rest/api/azureopenai/fine-tuning?view=rest-azureopenai-2024-08-01-preview&preserve-view=true), [file-upload](/rest/api/azureopenai/files/upload?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true), [ingestion jobs](/rest/api/azureopenai/ingestion-jobs/create?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true), [batch](/rest/api/azureopenai/batch?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true) and certain [model level queries](/rest/api/azureopenai/models/get?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true) | **Data plane - inference** | [`2024-12-01-preview`](/azure/ai-services/openai/reference-preview#data-plane-inference) | [`2024-10-21`](/azure/ai-services/openai/reference#data-plane-inference) | [Spec files](https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference) | The data plane inference API provides the inference capabilities/endpoints for features like completions, chat completions, embeddings, speech/whisper, on your data, Dall-e, assistants, etc. | ## Authentication diff --git a/articles/ai-services/openai/includes/use-your-data-rest.md b/articles/ai-services/openai/includes/use-your-data-rest.md index 2827c5d4bc..1098b17a22 100644 --- a/articles/ai-services/openai/includes/use-your-data-rest.md +++ b/articles/ai-services/openai/includes/use-your-data-rest.md @@ -5,7 +5,7 @@ author: aahill ms.author: aahi ms.service: azure-ai-openai ms.topic: include -ms.date: 03/07/2024 +ms.date: 01/10/2025 --- [!INCLUDE [Set up required variables](./use-your-data-common-variables.md)] @@ -20,7 +20,7 @@ To trigger a response from the model, you should end with a user message indicat > There are several parameters you can use to change the model's response, such as `temperature` or `top_p`. See the [reference documentation](../reference.md#completions-extensions) for more information. ```bash -curl -i -X POST $AZURE_OPENAI_ENDPOINT/openai/deployments/$AZURE_OPENAI_DEPLOYMENT_ID/chat/completions?api-version=2024-02-15-preview \ +curl -i -X POST $AZURE_OPENAI_ENDPOINT/openai/deployments/$AZURE_OPENAI_DEPLOYMENT_ID/chat/completions?api-version=2024-10-21 \ -H "Content-Type: application/json" \ -H "api-key: $AZURE_OPENAI_API_KEY" \ -d \ @@ -31,8 +31,11 @@ curl -i -X POST $AZURE_OPENAI_ENDPOINT/openai/deployments/$AZURE_OPENAI_DEPLOYME "type": "azure_search", "parameters": { "endpoint": "'$AZURE_AI_SEARCH_ENDPOINT'", - "key": "'$AZURE_AI_SEARCH_API_KEY'", - "index_name": "'$AZURE_AI_SEARCH_INDEX'" + "index_name": "'$AZURE_AI_SEARCH_INDEX'", + "authentication": { + "type": "api_key", + "key": "'$AZURE_AI_SEARCH_API_KEY'" + } } } ], @@ -81,7 +84,8 @@ curl -i -X POST $AZURE_OPENAI_ENDPOINT/openai/deployments/$AZURE_OPENAI_DEPLOYME "prompt_tokens": 3779, "completion_tokens": 105, "total_tokens": 3884 - } + }, + "system_fingerprint": "fp_65792305e4" } ``` diff --git a/articles/ai-studio/how-to/configure-managed-network.md b/articles/ai-studio/how-to/configure-managed-network.md index 18525a9bb6..1572d583cc 100644 --- a/articles/ai-studio/how-to/configure-managed-network.md +++ b/articles/ai-studio/how-to/configure-managed-network.md @@ -634,7 +634,7 @@ During hub creation, select __Provision managed network proactively at creation_ The following example shows how to provision a managed virtual network during hub creation. The `--provision-network-now` flag is in preview. ```azurecli -az ml workspace create -n myworkspace -g my_resource_group --kind hub --managed-network AllowInternetOutbound --provision-network-now +az ml workspace create -n myworkspace -g my_resource_group --kind hub --managed-network AllowInternetOutbound --provision-network-now true ``` The following example shows how to provision a managed virtual network. @@ -654,7 +654,7 @@ az ml workspace show -n my_ai_hub_name -g my_resource_group --query managed_netw The following example shows how to provision a managed virtual network during hub creation. The `--provision-network-now` flag is in preview. ```azurecli -az ml workspace create -n myworkspace -g my_resource_group --managed-network AllowInternetOutbound --provision-network-now +az ml workspace create -n myworkspace -g my_resource_group --managed-network AllowInternetOutbound --provision-network-now true ``` The following example shows how to provision a managed virtual network: diff --git a/articles/search/media/vector-search-index-size/vector-index-size-by-algorithm.png b/articles/search/media/vector-search-index-size/vector-index-size-by-algorithm.png new file mode 100644 index 0000000000..e49ddcfba6 Binary files /dev/null and b/articles/search/media/vector-search-index-size/vector-index-size-by-algorithm.png differ diff --git a/articles/search/vector-search-how-to-create-index.md b/articles/search/vector-search-how-to-create-index.md index 573c19a496..d66d9a3b0d 100644 --- a/articles/search/vector-search-how-to-create-index.md +++ b/articles/search/vector-search-how-to-create-index.md @@ -30,13 +30,13 @@ This article explains the workflow and uses REST for illustration. Once you unde ## Prerequisites -+ Azure AI Search, in any region and on any tier. Most existing services support vector search. For services created before January 2019, there's a small subset that can't create a vector index. In this situation, a new service must be created. If you're using integrated vectorization (skillsets that call Azure AI), Azure AI Search must be in the same region as Azure OpenAI or Azure AI services. ++ Azure AI Search, in any region and on any tier. On services created before January 2019, there's a small subset that can't create a vector index. If this applies to you, create a new service to use vectors. For indexing workloads that include integrated vectorization (skillsets that call Azure AI), Azure AI Search must be in the same region as Azure OpenAI or Azure AI services. -+ [Pre-existing vector embeddings](vector-search-how-to-generate-embeddings.md) or use [integrated vectorization](vector-search-integrated-vectorization.md), where embedding models are called from the indexing pipeline. ++ You must have [pre-existing vector embeddings](vector-search-how-to-generate-embeddings.md) to upload to the index, or you can use [integrated vectorization](vector-search-integrated-vectorization.md), where embedding models are called from a skillset in an indexer pipeline. -+ You should know the dimensions limit of the model used to create the embeddings. Valid values are 2 through 3072 dimensions. In Azure OpenAI, for **text-embedding-ada-002**, the length of the numerical vector is 1536. For **text-embedding-3-small** or **text-embedding-3-large**, the vector length is 3072. ++ You should know the dimensions limit of the model used to create the embeddings so that you can assign that limit to the vector field. Integrated vectorization supports a finite number of embedding models. For **text-embedding-ada-002**, dimensions are fixed at 1536. For **text-embedding-3-small** or **text-embedding-3-large**, the vector length ranges from 1 to 1536 and 3072, respectively. -+ You should also know what the supported similarity metrics are. For Azure OpenAI, similarity is [computed using `cosine`](/azure/ai-services/openai/concepts/understand-embeddings#cosine-similarity). ++ You should also know what similarity metric to use. For embedding models on Azure OpenAI, similarity is [computed using `cosine`](/azure/ai-services/openai/concepts/understand-embeddings#cosine-similarity). + You should be familiar with [creating an index](search-how-to-create-search-index.md). The schema must include a field for the document key, other fields you want to search or filter, and other configurations for behaviors needed during indexing and queries. @@ -50,9 +50,9 @@ Make sure your documents: 1. Provide vector data (an array of single-precision floating point numbers) in source fields. - Vector fields contain an array generated by embedding models, one embedding per field, where the field is a top-level field (not part of a nested or complex type). For the simplest integration, we recommend the embedding models in [Azure OpenAI](https://aka.ms/oai/access), such as **text-embedding-ada-002** for text documents or the [Image Retrieval REST API](/rest/api/computervision/2023-02-01-preview/image-retrieval/vectorize-image) for images. + Vector fields contain an array generated by embedding models, one embedding per field, where the field is a top-level field (not part of a nested or complex type). For the simplest integration, we recommend the embedding models in [Azure OpenAI](https://aka.ms/oai/access), such as a **text-embedding-3** model for text documents or the [Image Retrieval REST API](/rest/api/computervision/2023-02-01-preview/image-retrieval/vectorize-image) for images. - If you can take a dependency on indexers and skillsets, consider using [integrated vectorization](vector-search-integrated-vectorization.md) that encodes images and textual content during indexing. Your field definitions are for vector fields, but incoming source data can be text or images, represented as vector arrays created during indexing. + If you can take a dependency on indexers and skillsets, consider using [integrated vectorization](vector-search-integrated-vectorization.md) that encodes images and textual content during indexing. Your field definitions are for vector fields, but incoming source data can be text or images, which are converted to vector arrays during indexing. 1. Provide other fields with human-readable content for the query response, and for hybrid query scenarios that include full text search or semantic ranking in the same request. @@ -69,7 +69,7 @@ A vector configuration specifies the parameters used during indexing to create " If you choose HNSW on a field, you can opt in for exhaustive KNN at query time. But the other direction doesn’t work: if you choose exhaustive, you can’t later request HNSW search because the extra data structures that enable approximate search don’t exist. -A vector configuration also specifies quantization methods for reducing vector size: +Optionally, a vector configuration also specifies quantization methods for reducing vector size: + Scalar + Binary (available in 2024-07-01 only and in newer Azure SDK packages) diff --git a/articles/search/vector-search-index-size.md b/articles/search/vector-search-index-size.md index 969c69366e..1a4a6789d7 100644 --- a/articles/search/vector-search-index-size.md +++ b/articles/search/vector-search-index-size.md @@ -10,7 +10,7 @@ ms.custom: - build-2024 - ignite-2024 ms.topic: conceptual -ms.date: 09/19/2024 +ms.date: 01/09/2025 --- # Vector index size and staying under limits @@ -27,7 +27,7 @@ For each vector field, Azure AI Search constructs an internal vector index using + Vector index size is measured in bytes. -+ Vector quotas are based on memory constraints. All searchable vector indexes must be loaded into memory. At the same time, there must also be sufficient memory for other runtime operations. Vector quotas exist to ensure that the overall system remains stable and balanced for all workloads. ++ Vector quotas are based on memory constraints. For vector indexes created using the Hierarchical Navigable Small World (HNSW) algorithm, searchable vector indexes reside in memory. At the same time, there must also be sufficient memory for other runtime operations. Vector quotas exist to ensure that the overall system remains stable and balanced for all workloads. If you use exhaustive KNN algorithm, indexes are loaded into memory only at query time. + Vector indexes are also subject to disk quota, in the sense that all indexes are subject disk quota. There's no separate disk quota for vector indexes. @@ -67,11 +67,24 @@ A request for vector metrics is a data plane operation. You can use the Azure po ### [**Portal**](#tab/portal-vector-quota) -Usage information can be found on the **Overview** page's **Usage** tab. Portal pages refresh every few minutes so if you recently updated an index, wait a bit before checking results. +#### Vector size per index + +To get vector index size per index, select **Search management** > **Indexes** to view a list of indexes and the document count, the size of in-memory vector indexes, and total index size as stored on disk. + +Recall that vector quota is based on memory constraints. For vector indexes created using the HNSW algorithm, all searchable vector indexes are permanently loaded into memory. For indexes created using the exhaustive KNN algorithm, vector indexes are loaded in chunks, sequentially, during query time. There's no memory residency requirement for exhaustive KNN indexes. The lifetime of the loaded pages in memory is similar to text search and there are no other metrics applicable to exhaustive KNN indexes other than total storage. + +The following screenshot shows two versions of the same vector index. One version is created using HNSW algorithm, where the vector graph is memory resident. Another version is created using exhaustive KNN algorithm. With exhaustive KNN, there's no specialized in-memory vector index, so the portal shows 0 MB for vector index size. Those vectors still exist and are counted in overall storage size, but they don’t occupy the in-memory resource that the vector index size metric is tracking. + +:::image type="content" source="media/vector-search-index-size/vector-index-size-by-algorithm.png" lightbox="media/vector-search-index-size/vector-index-size-by-algorithm.png" alt-text="Screenshot of the index portal page showing vector index size based on different algorithms."::: + +#### Vector size per service + +To get vector index size for the search service as a whole, select the **Overview** page's **Usage** tab. Portal pages refresh every few minutes so if you recently updated an index, wait a bit before checking results. The following screenshot is for an older Standard 1 (S1) search service, configured for one partition and one replica. + Storage quota is a disk constraint, and it's inclusive of all indexes (vector and nonvector) on a search service. + + Vector index size quota is a memory constraint. It's the amount of memory required to load all internal vector indexes created for each vector field on a search service. The screenshot indicates that indexes (vector and nonvector) consume almost 460 megabytes of available disk storage. Vector indexes consume almost 93 megabytes of memory at the service level.