diff --git a/docs/en/stack/ml/nlp/ml-nlp-limitations.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-limitations.asciidoc index 8673cdb19..e505bb63b 100644 --- a/docs/en/stack/ml/nlp/ml-nlp-limitations.asciidoc +++ b/docs/en/stack/ml/nlp/ml-nlp-limitations.asciidoc @@ -9,6 +9,12 @@ The following limitations and known problems apply to the {version} release of the Elastic {nlp} trained models feature. +[discrete] +[[ml-nlp-large-documents-limit-10k-10mb]] +== Document size limitations when using `semantic_text` fields + +When using semantic text to ingest documents, chunking takes place automatically. The number of chunks is limited by the {ref}/mapping-settings-limit.html[`index.mapping.nested_objects.limit`] cluster setting, which defaults to 10k. Documents that are too large will cause errors during ingestion. To avoid this issue, please split your documents into roughly 1MB parts before ingestion. + [discrete] [[ml-nlp-elser-v1-limit-512]] == ELSER semantic search is limited to 512 tokens per field that inference is applied to @@ -17,4 +23,4 @@ When you use ELSER for semantic search, only the first 512 extracted tokens from each field of the ingested documents that ELSER is applied to are taken into account for the search process. If your data set contains long documents, divide them into smaller segments before ingestion if you need the full text to be -searchable. \ No newline at end of file +searchable.