diff --git a/faq/faq.mdx b/faq/faq.mdx index d4af5516..f41f7006 100644 --- a/faq/faq.mdx +++ b/faq/faq.mdx @@ -52,7 +52,7 @@ curl -X 'POST' \ -F 'languages=kor' \ | jq -C . | less -R ``` -For comprehensive language support, refer to the [Tesseract documentation]("https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html"), which provides a detailed list of supported languages and installation guidelines. +For comprehensive language support, refer to the [Tesseract documentation](https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html), which provides a detailed list of supported languages and installation guidelines. You can still use ``ocr_languages`` kwarg, but this parameter is being deprecated in favor of ``languages`` kwarg. diff --git a/open-source/core-functionality/staging.mdx b/open-source/core-functionality/staging.mdx index 11d1e446..70df7b31 100644 --- a/open-source/core-functionality/staging.mdx +++ b/open-source/core-functionality/staging.mdx @@ -4,7 +4,7 @@ title: Staging -The `Staging` brick is being deprecated in favor of the new and more comprehensive `Destination Connectors`. To explore the complete list and usage, please refer to [Destination Connectors documentation](/ingest/destination-connectors/overview). +The `Staging` brick is being deprecated in favor of the new and more comprehensive `Destination Connectors`. To explore the complete list and usage, please refer to [Destination Connectors documentation](../ingest/destination-connectors/overview). Note: We are constantly expanding our collection of destination connectors. If you wish to request a specific Destination Connector, you’re encouraged to submit a Feature Request on the [Unstructured GitHub repository](https://github.com/Unstructured-IO/unstructured/issues/new/choose). diff --git a/open-source/introduction/overview.mdx b/open-source/introduction/overview.mdx index 36a80a9c..1dda1d4e 100644 --- a/open-source/introduction/overview.mdx +++ b/open-source/introduction/overview.mdx @@ -19,7 +19,7 @@ sidebarTitle: Overview * and more! -The [`unstructured` library]((https://github.com/Unstructured-IO/unstructured)) offers an open-source toolkit +The [`unstructured` library](https://github.com/Unstructured-IO/unstructured) offers an open-source toolkit designed to simplify the ingestion and pre-processing of diverse data formats, including images and text-based documents such as PDFs, HTML files, Word documents, and more. With a focus on optimizing data workflows for Large Language Models (LLMs), `unstructured` provides modular functions and connectors that work seamlessly together. This cohesive system ensures @@ -28,7 +28,7 @@ and use cases. ## Key functionality -* **Precise Document Extraction**: Unstructured offers advanced capabilities in extracting elements and metadata from documents. This includes a variety of document element types and metadata. Learn more about [Document elements and metadata](/open-source/introduction/document-elements). +* **Precise Document Extraction**: Unstructured offers advanced capabilities in extracting elements and metadata from documents. This includes a variety of document element types and metadata. Learn more about [Document elements and metadata](../concepts/document-elements). * **Extensive File Support**: The platform supports a wide array of file types, ensuring versatility in handling different document formats from PDF, Images, HTML, and many more. Detailed information on supported file types can be found [here](/api-reference/api-services/overview#supported-file-types). @@ -46,7 +46,7 @@ and use cases. * [Embedding](/open-source/core-functionality/embedding): The embedding encoder classes in Unstructured leverage document elements detected through partitioning or grouped via chunking to obtain embeddings for each element. This is particularly useful for applications like Retrieval Augmented Generation (RAG), where precise and contextually relevant embeddings are crucial. -* **High-performant Connectors**: The platform includes optimized connectors for efficient data ingestion and output. These comprise [Source Connectors](/ingest/destination-connectors/overview) for data input and [Destination Connectors](/ingest/destination-connectors/overview) for data export. +* **High-performant Connectors**: The platform includes optimized connectors for efficient data ingestion and output. These comprise [Source Connectors](../ingest/source-connectors/overview) for data input and [Destination Connectors](../ingest/destination-connectors/overview) for data export. ## Common Use Cases diff --git a/open-source/introduction/quick-start.mdx b/open-source/introduction/quick-start.mdx index 3ef75f22..40acc97a 100644 --- a/open-source/introduction/quick-start.mdx +++ b/open-source/introduction/quick-start.mdx @@ -57,7 +57,7 @@ The following section will cover basic concepts and usage patterns in `unstructu The example documents in this section come from the [example-docs](https://github.com/Unstructured-IO/unstructured/tree/main/example-docs) directory in the `unstructured` repo. -Before running the code in this make sure you’ve installed the `unstructured` library and all dependencies using the instructions in the [Quick Start](/installation/overview#quick-start) section. +Before running the code in this make sure you’ve installed the `unstructured` library and all dependencies using the instructions in the [Quick Start](../installation/overview#quick-start) section. ## Partitioning a document diff --git a/platform/overview.mdx b/platform/overview.mdx index a4bdb95b..d294224e 100644 --- a/platform/overview.mdx +++ b/platform/overview.mdx @@ -20,7 +20,7 @@ To **get your data RAG-ready** our platform moves it through the following proce ``` - We offer multiple [Source Connectors](/content/text). We can connect to your data in its existing location. + We offer multiple [Source Connectors](../platform/platform-source-connectors/overview). We can connect to your data in its existing location. **Routing** determines which strategy we will employ in **transforming your document to our canonical JSON schema**. There are three [Partioning Strategies](/api-reference/api-services/partitioning "partioning strategies") for document transformation, ```fast```, ```hires```, or ```ocr_only```. ```fast``` is great for when there is extractable text available, like in HTML files or in the Microsoft Office Document format. ```hires``` is best for PDFs and tables and where accurate classification of document elements is critical. ```ocr_only``` is useful when dealing with image-based files or PDFs that do not have extractable text. **If you're unsure, select ```auto``` and we'll handle the decision for you**. @@ -35,7 +35,7 @@ To **get your data RAG-ready** our platform moves it through the following proce Call out to third party embedding providers, ```Open AI```, ```AWS Bedrock```, and ```Octo ML```. - We have multiple [Destination Connectors](/content/text). Including **all major vector databases**. + We have multiple [Destination Connectors](../platform/platform-destination-connectors/overview). Including **all major vector databases**. @@ -53,10 +53,10 @@ To simplify this process and provide it as a no-code solution, platform consist Workflow[3. Workflow] --> Jobs[4. Jobs] ``` -1. [Source Connectors](platform-source-connectors/overview) to ingest your data. -2. [Destination Connectors](platform-destination-connectors/overview) tell our system where to write your transformed data too.. -3. [Workflows](workflows-automation) connect sources to destinations and provide chunking, embedding, and scheduling options. -4. [Jobs](jobs-scheduling) allow you to monitor data transformation progress. +1. [Source Connectors](../platform/platform-source-connectors/overview) to ingest your data. +2. [Destination Connectors](../platform/platform-destination-connectors/overview) tell our system where to write your transformed data too.. +3. [Workflows](../platform/workflows-automation) connect sources to destinations and provide chunking, embedding, and scheduling options. +4. [Jobs](../platform/jobs-scheduling) allow you to monitor data transformation progress. ### Compliance diff --git a/platform/saas-platform-guide.mdx b/platform/saas-platform-guide.mdx index 7a75eca6..f7923478 100644 --- a/platform/saas-platform-guide.mdx +++ b/platform/saas-platform-guide.mdx @@ -10,16 +10,16 @@ This page describes how to get started with the SaaS Unstructured Platform. Lear You can [sign-up here](https://unstructured.io/platform) to get started. - Configure a ```Source Connector``` to [Amazon S3](platform-source-connectors/s3) (or a [source of your choice](platform-source-connectors/overview)). + Configure a ```Source Connector``` to [Amazon S3](../platform/platform-source-connectors/s3) (or a [source of your choice](../platform/platform-source-connectors/overview)). - Configure a ```Destination Connector``` to [Mongo DB Atlas Vector DB](platform-destination-connectors/mongodb) (or a [destination of your choice](platform-destination-connectors/overview)). + Configure a ```Destination Connector``` to [Mongo DB Atlas Vector DB](../platform/platform-destination-connectors/mongodb) (or a [destination of your choice](../platform/platform-destination-connectors/overview)). - Connect your ```Source Connector``` and ```Destination Connector``` together, and set options for ```chunking```, ```embedding```, and ```scheduling``` in [Workflows](workflows-automation). + Connect your ```Source Connector``` and ```Destination Connector``` together, and set options for ```chunking```, ```embedding```, and ```scheduling``` in [Workflows](../platform/workflows-automation). - ```Jobs``` provide a location for determining [how your workflow is performing](jobs-scheduling). + ```Jobs``` provide a location for determining [how your workflow is performing](../platform/jobs-scheduling). Your unstructured **data will continuously flow** into your destination as and when new files/updates become available.