diff --git a/docs/active-basics/active-oss-filtering.md b/docs/active-basics/active-oss-filtering.md new file mode 100644 index 000000000..830b704c0 --- /dev/null +++ b/docs/active-basics/active-oss-filtering.md @@ -0,0 +1,94 @@ +--- +title: "Filtering" +slug: "active-oss-filtering" +hidden: false +metadata: + title: "Filtering" + description: "Enhance insights with data filtering in Encord Active: Identify patterns, remove duplicates, improve model behavior. Use standard filters, embedding plots, or natural language search" + image: + 0: "https://files.readme.io/9123073-image_16.png" +createdAt: "2023-07-14T16:16:03.504Z" +updatedAt: "2023-08-09T16:19:33.172Z" +category: "65a71bbfea7a3f005192d1a7" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +**Learn how to filter data in Encord Active** + +Filtering data in Encord Active is crucial for various reasons. It enables insights and actionable results on the following key aspects and more: +1. Identification of patterns, trends, or anomalies within a subset of the data. +2. Recognition of duplicates, outliers and inconsistencies. +3. Removal of irrelevant, noisy and erroneous data. +4. Understanding model's behaviour and potential skewness when facing different subsets of the data. +5. **[Encord project only]** Update tasks status to prioritize some unannotated images in the labeling stage and send labels to be reviewed/fixed, all along with descriptive comments for the project users (e.g. annotators and reviewers). + +Encord Active provides three data filtering methods: +1. **Standard filter feature**: This option allows users to refine their search using metadata filters, user-defined tag filters, and metric filters. +2. **Embedding plot**: A two-dimensional visualization technique used by Encord to represent high-dimensional data in a more interpretable form. Can be used to select points within a specific rectangular area, thereby focusing on a particular subset of data points for in-depth analysis. +3. **Natural language search**: Enables users to enter descriptive queries in everyday language, making it easier to find relevant images without the need for specific keywords or complex search parameters. + + +# Standard filter feature + +The standard filter feature offers the following filtering options: +1. **Data points metadata filters**: Filter data points based on metadata attributes such as `Object Class` and `Annotator`, allowing to focus on specific classes or annotations created by particular annotators. +2. **User-defined tag filters**: Apply filters based on user-defined tags, enabling categorization and filtering of data points according to custom tags. +3. **Metric filters**: Utilize metrics, including built-in ones like `Image Diversity` and `Label Duplicates`, as well as user-defined metrics, to filter data points based on potentially complex properties. + +[//]: # (Don't show this section in the ToC. Use H3 heading to make that happen.) +### Steps to use the standard filter feature + +1. Go to the Explorer page and locate the filter feature. + ![Explorer page featuring the highlighted filter feature, allowing users to refine data visualization](https://storage.googleapis.com/docs-media.encord.com/static/img/active/user-guide/explorer-highlight-filter.png) +2. Choose one or more filters from the available options. +3. For numerical filters, specify the threshold range. For categorical filters, select the groups of interest. + ![Filtering data example with the `Red value` filter applied, narrowing down data points based on a specified threshold](https://storage.googleapis.com/docs-media.encord.com/static/img/active/user-guide/explorer-filter-by-red-values.png) + +> 👍 Tip +> Take advantage of one of the UI components to personalize the visualization order of the filtered data based on a specific metric. You can choose to display the data in either ascending or descending order, depending on your preferences. +> +> ![Customize visualization order option in the UI](https://storage.googleapis.com/docs-media.encord.com/static/img/active/user-guide/explorer-component-order-by.png) + + +# Embedding plot + +The embedding plot in Encord Active is a two-dimensional visualization technique used to represent high-dimensional data in a more interpretable and visually accessible form. By reducing the dimensionality of the data, the embedding plot helps preserve the underlying structure and patterns of the original data. + +In the plot, each data point is represented as a single point in the two-dimensional space, with proximity indicating similarity and shared characteristics among corresponding high-dimensional data points. This allows selecting points within a specific rectangular area, enabling a focused analysis of a particular subset of data points. + +By defining a rectangular area on the plot, users can quickly isolate and examine the data points that fall within that region. The selection can be based on specific criteria or visual observations, allowing further exploration of attributes or additional analysis on the chosen subset. + +This interactive functionality enhances the ability to gain deeper insights into underlying patterns and relationships within the selected area, providing a flexible and intuitive way to analyze and understand the data points within the dataset. + +Based on the selected option in the _Order by_ drop-down, users can choose to visualize either the embedding plot for the data or the labels. + +![Vibrant 2D embedding plot with distinct data points highlighting patterns and clusters](https://storage.googleapis.com/docs-media.encord.com/static/img/active/user-guide/explorer-embedding-plot.png) + +> 👍 Tip +> In addition to selecting points within a rectangular area, the label embedding plot offers the functionality to filter data points based on the label classes. + + +# Natural language search + +
+ Video Tutorial - Natural language search in EA + +[block:html] +{ + "html": "
" +} +[/block] + +
+ +The natural language search feature enables users to enter descriptive queries in everyday language, such as "images that contain baseball items". The system intelligently processes the query and retrieves images that match the description. This feature simplifies and greatly enhances the search experience within Encord Active, allowing finding relevant images without the need for specific keywords or complex search parameters. + +![Encord Active's natural language search in action, retrieving relevant images based on descriptive queries](https://storage.cloud.google.com/docs-media.encord.com/static/img/active/user-guide/explorer-natural-language-search.png) + +> ℹ️ Note +> The natural language search feature is exclusively available in the hosted version of Encord Active. \ No newline at end of file diff --git a/docs/active-basics/active-relabeling.md b/docs/active-basics/active-relabeling.md new file mode 100644 index 000000000..11eaf6054 --- /dev/null +++ b/docs/active-basics/active-relabeling.md @@ -0,0 +1,99 @@ +--- +title: "Relabeling" +slug: "active-relabeling" +hidden: false +metadata: + title: "Relabeling" + description: "Streamline data relabeling with Encord Active: Transfer data to Encord Annotate for enhanced annotation accuracy. Optimize labeling process." + image: + 0: "https://files.readme.io/77a0cfc-image_16.png" +createdAt: "2023-07-12T09:21:52.521Z" +updatedAt: "2023-08-11T12:43:00.519Z" +category: "65a71bbfea7a3f005192d1a7" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +**Learn how to submit your data for labeling in Encord Annotate** + +> Send and prioritize data for labeling from Encord Active is currently a closed-beta feature. To learn more about this feature, please reach out to us on [Slack][slack-join] or via [email](mailto:active@encord.com). +> Full documentation coming soon! +> Reach out to us if you have urgent requests or catch us in Slack. + + +[//]: # (When examining your project data, you might come across labels that appear incorrect or are missing altogether. With Encord Active, you have the ability to highlight such data and seamlessly transfer it to Encord Annotate, at the dedicated labeling stage for your project. This empowers annotators to address any missing elements and enhance the overall quality of your labels, ensuring a more accurate and comprehensive data annotation process.) + +[//]: # () +[//]: # (## Steps to send data to the labeling stage) + +[//]: # () +[//]: # (1. Go to the toolbox in the explorer pages and use the [filter feature](https://docs.encord.com/docs/active-filtering) to choose the desired data.) + +[//]: # ( ![data-quality-explorer-filter-by-tag](https://storage.cloud.google.com/docs-media.encord.com/static/img/images/user-guide/data-quality-explorer-filter-by-tag.png)) + +[//]: # (2. Locate and access the _Action_ tab in the same toolbox.) + +[//]: # ( ![toolbox-action-tab](https://storage.cloud.google.com/docs-media.encord.com/static/img/images/user-guide/toolbox-action-tab.png)) + +[//]: # (3. Click the 🖋 Relabel button.) + +[//]: # ( ![toolbox-action-tab-relabel-button](https://storage.cloud.google.com/docs-media.encord.com/static/img/images/user-guide/toolbox-action-tab-relabel-button.png)) + +[//]: # (4. Verify that the number of tasks ready for submission to the labeling stage is correct and press the Confirm button. ) + +[//]: # ( ![toolbox-action-tab-relabel-button-confirmed](https://storage.cloud.google.com/docs-media.encord.com/static/img/images/user-guide/toolbox-action-tab-relabel-button-confirmed.png)) + +[//]: # (5. **[Optional]** Navigate to the project in Encord Annotate and explore the task modifications in the _Summary_ tab.) + +[//]: # (
) + +[//]: # (
) + +[//]: # ( ) + +[//]: # (

Before sending the tasks to labeling

) + +[//]: # (
) + +[//]: # (
) + +[//]: # ( ) + +[//]: # (

After sending the tasks to labeling

) + +[//]: # (
) + +[//]: # (
) + +[//]: # () +[//]: # () +[//]: # (> ℹ️ Note) + +[//]: # (> The **relabel feature** is currently limited to workflow projects.) + +[//]: # (> ) + +[//]: # (> To upgrade your project to a workflow project, please reach out to us via [Slack][slack-join] or [email](mailto:active@encord.com). Our team will be happy to assist you with the necessary steps and provide further guidance.) + + +[slack-join]: https://join.slack.com/t/encordactive/shared_invite/zt-1hc2vqur9-Fzj1EEAHoqu91sZ0CX0A7Q diff --git a/docs/active-basics/active-tagging.md b/docs/active-basics/active-tagging.md new file mode 100644 index 000000000..a2cdd72ca --- /dev/null +++ b/docs/active-basics/active-tagging.md @@ -0,0 +1,58 @@ +--- +title: "Tagging" +slug: "active-tagging" +hidden: false +metadata: + title: "Tagging" + description: "Learn how tagging in the LOCAL version of Encord Active enhances data organization, search, and collaboration. Efficient workflow tips." + image: + 0: "https://files.readme.io/e8db7ee-image_16.png" +createdAt: "2023-07-14T15:24:59.405Z" +updatedAt: "2023-08-11T15:33:37.736Z" +category: "65a71bbfea7a3f005192d1a7" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + + +**Learn how to create tags in Encord Active** + +Tagging is a versatile feature used in almost all Encord Active workflows, whether you are relabeling, augmenting, exporting, or deleting data. You can tag images [individually](#individual-tagging) or in [bulk](#bulk-tagging). + +In Encord Active, tagging data provides several advantages: +1. Organization: Tagging allows you to organize your data effectively within the platform. By assigning tags to your data points, you can group and categorize them based on common characteristics, making it easier to manage and navigate large subsets of the dataset. +2. Enhanced search and filtering: Tags in Encord Active enable powerful search and filtering capabilities. You can search for specific data points or filter data based on tags, narrowing down your focus to the relevant information you need. +3. Customizable metadata: Tags serve as customizable metadata that can provide additional context and information about your data. You can define and assign tags that align with your specific project requirements, providing meaningful insights and annotations for efficient data analysis. +4. Collaboration and knowledge sharing: Tagging promotes collaboration and knowledge sharing among team members in Encord Active. With consistent tagging conventions, team members can easily understand and access tagged data, facilitating efficient collaboration and ensuring everyone is on the same page. + +These are just a few of the advantages of tagging in Encord Active, and there may be more benefits specific to your project and workflow. + +# Individual tagging + +**Steps to tag individual images or labels:** +1. Access the Explorer page and locate the specific images or labels you want to tag. +2. Select them by clicking on the checkbox in the top-left corner of each item. +3. Look for the TAG button, positioned above the natural language search bar, and click on it to initiate the tagging process. +4. Within the tagging interface, select the type of tag you want to apply. For images, choose the `data` type. For labels, select the `label` type. +5. Provide a name for the tag you want to apply to the selected items. This name should reflect the relevant characteristic or information you want to associate with them. +6. Press Enter to confirm the tag selection and apply it to the chosen items. +7. **[Optional]** Validate the successful application of the tag by visually confirming its presence on the tagged items or using the filter options to isolate the tagged items based on the newly created tag. + +![tagging-individual-images](https://storage.cloud.google.com/docs-media.encord.com/static/img/images/user-guide/tagging-individual-images.png) + +# Bulk tagging + +**Steps to tag images or labels in bulk:** +1. Access the Explorer page and utilize the [standard filter feature](https://docs.encord.com/docs/active-filtering#standard-filter-feature) to choose the desired data. +2. Locate the SELECT ALL button, situated near the natural language search bar, and click on it to select the filtered items. +3. Look for the TAG button, positioned above the natural language search bar, and click on it to initiate the tagging process. +4. Within the tagging interface, select the type of tag you want to apply. For images, choose the `data` type. For labels, select the `label` type. +5. Provide a name for the tag you want to apply to the selected items. This name should reflect the relevant characteristic or information you want to associate with them. +6. Press Enter to confirm the tag selection and apply it to the chosen items. +7. **[Optional]** Validate the successful application of the tag by visually confirming its presence on the tagged items or using the filter options to isolate the tagged items based on the newly created tag. + +![bulk-tagging](https://storage.cloud.google.com/docs-media.encord.com/static/img/images/user-guide/tagging-in-bulk.png) \ No newline at end of file diff --git a/docs/active-basics/active-versioning.md b/docs/active-basics/active-versioning.md new file mode 100644 index 000000000..4e538e3e9 --- /dev/null +++ b/docs/active-basics/active-versioning.md @@ -0,0 +1,47 @@ +--- +title: "Versioning" +slug: "active-versioning" +hidden: true +metadata: + title: "Versioning" + description: "Learn data versioning in Encord Active: Track experiments, models, and project states. Global checkpoints for effective comparisons" + image: + 0: "https://files.readme.io/e269354-image_16.png" +createdAt: "2023-07-12T09:21:53.085Z" +updatedAt: "2023-08-11T13:54:16.572Z" +category: "6480a3981ed49107a7c6be36" +--- + +**Learn how to version your data, labels, and models in Encord Active** + +### Why do you version your data? +When you do experiments and test hypotheses, you typically want to jump back and forth between different versions of your data, labels, and models. For example, when you train a model on a specific subset of your data, you will typically find that there is an edge case for which your model performs poorly. Hence, you expand your dataset with more data to better cover the edge case and train a new model. + +In order to track your experiments and compare not only the model performance but also the underlying data shown to the model, you can use the project versioning feature of Encord Active. + +### What types of versioning are supported? + +The versioning is global for the project, covering everything from the available data and labels at a specific moment to the corresponding model predictions. This ensures that all relevant information is versioned and accessible. + +Currently, versioning operates through checkpoints, allowing you to create checkpoints and navigate between them to review previous states of the project. + + +### How do I version my data? + +#### Creating a new version + + +In order to create a new version, navigate to the toolbox in the explorer pages, access the _Version_ tab, provide a version name and click the Create button. + +![Version creation form](https://storage.cloud.google.com/docs-media.encord.com/static/img/images/version-creation-form.png) + +> 👍 Tip +> You also have the ability to discard any outstanding changes, i.e. everything after the last version. + + +#### Viewing a previous version + +On the left sidebar, there is a drop-down which allows version selection. Selecting an old version will temporarily save any outstanding changes until the latest version is selected again. + +> 🚧 Caution +> While on a previous version the app will be in read-only mode. Any changes made will be discarded. \ No newline at end of file diff --git a/docs/active-cli.md b/docs/active-cli.md new file mode 100644 index 000000000..69b57b0cf --- /dev/null +++ b/docs/active-cli.md @@ -0,0 +1,424 @@ +--- +title: "Command line interface" +slug: "active-cli" +hidden: false +metadata: + title: "Command line interface" + description: "Simplify Encord Active interaction using CLI: Initialize projects, manage metrics, launch app seamlessly. User-friendly command line interface for efficiency." + image: + 0: "https://files.readme.io/2a55a47-image_16.png" +createdAt: "2023-07-12T12:23:17.233Z" +updatedAt: "2023-08-09T11:47:09.141Z" +category: "65a71bbfea7a3f005192d1a7" +--- +Encord Active is equipped with a command line interface (CLI) that simplifies your interaction with the platform. +With the CLI, you can easily initialize projects, import projects and labels, manage and run metrics, and launch the application. + +We strive to ensure that our CLI is self-explanatory, eliminating the need for frequent switching between the terminal and documentation. + +Simply run `encord-active --help` to get details about all the top-level commands and `encord-active COMMAND --help` + +Here is a list of all the top-level commands: + +``` +quickstart Start Encord Active straight away 🏃💨 +download Download a sandbox dataset to get started 📁 +init Initialize a project from your local file system 🌱 +import Import projects or predictions ⬇️ +refresh Sync data and labels from a remote Encord project 🔄 +start Launch the application with the provided project ✨ +project Manage project settings ⚙️ +metric Manage project metrics 📋 +print Print useful information 🖨️ +config Configure global settings 🔧 +``` + +## `quickstart` + +The command will download a small example project to a subdirectory named `quickstart` in the current working directory and automatically launch the application. + +``` +Usage: encord-active quickstart [OPTIONS] + +Options: + --target -t DIRECTORY Directory where the project would be saved. +``` + +## `download` + +In addition to the `quickstart` example, there are a several other open-source datasets available for download that you can use to explore the capabilities of Encord Active. + +The command will display a list of available sandbox projects, allowing you to select one from the menu interactively. +If you prefer to skip the interactive selection, you can directly specify the sandbox project name using the `--project-name` optional argument. + +``` +Usage: encord-active download [OPTIONS] + +Options: + --project-name TEXT Name of the chosen project. + --target -t DIRECTORY Directory where the project would be saved. +``` + +
+ List of downloadable sandbox projects + +#### Berkeley Deep Drive + +- **Research Paper:** BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning +- **Authors:** Fisher Yu, Haofeng Chen, Xin Wang, Wenqi Xian, Yingying Chen, Fangchen Liu, Vashisht Madhavan, Trevor Darrell +- **Dataset Size:** 1000 images & 12973 annotations +- **Categories:** 8 classes +- **License:** BSD 3-Clause License +- **Release:** 21st September, 2020 +- **Read more:** [Webpage](https://bdd-data.berkeley.edu/) & [GitHub](https://github.com/bdd100k/bdd100k) + +Sample pictures: +![BDD dataset](https://storage.cloud.google.com/docs-media.encord.com/static/img/images/BDD.png) + +#### COCO Validation 2017 Dataset + +- **Research Paper:** [Microsoft COCO: Common Objects in Context](https://arxiv.org/abs/1405.0312) +- **Author:** [Tsung-Yi Lin](https://arxiv.org/search/cs?searchtype=author&query=Lin%2C+T) , [Michael Maire](https://arxiv.org/search/cs?searchtype=author&query=Maire%2C+M), [Serge Belongie](https://arxiv.org/search/cs?searchtype=author&query=Belongie%2C+S), [Lubomir Bourdev](https://arxiv.org/search/cs?searchtype=author&query=Bourdev%2C+L) , [Ross Girshick](https://arxiv.org/search/cs?searchtype=author&query=Girshick%2C+R), [James Hays](https://arxiv.org/search/cs?searchtype=author&query=Hays%2C+J), [Pietro Perona](https://arxiv.org/search/cs?searchtype=author&query=Perona%2C+P), [Deva Ramanan](https://arxiv.org/search/cs?searchtype=author&query=Ramanan%2C+D), [C. Lawrence Zitnic](https://arxiv.org/search/cs?searchtype=author&query=Zitnick%2C+C+L), [Piotr Dollár](https://arxiv.org/search/cs?searchtype=author&query=Doll%C3%A1r%2C+P) +- **Dataset Size:**  5000 images, 4784 annotations +- **Categories:** 81 classes +- **License:** CC BY 4.0 +- **Release:** 1st, May, 2014 +- **Read More:** [GitHub](https://github.com/cocodataset/cocodataset.github.io) & [Webpage](https://cocodataset.org/#home) + +Sample pictures: +![COCO dataset](https://storage.cloud.google.com/docs-media.encord.com/static/img/images/COCO.png) + +#### Covid 19 Segmentation Dataset + +- **Research Paper:** Unknown +- **Author:** Unknown +- **Dataset Size:** 100 images & 602 annotations +- **Categories:**  13 classes +- **License:** CC BY 4.0 +- **Release:** Unknown +- **Read more:** [GitHub](https://github.com/GeneralBlockchain/covid-19-chest-xray-segmentations-dataset) + +Sample pictures: +![Covid dataset](https://storage.cloud.google.com/docs-media.encord.com/static/img/images/Covid.png) + +#### Rareplanes + +- **Research Paper:** [RarePlanes: Synthetic Data Takes Flight](https://arxiv.org/abs/2006.02963) +- **Author:** [Jacob Shermeyer](https://arxiv.org/search/cs?searchtype=author&query=Shermeyer%2C+J), [Thomas Hossler](https://arxiv.org/search/cs?searchtype=author&query=Hossler%2C+T), [Adam Van Etten](https://arxiv.org/search/cs?searchtype=author&query=Van+Etten%2C+A), [Daniel Hogan](https://arxiv.org/search/cs?searchtype=author&query=Hogan%2C+D), [Ryan Lewis](https://arxiv.org/search/cs?searchtype=author&query=Lewis%2C+R), [Daeil Kim](https://arxiv.org/search/cs?searchtype=author&query=Kim%2C+D) +- **Dataset Size:** 2710 images & 6812 annotations +- **Categories:**  7 plane categories +- **License:** CC 4.0 BY SA +- **Release: 4** June, 2020 +- **Read More:** [Webpage](https://www.cosmiqworks.org/rareplanes/) + +Sample pictures: +![Rareplanes dataset](https://storage.cloud.google.com/docs-media.encord.com/static/img/images/Rareplanes.png) + +#### TACO Dataset + +- **Research Paper:** [TACO: Trash Annotations in Context for Litter Detection](https://arxiv.org/abs/2003.06975) +- **Author:** Pedro F Proença, Pedro Simões +- **Dataset Size:** Official: 1500 images, 4784 annotations & Unofficial: 3736 images, 8419 annotations +- **Categories:** 60 litter categories +- **License:** CC BY 4.0 +- **Release:** 17th, March, 2020 +- **Read More:** [GitHub](https://github.com/pedropro/TACO) & [Webpage](http://tacodataset.org/) + +Sample pictures: +![TACO dataset](https://storage.cloud.google.com/docs-media.encord.com/static/img/images/TACO.png) + +#### Limuc Ulcerative Colitis Classification + +- **Research Paper:** Improving the Computer-Aided Estimation of Ulcerative Colitis Severity According to Mayo Endoscopic Score by Using Regression-Based Deep Learning +- **Authors:** Gorkem Polat, MSc, Haluk Tarik Kani, MD, Ilkay Ergenc, MD, Yesim Ozen Alahdab, MD, Alptekin Temizel, PhD, Ozlen Atug, MD +- **Dataset Size:** 11276 images +- **Categories:** Medical (Endoscopy/Colonoscopy) +- **License:** Creative Commons Attribution 4.0 International +- **Release:** 14th March 2022 +- **Read more:** [Webpage](https://zenodo.org/record/5827695) & [GitHub](https://github.com/GorkemP/labeled-images-for-ulcerative-colitis) + +
+ +## `start` + +Launch the application with the provided project ✨ + +``` +Usage: encord-active start [OPTIONS] + +Options: + --target -t DIRECTORY Path of the projects you would like to start +``` + +## `init` + +The command initializes new project from locally stored images and labels. It will search for images based on the `data-glob` arguments. By default, all jpeg, jpg, png, and tiff files will be matched. + +It will also search for labels if the `label-glob` and `transformer` options are provided. +Both glob results will be passed to your implementation of the `LabelTransformer` interface if you specify the `transformer` argument. + +``` +Usage: encord-active init [OPTIONS] ROOT + +Arguments: + * root DIRECTORY The root directory of the dataset you are trying to import + +Options: + --data-glob -dg TEXT Glob pattern to choose files. Repeat the `--data-glob` argument to + match multiple globs. + --label-glob -lg TEXT Glob pattern to choose label files. Repeat the `--label-glob` + argument to match multiple globs. This argument is only used if you + also provide the `transformer` argument. + --target -t DIRECTORY Directory where the project would be saved. + --name -n TEXT Name to give the new project. If no name is provided, the root + directory will be used with '[EA] ' prepended. + --symlinks Use symlinks instead of copying images to the target directory. + --dryrun Print the files that will be imported WITHOUT importing them. + --no-metrics Skip metrics execution on the initiated project. + --transformer PATH Path to python module with one or more implementations of the + `encord_active.lib.labels.label_transformer.LabelTransformer` + interface. +``` + +The [Quick import data & labels](https://docs.encord.com/docs/active-quick-import) workflow is a great starting point for utilizing this command. + +## `import` + +This command is used to import projects and predictions from different sources. + +Refer to the [import section](https://docs.encord.com/docs/active-import) for examples of specific use-cases. + +``` +Usage: encord-active import [OPTIONS] COMMAND [ARGS]... + +Import Projects or Predictions ⬇️ + +Commands: + predictions Imports a predictions file. The predictions should be using the `Prediction` model + and be stored in a pkl file. + If the `--coco` option is specified then the file should be a json following the COCO results format. 🧠 + project Imports a new project from Encord or a local COCO project. 📦 +``` + +### `project` + +Imports a new project from Encord or a local COCO project. + +``` +Usage: encord-active import project [OPTIONS] + +Encord Project Arguments: + --project-hash TEXT Encord project hash of the project you wish to import. + Leaving it blank will allow you to choose one interactively. + --store-data-locally Store project data locally to avoid the need for on-demand download when visualizing and analyzing it. + +COCO Project Arguments: + --coco Import a project from the COCO format. + --images -i DIRECTORY Path to the directory containing the dataset images. + --annotations -a FILE Path to the file containing the dataset annotations. + --symlinks Use symlinks instead of copying COCO images to the target directory. + +Options: + --target -t DIRECTORY Directory where the project would be saved. +``` + +### `predictions` + +Imports a predictions file. The predictions should be using the `Prediction` model and be stored in a pkl file. +If the `--coco` option is specified then the file should be a json following the COCO results format. + +Refer to the [Import model predictions](https://docs.encord.com/docs/active-import-model-predictions) section for specific usage examples. + + +``` +Usage: encord-active import predictions [OPTIONS] PREDICTIONS_PATH + +Arguments: + * predictions_path FILE Path to a predictions file. + +Options: + --target -t DIRECTORY Path to the target project. + --coco Import a COCO results format file. +``` + +## `refresh` + +Sync data and labels from a remote Encord project. + +``` +Usage: encord-active refresh [OPTIONS] + +Options: + --target -t DIRECTORY Path to the target project. + --include-unlabeled -i Include unlabeled data. Note: This will affect the results of 'encord.Project.list_label_rows()' as every label row will now have a label_hash. +``` + +The local project should have a reference to the remote Encord project in its config file (`project_meta.yaml`). +The required attributes are: +1. The remote flag set to `true`. +2. The hash of the remote Encord project. +3. The path to the private Encord user SSH key. + +This command works in local projects created via `encord-active import project` and those successfully exported to Encord from the "Actions" tab in the UI's toolbox. + +## `project` + +Manage project settings ⚙️ + +``` +Usage: encord-active project [OPTIONS] COMMAND [ARGS]... + +Commands: + download-data Download all data locally for improved responsiveness. +``` + +### `download-data` + +Store project data locally to avoid the need for on-demand download when visualizing and analyzing it. + +``` +Usage: encord-active project download-data [OPTIONS] + +Options: + --target -t DIRECTORY Path to the target project. +``` + +## `metric` + +Manage project metrics. + +``` +Usage: encord-active metric [OPTIONS] COMMAND [ARGS]... + +Commands: + add Add metrics. + list List metrics. + remove Remove metrics. + run Run metrics. + show Show information about available metrics. +``` + +> ℹ️ Note +> Make sure your shell's current working directory is that of an Encord Active project, or your command points to one with the `--target` global option. + + +### `add` + +Add metrics to the project by specifying the path to a metrics module and the titles of the desired metrics within the module. +If no metric titles are provided, all metrics found in the Python module will be automatically added to the project. + +``` +Usage: encord-active metric add [OPTIONS] MODULE_PATH [METRIC_TITLE]... + +Arguments: + * module_path FILE Path to the python module where the metric resides. + metric_title [METRIC_TITLE]... Title of the metric. Can be used multiple times. + +Options: + --target -t DIRECTORY Path to the target project. +``` + +If you attempt to add a metric that already exists, it will be skipped, and you will be notified accordingly. +However, if a metric title is not found in the Python module, an error will occur. +Please ensure that the metric titles are accurate and correspond to the metrics available in the module. + +> 🚧 Caution +> Some terminals may treat square braces (`[` and `]`) as special characters. It is advisable to always quote arguments containing these characters to prevent unexpected shell expansion. + + +### `remove` + +Removes metrics from a project. + +``` +Usage: encord-active metric remove [OPTIONS] METRIC_TITLE... + +Arguments: + * metric_title METRIC_TITLE... Title of the metric. Can be used multiple times. + +Options: + --target -t DIRECTORY Path to the target project. +``` + +### `list` + +List metrics in the project, including editables. Metrics are listed in a case-insensitive sorted order. + +``` +Usage: encord-active metric list [OPTIONS] + +Options: + --target -t DIRECTORY Path to the target project. +``` + +### `run` + +Run metrics on project data and labels. + +``` +Usage: encord-active metric run [OPTIONS] [METRIC_TITLE]... + +Arguments: + metric_title [METRIC_TITLE]... Title of the metric. Can be used multiple times. + +Options: + --target -t DIRECTORY Path to the target project. + --all Run all metrics. + --fuzzy Enable fuzzy search in the selection. (press [TAB] or [SPACE] to select more than one) 🪄 +``` + +### `show` + +Show information about one or more available metrics in the project. + +``` +Usage: encord-active metric show [OPTIONS] METRIC_TITLE... + +Arguments: + * metric_title METRIC_TITLE... Title of the metric. Can be used multiple times. + +Options: + --target -t DIRECTORY Path to the target project. +``` + +## `config` + +Encord Active keeps some configurable properties to prevent repetitive input prompts. + +The config file is stored at: + +- Linux: `~/.config/encord-active/config.toml` +- MacOS: `~/Library/Application Support/encord-active/config.toml` +- Windows: `%APPDATA%/encord-active/config.toml` + +```toml +ssh_key_path = "/absolute/path/to/ssh-key" # The API key to use when accessing remote Encord projects +``` + +``` +Usage: encord-active config [OPTIONS] COMMAND [ARGS]... + +Commands: + get Print the value of an Encord Active configuration property. + list List Encord Active configuration properties. + set Sets an Encord Active configuration property. + unset Unsets the value of an Encord Active configuration property. +``` + +## `print` + +Print useful information. + +``` + +Usage: encord-active print [OPTIONS] COMMAND [ARGS]... + +Commands: + data-mapping Prints a mapping between `data_hashes` and their corresponding `filename`. + encord-projects Print the mapping between `project_hash`es of your Encord projects and their titles. + ontology Prints an ontology mapping between the class name to the `featureNodeHash` JSON format. + system-info Prints the information of the system for the purpose of bug reporting. + +Options: + --json Save output to a json file. +``` \ No newline at end of file diff --git a/docs/active-contributing.md b/docs/active-contributing.md new file mode 100644 index 000000000..4f9e246d4 --- /dev/null +++ b/docs/active-contributing.md @@ -0,0 +1,210 @@ +--- +title: "Contributing" +slug: "active-contributing" +hidden: false +metadata: + title: "Contributing" + description: "Join our community: Contribute to Encord Active's growth. Improve docs, give feedback, share – stars on GitHub appreciated!" + image: + 0: "https://files.readme.io/c1f7df0-image_16.png" +createdAt: "2023-07-19T08:51:45.516Z" +updatedAt: "2023-08-09T11:49:15.064Z" +category: "65a71bbfea7a3f005192d1a7" +--- +We follow a [code of conduct](https://github.com/encord-team/encord-active/blob/main/CODE_OF_CONDUCT.md) when participating in the community. Please read it before you make any contributions. + +- If you plan to work on an issue, mention so in the issue page before you start working on it. +- If you plan to work on a new feature, create an issue and discuss it with other community members/maintainers. +- Ask for help in our [Slack community][slack-join]. + +## Ways to contribute + +- **Stars on GitHub**: If you are an Encord Active user and enjoy using our platform, don't forget to star it on [GitHub](https://github.com/encord-team/encord-active)! 🌟 +- **Improve documentation**: Good documentation is imperative to the success of any project. You can make our documents the best they need to be by improving their quality or adding new ones. +- **Give feedback**: We are always looking for ways to make Encord Active better, please share how you use Encord Active, what features are missing and what is done good via [Slack][slack-join]. +- **Share refine**: Help us reach people. Share [Encord Active repository](https://github.com/encord-team/encord-active) with everyone who can be interested. +- **Contribute to codebase**: Your help is needed to make this project the best it can be! You could develop new features or fix [existing issues](https://github.com/encord-team/encord-active/issues) - all contributions are welcome! + +## Environment setup + +Make sure you have `python3.9` installed on your system. + +To install the correct version of python you can use [pyenv](https://github.com/pyenv/pyenv), [brew (mac only)](https://formulae.brew.sh/formula/python@3.9) or simply [download](https://www.python.org/downloads) it. + +You'll also need to have [poetry installed](https://python-poetry.org/docs/#installation). + +After forking and cloning the repository, run: + +```shell +poetry install + +# If you intend to work on coco related things, run this instead: +poetry install --extras "coco" +``` + +> ℹ️ Note +> You might need to install `xcode-select` if you are on Mac or `C++ build tools` if you are on Windows. + +After the installation is done, you can activate the created virtual environment with: + +```shell +poetry shell +``` + +Now you should be able to run your locally installed `encord-active`. + +> ℹ️ Note +> Make sure you are always running `encord-active` from the activated virtual environment to not conflict with a globally installed version. + + +### Running the frontend + +> ℹ️ Note +> Running the frontend locally is only required if you intend to work on our React frontend components. + +Our frontend is build with [React](https://reactjs.org/). To start it in development mode, run: + +```shell +cd "frontend_components/encord_active_components/frontend" && npm i && npm start +``` + +In order to point `encord-active` to your locally running frontend, you'll need to change the `FRONTEND` environment variable in the `.env` file. Make sure you point it to the correct port, by default it should be running on `http://localhost:5173/` + +## Commit convention + +Commit messages are essential to make changes clear and concise. We use [conventional commits](https://www.conventionalcommits.org/en/v1.0.0/) to keep our commit messages consistent and easy to understand. + +``` +(optional scope): +``` + +Examples: + +- `feat: allow provided config object to extend other configs` +- `fix: array parsing issue when multiple spaces were contained in string` +- `docs: correct spelling of CHANGELOG` + +## Contribution guide + +You need to follow these steps below to make contribution to the main repository via pull request. You can learn about the details of pull request [here](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests). + +### 1. Fork the official repository + +If you are using Git, you can visit the [Encord Active repository](https://github.com/encord-team/encord-active) and find the Fork button at the right top corner of the web page, along with other buttons such as Watch and Star (highly appreciated if you click this one as well 🌟). Simply click the Fork button to create a copy of the repository under your own account. + +Now, you can clone your own forked repository into your local environment. + +```shell +git clone https://github.com//encord-active.git +``` + +Otherwise, if you have the GitHub CLI installed, the following command will create a fork. If you haven't, consider [installing it](https://github.com/cli/cli#installation). + +```shell +gh repo fork encord-team/encord-active +``` + +### 2. Configure Git + +You need to set the official repository as your upstream so that you can synchronize with the latest updates in the official repository. You can learn about syncing forks [here](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/configuring-a-remote-repository-for-a-fork). + +With Git, it's as simple as running the following commands: + +```shell +cd encord-active +git remote add upstream https://github.com/encord-team/encord-active.git +``` + +> If you use the GitHub CLI, this step is done automatically 🪄 + +You can use the following command to verify that the remote is set. You should see both `origin` and `upstream` in the output. + +```shell +git remote -v +> origin https://github.com//encord-active.git (fetch) +> origin https://github.com//encord-active.git (push) +> upstream https://github.com/encord-team/encord-active.git (fetch) +> upstream https://github.com/encord-team/encord-active.git (push) +``` + +### 3. Synchronize + +Before you make changes to the codebase, it is always good to fetch the latest updates in the official repository. In order to do so, you can use the commands below. + +#### Git + +```shell +git fetch upstream +git checkout main +git merge upstream/main +git push origin main +``` + +Otherwise, you can click the `fetch upstream` button on the GitHub webpage of the main branch of your forked repository. Then, use these commands to sync. + +```shell +git checkout main +git fetch main +``` + +#### GitHub CLI + +To sync your remote fork: + +```shell +gh repo sync /encord-active +``` + +And then to sync your local clone: + +```shell +gh repo sync +``` + + +### 4. Pull request issue + +In order to not waste your time implementing a change that has already been declined, or is generally not needed, start by opening an [issue](https://github.com/encord-team/encord-active/issues) describing the problem you would like to solve. Make sure you use appropriate title and description and be as descriptive as possible. + +Generally, your code change should be only targeting one problem in order to make the review process as simple as possible. + +### 5. Make changes + +You should not make changes to the `main` branch of your forked repository as this might make upstream synchronization difficult. You can create a new branch with the appropriate name. Generally, branch names should start with a conventional commit type, e.g. `fix/` / `docs/` / `feat/` followed by the scope. + +```shell +git checkout -b +``` + +It is finally time to implement your change! + +You can commit and push the changes to your local repository. The changes should be kept logical, modular and atomic. + +```shell +git add -A +git commit -m ": " +git push -u origin +``` + +> 👍 Tip +> If you are making changes to the frontend, you can run `encord-active config set dev true` to enable file watchers. This will make the UI detect code changes and suggest to (auto) refresh the page. + + +### 7. Open a pull request + +You can now create a pull request on the GitHub webpage of your repository. The source branch is `` of your repository and the target branch should be `main` of `encord-team/encord-active`. After creating this pull request, you should be able to see it [here](https://github.com/encord-team/encord-active/pulls). + +If you are using the GitHub CLI you can run: + +```shell +gh pr create --web +``` + +Fill out the title and body appropriately trying to be as clear as possible. And again, make sure to follow the conventional commit guidelines for your title. + +Do write a clear description of your pull request and [link the pull request to your target issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue). This will automatically close the issue when the pull request is approved. + +In case of code conflict, you should rebase your branch and resolve the conflicts manually. + + +[slack-join]: https://join.slack.com/t/encordactive/shared_invite/zt-1hc2vqur9-Fzj1EEAHoqu91sZ0CX0A7Q diff --git a/docs/active-get-started/active-oss-launch.md b/docs/active-get-started/active-oss-launch.md new file mode 100644 index 000000000..6ea78783d --- /dev/null +++ b/docs/active-get-started/active-oss-launch.md @@ -0,0 +1,30 @@ +--- +title: "Launch Active OS" +slug: "active-oss-launch" +hidden: false +metadata: + title: "Launch Active OS" + description: "Launch the Encord Active Open Source app." +category: "65a71bbfea7a3f005192d1a7" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + + +## Launch the Active OS + +To launch the Encord Active OS app, run the following command: + +```shell +cd /path/to/project +encord-active start +``` + +Now, your browser should open a new window with Encord Active OS. + +> 🚧 Caution +> If the terminal just seems to get stuck and nothing happens in your browser, try visiting http://localhost:8000. \ No newline at end of file diff --git a/docs/active-get-started/active-oss-quickstart-import.md b/docs/active-get-started/active-oss-quickstart-import.md new file mode 100644 index 000000000..0a9caa337 --- /dev/null +++ b/docs/active-get-started/active-oss-quickstart-import.md @@ -0,0 +1,73 @@ +--- +title: "Import to Active OS" +slug: "active-oss-quickstart-import" +hidden: false +metadata: + title: "Import data and projects to Active OS" + description: "A quick overview of importing data and projects to Active OS." +category: "65a71bbfea7a3f005192d1a7" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +## Import Your Own Data + +To import your own data save your data in a directory and run the command: + +```shell +encord-active init /path/to/data/directory +``` + +A project will be created using the data (without labels) in the current working directory (unless used with `--target`). + +To launch the project in the Encord Active app, run the following command: + +```shell +cd /path/to/project +encord-active start +``` + +You can find more details on the `init` command in the [CLI section](https://docs.encord.com/docs/#init). + +## Import an Encord Project + +If you are an Encord user, you can directly [import](https://docs.encord.com/docs/active-cli#project) your own projects into Encord Active easily. + +```shell +encord-active import project +``` + +This will import your encord project to a new directory in your current working directory. If you don't have an Encord project ready, you can either + +1. [Initialise a project from a local data directory](https://docs.encord.com/docs/active-cli#init) +2. [Import a project from COCO](https://docs.encord.com/docs/active-import-coco-project) +3. [Download one of our sandbox datasets](https://docs.encord.com/docs/active-cli#download) + +> ℹ️ Note +> If you are new to the Encord platform, you can easily create an Encord account by [signing up](https://app.encord.com/register). + + +To import an Encord project, you will need the path to your private SSH key associated with the Encord user. See our documentation [here](https://docs.encord.com/docs/annotate-public-keys). + +The command will ask you: + +1. `Where is your private ssh key stored?`: type the path to your private ssh key +2. `What project would you like to import?`: here, you can (fuzzy) search for the project title that you would like to import. Hit Enter when your desired project is highlighted. + +Next, `encord-active` will fetch your data and labels before computing all the [metrics](https://docs.encord.com/docs/active-quality-metrics) available in `encord-active`. Downloading the data and computing the metrics may take a while. Bare with us - it'll be worth the wait. + +When the process is done, follow the printed instructions to launch the app with the [start][ea-cli-start] CLI command. + +## Import Examples + +[block:html] +{ + "html": "\n\n\n \n \n Clickable Div\n \n\n\n Quick import data & labels Import model predictions Encord project COCO project\n\n" +} +[/block] + +[ea-cli-start]: https://docs.encord.com/docs/active-cli#start diff --git a/docs/active-get-started/active-oss-quickstart.md b/docs/active-get-started/active-oss-quickstart.md new file mode 100644 index 000000000..b8e789bbb --- /dev/null +++ b/docs/active-get-started/active-oss-quickstart.md @@ -0,0 +1,69 @@ +--- +title: "Quickstart with Active OS" +slug: "active-oss-quickstart" +hidden: false +metadata: + title: "Getting started with Encord Active" + description: "Get started with Encord Active. Explore via example project. Effortless learning." +category: "65a71bbfea7a3f005192d1a7" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +Understand Encord Active Open Source in **5 minutes** by playing! + +The following command downloads a small example project to your current working directory and opens the application straight away. This is the fastest way to explore Encord Active OS. + +```shell +encord-active quickstart +``` + +This command must be run in the same virtual environment where you installed the package. + +The next section explains how to download larger and more interesting datasets for exploration. + +## Sandbox Dataset + +If you have more time, we have a few pre-built sandbox datasets with data, labels, and model predictions for you to start exploring Encord Active. + +To get started quickly with a sandbox dataset, you can run the following command: + +```shell +encord-active download +``` + +This will allow you to choose a dataset to download. When the download process is complete, you can visualize the results by following the printed instructions. + +> 👍 Tip +> You can follow the [COCO sandbox dataset tutorial](https://docs.encord.com/docs/active-touring-coco-dataset) to learn more about the features of Encord Active. + + +### Run Encord Active on Google Colab + +If you want to explore Encord Active without installing anything on your local machine, you can use the following Google Colab notebooks: + +[block:embed] +{ + "url": "https://colab.research.google.com/drive/1RujTUxcxpB9bGJHp_UtCdSSQn7oef2ci?usp=sharing", + "title": "Explore Encord Active sandbox dataset", + "favicon": "https://cdn.simpleicons.org/googlecolab/#F9AB00", + "image": "https://cdn.simpleicons.org/googlecolab/#F9AB00", + "provider": "colab.research.google.com", + "href": "https://colab.research.google.com/drive/1RujTUxcxpB9bGJHp_UtCdSSQn7oef2ci?usp=sharing" +} +[/block] + +[block:embed] +{ + "url": "https://colab.research.google.com/drive/1P4C-JAml8yh8aUa_rvNPI_hPBvBgZ1FD?usp=sharing", + "title": "Explore Encord Active through your own Encord projects", + "favicon": "https://cdn.simpleicons.org/googlecolab/#F9AB00", + "image": "https://cdn.simpleicons.org/googlecolab/#F9AB00", + "provider": "colab.research.google.com", + "href": "https://colab.research.google.com/drive/1P4C-JAml8yh8aUa_rvNPI_hPBvBgZ1FD?usp=sharing" +} +[/block] \ No newline at end of file diff --git a/docs/active-get-started/active-oss-whats-next.md b/docs/active-get-started/active-oss-whats-next.md new file mode 100644 index 000000000..f10ef4288 --- /dev/null +++ b/docs/active-get-started/active-oss-whats-next.md @@ -0,0 +1,33 @@ +--- +title: "What's Next?" +slug: "active-oss-whats-next" +hidden: false +metadata: + title: "Active OS after getting started" + description: "What to do after you have Active OS up and running." +category: "65a71bbfea7a3f005192d1a7" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +We recommend taking a look at any of the [tutorials](https://docs.encord.com/docs/video-tutorials) that demonstrate Encord Active's capabilities and the [workflows](https://docs.encord.com/docs/annotate-workflows-and-templates) section to learn about improving your model performance. A couple of example references are: + +1. [Import your model predictions](https://docs.encord.com/docs/active-import-model-predictions) +2. Find outliers in your [data](https://docs.encord.com/docs/active-identify-outliers#data-outliers) or your [labels](https://docs.encord.com/docs/active-identify-outliers#label-outliers) +3. [Identify metrics](https://docs.encord.com/docs/active-evaluate-detection-models) that are important for your model performance + +You can also have a look at how to [write custom metrics](https://docs.encord.com/docs/active-write-custom-quality-metrics) and how to use the [command line interface](https://docs.encord.com/docs/active-cli). + +### Need Support? + +Please don't hesitate to contact us if you have any questions via our dedicated [Slack workspace][slack-join] or email at [active@encord.com](mailto:active@encord.com). + +If you encounter any errors, we would love to hear from you so we can address them promptly. We receive immediate notifications when issues are submitted through Encord Active's [GitHub](https://github.com/encord-team/encord-active/issues) repository. Also, feel free to reach out to us via [Slack][slack-join] or email at [active@encord.com](mailto:active@encord.com) for further assistance. We appreciate your feedback and assistance in improving Encord Active. + + +[ea-cli-start]: https://docs.encord.com/docs/active-cli#start +[slack-join]: https://join.slack.com/t/encordactive/shared_invite/zt-1hc2vqur9-Fzj1EEAHoqu91sZ0CX0A7Q diff --git a/docs/active-how-to/active-compare-model-performance.md b/docs/active-how-to/active-compare-model-performance.md new file mode 100644 index 000000000..cf51db8e0 --- /dev/null +++ b/docs/active-how-to/active-compare-model-performance.md @@ -0,0 +1,109 @@ +--- +title: "Compare Model Performance" +slug: "active-compare-model-performance" +hidden: false +metadata: + title: "Compare Model Prediction Performance" + description: "Learn how to compare the predictive performance of your model." +category: "6480a3981ed49107a7c6be36" +--- + +You have trained your model and now you are ready to see how it performs. It is time to perform a cycle of the Active model optimization workflow. + +![Encord Active workflow](https://storage.googleapis.com/docs-media.encord.com/static/img/active/active-workflow-model-optimization.png) + +Now you want to compare your model's performance before using Encord (or maybe after running a number of data curation and label validtion cycles). Active supports doing direct model prediction performnce comparison from within your Active Project. + +
+ +To compare your model's performance: + +This process assumes you have already imported your model's predictions in to Active at least twice. + +1. Log in to Encord. + The Encord Homepage appears. + +2. Create a **Workflow** Project in Annotate. + +3. Add your Active Admin as an `Admin` on your **Project and all Datasets used in the Project**. + +4. Click **Active**. + The Active landing page appears. + +5. Import your Annotate Project. + +6. Click an Active Project. + The Project opens on the _Explorer_. + +7. Click **Model Evaluation**. + The _Model Evaluation_ page appears with _Summary_ displaying. + +8. Select an entry from the dropdown under **Prediction Set** under _Overview_. + +9. Select an entry from the dropdown under **Compare against** under _Overview_. + +10. Click through the various entries on the left side of the Model Evaluation page to view the comparison. + +11. Add more data and start the data curation, label validation, and model optimization cycles until the model reaches a performance level that you require. + +
+ +
+ +To compare your model's performance from scratch: + +This process assumes you are just getting started with Encord. You have not trained your model yet. You are using Encord to prepare your data for annotation, annotating your data, labeling your data, validating your labels, fixing any label issues, then training your model. + +1. Log in to Encord. + The Encord Homepage appears. + +2. Create a **Workflow** Project in Annotate. + +3. Add your Active Admin as an `Admin` on your **Project and all Datasets used in the Project**. + +4. Click **Active**. + The Active landing page appears. + +5. Import your Annotate Project. + +6. Click an Active Project. + The Project opens on the _Explorer_. + +7. Click **Model Evaluation**. + The _Model Evaluation_ page appears. + +8. [Import a Prediction Set](https://docs.encord.com/docs/active-import-model-predictions-cloud). + +9. Perform data curation on your Project in Active. + +10. Send the Project to Annotate. + +11. Label and review your data in Annotate. + +12. Sync the Active Project with the updated Annotate Project. + +13. Perform label validation on your updated and sync'd Project. + +14. Send the Project to Annotate. + +15. Label and review your data in Annoate. + +16. Retrain your model using the curated and validated data/labels. + +17. Click the Active Project. + The Project opens on the _Explorer_. + +18. Click **Model Evaluation**. + The _Model Evaluation_ page appears. + +19. [Import the updated Prediction Set](https://docs.encord.com/docs/active-import-model-predictions-cloud). + +20. Select an entry from the dropdown under **Prediction Set** under _Overview_. + +21. Select an entry from the dropdown under **Compare against** under _Overview_. + +22. Click through the various entries on the left side of the Model Evaluation page to view the comparison. + +23. Add more data and start the data curation, label validation, and model optimization cycles until the model reaches a performance level that you require. + +
diff --git a/docs/active-how-to/active-create-collections.md b/docs/active-how-to/active-create-collections.md new file mode 100644 index 000000000..57267f174 --- /dev/null +++ b/docs/active-how-to/active-create-collections.md @@ -0,0 +1,141 @@ +--- +title: "Create a Collection" +slug: "active-create-collections" +hidden: false +metadata: + title: "Create a Collection" + description: "Learn how to create Collections in Encord Active Cloud to enhance data organization, search, and collaboration." +category: "6480a3981ed49107a7c6be36" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +[block:html] +{ + "html": "
" +} +[/block] + +Collections are created by tagging/labeling images and then building groups (Collections) based on the tagged images. Tagging is a versatile feature used in almost all Encord Active workflows, whether you are relabeling, augmenting, exporting, or deleting data. + +In Encord Active, creating Collections provides several advantages: + +- Organization: Allows you to organize your data effectively within the platform. By assigning Collection tags to your data points, you can group and categorize data based on common characteristics, making it easier to manage and navigate large subsets of the dataset. + +- Enhanced search and filtering: Collections in Encord Active enables powerful search and filtering capabilities. You can search for specific data points or filter data based on tags, narrowing down your focus to the relevant information you need. + +- Customizable metadata: Collection tags serve as customizable metadata that can provide additional context and information about your data. You can define and assign tags that align with your specific project requirements, providing meaningful insights and annotations for efficient data analysis. + +- Collaboration and knowledge sharing: Tagging promotes collaboration and knowledge sharing among team members in Encord Active. With consistent tagging conventions, team members can easily understand and access tagged data, facilitating efficient collaboration and ensuring everyone is on the same page. + +These are just a few of the advantages of tagging in Encord Active, and there may be more benefits specific to your project and workflow. + +To give you a better idea about how Active and Annotate work together, here are a couple of use cases. + +
+ +To create a Collection from an Annotate Project: + +1. Log in to the Encord platform. + The landing page for the Encord platform appears. + +2. Create a Project ([Annotation Project](https://docs.encord.com/docs/annotate-annotation-projects) or [Training Project](https://docs.encord.com/docs/annotate-training-projects)) in Encord Annotate. + +3. Click **Active** from the main menu. + The landing page for Active appears. + +4. Click the **Import Annotate Project** button. + The _Select an Annotation Project_ dialog appears. + +5. Click the **Import** button for the Annotate project you want to import. + The _Confirm Project Import_ dialog appears. The dialog provides an estimate on how long the import may take. + +6. Specify the **Sample rate** (in FPS) for the import of videos. + + > ℹ️ Note + > Select **None** imports the entire video, without modification to the FPS of the video. + +7. Click **Confirm**. + +8. Close the _Select an Annotation Project_ dialog. + The landing page for Active appears with progress of the import for the project. + +9. Click the Project. + The landing page for the Project appears with the _Explorer_ tab selected. + +10. Search, sort, and filter your data/labels/predictions until you have the subset of the data you need. + +11. Select one or more of the images in the Explorer workspace. + A ribbon appears at the top of the Explorer workspace. + +12. Click **Select all** to select all the images in the subset. + +13. Click **Add to a Collection**. + +14. Click **New Collection**. + +15. Specify a meaningful title and description for the Collection. + + > ℹ️ Note + > The title specified here is applied as a tag/label to every selected image. + +16. Click **Collections** to verify the Collection appears in the Collections list. + +
+ +
+ +To create a Collection from data uploaded to Active: + +1. Contact Encord to get started with Encord Active. + +2. Log in to the Encord platform. + The landing page for the Encord platform appears. + +3. Click **Active** in the main menu. + The landing page for Active appears. + +4. Click the Project. + The landing page for the Project appears with the _Explorer_ tab selected. + +5. Search, sort, and filter your data/labels/predictions until you have the subset of the data you need. + +6. Select one or more of the images. + A ribbon appears at the top of the Explorer workspace. + +7. Click **Select all** to select all the images in the subset. + +8. Click **Add to a Collection**. + +9. Click **New Collection**. + +10. Specify a meaningful title and description for the Collection. + + > ℹ️ Note + > The title specified here is applied as a tag/label to every selected image. + +11. Click **Collections** to verify the Collection appears in the Collections list. + +
+ +## Next Steps + +### Data Cleansing/Curation and Label Correction/Validation + +[block:html] +{ + "html": "\n\n\n \n \n Clickable Div\n \n\n\n 1. Import from Annotate 3. Send to Annotate 4. Sync with Annotate 5. Update Collection\n\n" +} +[/block] + +### Model and Prediction Validation + +[block:html] +{ + "html": "\n\n\n \n \n Clickable Div\n \n\n\n 1. Import from Annotate 2. Import Predictions 3. Review Prediction Metrics 5. Send to Annotate 6. Sync with Annotate 7. Update Collection\n\n" +} +[/block] \ No newline at end of file diff --git a/docs/active-how-to/active-evaluate-classification-models.md b/docs/active-how-to/active-evaluate-classification-models.md new file mode 100644 index 000000000..5dc5f48f6 --- /dev/null +++ b/docs/active-how-to/active-evaluate-classification-models.md @@ -0,0 +1,58 @@ +--- +title: "Evaluating classification models" +slug: "active-evaluate-classification-models" +hidden: true +metadata: + title: "Evaluating classification models" + description: "Enhance model assessment: Encord Active for classification metrics. Accuracy, Precision, Recall, F1 scores & more. Optimize with insights." + image: + 0: "https://files.readme.io/ba490b4-image_16.png" +createdAt: "2023-07-11T16:27:42.137Z" +updatedAt: "2023-08-09T12:38:12.905Z" +category: "6480a3981ed49107a7c6be36" +--- + +Encord Active provides the capability to examine <> performance metrics including Accuracy, Precision, Recall, and F1 scores, along with a confusion matrix. Furthermore, these performance metrics can be assessed based on various class combinations. + +To follow this workflow, importing model predictions into an Encord Active project is a prerequisite. You can refer to the instructions on [importing model predictions](https://docs.encord.com/docs/active-import-model-predictions) in the documentation. + +## Steps + +1. Navigate to the _Model Quality_ > _Metrics_ tab on the left sidebar. +2. Under the **Classifications** tab, you will see the main performance metrics (accuracy, mean precision, mean recall, and mean F1 scores), metric importance graphs, confusion matrix, and class-based precision and recall plot. +3. You can filter by classes in the upper bar to see plots for your classes of interest. +4. Via the confusion matrix, you can detect which classes are confused with each other (uni-directional or bi-directional). +5. On the **Precision-Recall** plot, you can observe which classes the model has difficulty in and which classes it does well. +6. According to insights you get here, you can, e.g., prioritize from which classes you need to collect more data. + +## Example + +The following is the model performance result for a Pokemon dataset (classes: _Charmander, Mewtwo, Pikachu, Squirtle_). + +![metric](https://storage.cloud.google.com/docs-media.encord.com/static/img/images/workflows/evaluate-classification-model/img_1.png) + +## Finding Important Metrics + +The three important metrics for this dataset are _Image-level Annotation Quality, Red Values_, and _Uniqueness_. +When we look at their correlation, we see that as the Image-Level Annotation Quality increases, model performance increases, too. On the other hand, Red Values and Uniqueness have a negative correlation with the model performance. + +When we look at the confusion matrix, we find that most of the predictions are correct; Meanwhile, we can easily observe that a significant part of the _Charmander_ images was predicted as _Pikachu_, resulting in low recall for the _Charmander_ and low precision for the _pikachu_ classes. So there might be value in investigating these wrongly labeled Charmander samples. + +## Performance by Metric + +Now, choose _Performance By Metric_ on the left sidebar. On this page, you can observe the **True-Positive Rate** as a function of the chosen metric. You can detect which regions model performs poorly or well, so that you can prioritize your next data collection and labeling work accordingly. Classes can be filtered via global top bar for class-specific visualization. From the image below, it can be seen that performance decreases when the image's redness property increases. So, we can find similar images, like the ones where the model is failing, and annotate more of them to boost the performance in this region. + +![performance_by_metric](https://storage.cloud.google.com/docs-media.encord.com/static/img/images/workflows/evaluate-classification-model/img_2.png) + +## Exploring the Individual Samples + +Using the explorer page, you can visualize the ranked images for specific outcomes (True Positives, False Positives). + +![explorer](https://storage.cloud.google.com/docs-media.encord.com/static/img/images/workflows/evaluate-classification-model/img_3.png) + +This page is very similar to the other explorer pages under _Data Quality_ and _Label Quality_ tabs; however, since you have the prediction results now, the images can be filtered according to their outcome type. When the **True Positive** outcome is selected, only the images that are predicted correctly will be shown; likewise, when the **False Positive** outcome is selected, only the wrongly predicted images will be shown. + +By inspecting False-Positive images, you can detect: + +- Where your model is failing. +- Possible duplicate errors. diff --git a/docs/active-how-to/active-evaluate-detection-models.md b/docs/active-how-to/active-evaluate-detection-models.md new file mode 100644 index 000000000..f818f75dd --- /dev/null +++ b/docs/active-how-to/active-evaluate-detection-models.md @@ -0,0 +1,107 @@ +--- +title: "Evaluating detection models" +slug: "active-evaluate-detection-models" +hidden: true +metadata: + title: "Evaluating detection models" + description: "Visualize model performance in Encord Active: mAP, IoU thresholds, object detection, and segmentation. Optimize with insights." + image: + 0: "https://files.readme.io/f0af951-image_16.png" +createdAt: "2023-07-11T16:27:42.175Z" +updatedAt: "2023-08-09T12:39:52.017Z" +category: "6480a3981ed49107a7c6be36" +--- + +Encord Active enables you to visualize the important performance metrics, such as mean Average-Precision (mAP), for your model. Performance metrics can be visualized based on different classes and intersection-over-Union (IoU) thresholds. Performance metrics are supported for bounding-boxes (object detection) and polygons (segmentation). For this workflow, you need to [import your model predictions](https://docs.encord.com/docs/active-import-model-predictions) into Encord Active. + +`Prerequisites:` Dataset, Labels, Object (bounding-box or polygon) Predictions. + +#### Steps + +1. Navigate to the _Model Quality_ > _Metrics_ tab on the left sidebar. +2. Under the **Subset selection scores**, you will see the average precision (AP) and average recall (AR) for each class in the graph to the left and Precision-Recall curves for each class on the graph to the right. +3. You can select classes of interest and change the IoU threshold on the upper sidebar to customize plots. +4. On the **Mean scores** plot, you can observe in which classes the model has difficulty and in which classes it does well. +5. According to insights you get here, you can, e.g., prioritize from which classes you need to collect more data. + +#### Example + +Comparing **person** and **clock** objects. + +![clock_vs_person_performance](https://storage.cloud.google.com/docs-media.encord.com/static/img/images/clock_vs_person_performance.png) + +The above figure shows that **clock** class degrades overall performance considerably. So, when collecting and labeling more data, prioritizing it over **person** class will make more sense for overall performance. + +## Finding Important Metrics + +**Visualize the relationship between your model performance and metrics** + +With this workflow, you will be able to identify the most important [Quality Metrics](https://docs.encord.com/docs/active-quality-metrics) for your model performance and prioritize further data exploration and actions. + +`Prerequisites:` Dataset, Labels, Predictions + +#### Steps: + +1. Navigate to the _Model Quality_ > _Metrics_ tab. +2. Select label classes to include in the top left drop-down menu. +3. Determine the IoU threshold using the slider in the top bar. By default, IoU threshold is set to 0.50. +4. Next, Encord Active automatically computes mAP, mAR, Metric Importance, and Metric Correlation. + + **Metric importance**: Measures the _strength_ of the dependency between a metric and model performance. A high value means that the model performance would be strongly affected by a change in the metric. For example, high importance in 'Brightness' implies that a change in that quantity would strongly affect model performance. Values range from 0 (no dependency) to 1 (perfect dependency, one can completely predict model performance simply by looking at this metric). + + **Metric [correlation](https://en.wikipedia.org/wiki/Correlation)**: Measures the _linearity and direction_ of the dependency between a metric and model performance. Crucially, this metric tells us whether a positive change in a metric will lead to a positive change (positive correlation) or a negative change (negative correlation) in model performance. Values range from -1 to 1. + +5. Metrics denoted with (P) are _Prediction-level metrics_ and metrics with (F) are _Frame-level metrics_. +6. Once an important metric is identified, navigate to _Performance By Metric_ in the _Model Quality_ tab. +7. Select the important metric you want to understand using the drop-down menu on the top bar. +8. By default, the performance chart is shown in aggregate for all classes: optionally, you can choose to decompose performance by class or select individual classes to be shown in the top left drop-down menu. +9. The plot shows the _Precision_ and the _False Negative Rate_ (FNR) by metric to help you identify which metric characteristics your model has a hard time predicting. + +## Performance by Metric + +![metric_importance](https://storage.cloud.google.com/docs-media.encord.com/static/img/images/index_importance.png) +Metric importance plots indicate that _Object Area - Relative (P)_ is an important metric that has an important relationship with the model performance. + +In this case, go to **Performance By Metric** page and choose "_Object Area - Relative (P)_" in the **Select metric for your predictions** drop-down menu. Here, you can understand why _Object Area - Relative (P)_ has a relationship with the model performance, and you can act based on insights you got from here. Let's examine _Object Area - Relative (P)_ metric: + +![metric_importance](https://storage.cloud.google.com/docs-media.encord.com/static/img/images/object_area_relative_performance.png) + +As indicated in the details, this metric refers to the object area as a percentage of the total image area. The blue dashed horizontal line (around 0.17 Precision and 0.77 FNR) is the average precision and false negative rate of the selected classes, respectively. So, what we get from the above graph is that objects, whose area is less than the 0.24%, have a very low performance. In other words, the model predictions that are small are very often incorrect. Similarly, labeled objects, for which the area is small, have a high false negative rate. + +Based on this insight, you may improve your model with several actions, such as: + +- Filtering model predictions to not include the smallest objects. +- Increasing your model's input resolution. +- Increasing the confidence threshold for small objects. + +## Exploring the Individual Samples + +Using the explorer page, you can visualize the ranked images for specific outcomes (True Positives, False Positives, False-Negatives). + +![metric_importance](https://storage.cloud.google.com/docs-media.encord.com/static/img/images/workflows/evaluate-detection-model/img_1.png) + +#### Identifying False Positives + +By selecting the false positive outcome in the top bar, you can quickly identify in which areas your model fails. With this functionality you can: + +- Detect missing ground-truth labels. +- Diagnose annotation errors. +- Learn which classes your model confuses. + +and more, depending on your use cases. + +1. Navigate to the _Model Quality_ > _Explorer_ tab on the left sidebar and choose **False Positive** as the Outcome in + the top bar. +2. Visualize predictions and try to get insights on where model fails. +3. Under each image, an explanation is given for why the prediction is false positive. The three reasons are: + - No overlapping with the ground-truth object. This means that there is no label with the same class as the predicted class which overlaps with the prediction. + - IoU is too low. This means that the prediction does overlap with a label of the same class. However, the IoU between the prediction and the label is lower than the IoU threshold which is selected in the top bar. + - Prediction with higher confidence is already matched with the ground-truth object. Since the mAP score chooses the prediction with the highest model confidence that has an IOU larger than the set threshold, other predictions that matched the label with a sufficiently high IOU will be considered false positives. +4. Note, that the boxed magenta object is the prediction, while the remaining objects are labels for the same image/frame. + +#### Identifying False Negatives + +Using the false negatives tab in Encord Active, you can quickly find out which objects the model misses. + +1. Choose **False Positive** as the Outcome in the top bar. +2. Observe the ground-truth objects (purple boxed objects) that are missed by the model and get insights on where the model fails. The remaining objects in the image are the model predictions for that image. diff --git a/docs/active-how-to/active-exploring-correlations.md b/docs/active-how-to/active-exploring-correlations.md new file mode 100644 index 000000000..96eb1d785 --- /dev/null +++ b/docs/active-how-to/active-exploring-correlations.md @@ -0,0 +1,15 @@ +--- +title: "Exploring correlations" +slug: "active-exploring-correlations" +hidden: true +metadata: + title: "Exploring correlations" + description: "Discovering Data Correlations: Explore correlations for insights. Uncover relationships in your data." + image: + 0: "https://files.readme.io/6c9b7bb-image_16.png" +createdAt: "2023-07-11T16:27:42.582Z" +updatedAt: "2023-08-09T12:41:22.811Z" +category: "6480a3981ed49107a7c6be36" +--- + +**Feature in Beta. Workflow description coming soon.** diff --git a/docs/active-how-to/active-exploring-data-and-label-distributions.md b/docs/active-how-to/active-exploring-data-and-label-distributions.md new file mode 100644 index 000000000..2964cdd7b --- /dev/null +++ b/docs/active-how-to/active-exploring-data-and-label-distributions.md @@ -0,0 +1,71 @@ +--- +title: "Explore data and label distributions" +slug: "active-exploring-data-and-label-distributions" +hidden: false +metadata: + title: "Explore data and label distributions" + description: "Visualize & understand distributions with Encord Active. Optimize models by uncovering missing data and label insights." + image: + 0: "https://files.readme.io/8aedbf8-image_16.png" +createdAt: "2023-07-11T16:27:42.230Z" +updatedAt: "2023-08-09T12:43:09.288Z" +category: "6480a3981ed49107a7c6be36" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +Encord Active provides the capability to visually explore data and label distributions using pre-defined metrics and custom metrics. + +Gaining insights into data distribution across diverse quality metrics allows for the identification of potential data gaps that could influence model performance on outliers or edge cases. + +In a Project, access the _Analytics_ page to use the outliers, metric distribution charts and explore the _2D Metrics View_ for data and labels. + +## Outliers + +In the Outliers section, Active provides a quick summary of data or label outliers that you might want to investigate. + +**Data** + + + +**Annotations** + + + +## 2D metrics + + +In the _2D Metrics View_, one metric's values are plotted on the x-axis, while the values of the other metric are represented on the y-axis. This visualization allows for an examination of the relationship between these two metrics and their potential interactions within the data and labels. + +**Data** + + + +**Annotations** + + + +## Metric Distribution + +Data and label metric or property distributions can be visualized by using _Metric or Property_ drop-down menu within the _Metric Distribution_ section of the _Analytics_ page. + +**Data** + + + +**Annotations** + + + + diff --git a/docs/active-how-to/active-exploring-embeddings.md b/docs/active-how-to/active-exploring-embeddings.md new file mode 100644 index 000000000..e60c31b53 --- /dev/null +++ b/docs/active-how-to/active-exploring-embeddings.md @@ -0,0 +1,43 @@ +--- +title: "Explore embedding plots" +slug: "active-exploring-embeddings" +hidden: false +metadata: + title: "Exploring embedding plots" + description: "Improve your data curation and active learning workflows with the Embeddings view. Visualize clusters and gain deeper data understanding. Optimize workflows." + image: + 0: "https://files.readme.io/566b786-image_16.png" +createdAt: "2023-07-11T16:27:42.145Z" +updatedAt: "2023-08-09T12:45:07.861Z" +category: "6480a3981ed49107a7c6be36" +--- +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +Encord Active incorporates embedding plots — a two-dimensional visualization technique employed to represent intricate, high-dimensional data in a more comprehensible and visually coherent manner. This technique reduces data dimensionality while preserving the inherent structure and patterns within the original data. + +The embedding plot aids in identifying interesting/noteworthy clusters, inspecting outliers, and excluding unwanted samples. Accessible on the **Project > Explorer** page, the embedding plot is adaptable to data or labels. + +![Vibrant 2D data embedding plot highlighting data patterns and clusters](https://storage.googleapis.com/docs-media.encord.com/static/img/active/workflows/explorer-data-embedding-plot_01.png) + +Notice how images are clustered around certain regions. By defining a rectangular area on the plot, users can quickly isolate and analyze data points within that defined region. This approach facilitates the exploration of commonalities among these samples. + +Upon selecting a region, the content within the _Explorer_ page will be adjusted accordingly. Various actions can be executed with the chosen group: +- Use [Collections](https://docs.encord.com/docs/active-collections) to tag and group images. +- Investigate the performance of the selected samples within the _Predictions_ page. +- Establish subsets similar to these and then conduct comparisons. + +Samples within the data embedding plot lack label information, resulting in uniform coloration across all points. Data points in the label embedding plot are color-coded based on their label classes. + +![Vibrant 2D label embedding plot highlighting label patterns and clusters](https://storage.googleapis.com/docs-media.encord.com/static/img/active/workflows/explorer-label-embedding-plot_01.png) + +> 👍 Tip +> The embedding plot is adaptable to data or labels. In addition to selecting points within a rectangular area, the label embedding plot offers the functionality to filter data points based on the label classes. + +With the label embedding plot, users can: +- Identify classes that are often confused with each other. +- Detect samples with incorrect labeling, such as instances of a different class embedded within a larger cluster of another class. +- Spot outliers and subsequently eliminate them from the dataset. diff --git a/docs/active-how-to/active-exploring-image-similarity.md b/docs/active-how-to/active-exploring-image-similarity.md new file mode 100644 index 000000000..a7efac7dd --- /dev/null +++ b/docs/active-how-to/active-exploring-image-similarity.md @@ -0,0 +1,76 @@ +--- +title: "Explore image similarity" +slug: "active-exploring-image-similarity" +hidden: false +metadata: + title: "Explore image similarity" + description: "Enhance data quality with visual similarity search in Encord Active. Detect edge cases, duplicates, and label quality. Streamline dataset management." + image: + 0: "https://files.readme.io/7d31a4f-image_16.png" +createdAt: "2023-07-11T16:27:42.192Z" +updatedAt: "2023-08-09T12:46:40.225Z" +category: "6480a3981ed49107a7c6be36" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +Frequently, when distinctive characteristics arise within a dataset, identifying similar images becomes crucial (e.g., for relabeling or removal). Detecting these instances assists in assessing the thoroughness of data representation and the accuracy of labels, particularly in situations where certain classes may be underrepresented or labels could be inaccurately assigned. As datasets expand, manual identification of such cases becomes progressively challenging. + +Leverage Encord Active's **similarity search** feature to effortlessly locate semantically akin images in your dataset. Upon identifying an edge case or duplicate, applying tags and executing actions such as relabeling or deletion can be performed. + +## Quick Tour + +All of the sections in the Quick Tour assume that you are already in a Project. + +> 👍 Tip +> Choose any image in the Explorer workspace and click its _Similar items_ [!Similarity button](https://storage.googleapis.com/docs-media.encord.com/static/img/active/workflows/similarity-button.png) button. This displays images similar to the selected one, including any duplicates if they exist. + +### Explorer + +The _Explorer_ page has three areas that can help you find duplicate images in your Project. + +
+ +1: Duplicates Shortcut + +Found in the _Overview_ tab, any images that have a `Uniqueness` value of 0 to 0.0001 are highlighted as duplicates. You can adjust this value from the _Filter_ tab. + +![Duplicates shortcut](https://storage.googleapis.com/docs-media.encord.com/static/img/active/workflows/image-duplicates-qt-01.png) + +
+ +
+ +2: Sorting by `Uniqueness` + +The entire Project can be sorted by `Uniqueness`. Sort by ascending order to display duplicates first. + +![Sorting by `Uniqueness`](https://storage.googleapis.com/docs-media.encord.com/static/img/active/workflows/image-duplicates-qt-02.png) + +
+ +
+ +3: Filtering by `Uniqueness` + +Filter the entire project using `Uniqueness`. + +Go to **Filter** tab > **Add Filter** > **Data Quality Metrics** > **Uniqueness** A small histogram diagram appears above the filter. + +You can then change the filter settings to specify a range closer to 0. + +![Filtering by `Uniqueness`](https://storage.googleapis.com/docs-media.encord.com/static/img/active/workflows/image-duplicates-qt-03.png) + +
+ +### Analytics + +In a Project, go to the _Analytics_ page and pick the `Uniqueness` quality metric for the _Metric Distribution_ section. + +![Distribution of data based on Uniqueness scores](https://storage.googleapis.com/docs-media.encord.com/static/img/active/workflows/image-duplicates-qt-anal-01.png) + +The chart displays the distribution of data based on the `Uniqueness` scores. \ No newline at end of file diff --git a/docs/active-how-to/active-export-collection-to-csv.md b/docs/active-how-to/active-export-collection-to-csv.md new file mode 100644 index 000000000..50700e286 --- /dev/null +++ b/docs/active-how-to/active-export-collection-to-csv.md @@ -0,0 +1,73 @@ +--- +title: "Export Collections to CSV" +slug: "active-export-collection-to-csv" +hidden: false +metadata: + title: "Export Collections to CSV" + description: "Learn how to export Collections in Encord Active Cloud CSV." +category: "6480a3981ed49107a7c6be36" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +Any Collection can be exported to CSV. The CSV file contains the following: + +- DataTitle: Specifies the assigned name of the image, image group, image sequence, or video when the data uploaded to Encord. + +- DataHash: Specifies the unique data ID for each image, image group, image sequence, or video in the Collection. + +- ImageTitle: Specifies the file name for the image or video in the Collection. + +- ImageHash: Specifies the unique ID for each image or video contained in the Collection. + +- VideoFrame: Specifies the frame number in the Collection. This value is `0` if the Collection does not contain video. + +- EditorURL: Specifies Annotate Label Editor unique ID for each image or video frame in the Collection. + +**Example:** + +``` +DataTitle DataHash ImageTitle ImageHash VideoFrame EditorURL +image-sequence-8c45bcea 2171a8de-d8ab-4b5d-aec0-17f49239f98b blueberries-17.JPG 05e23742-daf5-44cc-8602-a8925d381d89 9 https://dev.encord.com/label_editor/2171a8de-d8ab-4b5d-aec0-17f49239f98b&dc1e94d0-879c-4374-b523-895a40ef87a9/9 +image-sequence-8c45bcea 2171a8de-d8ab-4b5d-aec0-17f49239f98b blueberries-01.JPG 1932938e-ec83-443c-8ecd-a9f229bbfa64 0 https://dev.encord.com/label_editor/2171a8de-d8ab-4b5d-aec0-17f49239f98b&dc1e94d0-879c-4374-b523-895a40ef87a9/0 +image-group-6274340e 24104c92-00fe-416c-9620-59df4b0295dd cherries_12.jpg 32ad8717-81b3-40f8-98d3-d279a3955b80 3 https://dev.encord.com/label_editor/24104c92-00fe-416c-9620-59df4b0295dd&dc1e94d0-879c-4374-b523-895a40ef87a9/3 +image-group-6274340e 24104c92-00fe-416c-9620-59df4b0295dd cherries_14.jpg 394bca42-8f9a-4fa5-b501-f0c4731d9fbe 1 https://dev.encord.com/label_editor/24104c92-00fe-416c-9620-59df4b0295dd&dc1e94d0-879c-4374-b523-895a40ef87a9/1 +Blueberries.mp4 6eabace8-2d84-4a77-b266-f86f4ef74fee Blueberries.mp4 6eabace8-2d84-4a77-b266-f86f4ef74fee 1 https://dev.encord.com/label_editor/6eabace8-2d84-4a77-b266-f86f4ef74fee&dc1e94d0-879c-4374-b523-895a40ef87a9/1 +Blueberries.mp4 6eabace8-2d84-4a77-b266-f86f4ef74fee Blueberries.mp4 6eabace8-2d84-4a77-b266-f86f4ef74fee 138 https://dev.encord.com/label_editor/6eabace8-2d84-4a77-b266-f86f4ef74fee&dc1e94d0-879c-4374-b523-895a40ef87a9/138 +Blueberries.mp4 6eabace8-2d84-4a77-b266-f86f4ef74fee Blueberries.mp4 6eabace8-2d84-4a77-b266-f86f4ef74fee 193 https://dev.encord.com/label_editor/6eabace8-2d84-4a77-b266-f86f4ef74fee&dc1e94d0-879c-4374-b523-895a40ef87a9/193 +image-sequence-8c45bcea 2171a8de-d8ab-4b5d-aec0-17f49239f98b blueberries-25.JPG cea53be3-f0f2-41de-8b54-79c471efb2fc 1 https://dev.encord.com/label_editor/2171a8de-d8ab-4b5d-aec0-17f49239f98b&dc1e94d0-879c-4374-b523-895a40ef87a9/1 +image-group-6274340e 24104c92-00fe-416c-9620-59df4b0295dd cherries_09.jpg f225843f-c8de-4368-80c8-e3a13b24ea23 6 https://dev.encord.com/label_editor/24104c92-00fe-416c-9620-59df4b0295dd&dc1e94d0-879c-4374-b523-895a40ef87a9/6 +``` + +
+ +To export a Collection to CSV: + +1. Log in to the Encord platform. + The landing page for the Encord platform appears. + +2. Click **Active** in the main menu. + The landing page for Active appears. + +3. Click the Project. + The landing page for the Project appears with the _Explorer_ tab selected. + +4. Click **Collections**. + The _Collections_ page appears. + +5. Select the checkbox for the Collection for export. + +6. Click the more icon. + A small menu appears. + +7. Click **Generate CSV** from the menu. + The _Download Generated CSV_ dialog appears when Active finishes generating a CSV file for the Collection. + +8. Click **Download CSV**. + The CSV file downloads to your local computer. + +
\ No newline at end of file diff --git a/docs/active-how-to/active-identify-outliers.md b/docs/active-how-to/active-identify-outliers.md new file mode 100644 index 000000000..c28bffd6b --- /dev/null +++ b/docs/active-how-to/active-identify-outliers.md @@ -0,0 +1,71 @@ +--- +title: "Finding data outliers" +slug: "active-identify-outliers" +hidden: true +metadata: + title: "Finding data outliers" + description: "Discover data outliers with Encord Active: Identify outliers using Interquartile ranges. Streamline data analysis." + image: + 0: "https://files.readme.io/7ecdcf8-image_16.png" +createdAt: "2023-07-11T16:27:42.224Z" +updatedAt: "2023-08-09T12:56:55.246Z" +category: "6480a3981ed49107a7c6be36" +--- + +With Encord Active, you can quickly find data and label outliers for pre-defined metrics, custom metrics, and label classes. Encord Active finds outliers using precomputed Interquartile ranges. + +## Setup + +If you haven't installed Encord Active, visit [installation](https://docs.encord.com/docs/active-oss-install). In this workflow we will be using the BDD validation dataset. + +## Data outliers + +### 1. Find outliers + +Navigate to the _Data Quality_ > _Summary_ tab. Here, the [Quality Metrics](https://docs.encord.com/docs/active-quality-metrics) will be presented as expandable panes. + +Click on a metric to get deeper insight into _moderate outliers_ and _severe outliers_. The most severe outliers are presented first in the pane. + +Use the slider to navigate your data from most severe outlier to least severe. + +![data-quality-outliers.png](https://storage.cloud.google.com/docs-media.encord.com/static/img/images/data-quality-outliers.png) + +### 2. Tag outliers + +When you have identified outliers of interest, use the [tags](https://docs.encord.com/docs/active-tagging) or [bulk tagging](https://docs.encord.com/docs/active-tagging#bulk-tagging) feature to save a group of images. + +![data-quality-outliers-tagging.png](https://storage.cloud.google.com/docs-media.encord.com/static/img/images/data-quality-outliers-tagging.png) + +After creating a tagged <>, you can access it at the bottom of the left sidebar in the _Actions_ tab. + +### 3. Act on outliers + +Within the _Actions_ tab, click _Filter dataframe on_ and select _tags_. Next, choose the tags you would like to export, relabel, augment, review, or delete from your dataset. + +![data-quality-outliers-action.png](https://storage.cloud.google.com/docs-media.encord.com/static/img/images/data-quality-outliers-action.png) + +## Label outliers + +### Steps + +### 1. Find outliers + +Navigate to the _Label Quality_ > _Summary_ tab. Here each [Quality Metric](https://docs.encord.com/docs/active-quality-metrics) will be presented as an expandable panes. + +![label-quality-outliers.png](https://storage.cloud.google.com/docs-media.encord.com/static/img/images/label-quality-outliers.png) + +You can click on a metric to get a deeper insight into _moderate outliers_ and _severe outliers_. Severe outliers are presented first in the pane. + +### 2. Tag outliers + +Next, you can use the slider to navigate your data from most severe outlier to least severe. + +![label-quality-outliers-slider.png](https://storage.cloud.google.com/docs-media.encord.com/static/img/images/label-quality-outliers-slider.png) + +When you have identified outliers of interest use the [tags](https://docs.encord.com/docs/active-tagging) or [bulk tagging](https://docs.encord.com/docs/active-tagging#bulk-tagging) feature to select a group of images. After creating a tagged <>, you can access it at the bottom of the left sidebar in the _Actions_ tab. + +![label-quality-outliers-tagging.png](https://storage.cloud.google.com/docs-media.encord.com/static/img/images/label-quality-outliers-tagging.png) + +Within the _Actions_ tab, click _Filter data frame on_ and select _tags_. Next, choose the tags you would like to export, relabel, augment, review, or delete from your dataset. + +![label-quality-outliers-action.png](https://storage.cloud.google.com/docs-media.encord.com/static/img/images/label-quality-outliers-action.png) diff --git a/docs/active-how-to/active-import-model-predictions-cloud.md b/docs/active-how-to/active-import-model-predictions-cloud.md new file mode 100644 index 000000000..cddb4575e --- /dev/null +++ b/docs/active-how-to/active-import-model-predictions-cloud.md @@ -0,0 +1,168 @@ +--- +title: "Import Predictions" +slug: "active-import-model-predictions-cloud" +hidden: false +metadata: + title: "Import Model Predictions to Active Cloud" + description: "Assess model quality with Encord Active Cloud analytics and metrics. Optimize model evaluation." +category: "6480a3981ed49107a7c6be36" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +--- + +Encord Active does not only provide a streamlined method to currate your image data, Active also provides metrics and analytics to optimize your model's performance. Simply upload your model's predictions in to Active Cloud to start. + +Your predictions must be imported to Active, before you can use the Predictions feature on the _Explorer_ page and the _Model Evaluation_ page. + +## STEP 1: Prepare Your Predictions for Import + +> :warning: **Disclaimer**: We strongly recommend that you are knowledgeable about the `Encord` SDK. If you are unfamiliar with the SDK or if you do not understand the following boilerplate code, refer to this topic in the [SDK documentation](https://docs.encord.com/reference/sdk-import-labels-annotations). + +Within Encord Active, predictions use the same format as labels. For the most part, creating predictions programmatically is the same as creating labels. + +We'll start from the "Label Creation Boilerpate", which shows you how to upload labels into Encord. Then we'll show you how to modify the boilerplate to store predictions. + +You can create labels programmatically in Encord following this structure: + +```python Label Creation Boilerplate + +# Import dependencies +import os + +from encord import EncordUserClient, Project +from encord.objects import LabelRowV2 + +# Authenticate client and identify project +ssh_private_key_path = os.getenv("ENCORD_CLIENT_SSH_PATH") +project_hash = os.getenv("ENCORD_PROJECT_HASH") + +assert ssh_private_key_path is not None +assert project_hash is not None + +client = EncordUserClient.create_with_ssh_private_key(ssh_private_key_path) +project: Project = client.get_project(project_hash) + +# Add labels +def add_information_to_lr(lr: LabelRowV2): + ... # Logic for adding labels/predictions + +# Save labels +label_rows: list[LabelRowV2] = project.list_label_rows_v2() +for lr in label_rows: + lr.initialise_labels() + add_information_to_lr(lr) + lr.save() +``` + +> ℹ️ Note +> +>The documentation to add labels to the label row is available [here](https://docs.encord.com/reference/sdk-working-with-labels). + +To store the labels as predictions instead, you need to change the following things in the Label Creation Boilerplate: + +- Initialize the label rows without the existing labels [see Storing Predictions Boilerplate line 27-30] +- Store the predictions as serialized json [see Storing Predictions Boilerplate line 32-36] +- Make `add_information_to_lr` use your model to "create labels" (remember that labels and predictions are equivalent in terms of structure) [see Storing Predictions Boilerplate line 19] + +```python Store Predictions Boilerplate +# Import dependencies +import os + +from encord import EncordUserClient, Project +from encord.objects import LabelRowV2 + +# Authenticate client and identify project +ssh_private_key_path = os.getenv("ENCORD_CLIENT_SSH_PATH") +project_hash = os.getenv("ENCORD_PROJECT_HASH") + +assert ssh_private_key_path is not None +assert project_hash is not None + +client = EncordUserClient.create_with_ssh_private_key(ssh_private_key_path) +project: Project = client.get_project(project_hash) + +# Make `add_information_to_lr` use your model to "create labels" +def add_information_to_lr(lr: LabelRowV2): + ... # Logic for adding labels/predictions + + +label_rows: list[LabelRowV2] = project.list_label_rows_v2() +serialized_output: list[dict] = [] +for lr in label_rows: + +# Initialize the label rows without the existing labels + lr.initialise_labels( # ignore existing labels + include_object_feature_hashes=set(), + include_classification_feature_hashes=set(), + ) +# Store the predictions as serialized json + serialized_output.append(lr.to_encord_dict()) # Serialize + +import json +with open("predictions.json", "w") as f: + json.dump(serialized_output, f) + +``` + +Now that you have the `predictions.json` file, you can move to STEP 2 and import the JSON file into the Active UI. + +## STEP 2: Import Predictions Set + +Once you have the `predictions.json` file from STEP 1, Prediction Sets can be imported from both the _Model Evaluation_ page and the **Upload predictions** button ( **+** ) on the _Overview_ tab of the _Predictions_ page in the _Explorer_ page. + +
+ +To import Prediction Sets into Active from the Model Evaluation page: + +1. Contact Encord to get started with Encord Active. + +2. Log in to the Encord platform. + The landing page for the Encord platform appears. + +3. Click **Active** in the main menu. + The landing page for Active appears. + +4. Click the Project. + The landing page for the Project appears with the _Explorer_ tab selected. + +5. Click the **Model Evaluation** tab. + The _Model Evaluation page_ appears. + + ![Import Model Predictions](https://storage.googleapis.com/docs-media.encord.com/static/img/active/user-guide/import-predictions.png) + +6. Click the **Import prediction** button. + The _Upload predictions_ dialog appears. + +7. Type a meaningful name for the prediction. + +8. Click the **Select Predictions File** button. + A dialog box appears. + +9. Select the JSON file to upload. + +10. Click **Open**. + +11. Click **Start Upload**. + Once the upload completes the _Model Evaluation_ page and the _Predictions_ page on the _Explorer_ page are available for use. + +
+ +> ℹ️ Note +> +> If you have any issues importing your predictions contact your CSM or contact us at support@encord.com. + +## Next Steps + +### Model and Prediction Validation + +[block:html] +{ + "html": "\n\n\n \n \n Clickable Div\n \n\n\n 1. Import from Annotate 3. Review Prediction Metrics 4. Create Collection 5. Send to Annotate 6. Sync with Annotate 7. Update Collection\n\n" +} +[/block] \ No newline at end of file diff --git a/docs/active-how-to/active-open-all-tasks.md b/docs/active-how-to/active-open-all-tasks.md new file mode 100644 index 000000000..fe55947f0 --- /dev/null +++ b/docs/active-how-to/active-open-all-tasks.md @@ -0,0 +1,89 @@ +--- +title: "Open Tasks from Active to Annotate" +slug: "active-open-all-tasks" +hidden: false +metadata: + title: "Open All Tasks from Active to Annotate" + description: "Learn how to open Annotate tasks using Collections in Encord Active Cloud." +category: "6480a3981ed49107a7c6be36" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +After creating a Collection, you can open annotation tasks of images/frames in the Collection, when the Collection is sent to Annotate. That means regardless of the image's/frame's current status in the Workflow (Annotate, Review, Complete) the task is opened/reopend in Annotate. + +### Important Information + +- There is no insight on the image's/frame's current status in the Workflow (Annotate, Review, Complete) in Annotate. + +- Reopening a closed task maintains the priority specified in the Collection. + +> ❗️ CRITICAL INFORMATION +> +> This action cannot be undone. This is a bulk operation on all images/frames in the Collection, and once the action is submitted to Annotate, all images/frames in the Collection are opened/reopened in their Annotate Project. That means regardless of the image's/frame's current status in the Workflow (Annotate, Review, Complete) the task is opened/reopend in Annotate. + +## Open tasks from Active to Annotate + +In addition to opening tasks, all priorities, comments on individual frames/images, and the comment added for the Collection, are sent to the Annotate Project. + +
+ +To send a Collection to Annotate: + +1. Log in to the Encord platform. + The landing page for the Encord platform appears. + +2. Click **Active** in the main menu. + The landing page for Active appears. + +3. Click the Project. + The landing page for the Project appears with the _Explorer_ tab selected. + +4. Click **Collections**. + The _Collections_ page appears. + +5. Select the checkbox for the Collection to send to Annotate. + +6. Click **Send to Annotate**. + The _Send to Annotate_ dialog appears. + + +7. Specify the following: + + - **Adjust priority of tasks:** Specifies the priority of all the images/frames in the Collection, when the images/frames are sent to Annotate. + + - **Include model predictions as pre-labels:** Includes model predictions as pre-labels in all frames/images in the Collection + + - **Reopen tasks:** Reopens all annotation tasks on images/frames in the Collection. + + > ❗️ CRITICAL INFORMATION + > + > This action cannot be undone. + + - **Delete all existing labels:** Deletes all exsiting labels on all frames/images in the Collection. + + > ❗️ CRITICAL INFORMATION + > + > This action cannot be undone. + + - **Leave a comment:** Applies the comment on all frames/images in the Collection. + + > 🚧 WARNING + > + > Comments applied on a Collection cannot be deleted in bulk. We recommend using comments, created on a Collection, in image sequence and video Datasets. + +8. Click **Submit**. + The _Subset created successfully_ dialog appears once creation completes. + +9. Click the link in the dialog to go to the Project in Annotate. + +10. Users in Annotate can then view opened/reopend tasks, priorities on tasks, and [can access comments](https://docs.encord.com/docs/annotate-label-editor#comments) made in the Collection. + +> ℹ️ Note +> After annotating the data, sync the Project data between Annotate and Active. To sync the Project data go to **Active > [select the project] > click More > Sync Project Data**. + +
\ No newline at end of file diff --git a/docs/active-how-to/active-oss-exploring-data-and-label-distributions.md b/docs/active-how-to/active-oss-exploring-data-and-label-distributions.md new file mode 100644 index 000000000..89c678635 --- /dev/null +++ b/docs/active-how-to/active-oss-exploring-data-and-label-distributions.md @@ -0,0 +1,39 @@ +--- +title: "Exploring data and label distributions" +slug: "active-oss-exploring-data-and-label-distributions" +hidden: false +metadata: + title: "Exploring data and label distributions" + description: "Visualize & understand distributions with Encord Active. Optimize models by uncovering missing data and label insights." + image: + 0: "https://files.readme.io/8aedbf8-image_16.png" +createdAt: "2023-07-11T16:27:42.230Z" +updatedAt: "2023-08-09T12:43:09.288Z" +category: "65a71bbfea7a3f005192d1a7" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +Encord Active provides the capability to visually explore data and label distributions via pre-defined metrics, custom metrics, and label classes. + +Gaining insights into data distribution across diverse quality metrics allows for the identification of potential data gaps that could influence model performance on outliers or edge cases. + +[//]: # (In this workflow, the [COCO Validation 2017 Dataset](https://docs.encord.com/docs/active-cli#coco-validation-2017-dataset) is used as an example.) + +## Static charts + +Access the _Summary_ page to utilize the metric distribution charts and explore the _2D Metrics View_ for data and labels. + +In the _2D Metrics View_, one metric's values are plotted on the x-axis, while the values of the other metric are represented on the y-axis. This visualization allows for an examination of the relationship between these two metrics and their potential interactions within the data and labels. + +The label class distribution can be visualized by selecting the `Label Class` property from the drop-down menu within the Metric Distribution section on the _Label_ tab. + +## Interactive exploration + +Access the Explorer page and select a quality metric (such as `Brightness` or `Object Annotation Quality`) from the _Order by_ drop-down. This menu is located above the natural language search bar and enables data to be organized according to the chosen criteria. + +The dashboard displays data distribution based on the selected metric. Navigating through the ordered dataset is possible by progressing through visualized data items. It is also feasible to perform range selections on the distribution chart or apply the chosen metric as a filter and utilize the slider to define a specific range. diff --git a/docs/active-how-to/active-oss-exploring-embeddings.md b/docs/active-how-to/active-oss-exploring-embeddings.md new file mode 100644 index 000000000..a3acc6bb4 --- /dev/null +++ b/docs/active-how-to/active-oss-exploring-embeddings.md @@ -0,0 +1,48 @@ +--- +title: "Explore embedding plots" +slug: "active-oss-exploring-embeddings" +hidden: false +metadata: + title: "Explore embedding plots" + description: "Enhance active learning with 2D embeddings in Encord Active. Visualize clusters and gain deeper data understanding. Optimize workflows." + image: + 0: "https://files.readme.io/566b786-image_16.png" +createdAt: "2023-07-11T16:27:42.145Z" +updatedAt: "2023-08-09T12:45:07.861Z" +category: "65a71bbfea7a3f005192d1a7" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +**Enhance the active learning cycle with embedding plots** + +Encord Active incorporates embedding plots — a two-dimensional visualization technique employed to represent intricate, high-dimensional data in a more comprehensible and visually coherent manner. This technique reduces data dimensionality while preserving the inherent structure and patterns within the original data. + +The embedding plot aids in identifying noteworthy clusters, gaining a deeper understanding of the data, performing weak labeling on images, and excluding undesirable images. Accessible on the **Explorer** page, the embedding plot is adaptable to data or labels based on the chosen option in the _Order by_ drop-down. + +[//]: # (In this workflow, the [COCO Validation 2017 Dataset](https://docs.encord.com/docs/active-cli#coco-validation-2017-dataset) is used as an example.) + +![Vibrant 2D data embedding plot highlighting data patterns and clusters](https://storage.googleapis.com/docs-media.encord.com/static/img/active/workflows/explorer-data-embedding-plot.png) + +Notice how images are clustered around certain regions. By defining a rectangular area on the plot, users can quickly isolate and analyze data points within that defined region. This approach facilitates the exploration of commonalities among these samples. + +Upon selecting a region, the content within the Explorer page will be adjusted accordingly. Various actions can be executed with the chosen group: +- Utilize the [tagging feature](https://docs.encord.com/docs/active-tagging) to mark them and posteriorly [forward them for labeling](https://docs.encord.com/docs/active-relabeling). +- Investigate the performance of the selected samples within the _Predictions_ page. +- Establish subsets similar to these and then conduct comparisons. + +Samples within the data embedding plot lack label information, resulting in uniform coloration across all points. Conversely, data points in the label embedding plot are color-coded based on their respective label classes. + +![Vibrant 2D label embedding plot highlighting label patterns and clusters](https://storage.googleapis.com/docs-media.encord.com/static/img/active/workflows/explorer-label-embedding-plot.png) + +> 👍 Tip +> The embedding plot is adaptable to data or labels based on the chosen option in the _Order by_ drop-down. In addition to selecting points within a rectangular area, the label embedding plot offers the functionality to filter data points based on the label classes. + +With the label embedding plot, users can: +- Identify classes that are often confused with each other. +- Detect samples with incorrect labeling, such as instances of a different class embedded within a larger cluster of another class. +- Spot outliers and subsequently eliminate them from the dataset. diff --git a/docs/active-how-to/active-oss-exploring-image-similarity.md b/docs/active-how-to/active-oss-exploring-image-similarity.md new file mode 100644 index 000000000..62ed06493 --- /dev/null +++ b/docs/active-how-to/active-oss-exploring-image-similarity.md @@ -0,0 +1,36 @@ +--- +title: "Exploring image similarity" +slug: "active-oss-exploring-image-similarity" +hidden: false +metadata: + title: "Exploring image similarity" + description: "Enhance data quality with visual similarity search in Encord Active. Detect edge cases, duplicates, and label quality. Streamline dataset management." + image: + 0: "https://files.readme.io/7d31a4f-image_16.png" +createdAt: "2023-07-11T16:27:42.192Z" +updatedAt: "2023-08-09T12:46:40.225Z" +category: "65a71bbfea7a3f005192d1a7" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +**Mine edge cases, duplicates, and check the quality of your labels with visual similarity search** + +Frequently, when distinctive characteristics arise within a dataset, identifying similar images becomes crucial (e.g., for relabeling or removal). Detecting these instances assists in assessing the thoroughness of data representation and the accuracy of labels, particularly in situations where certain classes may be underrepresented or labels could be inaccurately assigned. As datasets expand, manual identification of such cases becomes progressively challenging. + +Leverage Encord Active's **similarity search** feature to effortlessly locate semantically akin images in your dataset. Upon identifying an edge case or duplicate, applying tags and executing actions such as relabeling or deletion can be performed. + +[//]: # (In this workflow, the [COCO Validation 2017 Dataset](https://docs.encord.com/docs/active-cli#coco-validation-2017-dataset) is used as an example.) + +## Steps + +1. Access the _Explorer_ page and locate the image or label of interest. +2. Click the **Similar items** button associated with the selected sample. Encord Active will arrange the samples on the _Explorer_ page, showcasing the most semantically similar images first. + ![Displaying similar images based on the similarity search query](https://storage.googleapis.com/docs-media.encord.com/static/img/active/workflows/explorer-image-similarity-search.png) + +> 👍 Tip +> To cancel the similarity search, you can click the X button located in the top right corner of the chosen sample or the RESET FILTERS button positioned near the natural language search bar. diff --git a/docs/active-how-to/active-oss-install.md b/docs/active-how-to/active-oss-install.md new file mode 100644 index 000000000..2b72d15dc --- /dev/null +++ b/docs/active-how-to/active-oss-install.md @@ -0,0 +1,82 @@ +--- +title: "Install Active OS" +slug: "active-oss-install" +hidden: false +metadata: + title: "Install Active OS" + description: "Installation guide for Encord Active OS. Streamline setup. Get started with Encord installation." +category: "65a71bbfea7a3f005192d1a7" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +## Prerequisites + +> ℹ️ Note +Make sure you have [python 3.9](https://www.python.org/downloads/release/python-3917/) and [Git VCS](https://git-scm.com/download) installed on your system. + +To install the correct version of python you can use [pyenv](https://github.com/pyenv/pyenv), [brew (mac only)](https://formulae.brew.sh/formula/python@3.9) or simply [download](https://www.python.org/downloads/release/python-3917/) it. + + +## From PyPi + +Install `encord-active` in your favorite Python environment using the following commands: + +```shell Linux/macOS +python3.9 -m venv ea-venv +source ea-venv/bin/activate +pip install encord-active +``` +```shell Windows +python -m venv ea-venv +ea-venv\Scripts\activate +pip install encord-active +``` + +### COCO extras + +If you intend to work with files using COCO format you'll have to install Encord Active with an extra dependency: + +```shell +pip install encord-active[coco] +``` + +> ℹ️ Note +You might need to install `xcode-select` if you are on Mac or `C++ build tools` if you are on Windows. + + +## Check the Installation + +To check what version of Encord Active is installed, run: + +```shell +$ encord-active --version +``` + +This command must be run in the same virtual environment where you installed the package. + +The `--help` option provides some context to what you can do with `encord-active`. If you'd like to explore the available commands in the Command Line Interface (CLI), you can refer to the [CLI section](https://docs.encord.com/docs/active-cli) for detailed information. + +## Docker + +We also provide a docker image that works exactly as the CLI. + +```shell +docker run -it --rm -p 8000:8000 -v ${PWD}:/data encord/encord-active +``` + +Running the previous command will mount your current working directory, so everything that happens inside the docker container will persist after it is done. + +### SSH key + +If you intend to use Encord Active OS with an Encord Annotate project you'll need to mount a volume with your SSH key as well. + +```shell +docker run -it --rm -p 8000:8000 -v ${PWD}:/data -v ${HOME}/.ssh:/root/.ssh encord/encord-active +``` + +When asked for your SSH key, you can point to `~/.ssh/`. \ No newline at end of file diff --git a/docs/active-how-to/active-oss-remove-duplicate-images.md b/docs/active-how-to/active-oss-remove-duplicate-images.md new file mode 100644 index 000000000..9c53131e2 --- /dev/null +++ b/docs/active-how-to/active-oss-remove-duplicate-images.md @@ -0,0 +1,94 @@ +--- +title: "Remove duplicate images" +slug: "active-oss-remove-duplicate-images" +hidden: true +metadata: + title: "Remove duplicate images" + description: "Enhance dataset quality: Detect & remove duplicate images with Encord Active. Mitigate bias, optimize data for models" + image: + 0: "https://files.readme.io/05b71a8-image_16.png" +createdAt: "2023-07-11T16:27:42.223Z" +updatedAt: "2023-08-09T16:11:47.793Z" +category: "6480a3981ed49107a7c6be36" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +**Enhance datasets by detecting and eliminating duplicate and near-duplicate images** + +The presence of duplicate or closely similar images can introduce bias in deep learning models. Encord Active provides the capability to identify and eliminate these duplicate or near-duplicate images from datasets. This process contributes to enhancing data quality by removing redundant instances, ultimately leading to improved model performance. + +In this workflow, the [`Image Singularity` quality metric](https://docs.encord.com/docs/active-data-quality-metrics#uniqueness) is employed to identify duplicate and near-duplicate images. + +## Image singularity (AKA Uniqueness) + +The `Image singularity` metric evaluates all images within the dataset and assigns a uniqueness score to each, indicating their distinctiveness. +- The uniqueness score falls within the [0,1] range. A higher score indicates a greater level of image uniqueness. +- A score of zero signifies the presence of at least one identical image within the dataset. For instances with _N_ duplicate images, _N-1_ of them are assigned a score of zero (with only one holding a non-zero score) to facilitate their exclusion from the dataset. +- Near-duplicate images are labeled as `Near-duplicate image` and are presented side by side in the Explorer's grid view. This setup simplifies the decision-making process when selecting which image to keep and which one to remove. + +[//]: # (TODO add this option when Description fields made by metrics are accessible in the UI) +[//]: # (- In the context of _N_ duplicate images, _N-1_ images are associated with a single representative image, visible through the `Description` field in the details of each image.) + +## Walkthrough + +Go to the _Data_ tab within the _Summary_ page and pick the `Image Singularity` quality metric from the drop-down menu in the _Metric Distribution_ section. + +![Distribution of data based on Image Singularity scores](https://storage.googleapis.com/docs-media.encord.com/static/img/active/workflows/summary-data-metric-distribution-image-singularity.png) + +The chart displays the distribution of data based on the `Image Singularity` scores. The example image illustrates a project containing around 200 duplicate images. + +Proceed to the _Explorer_ page and choose the `Image Singularity` quality metric from the _Order by_ drop-down. This menu is positioned above the natural language search bar and enables data to be organized according to the chosen criteria. + +![Ordering data by Image Singularity](https://storage.googleapis.com/docs-media.encord.com/static/img/active/workflows/explorer-order-by-image-singularity.png) + +Choose any sample and click its corresponding **Similar items** button. This action will display images similar to the selected one, including any duplicates if they exist. + +![Displaying similar images based on the similarity search query](https://storage.googleapis.com/docs-media.encord.com/static/img/active/workflows/explorer-image-similarity-search.png) + +### Removing duplicate images + +In situations where users aim to eliminate duplicate images from a dataset, they can flag these images and create a subset of the dataset devoid of duplicates. + +1. Access the _Explorer_ page and ensure that a data metric is chosen in the _Group by_ dropdown. This steps ensures that the _Explorer_'s grid view shows data items. +2. Tag all images with a data tag, such as `non-duplicate images`, by utilizing the SELECT ALL button followed by the TAG button. This operation is known as [bulk tagging](https://docs.encord.com/docs/active-tagging#bulk-tagging). Afterwards, click the CLEAR SELECTION button to reset the selection. +3. Opt for the `Image Singularity` quality metric within the FILTERS button. Adjust the range slider for this metric to cover the entire range available. This step involves the [standard filter](https://docs.encord.com/docs/active-filtering#standard-filter-feature). +4. Click the SELECT ALL button to choose all image duplicates. Then, utilize the TAG button to remove the `non-duplicate images` tag from this subset. Upon completion, click both the RESET FILTERS and CLEAR SELECTION buttons to reset the selections. As a result, the subset labeled with the `non-duplicate images` tag will now exclusively consist of images that are not duplicated. +5. Choose the `Data Tags` option within the FILTERS button. Ensure that only the `non-duplicate images` tag is selected. +6. Click the CREATE PROJECT SUBSET button and follow the provided instructions to generate a project containing exclusively non-duplicate images. + +Incorporating this workflow into dataset management strategies can significantly enhance data quality, eliminate redundancies, and contribute to more accurate model training and evaluation. + +### Removing near-duplicate images + +[block:image] +{ + "images": [ + { + "image": ["https://storage.googleapis.com/docs-media.encord.com/static/img/images/workflows/improve-your-data-and-labels/remove-duplicate-images/remove-duplicate-images-11.png", + "", + "An example of near-duplicate image pairs detected with Encord Active" + ], + "align": "center", + "caption": "An example of near-duplicate image pairs detected with Encord Active" + } + ] +} +[/block] + +Similar to duplicates, near-duplicate images are those where one image slightly differs from another due to shifts, blurriness, or distortion. Consequently, they should also be eliminated from the dataset. However, in this scenario, a decision is required to determine which sample remains and which is discarded. These images possess scores marginally greater than 0 and are displayed alongside one another in the grid view, facilitating easy comparison. + +To proceed: +1. Tag all images with a data tag, by utilizing the SELECT ALL button followed by the TAG button. Afterwards, click the CLEAR SELECTION button to reset the selection. +2. To focus on images with remarkably low uniqueness scores, opt for the `Image Singularity` quality metric within the FILTERS button and adjust the range slider for this metric to cover the range [0,0.05]. +3. Examine the images and proceed to remove the tag from images intended for exclusion from the project. +4. Follow the same export steps as outlined in the _Removing duplicate images_ section. + +[//]: # (TODO add this option when Description fields made by metrics are accessible in the UI) +[//]: # (2. Identify images with a description indicating "_Near-duplicate image_." ) + +With these actions, users can efficiently manage near-duplicate images and improve dataset quality. diff --git a/docs/active-how-to/active-remove-duplicate-images.md b/docs/active-how-to/active-remove-duplicate-images.md new file mode 100644 index 000000000..1870f6e4f --- /dev/null +++ b/docs/active-how-to/active-remove-duplicate-images.md @@ -0,0 +1,224 @@ +--- +title: "Remove duplicate images" +slug: "active-remove-duplicate-images" +hidden: false +metadata: + title: "Removing duplicate images" + description: "Enhance dataset quality: Detect and remove duplicate images with Encord Active. Mitigate bias and optimize data for models." + image: + 0: "https://files.readme.io/05b71a8-image_16.png" +createdAt: "2023-07-11T16:27:42.223Z" +updatedAt: "2023-08-09T16:11:47.793Z" +category: "6480a3981ed49107a7c6be36" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +The presence of duplicate or closely similar images can introduce bias in deep learning models. Encord Active provides the capability to identify and eliminate duplicate or near-duplicate images from datasets. This process contributes to enhancing data quality by removing redundant instances, ultimately leading to improved model performance. + +In this workflow, the [`Uniqueness` quality metric](https://docs.encord.com/docs/active-data-quality-metrics#uniqueness) is used to identify duplicate and near-duplicate images. + +## Uniqueness metric + +The `Uniqueness` metric evaluates all images within the dataset and assigns a uniqueness score to each, indicating their distinctiveness. + +- The uniqueness score falls within the [0,1] range. A higher score indicates a greater level of image uniqueness. The **Duplicates** summary on the **Data > Overview** tab uses a range between 0 and 0.0001. + +- A score of zero signifies the presence of at least one identical image within the dataset. For instances with _N_ duplicate images, _N-1_ of them are assigned a score of zero (with only one holding a non-zero score) to facilitate their exclusion from the dataset. + +- Near-duplicate images are labeled as `Near-duplicate image` and are presented side by side in the Explorer's grid view. This setup simplifies the decision-making process when selecting which image to keep and which one to remove. + + + +## Quick Tour + +All the sections in the Quick Tour assume that you are already in a Project. + +> 👍 Tip +> Choose any image in the Explorer workspace and click its _Similar items_ [!Similarity button](https://storage.googleapis.com/docs-media.encord.com/static/img/active/workflows/similarity-button.png) button. This displays images similar to the selected one, including any duplicates if they exist. + +### Explorer + +The _Explorer_ page has three areas that can help you find duplicate images in your Project. + +
+ +1: Duplicates Shortcut + +Found in the _Overview_ tab, any images that have a `Uniqueness` value of 0 to 0.0001 are highlighted as duplicates. You can adjust this value from the _Filter_ tab. + +![Duplicates shortcut](https://storage.googleapis.com/docs-media.encord.com/static/img/active/workflows/image-duplicates-qt-01.png) + +
+ +
+ +2: Sorting by `Uniqueness` + +The entire Project can be sorted by `Uniqueness`. Sort by ascending order to display duplicates first. + +![Sorting by `Uniqueness`](https://storage.googleapis.com/docs-media.encord.com/static/img/active/workflows/image-duplicates-qt-02.png) + +
+ +
+ +3: Filtering by `Uniqueness` + +Filter the entire project using `Uniqueness`. + +Go to **Filter** tab > **Add Filter** > **Data Quality Metrics** > **Uniqueness** A small histogram diagram appears above the filter. + +You can then change the filter settings to specify a range closer to 0. + +![Filtering by `Uniqueness`](https://storage.googleapis.com/docs-media.encord.com/static/img/active/workflows/image-duplicates-qt-03.png) + +
+ +### Analytics + +In a Project, go to the _Analytics_ page and pick the `Uniqueness` quality metric for the _Metric Distribution_ section. + +![Distribution of data based on Uniqueness scores]([!Duplicates shortcut](https://storage.googleapis.com/docs-media.encord.com/static/img/active/workflows/image-duplicates-qt-anal-01.png)) + +The chart displays the distribution of data based on the `Uniqueness` scores. + +## Remove duplicate images + +When you want to remove/exclude duplicate images from a dataset, tag duplicate images and create a Collection devoid of duplicates. + +
+ +To remove duplicate images from your Project: + +1. Log in to the Encord platform. + The landing page for the Encord platform appears. + +2. Click **Active** in the main menu. + The landing page for Active appears. + +3. Click the Project. + The landing page for the Project appears with the _Explorer_ tab selected with _Data_ selected. + +4. Click the _Duplicates_ shortcut under the _Overview_ tab. + The _Duplicates_ shortcut applies the `Uniqueness` filter to all images in the Project. The `Uniqueness` filter returns images with a `Uniqueness` value between 0 and 0.0001. + +5. Sort the filtered data in ascending order by `Uniqueness`. + +6. Adjust the `Uniqueness` filter from the default value to find all the duplicate images in the Project. + As you adjust the filter the images that appear in the Explorer workspace change. + +7. Select one and then all images. + +8. Click the **Add to a Collection** button to create a Collection. + +9. Click **New Collection**. + +10. Name the Collection `Duplicates`. + All selected images have the tag `Duplicates` applied to them. + +11. Reset all Filters. + +12. Add a Collections filter that excludes `Duplicates`. + +7. Select one and then all images. + +8. Click the **Add to a Collection** button to create a Collection. + +9. Click **New Collection**. + +10. Specify a meaningful name for the Collection. + +11. Go to the _Collections_ page. + +12. Select the Collection that excludes `Duplicates`. + +13. Click **Create Dataset**. + +14. Specify a meaningful name and description for the Dataset and Project. + +15. Click **Submit**. + The Dataset and Project appear in Annotate. + +Incorporating this workflow into dataset management strategies can significantly enhance data quality, eliminate redundancies, and contribute to more accurate model training and evaluation. + +
+ +## Remove near-duplicate images + +[block:image] +{ + "images": [ + { + "image": ["https://storage.googleapis.com/docs-media.encord.com/static/img/images/workflows/improve-your-data-and-labels/remove-duplicate-images/remove-duplicate-images-11.png", + "", + "An example of near-duplicate image pairs detected with Encord Active" + ], + "align": "center", + "caption": "An example of near-duplicate image pairs detected with Encord Active" + } + ] +} +[/block] + +Similar to duplicates, near-duplicate images are images where one image slightly differs from another due to shifts, blurriness, or distortion. Consequently, they should also be eliminated from the dataset. However, in this scenario, a decision is required to determine which sample remains and which is discarded. These images possess scores marginally greater than 0 and are displayed alongside one another in the Explorer grid view workspace, facilitating easy comparison. + +1. Log in to the Encord platform. + The landing page for the Encord platform appears. + +2. Click **Active** in the main menu. + The landing page for Active appears. + +3. Click the Project. + The landing page for the Project appears with the _Explorer_ tab selected with _Data_ selected. + +4. Click the _Duplicates_ shortcut under the _Overview_ tab. + The _Duplicates_ shortcut applies the `Uniqueness` filter to all images in the Project. The `Uniqueness` filter returns images with a `Uniqueness` value between 0 and 0.0001. + +5. Sort the filtered data in ascending order by `Uniqueness`. + +6. Adjust the `Uniqueness` filter from the default value to **0 to 0.05**. + +7. Examine the images in the Explorer workspace and select the images you want removed from the Project. + +8. Click the **Add to a Collection** button to create a Collection. + +9. Click **New Collection**. + + > ℹ️ Note + > If you already have a Collection called `Duplicates`, add the images to the existing Collection and go to _step 11_. + +10. Name the Collection `Duplicates`. + All selected images have the tag `Duplicates` applied to them. + +11. Reset all Filters. + +12. Add a Collections filter that excludes `Duplicates`. + +13. Select one and then all images. + +14. Click the **Add to a Collection** button to create a Collection. + +15. Click **New Collection**. + +16. Specify a meaningful name for the Collection. + +17. Go to the _Collections_ page. + +18. Select the Collection that excludes `Duplicates`. + +19. Click **Create Dataset**. + +20. Specify a meaningful name and description for the Dataset and Project. + +21. Click **Submit**. + The Dataset and Project appear in Annotate. + +With these actions, users can efficiently manage near-duplicate images and improve dataset quality. diff --git a/docs/active-import/active-import-coco-project.md b/docs/active-import/active-import-coco-project.md new file mode 100644 index 000000000..7bfd9a059 --- /dev/null +++ b/docs/active-import/active-import-coco-project.md @@ -0,0 +1,71 @@ +--- +title: "Import a COCO project" +slug: "active-import-coco-project" +hidden: false +metadata: + title: "Import a COCO project" + description: "Start a project in Encord Active: Utilize local COCO format datasets and annotations. Streamlined project creation." + image: + 0: "https://files.readme.io/675f89d-image_16.png" +createdAt: "2023-07-11T16:27:41.928Z" +updatedAt: "2023-08-11T12:43:00.436Z" +category: "65a71bbfea7a3f005192d1a7" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +**Create a project using a dataset and annotations stored in COCO format from your local file system** + +> ℹ️ Note +> Make sure you have installed Encord Active with the `coco` [extras](https://docs.encord.com/docs/active-oss-install#coco-extras). + + +If you have an existing project on your local machine in the COCO data format, you can import it using the following command: + +```shell +encord-active import project --coco -i ./images -a ./annotations.json +``` + +This command will create a new Encord Active project within a fresh folder in the current working directory. The project will include the specified images and annotations. + +> ℹ️ Note +> The input of the command above assumes the following structure, but it is not limited to it: +> +> ``` +> coco-project-foo +> ├── images +> │   ├── 00000001.jpeg +> │   ├── 00000002.jpeg +> │   ├── ... +> └── annotations.json +> ``` +> +> You have the flexibility to provide any path for each of the arguments, as long as the first argument represents a directory containing images, and the second argument is an annotations file that adheres to the [COCO data format](https://cocodataset.org/#format-data). + +Running the importer will do the following things. + +1. Create a local Encord Active project. +2. Compute all the [metrics](https://docs.encord.com/docs/active-quality-metrics). + +> ℹ️ Note +> Step 1 will by default make a hard copy of the images used in your dataset. +> **Optionally**, you can add the `--symlinks` argument +> +> ```shell +> encord-active import project --coco -i ./images -a ./annotations.json --symlinks +> ``` +> +> to tell Encord Active to use symlinks instead of copying files. But be aware that **if you later move or delete your original image files, Encord Active will stop working!** + +The whole flow might take a while depending on the size of the original dataset. When the process is done, follow the printed instructions to launch the app with the [start][ea-cli-start] CLI command. + +[//]: # (TODO show the note when the export section shows how to export data and labels to encord) +[//]: # (> ℹ️ Note) +[//]: # (> If you wish to make the project available on the Encord platform, please consult the [Export section](https://docs.encord.com/docs/active-exporting#export-to-the-encord-platform) for instructions on how to accomplish this.) + + +[ea-cli-start]: https://docs.encord.com/docs/active-cli#start \ No newline at end of file diff --git a/docs/active-import/active-import-encord-project.md b/docs/active-import/active-import-encord-project.md new file mode 100644 index 000000000..b51b7e2ca --- /dev/null +++ b/docs/active-import/active-import-encord-project.md @@ -0,0 +1,55 @@ +--- +title: "Import Encord annotation project" +slug: "active-import-encord-project" +hidden: false +metadata: + title: "Import Encord annotation project" + description: "Seamlessly import Encord Annotate projects to Encord Active. Quick transition, all-inclusive data transfer." + image: + 0: "https://files.readme.io/1fe9089-image_16.png" +createdAt: "2023-07-11T16:27:41.855Z" +updatedAt: "2023-08-09T13:03:43.944Z" +category: "65a71bbfea7a3f005192d1a7" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +**Pull projects from Encord Annotate** + +If you already have a project in Encord Annotate, you can start with its respective Encord Active project right away. + +> ℹ️ Note +> If you are new to the Encord platform, [signing up](https://app.encord.com/register) for an Encord account is quick and easy. + + +To interactively select a project from the list of available projects in Encord Annotate use the following command: + +```shell +encord-active import project +``` + +To narrow down the search for the project you wish to import, enter some text that matches the project title. Use the keyboard arrows to navigate and select the desired project, then press Enter to confirm your choice. + +Alternatively, if you prefer to override the selection process, you can use the `--project-hash` option when executing the command. + +You will get a directory containing all the data, labels, and [metrics](https://docs.encord.com/docs/active-quality-metrics) of the project. You have the option to choose whether to store the data in the local file system and can opt-in or opt-out accordingly. + +When the process is done, follow the printed instructions to launch the app with the [start][ea-cli-start] CLI command. + +If you are importing an Encord Annotate project for the first time, the Command Line Interface (CLI) will prompt you to provide the local path of a private SSH key associated with Encord. To associate an SSH key with Encord, please refer to the documentation available [here](https://docs.encord.com/docs/annotate-public-keys#set-up-public-key-authentication). The provided SSH key path will be stored for future use. + +> ℹ️ Note +> The previous command imports a project into a new folder within the current working directory. However, if a different directory needs to be specified, the `--target` option can be included as follows: +> +> ```shell +> encord-active import project --target /path/to/store/project +> ``` +> +> This will import the project in a subdirectory of `/path/to/store/project`. + + +[ea-cli-start]: https://docs.encord.com/docs/active-cli#start \ No newline at end of file diff --git a/docs/active-import/active-import-from-annotate.md b/docs/active-import/active-import-from-annotate.md new file mode 100644 index 000000000..3f92141da --- /dev/null +++ b/docs/active-import/active-import-from-annotate.md @@ -0,0 +1,229 @@ +--- +title: "Import Project from Annotate" +slug: "active-import-from-annotate" +hidden: false +metadata: + title: "Import Project from Annotate" + description: "Import Annotate Projects to Active to improve your workflows." +category: "6480a3981ed49107a7c6be36" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +[block:html] +{ + "html": "
" +} +[/block] + +Creating a Project in Annotate is a good place to start using Active. From Annotate you can configure the Dataset, Ontology, and workflow for your Project. Once that is done, move to Active to provide a streamlined dataset for your annotators. + +> ❗️ CRITICAL INFORMATION +> +> Your organization's Encord Admin (the person whose authentication key is used when setting up your Active deployment on Encord) must be added as an **Admin** to your Annotate Projects, for Annotate Projects to be imported into Active. If the Encord Admin is not added as an **Admin** to your Annotate Project, the Annotate Project does not appear in the Project list when you click the **Import Annotate Project** button in Active. + +> ❗️ CRITICAL INFORMATION +> +> We strongly recommend only importing Annotate Projects that use Workflows. The current feature set in Active is optimized to work with Annotate Projects that use Workflows. While Annotate Projects that use Manual QA can be imported into Active, there are a number of features that Manual QA projects do not support. + + + +## Data Import Behavior + +Active imports your data in stages. This allows you to start working with your data as quickly as possible without delay. Active imports data in the following stages: + +| Stage | Import | Description | +| :------ | :-------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------ | +| 1 | Data import | All images/video frames and all Metadata. This takes just a few minutes. | +| 2 | Label import | All labels/annotations (instance and frame level) on the images/video frames. Label import time depends on the number of labels for import. | +| 3 | Metric calculation | Calculations to unlock filtering and sorting by all quality metrics, and Overview shortcuts. | +| 4 | Embedding calculation | Calculations to unlock Embeddings and similiarity search. | +| 5 | Advanced metric calculation | Calculations to unlock embedding reductions (view and filter) and the more complex metrics. | + + +### Stage 1: Data Import + +Import images / video frames, and metadata from your Annotate Project into Active in just a few minutes! + +Once Stage 1 completes you can: + +- View all of the images/video frames in your Project on the _Explorer_ page + +- Filter your images/video frames by metadata + +- Create Collections based on your data + +- Send to Annotate: + + - Send priorities and add and send comments + + - Create Datasets and Projects + +You must WAIT for a later stage to: + +- View labels/annotations + +- Filter by anything except by metadata + +- Use [Overview shortcuts](https://docs.encord.com/docs/active-issue-shortcuts-prediction-types) + +- Sort your data using any criteria except Random + +- Search your data using natural language and image search + +- Use Embeddings + +- Access the _Analytics_ page + +- Access the _Model Evaluation_ page + + +### Stage 2: Label Import + +During Stage 2, all image and video frame labels/annotations (instance and frame-level), are imported into your Project. More labels take longer to import. + +Once Stage 2 completes you can: + +- Perform all tasks from the previous stages + +- View all labels/annotations on your data in the _Explorer_ page + +- Filter all data by Class + +- Access to the Label view on the _Explorer_ page + +- Send to Annotate: + + - Reopen tasks + + - Perform bulk Classification + +You must WAIT for a later stage to: + +- Filter by anything except by metadata and class + +- Use [Overview shortcuts](https://docs.encord.com/docs/active-issue-shortcuts-prediction-types) + +- Sort your data using any criteria except Random + +- Search your data using natural language and image search + +- Use Embeddings + +- Access the _Analytics_ page + +- Access the _Model Evaluation_ page + +### Stage 3: Metric Calculation + +During Stage 3, Active performs metric calculations for flitering and Overview shortcuts. + +Once Stage 3 completes you can: + +- Perform all tasks from the previous stages + +- Filter your data using any criteria + +- Use [Overview shortcuts](https://docs.encord.com/docs/active-issue-shortcuts-prediction-types) + +You must WAIT for a later stage to: + +- Sort your data using any criteria except Random + +- Search your data using natural language and image search + +- Use Embeddings + +- Access the _Analytics_ Page + +- Access the _Model Evaluation_ page + + +### Stage 4: Embedding calculation + +During Stage 4, Active performs calculations for embeddings and search capabilities. + + +Once Stage 4 completes you can: + +- Perform all tasks from the previous stages + +- Similarity search for images and frames + +- Natural language and image searches + +- Use Embeddings + +- Sort your data using any criteria + + +### Stage 5: Advanced Metric Calculation + +Stage 5 completes the import and unlocks all other remaining features for Active. For example, embedding reductions, metrics that depend on embeddings, and filtering on embedding reductions. + + +## Import an Annotate Project + +
+ +To import an Annotate Project: + +1. Log in to the Encord platform. + The landing page for the Encord platform appears. + +2. Create a Project ([Annotation Project](https://docs.encord.com/docs/annotate-annotation-projects) or [Training Project](https://docs.encord.com/docs/annotate-training-projects)) in Encord Annotate. + +3. Click **Active** from the main menu. + The landing page for Active appears. + +4. Click the **Import Annotate Project** button. + The _Select an Annotation Project_ dialog appears. + ![Start import](https://storage.googleapis.com/docs-media.encord.com/static/img/active/user-guide/import-project_01.png) + +5. Click the **Import** button for the Annotate project you want to import. + The _Confirm Project Import_ dialog appears. The dialog provides an estimate on how long the import may take. + ![Import info](https://storage.googleapis.com/docs-media.encord.com/static/img/active/user-guide/import-project_02.png) + +6. Specify the **Sample rate** (in FPS) for the import of videos. + + > ℹ️ Note + > Select **None** imports the entire video, without modification to the FPS of the video. + +7. Click **Confirm**. + ![Confirm import](https://storage.googleapis.com/docs-media.encord.com/static/img/active/user-guide/import-project_03.png) + +8. Close the _Select an Annotation Project_ dialog. + The landing page for Active appears with progress of the import for the project. Stage 1 (Data import) of the Project import completes in a few minutes. + +9. Click the Project to monitor the status of the import. + The landing page for the Project appears with the _Explorer_ tab selected and displaying the stage of the import. + ![Import stages](https://storage.googleapis.com/docs-media.encord.com/static/img/active/user-guide/import-project_04.png) + + ![Import stages progress](https://storage.googleapis.com/docs-media.encord.com/static/img/active/user-guide/import-project_05.png) + +Once all five stages of the import complete, you are ready to [filter, sort, and search data/labels](https://docs.encord.com/docs/active-filtering), [create collections](https://docs.encord.com/docs/active-collections) and optimize your model performance. + +
+ + +## Next Steps + +### Data Cleansing/Curation and Label Correction/Validation + +[block:html] +{ + "html": "\n\n\n \n \n Clickable Div\n \n\n\n 2. Create Collection 3. Send to Annotate 4. Sync with Annotate 5. Update Collection\n\n" +} +[/block] + +### Model and Prediction Validation + +[block:html] +{ + "html": "\n\n\n \n \n Clickable Div\n \n\n\n 2. Import Predictions 3. Review Prediction Metrics 4. Create Collection 5. Send to Annotate 6. Sync with Annotate 7. Update Collection\n\n" +} +[/block] \ No newline at end of file diff --git a/docs/active-import/active-import-model-predictions.md b/docs/active-import/active-import-model-predictions.md new file mode 100644 index 000000000..145ad9f3d --- /dev/null +++ b/docs/active-import/active-import-model-predictions.md @@ -0,0 +1,340 @@ +--- +title: "Import model predictions" +slug: "active-import-model-predictions" +hidden: true +metadata: + title: "Import model predictions" + description: "Enhance Encord Active with model predictions. Integrate for visualizations, evaluation, labeling insights. Boost system performance." + image: + 0: "https://files.readme.io/091496f-image_16.png" +createdAt: "2023-07-14T16:16:03.504Z" +updatedAt: "2023-08-11T13:41:50.171Z" +category: "65a71bbfea7a3f005192d1a7" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +**Incorporate model predictions into Encord Active** + +By incorporating machine learning model predictions, Encord Active expands its capabilities to provide visualizations, model evaluation, identification of failure modes, error detection in labeling, prioritization of high-value data for relabeling, and other valuable insights, enhancing the overall performance of the system. + +> ℹ️ Note +> If you possess predictions in the [COCO Results format](https://cocodataset.org/#format-results), you can conveniently navigate to the [Import COCO predictions](#coco-predictions) subsection. + + +To import your model predictions into Encord Active you need to: + +1. [Cover the basics](#covering-the-basics). +2. [Prepare a `.pkl` file to be imported](#prepare-a-pkl-file-to-be-imported). +3. [Import the predictions via the CLI](#import-the-predictions-via-the-cli). + +If you are familiar with [`data_hash`](#uniquely-identifying-data-units) and [`featureNodeHash`](#uniquely-identifying-predicted-classes), you can safely skip to [2. Prepare a `.pkl` File to be Imported](#prepare-a-pkl-file-to-be-imported). Please note that when specifying the `class_id` for a prediction, Encord Active expects the associated `featureNodeHash` from the Encord <> as id. + +> 👍 Tip +> In the [SDK section](https://docs.encord.com/docs/active-sdk-import-predictions), you will also find ways to import predictions [KITTI files](https://docs.encord.com/docs/active-sdk-import-predictions#kitti) and [directories containing mask files](https://docs.encord.com/docs/active-sdk-import-predictions#instance-segmentation-masks). + + +## Covering the basics + +Before diving into the details, there are a couple of things that need to be covered. + +> ℹ️ Note +> All commands used from this point onward assume that the current working directory is the project folder. If it isn't, please either navigate to it or utilize the `--target` option available in each command. + + +### Uniquely identifying data units + +At Encord, every <> has a `data_hash` which uniquely defines it. To view the mapping between the `data_hash` values in your Encord project and the corresponding filenames that were uploaded, execute the following CLI command: + +```shell +encord-active print data-mapping +``` + +Once you have selected the project for which you want to generate the mapping, it will display a JSON object resembling the following structure, consisting of key-value pairs (`data_hash`, `data_file_name`): + +``` +{ + "c115344f-6869-4608-a4b8-644241fea10c": "image_1.jpg", + "5973f5b6-d284-4a71-9e7e-6576aa3e56cb": "image_2.jpg", + "9f4dae86-cad4-42f8-bb47-b179eb2e4886": "video_1.mp4" + ... +} + +``` + +> 👍 Tip +> To store the data mapping as `data_mapping.json` in the current working directory, run: +> +> ```shell +> encord-active print --json data-mapping +> ``` + +Please note that in the case of image groups, each individual image within the group has its own unique `data_hash`, whereas videos have a single `data_hash` representing the entire video. As a consequence, predictions for videos will also need a `frame` to uniquely define where the prediction belongs. + +> 🚧 Caution +> When you are preparing predictions for import, you need to have the `data_hash` and potentially the `frame` available. + + +### Uniquely identifying predicted classes + +The second thing you will need during the preparation of predictions for import, is the `class_id` for each prediction. The `class_id` tells Encord Active which class the prediction is associated with. + +The `class_id` values in an Encord project are determined by the `featureNodeHash` attribute associated with labels in the Encord <>. You can conveniently print the class names and corresponding `class_id` values of your project ontology via the CLI: + +```shell +encord-active print ontology +``` + +Once you have selected the project for which you want to generate the mapping, it will display a JSON object resembling the following structure, consisting of key-value pairs (`label_name`, `class_id`): + +```json +{ + "objects": { + "cat": "OTK8MrM3", + "dog": "Nr52O8Ex", + "horse": "MjkXn2Mx" + }, + "classifications": {...} +} +``` + +As classifications with nested and/or checklist attributes (e.g. `has a dog? yes/no -> explain why?`) are represented once for each attribute answer, it's necessary to uniquely identify each <> and corresponding answer. This requires utilizing the respective <>, attribute and option hashes from the <>. + +```json +{ + "objects": {...}, + "classifications": { + "horses": { + "feature_hash": "55eab8b3", + "attribute_hash": "d446851e", + "option_hash": "376b9761" + }, + "cats": { + "feature_hash": "55eab8b3", + "attribute_hash": "d446851e", + "option_hash": "d8e85460" + }, + "dogs": { + "feature_hash": "55eab8b3", + "attribute_hash": "d446851e", + "option_hash": "e5264a59" + } + } +} +``` + +> 👍 Tip +> To store the ontology as `ontology_output.json` in the current working directory, run: +> +> ```shell +> encord-active print --json ontology +> ``` + + +## Prepare a `.pkl` file to be imported + +Now, you can prepare a pickle file (`.pkl`) to be imported by Encord Active. You can do this by building a list of `Prediction` objects. A prediction object holds a unique identifier of the <> (the `data_hash` and potentially `frame`), the `class_id`, a model `confidence` score, the actual prediction `data`, and the `format` of that data. + +### Creating a `Prediction` label + +Below are examples illustrating how to create a label. Click a section to expose the details for each of the four supported types. + +
+Classification + +```python +prediction = Prediction( + data_hash="", + frame = 3, # optional frame for videos + confidence = 0.8, + classification=FrameClassification( + feature_hash="", + # highlight-start + attribute_hash="", + option_hash="", + # highlight-end + ), +) +``` + +> 👍 Tip +> To find the three hashes, we can inspect the ontology by running: +> +> ```shell +> encord-active print ontology +> ``` +
+ + +
+Bounding Box + +You should specify your `BoundingBox` with relative coordinates and dimensions. +That is: + +- `x`: x-coordinate of the top-left corner of the box divided by the image width +- `y`: y-coordinate of the top-left corner of the box divided by the image height +- `w`: box pixel width / image width +- `h`: box pixel height / image height + +```python +from encord_active.lib.db.predictions import BoundingBox, Prediction, Format, ObjectDetection + +prediction = Prediction( + data_hash="", + frame = 3, # optional frame for videos + confidence = 0.8, + object=ObjectDetection( + feature_hash="", + # highlight-start + format = Format.BOUNDING_BOX, + data = BoundingBox(x=0.2, y=0.3, w=0.1, h=0.4), + # highlight-end + ), +) +``` + +> 👍 Tip +> If you don't have your <> represented in relative terms, you can convert it from pixel values like this: +> +> ```python +> img_h, img_w = 720, 1280 # the image size in pixels +> BoundingBox(x=10/img_w, y=25/img_h, w=200/img_w, h=150/img_h) +> ``` + +
+ + +
+Segmentation mask + +You should specify masks as binary `numpy` arrays of size $height \times width$ and `dtype=np.uint8`. + +```python +from encord_active.lib.db.predictions import Prediction, Format + +prediction = Prediction( + data_hash = "", + frame = 3, # optional frame for videos + confidence = 0.8, + object=ObjectDetection( + feature_hash="", + # highlight-start + format = Format.MASK, + data = mask + # highlight-end + ), +) +``` + +
+ +
+Polygon + +You should specify your `Polygon` with relative coordinates as a `numpy` array of shape $num\_points \times 2$. That is, an array of relative (`x`, `y`) coordinates: + +- `x`: relative x-coordinate of each point of the polygon (pixel coordinate / image width). +- `y`: relative y-coordinate of each point of the polygon (pixel coordinate / image height). + +```python +from encord_active.lib.db.predictions import Prediction, Format +import numpy as np + +polygon = np.array([ + # x y + [0.2, 0.1], + [0.2, 0.4], + [0.3, 0.4], + [0.3, 0.1], +]) + +prediction = Prediction( + data_hash = "", + frame = 3, # optional frame for videos + confidence = 0.8, + object=ObjectDetection( + feature_hash="", + # highlight-start + format = Format.POLYGON, + data = polygon + # highlight-end + ), +) +``` + +> 👍 Tip +> If you have your <> represented in absolute terms of pixel locations, you can convert it to relative terms like this: +> +> ```python +> img_h, img_w = 720, 1280 # the image size in pixels +> polygon = polygon / np.array([[img_w, img_h]]) +> ``` + +Notice the double braces `[[img_w, img_h]]` to get an array of shape `[1, 2]`. + +
+ +### Creating the pickle file + +Now you are ready to create the pickle file. You can select the appropriate snippet based on your prediction format from above and paste it in the code below. + +Pay attention to the highlighted line, as it specifies the location where the `.pkl` file will be stored. + +```python showLineNumbers +import pickle +from encord_active.lib.db.predictions import Prediction, Format + +predictions_to_store = [] + +for prediction in my_predictions: # Iterate over your predictions + predictions_to_store.append( + # PASTE appropriate prediction snippet from above + ) + +# highlight-next-line +with open("/path/to/predictions.pkl", "wb") as f: + pickle.dump(predictions_to_store, f) +``` + +In the above code snippet, you will have to fetch `data_hash`, `class_id`, etc., from the for loop in line 5. + +## Import the predictions via the CLI + +To import the predictions into Encord Active, execute the following command in the CLI: + +```shell +encord-active import predictions /path/to/predictions.pkl +``` + +This will import your predictions into Encord Active and run all the [metrics](https://docs.encord.com/docs/active-quality-metrics) on your predictions. + +## Easy imports + +Encord Active streamlines the import of well-known model prediction formats allowing for easy integration of diverse model types into the system. + +The following subsections outline simplified methods to import popular formats, bypassing the previous 3-step process. + +### COCO predictions + +> ℹ️ Note +> Make sure you have installed Encord Active with the `coco` [extras](https://docs.encord.com/docs/active-oss-install#coco-extras). +> +> This command assumes that you have imported your project using the [COCO importer](https://docs.encord.com/docs/active-cli#project) and that the current working directory is the project folder. + +Importing COCO predictions is currently the easiest way to import predictions into Encord Active. + +You need to have a results JSON file following the [COCO results format](https://cocodataset.org/#format-results) and run the following command on it: + +```shell +encord-active import predictions --coco results.json +``` + +> ℹ️ Note +> Make sure that the annotation coordinates in the COCO results file are not normalized (not scaled into [0-1]). + +After the execution is done, you are ready to [evaluate your model performance](https://docs.encord.com/docs/active-evaluate-detection-models). \ No newline at end of file diff --git a/docs/active-import/active-quick-import.md b/docs/active-import/active-quick-import.md new file mode 100644 index 000000000..bbe378781 --- /dev/null +++ b/docs/active-import/active-quick-import.md @@ -0,0 +1,105 @@ +--- +title: "Quick import data & labels" +slug: "active-quick-import" +hidden: false +createdAt: "2023-07-21T14:06:06.472Z" +updatedAt: "2023-07-31T11:23:32.263Z" +category: "65a71bbfea7a3f005192d1a7" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +--- + +**Create a project using images from a dataset directory of your choice** + + +If you already have an image dataset stored locally, you can initialize a project from that dataset using the `init` command. This command will automatically execute all the built-in metrics on your data, setting up the project accordingly. + +The main argument is the path to the local dataset directory. + +```shell +encord-active init /path/to/dataset +``` + +> 👍 Tip +> To simulate the creation of a project without actually performing any action, use the `--dryrun` option. +> +> ```shell +> encord-active init --dryrun /path/to/dataset +> ``` +> +> This option provides a detailed list of all the files that would be included in the project, along with a summary. It allows you to verify the project content and ensure that everything is set up correctly before proceeding. + +There are various options available to customize the initialization of your project according to your specific requirements. For a comprehensive list of these options, please refer to the [Command Line Interface](https://docs.encord.com/docs/active-cli#init) (CLI) documentation. + +## Including labels + +If you want to include labels as well, this is also an option. To do so, you'll have to define how to parse your labels by implementing the [`LabelTransformer`](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/labels/label_transformer.py#L61-L79) interface. + +```python +from pathlib import Path +from typing import List + +from encord_active.lib.labels.label_transformer import ( + BoundingBoxLabel, + ClassificationLabel, + DataLabel, + LabelTransformer, + PolygonLabel +) + + +class MyTransformer(LabelTransformer): + def from_custom_labels(self, label_files: List[Path], data_files: List[Path]) -> List[DataLabel]: + # your implementation goes here + ... +``` + +Here is an example of inferring classifications from the file structure of the images. +Let's say you have your images stored in the following structure: + +``` +/path/to/data_root +├── cat +│   ├── 0.jpg +│   └── ... +├── dog +│   ├── 0.jpg +│   └── ... +└── horse +    ├── 0.jpg +    └── ... +``` + +Your implementation would look similar to: + +```python +# classification_transformer.py +from pathlib import Path +from typing import List + +from encord_active.lib.labels.label_transformer import ( + ClassificationLabel, + DataLabel, + LabelTransformer, +) + + +class ClassificationTransformer(LabelTransformer): + def from_custom_labels(self, _, data_files: List[Path]) -> List[DataLabel]: + return [DataLabel(f, ClassificationLabel(class_=f.parent.name)) for f in data_files] +``` + +And the CLI command: + +```shell +encord-active init --transformer classification_transformer.py /path/to/data_root +``` + +> 👍 Tip +> More concrete examples for bounding boxes, polygons and other label types are included in our [example directory](https://github.com/encord-team/encord-active/blob/main/examples/label-transformers) on GitHub. \ No newline at end of file diff --git a/docs/active-learning.md b/docs/active-learning.md new file mode 100644 index 000000000..031ea6c22 --- /dev/null +++ b/docs/active-learning.md @@ -0,0 +1,33 @@ +--- +title: "Active learning" +slug: "active-learning" +hidden: true +metadata: + title: "Active Learning" + description: "Optimize model training with Encord Active's active learning tools. Efficient labeling, improved accuracy. Learn more about active learning." + image: + 0: "https://files.readme.io/ae0744a-image_16.png" +createdAt: "2023-07-11T17:05:59.696Z" +updatedAt: "2023-08-09T14:53:21.123Z" +category: "6480a3981ed49107a7c6be36" +--- +The annotation process can sometimes be extensively time-consuming and expensive. While images and videos can often be scraped, or even taken automatically, labeling for tasks like segmentation and motion detection is laborious. Some domains, such as medical imaging, require domain knowledge from experts with limited accessibility. + +When the unlabeled data is abundant, wouldn't it be nice if you could pick out the 5% of samples most useful to your model, rather than labeling large swathes of redundant data points? This is the idea behind active learning. + +**Encord Active** provides you with the tools to take advantage of the active learning method - and is integrated with **Encord Annotate** to deliver the best annotation experience. + +If you are already familiar with the active learning foundation, continue your read with an exploration of [Encord Active's acquisition functions](https://docs.encord.com/docs/active-model-quality-metrics#acquisition-functions) and our end-to-end tutorial about [easy active learning on MNIST](https://docs.encord.com/docs/active-easy-active-learning-mnist). + +## What is active learning? + +Active learning is an iterative process where a [machine learning model](https://encord.com/blog/introduction-to-building-your-first-machine-learning) is used to select the best examples to be labeled next. After annotation, the model is retrained on the new, larger dataset, then selects more data to be labeled until reaching a stopping criterion. This process is illustrated in the figure below. + +
+ +![active-learning-cycle.svg](https://storage.cloud.google.com/docs-media.encord.com/static/img/images/active-learning/active-learning-cycle.svg) + +
+ + +Check out our [practical guide to active learning for computer vision](https://encord.com/blog/a-practical-guide-to-active-learning-for-computer-vision/) to learn more about active learning, its tradeoffs, alternatives and a comprehensive explanation on active learning pipelines. \ No newline at end of file diff --git a/docs/active-learning/active-acquisition-functions.md b/docs/active-learning/active-acquisition-functions.md new file mode 100644 index 000000000..2c52fa667 --- /dev/null +++ b/docs/active-learning/active-acquisition-functions.md @@ -0,0 +1,107 @@ +--- +title: "Acquisition functions" +slug: "active-acquisition-functions" +hidden: true +metadata: + title: "Acquisition functions" + description: "Enhance model training with Encord Active's diverse acquisition functions: uncertainty-based & diversity-based strategies. Optimize active learning." + image: + 0: "https://files.readme.io/b6ed5cd-image_16.png" +createdAt: "2023-08-01T15:17:23.578Z" +updatedAt: "2023-08-09T11:44:29.536Z" +category: "6480a3981ed49107a7c6be36" +--- + +We want you to select the data samples that will be the most informative to your model, so a natural approach would be to score each sample based on its predicted usefulness for training. Since labeling samples is usually done in batches, you could take the top _k_ scoring samples for annotation. This type of function, that takes an unlabeled data sample and outputs its score, is called _acquisition function_. + +## Uncertainty-based acquisition functions + +In **Encord Active**, we employ the _uncertainty sampling_ strategy where we score data samples based on the uncertainty of the model predictions. The assumption is that samples the model is unconfident about are likely to be more informative than samples for which the model is very confident about the label. + +We include the following uncertainty-based acquisition functions: + +- [Least confidence](https://docs.encord.com/docs/active-model-quality-metrics#least-confidence) + + + +- [Margin](https://docs.encord.com/docs/active-model-quality-metrics#margin) + + , Where and are the first and second-highest predicted labels. + +- [Variance](https://docs.encord.com/docs/active-model-quality-metrics#variance) + + + +- [Entropy](https://docs.encord.com/docs/active-model-quality-metrics#entropy) + + + +- [Mean object confidence](https://docs.encord.com/docs/active-model-quality-metrics#mean-object-confidence) + + + + + +> 👍 Tip +> Follow the links provided for each acquisition function for detailed explanations of each, including alternative formulas, guidance on interpreting the output scores, and its implementation in GitHub. + + +> 🚧 Caution +> On the following scenarios, uncertainty-based acquisition functions must be used with extra care: +> - Softmax outputs from deep networks are often not calibrated and tend to be quite overconfident. +> - For convolutional neural networks, small, seemingly meaningless perturbations in the input space can completely change predictions. + + +## Diversity-based acquisition functions + +Diversity sampling is an active learning strategy that aims to ensure that the labeled training set includes a broad range of examples from across the input space. The underlying assumption is that a diverse set of training examples will allow the model to learn a more generalized representation, improving its performance on unseen data. + +In contrast to uncertainty-based methods, which prioritize examples that the model finds difficult to classify, diversity-based methods prioritize examples based on their novelty or dissimilarity to examples that are already in the training set. This can be particularly useful when the input space is large and the distribution of examples is uneven. + +We include the following diversity-based acquisition function: + +- [Image Diversity](https://docs.encord.com/docs/active-data-quality-metrics#image-diversity) + +This metric clusters the images according to number of classes in the <> file. Then, it chooses samples from each cluster one-by-one to form an equal number of samples from each cluster. Samples are chosen according to their proximity to cluster centroids (closer samples will be chosen first). + +> 👍 Tip +> Diversity-based acquisition functions are generally easier to use compared to the uncertainty-based functions because they may not require any ML model. See the [Running Diversity Based Acquisition Function on Unlabeled Data](https://docs.encord.com/docs/active-diversity-sampling-on-unlabeled-data) tutorial to learn how to use them in you project easily. + + +## Which acquisition function should I use? + +_“Ok, I have this list of acquisition functions now, but which one is the best? How do I choose?”_ + +This isn’t a question for which there is an easy answer - it heavily depends on your problem, your data, your model, your labeling budget, your goals, etc. This choice can be crucial to your results and comparing multiple acquisition functions during the active learning process is not always feasible. + +Simple uncertainty measures like least confident score, margin score and entropy make good first considerations. + +> 👍 Tip +> If you’d like to talk to an expert on the topic, the Encord ML team can be found in the #general channel in our Encord Active [Slack community](https://join.slack.com/t/encordactive/shared_invite/zt-1hc2vqur9-Fzj1EEAHoqu91sZ0CX0A7Q). + +## How can I utilize the chosen acquisition function? + +Explore the [Easy Active Learning on MNIST](https://docs.encord.com/docs/active-easy-active-learning-mnist) tutorial for a quick overview using a well-known example dataset. + +### Tutorials + +[Easy Active Learning on MNIST](https://docs.encord.com/docs/active-easy-active-learning-mnist): A quick overview of the acquisition functions using a well-known example dataset. +[Diversity sampling without an ML model](https://docs.encord.com/docs/active-diversity-sampling-on-unlabeled-data): Using diversity sampling to rank images without training any model. +[Selecting hard samples for object detection](https://github.com/encord-team/encord-active/blob/main/examples/active%20learning/object-detection/select-hard-samples-to-annotate.ipynb): A jupyter notebook to run acquisition functions using an object detector. diff --git a/docs/active-learning/active-in-encord.md b/docs/active-learning/active-in-encord.md new file mode 100644 index 000000000..c7ef4072b --- /dev/null +++ b/docs/active-learning/active-in-encord.md @@ -0,0 +1,92 @@ +--- +title: "In Encord" +slug: "active-in-encord" +hidden: true +metadata: + title: "In Encord" + description: "Set up active learning process in Encord. Streamline workflow stages: Initialization, high-value data prioritization, model training. Optimize learning." + image: + 0: "https://files.readme.io/657055d-image_16.png" +createdAt: "2023-07-11T16:27:41.851Z" +updatedAt: "2023-08-09T13:52:08.839Z" +category: "6480a3981ed49107a7c6be36" +--- +**Learn how to set up the components of your active learning process in Encord** + +> ℹ️ Note +> Active learning workflows in the Encord platform are specifically designed for [workflow projects](https://docs.encord.com/docs/annotate-annotation-projects). This requirement allows for seamless task movement between essential stages such as `label`, `review` and `complete` when utilizing the SDK. + + +Active learning workflows in Encord Active share the following key stages: + +1. [Initialization](#initialization). +2. [Prioritizing high-value data to label](#prioritize-high-value-data-to-label). +3. [Model training and update](#model-training-and-update). + +If you prefer to witness an active learning workflow in action, please take a look at the end-to-end tutorial for [MNIST](https://docs.encord.com/docs/active-easy-active-learning-mnist). + +## Initialization +To start an active learning workflow, you need an initial labeled dataset for training the machine learning model. In the Encord platform, this corresponds to having a project with annotations. + +If you don't have any projects yet, you can watch the tutorial video on setting up a [workflow project](https://docs.encord.com/docs/annotate-workflows-and-templates) to get started quickly. + +### Choose an Encord project + +To proceed, you should pull the project into Encord Active. Execute the following CLI command and remember to acknowledge that you would like to include uninitialized label rows, as they represent unannotated data. + +```bash +encord-active import project +``` + +If you require detailed information on the options available during the import process, you can refer to the [Import from Encord platform](https://docs.encord.com/docs/active-import-encord-project) guide. + +If your workflow project already contains annotations, you can proceed directly to [Model training and update](#model-training-and-update). + +[//]: # (Alternatively, if you have a trained model that you intend to use in the active learning workflow, you can jump to the [Querying](#querying) step.) + +## Prioritize high value data to label + +If your project does not have any annotations or you are seeking the most appropriate data for labeling, it's essential to score and rank your data. +While random selection is a possibility, Encord Active provides metrics such as [`Image Diversity`](https://docs.encord.com/docs/active-data-quality-metrics#image-diversity) to enhance and optimize annotation impact. +This metric ranks images based on their ease of annotation, enabling prioritization of suitable and manageable data. + + +> 👍 Tip +> Check out the [quality metrics page](https://docs.encord.com/docs/active-quality-metrics) for a comprehensive overview of available metrics in Encord Active, including the [acquisition functions](https://docs.encord.com/docs/active-acquisition-functions) used for sample selection. + + +For example, you can follow these steps to prioritize labeling for data with the lowest `Image Diversity` score using the UI: +1. In the _Data Quality_ explorer page, navigate to the toolbox and click on the _Filter_ tab. +2. Select the option that correspond to the first labeling stage (usually named `Annotate 1`) under the `Workflow Stage` metadata filter to pick the unannotated data. +3. Add the `Image Diversity` filter and adjust the slider to select a subset of data with the lowest score. +4. Access the _Action_ tab in the toolbox. +5. Click on the 🖋 Relabel button and follow the instructions to [prioritize labeling](https://docs.encord.com/docs/active-relabeling) for the selected data. + + +[//]: # (todo remove this info section once Encord Annotate provides task prioritization to the public) +> ℹ️ Note +> Task prioritization for labeling is currently a closed-beta feature in the Encord platform. To learn more about this feature, please reach out to us on [Slack][slack-join] or via [email](mailto:active@encord.com). +> +> Nevertheless, to mimic the behavior of task prioritization in projects with only single images, you can follow these steps: +> 1. In the _Filter_ tab, select the option that correspond to the first labeling stage (usually named `Annotate 1`) under the `Workflow Stage` metadata filter to pick the data ready to be labeled. +> 2. Use the [bulk tagging](https://docs.encord.com/docs/active-tagging#bulk-tagging) feature to mark them with a data tag, such as `unlabeled`. +> 3. Add the `Image Diversity` filter and adjust the slider to select a subset of filtered data with the lowest score. +> 4. Use the bulk tagging feature to mark this further selection with a data tag, such as `to label next`. +> 5. Reset the filters and choose the `unlabeled` tag option under `Tags`. +> 6. Access the _Action tab_ in the toolbox and click on the ✅ Mark as Complete button and follow the instructions to temporarily move all the selected data to the workflow's `Complete` stage. +> 7. Return to the _Filter_ tab, reset the filters and choose the `to label next` tag option under `Tags`. +> 8. Access the _Action_ tab in the toolbox again, click on the 🖋 Relabel button and follow the instructions to move the selected data to the workflow's first annotation stage. +> 9. Once the selected data has been labeled, use the following filter combination to bring back the remaining data from the `Complete` stage to the first labeling stage as in step (8): +> Select the `No class` option under `Object Class` and choose the proper tag name (e.g. `unlabeled`) option under `Tags`. +> +> By following these steps, you can ensure that the first labeling stage contains only the prioritized data for labeling, and the task states align at the end with the flow that utilizes the task prioritization feature. + + +## Model training and update + +In the active learning workflow, model training plays a crucial role. It involves training a machine learning model using the initial labeled dataset and iteratively updating it with newly labeled data. Encord Active provides support for a wide range of models by allowing you to plug in your own model and interface with it using convenient wrappers. + +More information can be found in this [here](https://docs.encord.com/docs/active-easy-active-learning-mnist#model-training). + + +[slack-join]: https://join.slack.com/t/encordactive/shared_invite/zt-1hc2vqur9-Fzj1EEAHoqu91sZ0CX0A7Q diff --git a/docs/active-os-faq.md b/docs/active-os-faq.md new file mode 100644 index 000000000..b15bf9db0 --- /dev/null +++ b/docs/active-os-faq.md @@ -0,0 +1,232 @@ +--- +title: "Frequently asked questions" +slug: "active-os-faq" +hidden: false +metadata: + title: "Frequently asked questions" + description: "FAQs: Get answers to common queries on Encord Active OS. Streamline information access." +category: "65a71bbfea7a3f005192d1a7" +--- + +## What is the difference between Encord Active Cloud and Encord Active OS? + +Active Cloud is tightly integrated with Encord Annotate, with Active Cloud and Annotate being hosted by Encord. Encord Active OS is an open source toolkit that can be installed on a local computer/server. + +Encord Active Cloud and Encord Active OS (open source) are active learning solutions that help you find failure modes in your models, and improve your data quality and model performance. + +Use Active Cloud and Active OS to visualize your data, evaluate your models, surface model failure modes, find labeling mistakes, prioritize high-value data for relabeling and more! + +--- + +## Are Annotate and Active Projects the same? + +The short answer is no. Here are the differences: + +- Annotate Projects are made of Datasets (data), Ontologies, and Workflows. Once Annotation Projects get underway, labels and comments also become part of the Annotate Projects. + +If you encounter any issues during the installation process, we recommend checking that you have followed the steps outlined in the [installation guide](https://docs.encord.com/docs/active-oss-install) carefully. + +- Active Cloud is integrated with the Encord Plaform, which by extension means Active Cloud is also integrated with Annotate. Active Cloud Projects import portions of Annotate Projects. Active Cloud projects import Annotate Project data, Ontologies, labels, and comments but not Workflows. + +- Active OS is not integrated with the Encord Platform, but still contains some portions of Annotate Projects. Active OS Projects consist of Annotate data and labels. + +--- + +## What is a quality metric? + +A quality metric is a function that can be applied to your data, labels, and model predictions to assess their quality and rank them accordingly. + +Encord Active uses these metrics to analyze and decompose your data, labels, and predictions. + +Here is a [blog](https://encord.com/blog/ai-production-gap/) post on how we like to think about quality metrics. + +To learn more about [quality metrics, go here](https://docs.encord.com/docs/active-quality-metrics). + +Quality metrics are not only limited to those that ship with Encord Active. In fact, the power lies in defining your own quality metrics for indexing your data just right. [Here](https://docs.encord.com/docs/active-write-custom-quality-metrics) is the documentation for writing your own metrics. + +--- + +## How do I write my own quality metrics? + +[See our documentation](https://docs.encord.com/docs/active-write-custom-quality-metrics) on writing your own metrics. + +--- + +## How do I use Encord Active for active learning? + +Encord Active supports the active learning process by allowing you to + +1. Explore your data to select what to label next +2. Employ acquisition functions to automatically select what to label next +3. Find label errors that potentially harm your model performance +4. Sending data to Encord's Annotation module for labeling +5. Automatically decompose your model performance to help you determine where to put your focus for the next model iteration +6. Tag subsets of data to set aside test sets for specific edge cases for which you want to maintain your model performance between each production model + +For detailed information on active learning and the role of Encord Active, you can refer to our documentation on [Active Learning within Encord Active](https://docs.encord.com/docs/active-learning). + +--- + +[//]: # (TODO open this question with an answer to a feature that exists) +[//]: # (## How do I upload my data and labels to Encord Annotate?) + +[//]: # () +[//]: # (Uploading your data to Encord Annotate is as simple as clicking the _Export to Encord_ button on the [Filter and Export](https://docs.encord.com/docs/active-exporting) page. This will create an ontology, a dataset, and a project on Encord Annotate as well as provide you with links to all three.) + +[//]: # () +[//]: # () +[//]: # (This action requires an ssh-key associated with Encord Active:) + +[//]: # () +[//]: # (1. [Add your public ssh key](https://docs.encord.com/docs/annotate-public-keys#set-up-public-key-authentication) to Encord Annotate) + +[//]: # () +[//]: # (2. Associate the private key with Encord Active.) + +[//]: # () +[//]: # ( ```shell) + +[//]: # ( encord-active config set ssh-key-pash /path/to/private/key) + +[//]: # ( ```) + +[//]: # () +[//]: # (---) + +--- + +## How do I import my model predictions? + +To import your model predictions into Encord Active OS, perform the following steps: + +1. Build a list of `encord_active.lib.db.predictions.Prediction` objects that represent your model predictions. +2. Store the list of predictions in a pickle file. +3. Run the command `encord-active import predictions /path/to/your/file.pkl`, where `/path/to/your/file.pkl` is the path to the pickle file containing your predictions. + +By executing this command, Encord Active imports and incorporates your model predictions into the project. You can refer to the [workflow description](https://docs.encord.com/docs/active-import-model-predictions) for importing model predictions for more detailed instructions. + +--- + +## How can I do dataset management with Encord Active? + +Dataset management can be done in Active OS the following ways: + +- You can [tag](https://docs.encord.com/docs/active-tagging) your data to keep track of subsets (or versions) of your dataset. + +- If you are planning to do more involved changes to your dataset and you want the ability to go back, your best option is to use the Clone button in the _Action_ tab of the toolbox in the application's explorer pages. + +--- + +## How do I version my data and labels through Encord Active? + +While you can version your data and labels using Active, Annotate supports [data versioning](https://docs.encord.com/docs/annotate-annotation-projects#save-label-version) and [label versioning](https://docs.encord.com/docs/annotate-annotation-projects#saved-versions). + +The best way to version your project in Active OS is to tag your data with the [tags](https://docs.encord.com/docs/active-tagging) feature as you go. + +Alternatively, you can use `git`. To do that, we suggest adding a `.gitignore` file with the following content: + +```gitignore +data/**/*.jpg +data/**/*.jpeg +data/**/*.png +data/**/*.tiff +data/**/*.mp4 +``` + +After that, run the following: `git add .; git commit -am "Initial commit"`. + +--- + +## How do I use Encord Active to find label errors? + +[See this blog post](https://encord.com/blog/find-and-fix-label-errors-tutorial/) on finding and fixing label errors using Encord Active OS. + +--- + +## Can I use Encord Active without an Encord account + +Yes, you can use Encord Active OS without an Encord Account. Encord Active OS is an open source project aimed to support all computer vision based active learning projects. For example: +- Use the [`init`](https://docs.encord.com/docs/active-cli#init) command to initialize a project from an image directory. +- [Import a COCO project](https://docs.encord.com/docs/active-import-coco-project). + +Please see our [import documentation](https://docs.encord.com/docs/active-import-coco-project) for more details, and options available to you. + +--- + +## Does data stay within my local environment? + +**Yes!** + +When using Active OS everything you do with the library stays within your local machine. No statistics, data, or other information is collected or sent elsewhere. + +The only communication that occurs with the outside world is with Encord's main platform - if you have a project linked to Encord. + +--- + +## What do I do if I have issues with the installation of Active OS? + +If you encounter any issues during the Active OS installation process, we recommend checking that you have followed the steps outlined in the [installation guide](https://docs.encord.com/docs/active-oss-install) carefully. + +If the problem persists or if you have any further questions, please don't hesitate to get in touch with us via [Slack][slack-join] or [email](mailto:active@encord.com). We'll be happy to assist you with any installation-related issues you may have. + +--- + +## How do I add my own embeddings? + +Please see this [notebook](https://colab.research.google.com/github/encord-team/encord-active/blob/main/examples/adding-own-custom-embeddings.ipynb) to learn how to add your own custom embeddings using Active OS. + +--- + +## What is the tagging feature in Active OS? + +In the Active OS UI, throughout the Data Quality, <> Quality, and Model Quality pages, you can tag your data. +There are two different levels at which you can tag data; the data level which applies to the raw images/video frames and the label level which applies to the <>s and objects associated to each image. + +You can, for example, use tags to [filter](https://docs.encord.com/docs/active-filtering) your data for further processing like relabeling, training models, or inspecting model performance based on a specific subset of your data. + +[Here](https://docs.encord.com/docs/active-tagging) is some more documentation on using the tagging feature. + +--- + +## What should I do if I encounter an error? + +If you come across an error, don't worry! We're here to assist you. +Reach out to us on [Slack][slack-join] or shoot us an [email](mailto:active@encord.com), and we'll promptly address your concern. + +Additionally, we greatly appreciate it if you could [report the issue](https://github.com/encord-team/encord-active/issues/new/choose) on GitHub. Your feedback and bug reports help us improve Encord Active for everyone. + + +[ea-lib]: https://github.com/encord-team/encord-active/tree/main/src/encord_active/lib +[ea-server]: https://github.com/encord-team/encord-active/tree/main/src/encord_active/server +[slack-join]: https://join.slack.com/t/encordactive/shared_invite/zt-1hc2vqur9-Fzj1EEAHoqu91sZ0CX0A7Q + +--- + +## How does Encord Active OS integrate without the Encord Platform? + +There are multiple ways in which you can integrate your data with Encord Active. We have described how to import data [here](https://docs.encord.com/docs/active-import). To integrate model predictions, you can read more [here](https://docs.encord.com/docs/active-import-model-predictions). + +Exporting data back into the rest of your pipeline can be done using the toolbox in the application's explorer pages. + +--- + +## Initializing Encord Active OS is taking a long time, what should I do? + +For larger projects, initialization of Active OS can take a while. +While we're working on improving the efficiency, there are a couple of tricks that you can do. + +1. As soon as the metric computations have started (indicated by Encord Active printing a line containing `Running metric`) you can open a new terminal and run `encord-active start`. This will allow you to continuously see what have been computed so far. Refresh the browser once in a while when new metrics are done computing in your first terminal. + +2. You can also kill the import process as soon as the metrics have started to compute. This will leave you with a project containing fewer quality metrics. As a consequence, you will not be able to see as many insights as if the process is allowed to finish. However, you can always use the [`encord-active metric run`](https://docs.encord.com/docs/active-cli#run) command to run metrics that are missing. + +--- + +## Can I use Encord Active OS without a UI? + +Yes, you can! + +The code base is structured such that all data operations live in [`encord_active.lib`][ea-lib] and [`encord_active.server`][ea-server] which serve as the "backend" for the UI. As such, everything you can do with the UI can also be done by code. + +Other good resources can be found in our [example notebooks](https://github.com/encord-team/encord-active/tree/main/examples). + +--- \ No newline at end of file diff --git a/docs/active-oss-basics.md b/docs/active-oss-basics.md new file mode 100644 index 000000000..5cf83e143 --- /dev/null +++ b/docs/active-oss-basics.md @@ -0,0 +1,29 @@ +--- +title: "Basics" +slug: "active-oss-basics" +hidden: false +metadata: + title: "Encord Active Basics" + description: "Explore the basics of Encord Active: Learn filtering, sorting, issue and prediction shortcuts, and Collections - essential platform features explained." +category: "65a71bbfea7a3f005192d1a7" +--- + +Welcome to Encord Active 101! Learn about the platform's key building blocks. + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +[block:html] +{ + "html": "\n\n\n \n \n Clickable Div\n \n\n\n Filtering Tagging Relabeling\n\n" +} +[/block] + + \ No newline at end of file diff --git a/docs/active-oss-getting-started.md b/docs/active-oss-getting-started.md new file mode 100644 index 000000000..145b20293 --- /dev/null +++ b/docs/active-oss-getting-started.md @@ -0,0 +1,21 @@ +--- +title: "Getting started with Encord Active OS" +slug: "active-oss-getting-started" +hidden: false +metadata: + title: "Getting started with Encord Active OS" + description: "Get started with Encord Active. Explore using the example project. Effortless learning." +category: "65a71bbfea7a3f005192d1a7" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +[block:html] +{ + "html": "\n\n\n \n \n Clickable Div\n \n\n\n 1. Install Active OS 2. Active OS Quickstart 3. Import to Active OS 4. Launch Active OS 5. Active OS next steps\n\n" +} +[/block] \ No newline at end of file diff --git a/docs/active-oss-how-to.md b/docs/active-oss-how-to.md new file mode 100644 index 000000000..37cccd22e --- /dev/null +++ b/docs/active-oss-how-to.md @@ -0,0 +1,40 @@ +--- +title: "How to" +slug: "active-oss-how-to" +hidden: false +metadata: + title: "How to" + description: "Enhance data, labels & models with effective workflows: Explore distribution, outliers, similarity, embeddings, evaluation & more." + image: + 0: "https://files.readme.io/9101bc1-image_16.png" +category: "65a71bbfea7a3f005192d1a7" +--- + +Use the links below learn about workflows for improving your data, labels, and models. + +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +[block:html] +{ + "html": "\n\n\n \n \n Clickable Div\n \n\n\n Quick import data & labels Explore image similarity Encord project Remove duplicate images\n\n" +} +[/block] + + + diff --git a/docs/active-oss-import.md b/docs/active-oss-import.md new file mode 100644 index 000000000..34de5eec2 --- /dev/null +++ b/docs/active-oss-import.md @@ -0,0 +1,125 @@ +--- +title: "Import" +slug: "active-oss-import" +hidden: false +metadata: + title: "Import" + description: "Import data into Encord Active: Easily bring in images, videos (MP4), and soon DICOM (DCM) formats. Streamline data integration." +category: "65a71bbfea7a3f005192d1a7" +--- + +To use Encord Active, you'll need data. This page shows you ways in which you can import your data into Encord Active. + +Encord Active supports the following formats for images (jpg, png), videos (MP4) (DICOM (DCM) support is coming soon). Select the format that best fits your current data storage location. + + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +The import workflows currently support the following data and label integration: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ Import Type + + Data Type + + Label Type +
+ Images + + Videos + + Classification + + Bounding Boxes + + Polygons + + Polyline + + Bitmask + + Key-point +
+ Quick import data & labels + -
+ Import model predictions +
+ Encord project +
+ COCO project + --
+ +[block:html] +{ + "html": "\n\n\n \n \n Clickable Div\n \n\n\n Quick import data & labels Import model predictions Encord project COCO project\n\n" +} +[/block] \ No newline at end of file diff --git a/docs/active-oss-tutorials.md b/docs/active-oss-tutorials.md new file mode 100644 index 000000000..4674bb02f --- /dev/null +++ b/docs/active-oss-tutorials.md @@ -0,0 +1,29 @@ +--- +title: "Tutorials" +slug: "active-oss-tutorials" +hidden: false +category: "65a71bbfea7a3f005192d1a7" +--- + +This page contains case studies and tutorials that show how to use Encord Active end-to-end. Click the links below to navigate to the tutorial you need. + +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +[block:html] +{ + "html": "\n\n\n \n \n Clickable Div\n \n\n\n Easy Active Learning COCO Dataset Quickstart Dataset Diversity Sampling\n\n" +} +[/block] + + diff --git a/docs/active-overview.md b/docs/active-overview.md new file mode 100644 index 000000000..679131cc8 --- /dev/null +++ b/docs/active-overview.md @@ -0,0 +1,155 @@ +--- +title: "Overview of Encord Active" +slug: "active-overview" +hidden: false +metadata: + title: "Overview of Encord Active" + description: "Optimize models with Encord Active: A cloud-based and open source toolkit for data quality, model enhancement, and failure mode detection. Boost performance." + image: + 0: "https://files.readme.io/294aa28-image_16.png" +createdAt: "2023-07-11T16:27:41.844Z" +updatedAt: "2023-08-11T12:43:00.437Z" +category: "6480a3981ed49107a7c6be36" +--- + + + +
+ Encord Active logo +
+ + +Encord Active is available in two versions: **Encord Active Cloud** and **Encord Active OS**. Active Cloud is tightly integrated with Encord Annotate, with Active Cloud and Annotate being hosted by Encord. Encord Active OS is an open source toolkit that can be installed on a local computer/server. + +[Encord Active Cloud](https://encord.com/encord-active/) and Encord Active OS (open source) are active learning solutions that help you find failure modes in your models, and improve your data quality and model performance. + +Use Active Cloud and Active OS to visualize your data, evaluate your models, surface model failure modes, find labeling mistakes, prioritize high-value data for relabeling and more! + +[//]: # (![video](https://storage.googleapis.com/docs-media.encord.com/static/img/gifs/ea-demo.gif)) + +## When to use Encord Active? + +Encord Active helps you understand and improve your data, labels, and models at all stages of your computer vision journey. + +Whether you've just started collecting data, labeled your first batch of samples, or have multiple models in production, Encord Active can help you. + +![encord active diagram](https://storage.googleapis.com/docs-media.encord.com/static/img/process-chart-ea.webp) + +### Example use cases + +To give you a better idea about how Active Cloud and Annotate work together, here are some use cases. + +**Data Curation and Label Error Correction** + +![Encord Active workflow](https://storage.googleapis.com/docs-media.encord.com/static/img/active/active-workflow-data-curation-label-validation.png) + +**Optimize Model Performance** + +![Encord Active workflow](https://storage.googleapis.com/docs-media.encord.com/static/img/active/active-workflow-model-optimization.png) + +> ℹ️ Note +> Before going any further, you should know what a Collection is in Encord Active Cloud. Collections provide a way to save interesting groups of data units and labels, to support and guide your downstream workflow. For more information on [Collections go here](https://docs.encord.com/docs/active-collections). + +[block:html] +{ + "html": "\n\n\n \n \n Clickable Div\n \n\n\n Data Cleansing/Curation Label Correction/Validation Model/Prediction Evaluation\n\n" +} +[/block] + +## What data does Encord Active support? + + + + + + +| Features | Active Open Source | Active Cloud | +| :------------------------------- | :-----------------------------------------: | :-----------------------------------------------------------------------------: | +| Data types | `jpg`, `png` | `jpg`, `png`, `tiff`, `mp4` | +| Labels**1** |`classification`, `bounding box`, `polygon` | `classification`, `bounding box`, `polygon`, `polyline`, `bitmask`, `key-point` | +| Number of images | 25,000 per project (unlimited projects) | 500,000 per project (unlimited projects) | +| Videos | - | 2 hours @ 30fps | +| Data exploration | ✅ | ✅ | +| Label exploration | ✅ | ✅ | +| Similarity search | ✅ | ✅ | +| Off-the-shelf quality metrics | ✅ | ✅ | +| Custom quality metrics | ✅ | ✅ | +| Data and label tagging | ✅ | ✅ | +| Image duplication detection | ✅ | ✅ | +| Label error detection | ✅ | ✅ | +| Outlier detection | ✅ | ✅ | +| Collections | - | ✅ | +| Model evaluation | - | ✅ | +| Label synchronization | - | ✅ | +| Natural language search | - | ✅ | +| Search by image | - | ✅ | +| Integration with Encord Annotate | - | ✅ | +| Nested Attributes | - | ✅ | +| Custom metadata | - | ✅ | + +**1**: <>s and <>s are both supported: + +Filtering: + +- Objects + all attributes +- Classification + all attributes + +Model evaluation: + +- Objects and Classifications cannot be mixed +- Classification support includes top level radio button +- Object support includes top level object \ No newline at end of file diff --git a/docs/active-python-sdk.md b/docs/active-python-sdk.md new file mode 100644 index 000000000..4a71e83de --- /dev/null +++ b/docs/active-python-sdk.md @@ -0,0 +1,27 @@ +--- +title: "Python SDK" +slug: "active-python-sdk" +hidden: false +metadata: + title: "Python SDK" + description: "Integrate Encord Active with Python SDK: Project setup, metrics, predictions, tagging, embeddings, filtering. Streamline workflows." + image: + 0: "https://files.readme.io/64d22ab-image_16.png" +createdAt: "2023-07-11T17:05:59.684Z" +updatedAt: "2023-08-09T14:58:41.583Z" +category: "65a71bbfea7a3f005192d1a7" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +These pages provide guidance on integrating Encord Active into your existing workflows through code. + +[block:html] +{ + "html": "\n\n\n \n \n Clickable Div\n \n\n\n Project initialization Quality metric execution Custom embeddings Filtering\n\n" +} +[/block] \ No newline at end of file diff --git a/docs/active-python-sdk/active-sdk-custom-embeddings.md b/docs/active-python-sdk/active-sdk-custom-embeddings.md new file mode 100644 index 000000000..02327751c --- /dev/null +++ b/docs/active-python-sdk/active-sdk-custom-embeddings.md @@ -0,0 +1,19 @@ +--- +title: "Custom embeddings" +slug: "active-sdk-custom-embeddings" +hidden: false +metadata: +createdAt: "2023-07-21T09:09:02.242Z" +updatedAt: "2023-08-09T16:15:32.198Z" +category: "65a71bbfea7a3f005192d1a7" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +The easiest way to add custom embeddings to your project is to use [this notebook][custom-embeddings-notebook] as a starting point. + +[custom-embeddings-notebook]: https://colab.research.google.com/github/encord-team/encord-notebooks/blob/main/colab-notebooks/Encord_Active_Add_Custom_Embeddings.ipynb \ No newline at end of file diff --git a/docs/active-python-sdk/active-sdk-filtering.md b/docs/active-python-sdk/active-sdk-filtering.md new file mode 100644 index 000000000..6c7a9c8c7 --- /dev/null +++ b/docs/active-python-sdk/active-sdk-filtering.md @@ -0,0 +1,71 @@ +--- +title: "Filtering" +slug: "active-sdk-filtering" +hidden: false +metadata: + title: "Filtering" + description: "Master data filtering in Encord Active: Optimize insights, remove noise, prioritize tasks. Use filters for focused analysis." + image: + 0: "https://files.readme.io/1556a4c-image_16.png" +createdAt: "2023-07-14T16:05:36.402Z" +updatedAt: "2023-08-09T16:17:22.090Z" +category: "65a71bbfea7a3f005192d1a7" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +**Learn how to apply filters to your data and labels** + +To filter your data and labels based on metrics, you can utilize the `MergedMetrics` dataframe. + +```python +from pathlib import Path + +import pandas as pd +from encord_active.lib.db.connection import DBConnection +from encord_active.lib.db.merged_metrics import MergedMetrics +from encord_active.lib.project.project_file_structure import ProjectFileStructure + +project_path = Path("/path/to/your/project/root") +pfs = ProjectFileStructure(project_path) +with DBConnection(pfs) as conn: + metrics: pd.DataFrame = MergedMetrics(conn).all().reset_index() +``` + +The resulting dataframe will contain all your data and labels, along with the associated metrics computed on them. + +Here's an example of the column names that you might see in the dataframe: + +``` +Index(['identifier', 'url', 'Green Value', 'Sharpness', 'Uniqueness', + 'Randomize Images', 'Red Value', 'Area', 'Aspect Ratio', + 'Brightness', 'Blue Value', 'Contrast', 'Classification Quality', + 'description', 'object_class', 'annotator', 'frame', 'tags'], dtype='object') +``` + +Based on this dataframe, you can apply various filtering operations using pandas to select the data and labels that meet your criteria. + +Here's an example code that builds upon the previous code and filters the dataframe to find images with very low brightness: + +```python +filtered_metrics = metrics[metrics["Brightness"] < 0.2] +``` + +In this code, the dataframe `metrics` is filtered to include only the rows where the value in the `Brightness` column is lower than 0.2. Customize the filter condition to suit your requirements and perform additional analysis or processing on the filtered dataframe as desired. + +If you want to obtain the url to the data item (image) corresponding to a specific row, you can use the following utility function: + +```python +from encord_active.lib.common.data_utils import url_to_file_path +from encord_active.lib.common.image_utils import key_to_data_unit + +metric_row = metrics.iloc[0] +image_url = url_to_file_path( + key_to_data_unit(metric_row["identifier"], pfs)[0].signed_url, + project_path +) +``` \ No newline at end of file diff --git a/docs/active-python-sdk/active-sdk-import-predictions.md b/docs/active-python-sdk/active-sdk-import-predictions.md new file mode 100644 index 000000000..43259563a --- /dev/null +++ b/docs/active-python-sdk/active-sdk-import-predictions.md @@ -0,0 +1,257 @@ +--- +title: "Import predictions" +slug: "active-sdk-import-predictions" +hidden: false +createdAt: "2023-08-07T15:36:31.990Z" +updatedAt: "2023-08-11T13:41:50.319Z" +category: "65a71bbfea7a3f005192d1a7" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +**Learn how to import predictions into Encord Active** + +> 🚧 Caution +> When running an importer, any previously imported predictions will be overwritten! + +[//]: # (> When running an importer, any previously imported predictions will be overwritten! To ensure the ability to revert to previous iterations, it is important to [version your projects](https://docs.encord.com/docs/active-versioning). ) + +# General import + +To import predictions into Encord Active, make sure you have a list of `Prediction` objects ready and follow these steps: +1. Verify that all `Prediction` objects are constructed correctly. If you are unsure about building `Prediction` objects, refer to the [Import model predictions](https://docs.encord.com/docs/active-import-model-predictions) guide, which provides details on how to create predictions for bounding boxes, polygons, masks, and classifications. +2. Use the following code snippet to import the predictions into Encord Active: + +```python +from pathlib import Path + +from encord_active.lib.db.predictions import Prediction +from encord_active.lib.model_predictions.importers import import_predictions +from encord_active.lib.project import Project + +project_dir = Path("/path/to/your/project/directory/") +predictions: list[Prediction] = ... # Your list of predictions + +import_predictions(Project(project_dir), predictions) +``` + +# Custom imports + +## COCO + +Simplify importing COCO predictions into Encord Active using `import_coco_predictions()` from `encord_active.lib.model_predictions.importers`, which migrates predictions from the COCO results format to an Encord Active project. + +For smooth migration, the `import_coco_prediction` function requires a mappings between the object classes of both formats and the IDs of the images. + +The <> mapping should follow a specific format, where the keys correspond to the `featureNodeHash` of the objects in the project ontology, and the values correspond to their respective COCO class names. + +``` +{ + # featureNodeHash: class_name + "OTk2MzM3": "pedestrian", + "NzYyMjcx": "cyclist", + "Nzg2ODEx": "car" +} +``` + +By default, the `import_coco_predictions` function will attempt to read the ontology mapping from a JSON file named `ontology_mapping.json` located in the same directory as the predictions' file. If this file is not present, the user should provide the ontology mapping explicitly. + +The image mapping should also follow a specific format, with the keys corresponding to the IDs of the images in the COCO file, and the values corresponding to their respective <> hashes. + +``` +{ + # image_id: data_unit_hash + 1: "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", + 2: "ffffffff-gggg-hhhh-iiii-jjjjjjjjjjjj" +} +``` + +Similarly, the `import_coco_predictions` function will attempt to read the image mapping from a JSON file named `image_mapping.json` located in the same directory as the predictions' file. If this file is not present, the user should provide the image mapping explicitly. + +When importing COCO predictions, ensure that both the <> mapping and image mapping are available to accurately associate the predictions with the appropriate ontology objects and data units. + +```python +from pathlib import Path + +from encord_active.lib.model_predictions.importers import import_coco_predictions +from encord_active.lib.project import Project + +project_dir = Path("/path/to/your/project/directory/") +predictions_json = Path("/path/to/your/predictions.json") + +import_coco_predictions(Project(project_dir), predictions_json) +``` + +> ℹ️ Note +> Remember to provide the ontology mapping and image mapping if they are not available in the same directory as the predictions' file. + +Refer to the method's documentation for `import_coco_predictions` to learn more about the optional parameters. + +## KITTI + +The KITTI dataset is a widely-used computer vision dataset for benchmarking and evaluating autonomous driving systems. Due to the popularity of this dataset, many researchers and developers have adopted its label format for their own datasets and applications. Encord Active enhances your experience by migrating such labels to the Encord format. + +> 🚧 Caution +> The following approach **only works for bounding boxes**. + +Simplify importing KITTI predictions into Encord Active using `import_kitti_predictions()` from `encord_active.lib.model_predictions.importers`, which migrates predictions from the KITTI label files format stored in TXT or CSV files to an Encord Active project. + +For smooth migration, the `import_kitti_predictions` function requires a specific file structure and a mapping between the object classes in the project's <> and the corresponding KITTI class names. + +Ensure the predictions' directory comply with the following file structure: + +``` +predictions +├── example_image.txt +├── example_video__0.csv +├── example_video__1.csv +├── ... +├── ontology_mapping.json (optional) +└── other_files_and_folders (optional) +``` + +Where each prediction file is named with the format `[__]`, where `` is the title of the <>, and `` (optional) represents the frame number in the video / <> / <>. This naming convention allows the `import_kitti_predictions` function to interpret and associate predictions with the correct data units in the Encord Active project. + +The ontology mapping should follow a specific format, where the keys correspond to the `featureNodeHash` of the <> objects in the project's ontology, and the values correspond to their respective KITTI class names. + +``` +{ + # featureNodeHash: class_name + "OTk2MzM3": "pedestrian", + "NzYyMjcx": "cyclist", + "Nzg2ODEx": "car" +} +``` + +By default, the `import_kitti_predictions` function will attempt to read the <> mapping from a JSON file named `ontology_mapping.json` located in the predictions' directory. If this file is not present, the user should provide the ontology mapping explicitly. + +When importing KITTI predictions, ensure that both the file structure and ontology mapping are valid and available to accurately associate the predictions with the appropriate ontology objects and data units. + +```python +from pathlib import Path + +from encord_active.lib.model_predictions.importers import import_kitti_predictions +from encord_active.lib.project import Project + +project_dir = Path("/path/to/your/project/directory/") +predictions_dir = Path("/path/to/your/predictions/directory/") + +import_kitti_predictions(Project(project_dir), predictions_dir) +``` + +> ℹ️ Note +> Remember to provide the <> mapping if it is not available in the predictions' directory. + +You can further filter the predictions files using the `file_name_regex` optional parameter, allowing you to select only specific prediction files based on their names. Additionally, if needed, you can provide a custom function `file_path_to_data_unit_func` that matches file names with the corresponding data units in Encord, allowing you to precisely identify the data units for each prediction. Refer to the method's documentation for `import_kitti_predictions` to learn more about the optional parameters. + +### Label Files Format + +The KITTI importer supports the label format described here with the addition of a column corresponding to the model confidence. + +An example: + +``` +car 0.00 0 0.00 587.01 173.33 614.12 200.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 97.85 +cyclist 0.00 0 0.00 665.45 160.00 717.93 217.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00 32.65 +pedestrian 0.00 0 0.00 423.17 173.67 433.17 224.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 3.183 +``` + +Columns are: + +- `class_name`: str +- ~~`truncation`: float~~ ignored +- ~~`occlusion`: int~~ ignored +- ~~`alpha`: float~~ ignored +- `xmin`: float +- `ymin`: float +- `xmax`: float +- `ymax`: float +- ~~`height`: float~~ ignored +- ~~`width`: float~~ ignored +- ~~`length`: float~~ ignored +- ~~`location_x`: float~~ ignored +- ~~`location_y`: float~~ ignored +- ~~`location_z`: float~~ ignored +- ~~`rotation_y`: float~~ ignored +- `confidence`: float + +> ℹ️ Note +> The columns flagged as `ignored` have to appear in the label format but their values will be ignored. + +## Instance segmentation masks + +> 🚧 Caution +> The following approach will **transform segmentations into simple polygons**. + +Simplify importing predictions stored as `.png` masks of shape `[height, width]`, where each pixel value correspond to a class, into Encord Active using `import_mask_predictions()` from `encord_active.lib.model_predictions.importers`, which migrates predictions from segmentation masks to an Encord Active project. + +For smooth migration, the `import_mask_predictions` function requires a specific file structure and a mapping between the object classes in the project's ontology and the corresponding IDs in the masks. + +Ensure the predictions' directory comply with the following file structure: + +``` +predictions +├── example_image.png +├── example_video__0.png +├── example_video__1.png +├── ... +├── ontology_mapping.json (optional) +└── other_files_and_folders (optional) +``` + +Where each prediction file is named with the format `[__]`, where `` is the title of the <>, and `` (optional) represents the frame number in the video / <> / <>. This naming convention allows the `import_mask_predictions` function to interpret and associate predictions with the correct data units in the Encord Active project. + +The ontology mapping should adhere to a specific format, with the keys corresponding to the `featureNodeHash` of the polygon objects in the project's ontology, and the values corresponding to their respective IDs in the masks. + +``` +{ + # featureNodeHash: pixel_value + "OTk2MzM3": 1, # "pedestrian" + "NzYyMjcx": 2, # "cyclist", + "Nzg2ODEx": 3, # "car" + # Note: value: 0 is reserved for "background" +} +``` + +By default, the `import_mask_predictions` function will attempt to read the ontology mapping from a JSON file named `ontology_mapping.json` located in the predictions' directory. If this file is not present, the user should provide the ontology mapping explicitly. + +When importing mask predictions, ensure that both the file structure and ontology mapping are valid and available to accurately associate the predictions with the appropriate ontology objects and data units. + +```python +from pathlib import Path + +from encord_active.lib.model_predictions.importers import import_mask_predictions +from encord_active.lib.project import Project + +project_dir = Path("/path/to/your/project/directory/") +predictions_dir = Path("/path/to/your/predictions/directory/") + +import_mask_predictions(Project(project_dir), predictions_dir) +``` + +> ℹ️ Note +> Remember to provide the ontology mapping if it is not available in the predictions' directory. + +You can further filter the predictions files using the `file_name_regex` optional parameter, allowing you to select only specific prediction files based on their names. Additionally, if needed, you can provide a custom function `file_path_to_data_unit_func` that matches file names with the corresponding data units in Encord, allowing you to precisely identify the data units for each prediction. Refer to the method's documentation for `import_mask_predictions` to learn more about the optional parameters. + +> 🚧 Caution +> For each imported file, every "self-contained" contour will be interpreted as an individual prediction. For example, the following mask will be interpreted as three objects: two from class 1 and one from class 2. +> +> ``` +> ┌───────────────────┐ +> │0000000000000000000│ +> │0011100000000000000│ +> │0011110000002222000│ +> │0000000000002222000│ +> │0000111000002200000│ +> │0000111000002200000│ +> │0000111000000000000│ +> │0000000000000000000│ +> └───────────────────┘ +> ``` +> +> Also, the confidence of the predictions will be set to 1. \ No newline at end of file diff --git a/docs/active-python-sdk/active-sdk-project-initialization.md b/docs/active-python-sdk/active-sdk-project-initialization.md new file mode 100644 index 000000000..ff1c4599b --- /dev/null +++ b/docs/active-python-sdk/active-sdk-project-initialization.md @@ -0,0 +1,200 @@ +--- +title: "Project initialization" +slug: "active-sdk-project-initialization" +hidden: false +metadata: + title: "Project Initialization" + description: "Start your project right: Initialize with or without labels using Python code. Quick setup for optimal results." + image: + 0: "https://files.readme.io/422fb92-image_16.png" +createdAt: "2023-07-14T16:05:36.364Z" +updatedAt: "2023-08-09T16:22:25.711Z" +category: "65a71bbfea7a3f005192d1a7" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +**Learn how to initialize a project with and without labels** + +Project initialization using Python code provides the same functionality as running the [`init`](https://docs.encord.com/docs/active-cli#init) command in the CLI. If you prefer using the CLI, you can refer to the [Quick import data & labels](https://docs.encord.com/docs/active-quick-import) guide. + +## Steps to initialize a project without labels + +[block:embed] +{ + "url": "https://colab.research.google.com/github/encord-team/encord-active/blob/main/examples/initialization-project-without-labels-using-python.ipynb", + "title": "Initialization of a project without labels in Encord Active using Python", + "favicon": "https://cdn.simpleicons.org/googlecolab/#F9AB00", + "image": "https://cdn.simpleicons.org/googlecolab/#F9AB00", + "provider": "colab.research.google.com", + "href": "https://colab.research.google.com/github/encord-team/encord-active/blob/main/examples/initialization-project-without-labels-using-python.ipynb" +} +[/block] + +1. Choose the images you want to import. + For example, you can use the `glob` function to find all the files with the ".jpg" extension in your current working directory, including subdirectories. Customize the logic as needed to include specific files. + + ```python + from pathlib import Path + + image_files = list(Path.cwd().glob("**/*.jpg")) + ``` + +2. Specify the directory where you want to store the Encord Active project. + It is recommended to keep multiple projects within the same directory for easy navigation within the UI. + + ```python + projects_dir = Path("path/to/where/you/store/projects") + ``` + +3. Create the project. + Note that the `symlinks` option determines whether files are copied (`symlinks=False`) or referenced using symlinks to save disk space (`symlinks=True`). + + ```python + from encord_active.lib.project.local import init_local_project + + project_path = init_local_project( + files=image_files, + target=projects_dir, + project_name="", + symlinks=True, + ) + ``` + +4. Run a metric to ensure proper functioning of the UI. + + ```python + from encord_active.lib.metrics.execute import execute_metrics + from encord_active.lib.metrics.heuristic.img_features import AreaMetric + + execute_metrics( + selected_metrics=[AreaMetric()], + data_dir=project_path, + use_cache_only=True, + ) + ``` + + Alternatively, you can run all metrics that do not depend on any labels using the following code snippet: + + ```python + from encord_active.lib.metrics.execute import run_metrics_by_embedding_type + from encord_active.lib.metrics.types import EmbeddingType + + run_metrics_by_embedding_type( + EmbeddingType.IMAGE, + data_dir=project_path, + use_cache_only=True, + ) + ``` + +After completing these steps, you can launch the application with the [start][ea-cli-start] CLI command and access the project: + +```shell +encord-active start -t "path/to/where/you/store/projects" +``` + +## Steps to initialize a project with labels + +[block:embed] +{ + "url": "https://colab.research.google.com/github/encord-team/encord-active/blob/main/examples/initialization-project-with-labels-using-python.ipynb", + "title": "Initialization of a project with labels in Encord Active using Python", + "favicon": "https://cdn.simpleicons.org/googlecolab/#F9AB00", + "image": "https://cdn.simpleicons.org/googlecolab/#F9AB00", + "provider": "colab.research.google.com", + "href": "https://colab.research.google.com/github/encord-team/encord-active/blob/main/examples/initialization-project-with-labels-using-python.ipynb" +} +[/block] + +If you have previously defined a `LabelTransformer` as explained in the [Quick import data & labels](https://docs.encord.com/docs/active-quick-import#including-labels) guide, you can utilize it in the project initialization process. To include labels, you need to provide the transformer object and the corresponding label files to the `init_local_project` function. + +1. Choose the images and label files you want to import. + For example, you can use the `glob` function to find all the files with the ".jpg" extension and label files with the ".json" extension in your current working directory, including subdirectories. Customize the logic as needed to include specific files. + + ```python + from pathlib import Path + + image_files = list(Path.cwd().glob("**/*.jpg")) + label_files = list(Path.cwd().glob("**/*.json")) + ``` + +2. Define a class that implements the [`LabelTransformer`][gh-label-transformer-interface] interface and handles the parsing of labels. + For instance, you can refer to the implementation of the [`BBoxTransformer`][gh-bbox-transformer] class. Instantiate this class to utilize it for including labels in your project. + + ```python + label_transformer = BBoxTransformer() + ``` + + > 👍 Tip + > Check out more label transformer examples in the [examples section][gh-transformer-examples] of Encord Active's GitHub repository. + + +3. Specify the directory where you want to store the Encord Active project. + It is recommended to keep multiple projects within the same directory for easy navigation within the UI. + + ```python + projects_dir = Path("path/to/where/you/store/projects") + ``` + +4. Create the project. + Note that the `symlinks` option determines whether files are copied (`symlinks=False`) or referenced using symlinks to save disk space (`symlinks=True`). + + ```python + from encord_active.lib.project.local import init_local_project + + project_path = init_local_project( + files = image_files, + target = projects_dir, + project_name = "", + symlinks = True, + label_transformer=label_transformer, + label_paths=label_files, + ) + ``` + +5. Run a metric to ensure proper functioning of the UI. + + ```python + from encord_active.lib.metrics.execute import execute_metrics + from encord_active.lib.metrics.heuristic.img_features import AreaMetric + + execute_metrics( + selected_metrics=[AreaMetric()], + data_dir=project_path, + use_cache_only=True, + ) + ``` + + Alternatively, you can run all metrics related to labels using the following code snippet: + + ```python + from encord_active.lib.metrics.execute import run_metrics_by_embedding_type + from encord_active.lib.metrics.types import EmbeddingType + + run_metrics_by_embedding_type( + EmbeddingType.OBJECT, + data_dir=project_path, + use_cache_only=True, + ) + + run_metrics_by_embedding_type( + EmbeddingType.CLASSIFICATION, + data_dir=project_path, + use_cache_only=True, + ) + ``` + +After completing these steps, you can launch the application by using the following CLI command and access the project: + +```shell +encord-active start -t "path/to/where/you/store/projects" +``` + +[ea-cli-start]: https://docs.encord.com/docs/active-cli#start +[gh-label-transformer-interface]: https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/labels/label_transformer.py#L61-L79 +[gh-bbox-transformer]: https://github.com/encord-team/encord-active/blob/main/examples/label-transformers/bounding-boxes/bbox_transformer.py +[gh-transformer-examples]: https://github.com/encord-team/encord-active/blob/main/examples/label-transformers \ No newline at end of file diff --git a/docs/active-python-sdk/active-sdk-quality-metric-execution.md b/docs/active-python-sdk/active-sdk-quality-metric-execution.md new file mode 100644 index 000000000..cfa81da6d --- /dev/null +++ b/docs/active-python-sdk/active-sdk-quality-metric-execution.md @@ -0,0 +1,73 @@ +--- +title: "Quality metric execution" +slug: "active-sdk-quality-metric-execution" +hidden: false +metadata: + title: "Quality metric execution" + description: "Compute quality metrics using Python code for CLI-equivalent functionality. Learn built-in & custom metric execution. | Encord" + image: + 0: "https://files.readme.io/d548c25-image_16.png" +createdAt: "2023-07-14T16:05:36.319Z" +updatedAt: "2023-08-11T13:43:54.172Z" +category: "65a71bbfea7a3f005192d1a7" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +**Learn how to execute built-in and custom quality metrics** + +There are a couple of ways to compute quality metrics via code. + +Quality metrics calculation using Python code provides the same functionality as executing the [`metric run`](https://docs.encord.com/docs/active-cli#run) command in the CLI. + +## Running built-in metrics + +To run all metrics available by default in Encord Active, you can use the following code snippet: + +```python +from pathlib import Path +from encord_active.lib.metrics.execute import run_metrics + +project_path = Path("/path/to/your/project/root") +run_metrics(data_dir=project_path, use_cache_only=True) +``` + +The `run_metrics` function also allows you to filter which metrics to run by providing a filter function: + +```python +options = dict(data_dir=project_path, use_cache_only=True) +run_metrics(filter_func=lambda m: m().metadata.title == "", **options) +``` + +### Compute only data or label metrics + +The `run_metrics_by_embedding_type` utility function allows you to run predefined subsets of metrics: + +```python +from encord_active.lib.metrics.execute import run_metrics_by_embedding_type +from encord_active.lib.metrics.types import EmbeddingType + +run_metrics_by_embedding_type(EmbeddingType.IMAGE, **options) +run_metrics_by_embedding_type(EmbeddingType.OBJECT, **options) +run_metrics_by_embedding_type(EmbeddingType.CLASSIFICATION, **options) +``` + +## Running custom metrics + +If you have already [written a custom metric](https://docs.encord.com/docs/active-write-custom-quality-metrics), let's call it `SuperMetric`, then you can execute it on a project of your choosing with the following code: + +```python +from pathlib import Path +from encord_active.lib.metrics.execute import execute_metrics +from super_metric import SuperMetric + +project_path = Path("/path/to/your/project") +execute_metrics([SuperMetric()], data_dir=project_path) +``` + +> 👍 Tip +> The CLI allows you to [register metrics](https://docs.encord.com/docs/active-cli#add) to the project to easily execute them afterwards. \ No newline at end of file diff --git a/docs/active-python-sdk/active-sdk-tagging.md b/docs/active-python-sdk/active-sdk-tagging.md new file mode 100644 index 000000000..3ed4c419c --- /dev/null +++ b/docs/active-python-sdk/active-sdk-tagging.md @@ -0,0 +1,107 @@ +--- +title: "Tagging" +slug: "active-sdk-tagging" +hidden: false +metadata: + title: "Tagging" + description: "Understand data tagging scopes: TagScope.DATA for whole images & TagScope.LABEL for individual labels. Precise AI data tagging with Encord Active." + image: + 0: "https://files.readme.io/d3b306e-image_16.png" +createdAt: "2023-07-11T16:27:42.454Z" +updatedAt: "2023-08-11T13:46:52.117Z" +category: "65a71bbfea7a3f005192d1a7" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +**Learn the meaning of scopes for tags and how to tag your data** + +Tagging data using Python code provides the same functionality as using the [_Tag_ feature](https://docs.encord.com/docs/active-tagging) in the application. + +## Scopes + +When applying tags in your project, it's important to understand the two different scopes to which tags can be applied: `TagScope.DATA` and `TagScope.LABEL`. + +* The `TagScope.DATA` scope allows you to apply tags to the images themselves. + Tags applied in this scope describe characteristics or attributes that apply to the entire image as a whole. For example, you can tag an image with labels such as "outdoor," "sunset," or "high-resolution" to describe its overall properties. + +* The `TagScope.LABEL` scope allows you to apply tags to individual labels within an image. + This means that you can assign specific tags to each label, capturing unique attributes or properties associated with that particular label. For instance, you can tag a label as "cat" or "dog" to identify the specific objects present in an image. + +It's worth noting that the `TagScope.LABEL` can be present multiple times in the same image if there are multiple labels with the same tag. This provides flexibility in tagging multiple labels that share a common characteristic. + +By utilizing these scopes, you can apply tags at different levels of granularity, ensuring precise and specific tagging of data in your AI project. + +## Tagging images + +Each image is assigned unique identifiers based on the `label_hash`, `data_hash`, and `frame`. This enables you to easily reference and tag specific images as needed. + +To effectively tag your images, follow the structure below: + +```python +from pathlib import Path + +from encord_active.lib.common.iterator import DatasetIterator +from encord_active.lib.db.connection import DBConnection +from encord_active.lib.db.merged_metrics import MergedMetrics +from encord_active.lib.db.tags import Tag, TagScope +from encord_active.lib.project.project_file_structure import ProjectFileStructure + +project_path = Path("/path/to/your/project/root") +with DBConnection(ProjectFileStructure(project_path)) as conn: + metrics = MergedMetrics(conn) + + new_tag = Tag(name="custom tag", scope=TagScope.DATA) + iterator = DatasetIterator(project_path) + for data_unit, image in iterator.iterate(): + identifier = f"{iterator.label_hash}_{iterator.du_hash}_{iterator.frame:05d}" + tags = metrics.get_row(identifier).tags[0] # Indexing to get the pd.Series content + tags.append(new_tag) # Only add TagScope.DATA tags here. + metrics.update_tags(identifier, tags) +``` + +## Tagging labels + +Tagging labels follows a similar process to tagging images. In addition to the `label_hash`, `data_hash`, and `frame` of the corresponding image, you will need the hash of the label you are tagging. Specifically, for classifications, you will need the `classificationHash`, and for objects, you will need the `objectHash`. + +To tag labels effectively, use the following approach: + +```python +from pathlib import Path + +from encord_active.lib.common.iterator import DatasetIterator +from encord_active.lib.db.connection import DBConnection +from encord_active.lib.db.merged_metrics import MergedMetrics +from encord_active.lib.db.tags import Tag, TagScope +from encord_active.lib.project.project_file_structure import ProjectFileStructure + +project_path = Path("/path/to/your/project/root") +with DBConnection(ProjectFileStructure(project_path)) as conn: + metrics = MergedMetrics(conn) + + iterator = DatasetIterator(project_path) + new_object_tag = Tag(name="an object tag", scope=TagScope.LABEL) + new_classification_tag = Tag(name="a classification tag", scope=TagScope.LABEL) + for data_unit, image in iterator.iterate(): + # For tagging objects + for obj in data_unit.get("labels", {}).get("objects", []): + obj_hash = obj["objectHash"] + identifier = f"{iterator.label_hash}_{iterator.du_hash}_{iterator.frame:05d}_{obj_hash}" + + tags = metrics.get_row(identifier).tags[0] # Indexing to get the pd.Series content + tags.append(new_object_tag) # Only add TagScope.LABEL tags here. + metrics.update_tags(identifier, tags) + + # For tagging frame-level classifications + for obj in data_unit.get("labels", {}).get("classifications", []): + clf_hash = obj["classificationHash"] + identifier = f"{iterator.label_hash}_{iterator.du_hash}_{iterator.frame:05d}_{clf_hash}" + + tags = metrics.get_row(identifier).tags[0] # Indexing to get the pd.Series content + tags.append(new_classification_tag) # Only add TagScope.LABEL tags here. + metrics.update_tags(identifier, tags) +``` \ No newline at end of file diff --git a/docs/active-quality-metrics.md b/docs/active-quality-metrics.md new file mode 100644 index 000000000..7a662f888 --- /dev/null +++ b/docs/active-quality-metrics.md @@ -0,0 +1,29 @@ +--- +title: "Quality metrics" +slug: "active-quality-metrics" +hidden: true +metadata: + title: "Quality Metrics" + description: "Evaluate quality with Encord Active metrics. Data, label, model quality analysis. Customize metrics. Enhance computer vision infrastructure." +category: "6480a3981ed49107a7c6be36" +--- + +Quality metrics evaluate the quality of various components in your computer vision infrastructure, and therefore constitute the foundation of Encord Active. They are additional parameterization options added onto your data, labels, and models; they are ways of indexing your data, labels, and models in semantically interesting and relevant ways. + +Encord Active (EA) is designed to compute, store, inspect, manipulate, and use quality metrics for a wide array of functionality. It hosts a library of these quality metrics, and importantly allows you to customize by writing your own “Quality Metrics” to calculate/compute QMs across your dataset. + +We have split the metrics into the following categories: + +- **Data Quality Metrics:** For analyzing and working with your image, sequence or video data. These metrics operate on images or individual video frames and are heuristic in the sense that they depend on the image content without labels. + - Example metrics: Area, Brightness, Green Value, Sharpness. + +- **Label Quality Metrics:** For analyzing and working with your labels. These metrics operate on the geometries of objects, like <>es, <>s, segmentations and <>s, and the heuristics of classifications. + - Example metrics: Aspect Ratio, Classification Quality, Occlusion Risk. + +--- + +[block:html] +{ + "html": "\n\n\n \n \n Clickable Div\n \n\n\n Data Quality Metrics Label Quality Metrics Custom Quality Metrics\n\n" +} +[/block] \ No newline at end of file diff --git a/docs/active-quality-metrics/active-data-quality-metrics.md b/docs/active-quality-metrics/active-data-quality-metrics.md new file mode 100644 index 000000000..88d57db93 --- /dev/null +++ b/docs/active-quality-metrics/active-data-quality-metrics.md @@ -0,0 +1,167 @@ +--- +title: "Data quality metrics" +slug: "active-data-quality-metrics" +hidden: false +metadata: +createdAt: "2023-07-21T09:09:02.294Z" +updatedAt: "2023-08-09T12:32:43.094Z" +category: "6480a3981ed49107a7c6be36" +--- +Data quality metrics work on images or individual video frames. + +## Access Data Quality Metrics + +Data Quality Metrics are used for sorting data, filtering data, and data analytics. + +| Title | Metric Type | Ontology Type | +|------------------------------------------------------------------------------------------------------------------------------------------------|-------------|---------------| +| [Area](#area) - Ranks images by their area (width/height). | `image` | | +| [Aspect Ratio](#aspect-ratio) - Ranks images by their aspect ratio (width/height). | `image` | | +| [Blue Value](#blue-value) - Ranks images by how blue the average value of the image is. | `image` | | +| [Brightness](#brightness) - Ranks images by their brightness. | `image` | | +| [Contrast](#contrast) - Ranks images by their contrast. | `image` | | +| [Diversity](#diversity) - Forms clusters based on the ontology and ranks images from easy samples to annotate to hard samples to annotate. | `image` | | +| [Frame Number](#frame-number) - Selects images based on a specified range. | `image` | | +| [Green Value](#green-value) - Ranks images by how green the average value of the image is. | `image` | | +| [Height](#height) - Ranks images by the height of the image. | `image` | | +| [Object Count](#object-count) - Counts number of objects in the image. | `image` | `bounding box`, `checklist`, `point`, `polygon`, `polyline`, `radio`, `rotatable bounding box`, `skeleton`, `text` | +| [Object Density](#object-density) - Computes the percentage of image area that is occupied by objects. | `image` | `bounding box`, `polygon`, `rotatable bounding box` | +| [Randomize Images](#randomize-images) - Assigns a random value between 0 and 1 to images. | `image` | | +| [Red Value](#red-value) - Ranks images by how red the average value of the image is. | `image` | | +| [Sharpness](#sharpness) - Ranks images by their sharpness. | `image` | | +| [Uniqueness](#uniqueness) - Finds duplicate and near-duplicate images. | `image` | | +| [Width](#width) - Ranks images by the width of the image. | `image` | | + +**To access Data Quality Metrics for Explorer:** + +1. Click a Project from the _Active_ home page. + +2. Click **Explorer**. + +3. Click **Data**. + +4. Sort and filter the tabular data. + +5. Click the plot diagram icon. + +6. Sort and filter the embedding plot data. + +**To access Data Quality Metrics for analytics:** + +1. Click a Project from the _Active_ home page. + +2. Click **Analytics**. + +3. Click **Data**. + +4. Select the quality metric you want to view from the **2D Metrics view** or **Metrics Distribution** graphs. + +## Area + +Ranks images by their area. Area is computed as the product of image width and image height (_width x height_). + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/heuristic/img_features.py). + +## Aspect Ratio + +Ranks images by their aspect ratio. Aspect ratio is computed as the ratio of image width to image height (_width / height_). + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/heuristic/img_features.py). + +## Blue Value + +Ranks images by how blue the average value of the image is. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/heuristic/img_features.py). + +## Brightness +Ranks images by their brightness. Brightness is computed as the average (normalized) pixel value across each image. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/heuristic/img_features.py). + +## Contrast +Ranks images by their contrast. Contrast is computed as the standard deviation of the pixel values. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/heuristic/img_features.py). + +## Diversity + +For selecting the first samples to annotate when there are no labels in the project. Choosing simple samples that represent those classes well, gives better results. This metric ranks images from easy samples to annotate to hard samples to annotate. Easy samples have lower scores, while hard samples have higher scores. + +### Algorithm + +1. [K-means clustering](https://en.wikipedia.org/wiki/K-means_clustering) is applied to image embeddings. The total number of clusters is obtained from the <> file (if there are both object and image-level information, total object classes are determined as the total cluster number). If no ontology information exists, _K_ is determined as 10. + +2. Samples for each cluster are ranked based on their proximity to cluster centers. Samples closer to the cluster centers refer to easy samples. + +3. Different clusters are combined in a way that the result is ordered from easy to hard and the number of samples for each class is balanced for the first _N_ samples. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/semantic/image_diversity.py). + +## Frame Number + +Select a range of images in a video or a sequential group of images. + + + + +## Green Value +Ranks images by how green the average value of the image is. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/heuristic/img_features.py). + +## Height +Ranks images by the height of the image. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/heuristic/img_features.py). + +## Object Count + +Counts number of objects in the image. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/heuristic/object_counting.py). + +## Object Density + +Computes the percentage of image area that is occupied by objects. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/geometric/object_size.py). + +## Randomize Images +Uses a uniform distribution to generate a value between 0 and 1 for each image. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/heuristic/random.py). + +## Red Value +Ranks images by how red the average value of the image is. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/heuristic/img_features.py). + +## Sharpness +Ranks images by their sharpness. + +Sharpness is computed by applying a Laplacian filter to each image and computing the variance of the output. In short, the score computes "the amount of edges" in each image. + +```python +score = cv2.Laplacian(image, cv2.CV_64F).var() +``` + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/heuristic/img_features.py). + +## Uniqueness + +This metric gives each image a score that shows each image's uniqueness. +- A score of zero means that the image has duplicates in the dataset; on the other hand, a score close to one represents that image is quite unique. Among the duplicate images, we only give a non-zero score to a single image, and the rest will have a score of zero (for example, if there are five identical images, only four will have a score of zero). This way, these duplicate samples can be easily tagged and removed from the project. +- Images that are near duplicates of each other will be shown side by side. + +### Possible actions + +- **To delete duplicate images:** Set the quality filter to cover only zero values (that ends up with all the duplicate images), then use bulk tagging (for example, with a tag like `Duplicate`) to tag all images. +- **To mark duplicate images:** Near-duplicate images are shown side by side. Navigate through these images and mark whichever is of interest to you. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/semantic/image_singularity.py). + +## Width +Ranks images by the width of the image. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/heuristic/img_features.py). diff --git a/docs/active-quality-metrics/active-label-quality-metrics.md b/docs/active-quality-metrics/active-label-quality-metrics.md new file mode 100644 index 000000000..d5b77abe7 --- /dev/null +++ b/docs/active-quality-metrics/active-label-quality-metrics.md @@ -0,0 +1,294 @@ +--- +title: "Label quality metrics" +slug: "active-label-quality-metrics" +hidden: false +metadata: + title: "Label quality metrics" + description: "Enhance label quality with geometries in Encord Active. Optimize annotations for bounding boxes, polygons, and polylines." + image: + 0: "https://files.readme.io/9b2a03a-image_16.png" +createdAt: "2023-07-21T09:09:02.139Z" +updatedAt: "2023-08-11T13:41:50.217Z" +category: "6480a3981ed49107a7c6be36" +--- + +Label quality metrics operate on the geometries of objects like <>es, <>s and <>s. + +## Access Label Quality Metrics + +Label Quality Metrics are used for sorting data, filtering data, and data analytics. + + + +| Title | Metric Type | Ontology Type | +|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------|--------------------------------------------------------------------------------------------------------------------| +| [Absolute Area](#absolute-area) - Computes object size in amount of pixels. | `image` | `bounding box`, `polygon`, `rotatable bounding box` | +| [Aspect Ratio](#aspect-ratio) - Computes aspect ratios of objects. | `image` | `bounding box`, `polygon`, `rotatable bounding box` | +| [Blue Value](#blue-value) - Ranks annotated objects by how blue the average value of the object is. | `image` | `bounding box`, `polygon`, `rotatable bounding box` | +| [Brightness](#brightness) - Ranks annotated objects by their brightness. | `image` | `bounding box`, `polygon`, `rotatable bounding box` | +| [Border Proximity](#border-proximity) - Ranks annotations by how close they are to image borders. | `image` | `bounding box`, `point`, `polygon`, `polyline`, `rotatable bounding box`, `skeleton` | +| [Broken Object Tracks](#broken-object-tracks) - Identifies broken object tracks based on object overlaps. | `sequence`, `video` | `bounding box`, `polygon`, `rotatable bounding box` | +| [Brightness](#brightness) - Ranks annotated objects by their brightness. | `image` | `bounding box`, `polygon`, `rotatable bounding box` | +| [Confidence](#confidence) - The confidence that an object was annotated correctly. | `image` | `bounding box`, `polygon`, `rotatable bounding box` | +| [Contrast](#contrast) - Ranks annotated objects by their contrast. | `image` | `bounding box`, `polygon`, `rotatable bounding box` | +| [Classification Quality](#classification-quality) - Compares image classifications against similar images. | `image` | `radio` | +| [Green Value](#green-value) - Ranks annotated objects by how green the average value of the object is. | `image` | `bounding box`, `polygon`, `rotatable bounding box` | +| [Height](#height) - Ranks annotated objects by the height of the object. | `image` | `bounding box`, `polygon`, `rotatable bounding box` | +| [Inconsistent Object Class](#inconsistent-object-class) - Looks for overlapping objects with different classes (across frames). | `sequence`, `video` | `bounding box`, `polygon`, `rotatable bounding box` | +| [Inconsistent Track ID](#inconsistent-track-id) - Looks for overlapping objects with different track-ids (across frames). | `sequence`, `video` | `bounding box`, `polygon`, `rotatable bounding box` | +| [Label Duplicates](#label-duplicates) - Ranks labels by how likely they are to represent the same object. | `image` | `bounding box`, `polygon`, `rotatable bounding box` | +| [Missing Objects](#missing-objects) - Identifies missing objects based on object overlaps. | `sequence`, `video` | `bounding box`, `polygon`, `rotatable bounding box` | +| [Object Classification Quality](#object-classification-quality) - Compares object annotations against similar image crops. | `image` | `bounding box`, `polygon`, `rotatable bounding box` | +| [Occlusion Risk](#occlusion-risk) - Tracks objects and detect outliers in videos. | `sequence`, `video` | `bounding box`, `rotatable bounding box` | +| [Polygon Shape Anomaly](#polygon-shape-anomaly) - Calculates potential outliers by polygon shape. | `image` | `polygon` | +| [Randomize Objects](#randomize-objects) - Assigns a random value between 0 and 1 to objects. | `image` | `bounding box`, `polygon`, `rotatable bounding box` | +| [Red Value](#red-value) - Ranks annotated objects by how red the average value of the object is. | `image` | `bounding box`, `polygon`, `rotatable bounding box` | +| [Relative Area](#relative-area) - Computes object size as a percentage of total image size. | `image` | `bounding box`, `polygon`, `rotatable bounding box` | +| [Sharpness](#sharpness) - Ranks annotated objects by their sharpness. | `image` | `bounding box`, `polygon`, `rotatable bounding box` | +| [Width](#width) - Ranks annotated objects by the width of the object. | `image` | `bounding box`, `polygon`, `rotatable bounding box` | + +**To access Label Quality Metrics for Explorer:** + +1. Click a Project from the _Active_ home page. + +2. Click **Explorer**. + +3. Click **Labels**. + +4. Sort and filter the tabular data. + +5. Click the plot diagram icon. + +6. Sort and filter the embedding plot data. + +**To access Label Quality Metrics for analytics:** + +1. Click a Project from the _Active_ home page. + +2. Click **Analytics**. + +3. Click **Annotations**. + +4. Select the quality metric you want to view from the _2D Metrics view_ or _Metrics Distribution_ graphs. + +## Absolute Area + +Computes object size in amount of pixels. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/geometric/object_size.py). + +## Aspect Ratio + +Computes aspect ratios (**width/height**) of objects. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/geometric/object_size.py). + +## Blue Value + +Ranks annotated objects by how blue the average value of the object is. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/heuristic/img_features.py). + +## Brightness +Ranks annotated objects by their brightness. Brightness is computed as the average (normalized) pixel value across each object. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/heuristic/img_features.py). + +## Broken Object Tracks + +Identifies broken object tracks by comparing object overlaps based on a running window. + +**Example:** + +If objects of the same class overlap in three consecutive frames (_i-1_, _i_, and _i+1_) but do not share object hash, the frames are flagged as a potentially broken track. + +![Broken Object Tracks example](https://storage.googleapis.com/docs-media.encord.com/static/img/active/filters-labels-metrics/broken-object-tracks.PNG) + +`CAT:2` is marked as potentially having a wrong track id. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/heuristic/missing_objects_and_wrong_tracks.py). + +## Border Proximity + +This metric ranks annotations by how close they are to image borders. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/geometric/image_border_closeness.py). + +## Confidence + +The confidence score (α) is a measure of a machine learning model's certainty that a given prediction is accurate. The higher the confidence score, the more certain a model is about its prediction. + +Manual labels are always assigned α = 100%, while label predictions created using models and automated methods such as interpolation have a confidence score below 100% (α < 100%). + +Values for this metric are calculated as labels are fetched from Annotate. + +> ℹ️ Note +> While arguably not making much sense when annotated by a human, this value is very important for objects that were automatically labeled. + +## Contrast + +Ranks annotated objects by their contrast. Contrast is computed as the standard deviation of the pixel values. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/heuristic/img_features.py). + +## Classification Quality + +This metric creates embeddings from images. Then, these embeddings are used to build nearest neighbor graph. Similar embeddings' classifications are compared against each other. + +We calculate the embeddings of each image, (for example, change 3xNxM dimensional images to 1xD dimensional vectors using a neural network architecture). Then for each embedding (or image) we look at the **50** nearest neighbors and compare its annotation with the neighboring annotations. + +For example, let's say the current image is annotated as **A** but only _20_ out of _50_ of its neighbors are also annotated as **A**. The rest are annotated differently. That gives us a score of _20/50_ = _0.4_. A score of 1 means that the annotation is very reliable because very similar images are annotated the same. As the score gets closer to the zero, the annotation is not reliable. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/semantic/img_classification_quality.py). + +## Green Value +Ranks annotated objects by how green the average value of the object is. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/heuristic/img_features.py). + +## Height +Ranks annotated objects by the height of the object. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/heuristic/img_features.py). + +## Inconsistent Object Class + +This algorithm looks for overlapping objects in consecutive frames that have different classes. + +**Example:** + +![Inconsistent Object Class example](https://storage.googleapis.com/docs-media.encord.com/static/img/active/filters-labels-metrics/inconsistent-object-track.PNG) + +`Dog:1` is flagged as potentially the wrong class, because `Dog:1` overlaps with `CAT:1`. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/heuristic/high_iou_changing_classes.py). + +## Inconsistent Track ID + +This algorithm looks for overlapping objects with different track-ids. Overlapping objects with different track-ids are flagged as potential inconsistencies in tracks. + +**Example:** + +![Inconsistent Track ID example](https://storage.googleapis.com/docs-media.encord.com/static/img/active/filters-labels-metrics/inconsistent-track-id.PNG) + +`Cat:2` is flagged as potentially having a broken track, because track ids `1` and `2` do not match. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/heuristic/high_iou_changing_classes.py). + +## Label Duplicates + +Ranks labels by how likely they are to represent the same object. + +> [Jaccard similarity coefficient](https://en.wikipedia.org/wiki/Jaccard_index) is used to measure closeness of two annotations. + +**Example 1:** + +An annotator accidentally labels the same thing in a frame twice. + +An annotator labeled the same orange twice in a frame. Look carefully at both images and you can see that there are two slightly different labels around the orange. + +![Duplicate labels example 1](https://storage.googleapis.com/docs-media.encord.com/static/img/active/filters-labels-metrics/label_duplicates_01.png) + + +**Example 2:** + +Sometimes the same type of things in a frame are very close to each other and the annotator does not know if the things should be annotated separately or as a group so they do both. Or perhaps the annotator labels all the things in a group and sometimes they label each individual thing, or they label the group and each individual thing in the group. + +An annotator labeled a group of oranges and then labeled individual oranges in the group. + +![Duplicate labels example 2](https://storage.googleapis.com/docs-media.encord.com/static/img/active/filters-labels-metrics/label_duplicates_02.png) + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/geometric/annotation_duplicates.py) + +## Missing Objects + +Identifies missing objects by comparing object overlaps based on a running window. + +**Example:** + +If an intermediate frame (frame _i_) does not include an object in the same region, as the two surrounding frames (_i-1_ and _i+1_), the frame is flagged. + +![Missing Objects example](https://storage.googleapis.com/docs-media.encord.com/static/img/active/filters-labels-metrics/missing-objects.PNG) + +Frame _i_ is flagged as potentially missing an object. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/heuristic/missing_objects_and_wrong_tracks.py). + +## Object Classification Quality + +This metric transforms polygons into bounding boxes and an embedding for each <> is extracted. Then, these embeddings are compared with their neighbors. If the neighbors are annotated/classified differently, a low score is given to the classification. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/semantic/img_object_quality.py). + +## Occlusion Risk + +This metric collects information related to object size and aspect ratio for each video and finds outliers among them. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/geometric/occlusion_detection_video.py). + +## Polygon Shape Anomaly + +Computes the Euclidean distance between the polygons' [Hu moments](https://en.wikipedia.org/wiki/Image_moment) for each class and the prototypical class moments. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/geometric/_hu_static.py). + + + +## Red Value +Ranks annotated objects by how red the average value of the object is. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/heuristic/img_features.py). + +## Relative Area + +Computes object size as a percentage of total image size. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/geometric/object_size.py). + +## Randomize Objects + +Uses a uniform distribution to generate a value between 0 and 1 to each object + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/heuristic/random.py). + +## Sharpness +Ranks annotated objects by their sharpness. + +Sharpness is computed by applying a Laplacian filter to each annotated object and computing the variance of the output. In short, the score computes "the amount of edges" in each annotated object. + +```python +score = cv2.Laplacian(image, cv2.CV_64F).var() +``` + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/heuristic/img_features.py). + +## Width +Ranks annotated objects by the width of the object. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/heuristic/img_features.py). \ No newline at end of file diff --git a/docs/active-quality-metrics/active-model-quality-metrics.md b/docs/active-quality-metrics/active-model-quality-metrics.md new file mode 100644 index 000000000..d23d2d121 --- /dev/null +++ b/docs/active-quality-metrics/active-model-quality-metrics.md @@ -0,0 +1,106 @@ +--- +title: "Model quality metrics" +slug: "active-model-quality-metrics" +hidden: false +metadata: + title: "Model quality metrics" + description: "Assess model quality with Encord Active's metrics. Acquire insights through acquisition functions: Entropy, Least Confidence, Margin, Variance, Mean Object Confidence. Optimize model evaluation." + image: + 0: "https://files.readme.io/4ee3d9b-image_16.png" +createdAt: "2023-07-21T09:09:02.307Z" +updatedAt: "2023-08-11T10:12:36.522Z" +category: "6480a3981ed49107a7c6be36" +--- +Model quality metrics help you evaluate your data and labels based on a trained model and imported model predictions. + +## Acquisition functions + +Acquisition functions are a special type of model quality metric, primarily used in active learning to score data samples according to how informative they are for the model and enable smart labeling of unannotated data. + +| Title | Metric Type | Data Type | +|----------------------------------------------------------------------------------------------------|-------------|-----------| +| [Entropy](#entropy) - Rank images by their entropy. | `image` | | +| [LeastConfidence](#least-confidence) - Rank images by their least <>. | `image` | | +| [Margin](#margin) - Rank images by their margin score. | `image` | | +| [Variance](#variance) - Rank images by their variance. | `image` | | +| [MeanObjectScore](#mean-object-confidence) - Rank images by their average object score | `image` | `object` | + + +### Entropy + +Rank images by their entropy. + +It can be employed to define a heuristic that measures a model’s uncertainty about the classes in an image using the average of the entropies of the model-predicted class probabilities in the image. Like before, the higher the image's score, the more “confused” the model is. As a result, data samples with higher entropy score should be offered for annotation. + +##### Metric details + +The mathematical formula of entropy is: + +
+ +
+ +In information theory, the entropy of a random variable is the average level of “information”, “surprise”, or “uncertainty” inherent to the variable's possible outcomes. The higher the entropy, the more “uncertain” the variable outcome. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/acquisition_metrics/acquisition_functions.py). + + +### Least Confidence + +Rank images by their least <>. Least confidence takes the difference between 1 (100% confidence) and the most confidently predicted label for each item. It's useful to convert the uncertainty scores to a 0–1 range, where 1 is the most uncertain score. + +It can be employed to define a heuristic that measures a model’s uncertainty about the classes in an image using the average of the **LC** score of the model-predicted class probabilities in the image. Like before, the higher the image's score, the more “confused” the model is. As a result, data samples with higher LC score should be offered for annotation. + +##### Metric details + +The mathematical formula of the LC score of a model's prediction is: + +
+ +
+ +The _Least confidence_ (LC) score of a model's prediction is the difference between 1 (100% confidence) and its most confidently predicted class label. The higher the LC score, the more “uncertain” the prediction. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/acquisition_metrics/acquisition_functions.py). + + +### Margin + +Rank images by their margin score. + +It can be employed to define a heuristic that measures a model’s uncertainty about the classes in an image using the average of the margin score of the model-predicted class probabilities in the image. Like before, the lower the image's score, the more “confused” the model is. As a result, data samples with lower margin score should be offered for annotation. + +##### Metric details + +Margin score of a model's prediction is the difference between the two classes with the highest probabilities. The lower the margin score, the more “uncertain” the prediction. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/acquisition_metrics/acquisition_functions.py). + + +### Variance + +Rank images by their variance. + +It can be employed to define a heuristic that measures a model’s uncertainty about the classes in an image using the average of the variance of the model-predicted class probabilities in the image. Like before, the lower the image's score, the more “confused” the model is. As a result, data samples with lower variance score should be offered for annotation. + +##### Metric details + +The mathematical formula of variance of a data set is: + +
+ +
+ +Variance is a measure of dispersion that takes into account the spread of all data points in a data set. The variance is the mean squared difference between each data point and the centre of the distribution measured by the mean. The lower the variance, the more “clustered” the data points. + +Implementation on [GitHub](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/acquisition_metrics/acquisition_functions.py). + +### Mean object confidence + +This method ranks images based on the mean score of their predicted objects, applicable specifically for object-level predictions such as bounding-box or segmentation. + +A lower score indicates that the model's predictions for an image lack certainty. This measurement is particularly effective in scenarios where the presence of at least one object is anticipated in every image. + +##### Metric details + +The metric calculates the maximum <> for each class within every object-level prediction, averages these across all object predictions within an image, and assigns this average as the image's score. In the absence of any predictions for an image, a score of zero is assigned. \ No newline at end of file diff --git a/docs/active-quality-metrics/active-write-custom-quality-metrics.md b/docs/active-quality-metrics/active-write-custom-quality-metrics.md new file mode 100644 index 000000000..4c6449d38 --- /dev/null +++ b/docs/active-quality-metrics/active-write-custom-quality-metrics.md @@ -0,0 +1,115 @@ +--- +title: "Writing custom quality metrics" +slug: "active-write-custom-quality-metrics" +hidden: false +metadata: + title: "Writing custom quality metrics" + description: "Guide to creating custom quality metrics in Encord Active: Learn to write unique geometric, semantic, and heuristic metrics." + image: + 0: "https://files.readme.io/aa7c5c7-image_16.png" +createdAt: "2023-07-14T16:05:36.364Z" +updatedAt: "2023-08-11T13:59:03.786Z" +category: "6480a3981ed49107a7c6be36" +--- +**Guide on how to write your own Custom Quality Metrics** + + + +Create a new python file in the `` directory and use the template provided in `libs/encord_active/metrics/example.py`. The subdirectory within `libs/encord_active/metrics` is dictated by what information the metric employs: + +- **Geometric:** Metrics related to the geometric properties of annotations. + This includes size, shape, location etc. +- **Semantic:** Metrics based on the _contents_ of some image, video or annotation. + This includes embedding distances, image uncertainties etc. +- **Heuristic:** Any other metrics. For example, brightness, sharpness, object counts, etc. + +You can use the following template to get started with writing your own metric. Your implementation should call `writer.write(, )` for every object in the iterator **OR** use `writer.write()` for every <> in the iterator. + +```python +from loguru import logger + +from encord_active.lib.common.iterator import Iterator +from encord_active.lib.metrics.metric import Metric +from encord_active.lib.metrics.types import AnnotationType, DataType, MetricType +from encord_active.lib.metrics.writer import CSVMetricWriter + +logger = logger.opt(colors=True) + + +class ExampleMetric(Metric): + def __init__(self): + super().__init__( + title="Example Title", + short_description="Assigns same value and description to all objects.", + long_description=r"""For long descriptions, you can use Markdown to _format_ the text. + +For example, you can make a [hyperlink](https://memegenerator.net/instance/74454868/europe-its-the-final-markdown) +to the awesome paper that proposed the method. + +Or use math to better explain such method: +$$h_{\lambda}(x) = \frac{1}{x^\intercal x}$$ +""", + doc_url='link/to/documentation', # This is optional, if a link is given, it can be accessed from the app + metric_type=MetricType.HEURISTIC, + data_type=DataType.IMAGE, + annotation_type=[AnnotationType.OBJECT.BOUNDING_BOX, AnnotationType.OBJECT.ROTATABLE_BOUNDING_BOX, AnnotationType.OBJECT.POLYGON], + ) + + def execute(self, iterator: Iterator, writer: CSVMetricWriter): + valid_annotation_types = {annotation_type.value for annotation_type in self.metadata.annotation_type} + + logger.info("My custom logging") + # Preprocessing happens here. + # You can build/load databases of embeddings, compute statistics, etc + for data_unit, image in iterator.iterate(desc="Progress bar description"): + # Frame level score (data quality) + writer.write(1337, description="Your description of the score [can be omitted]") + for obj in data_unit["labels"].get("objects", []): + # Label (object/classification) level score (label/model prediction quality) + if not obj["shape"] in valid_annotation_types: + continue + + # This is where you do the actual inference. + # Some convenient properties associated with the current data. + # ``iterator.label_hash`` the label hash of the current data unit + # ``iterator.du_hash`` the hash of the current data unit + # ``iterator.frame`` the frame of the current data unit + # ``iterator.num_frames`` the total number of frames in the label row. + + # Do your thing (inference) + # Then + writer.write(42, labels=obj, description="Your description of the score [can be omitted]") + +``` + +Run the following commands: + +```shell + +encord-active metric add // for adding the custom metric to a project +encord-active metric run // to execute a metric against a project + +``` + +Before running your own custom metric, make sure that you have `project_meta.yaml` file in the project data folder. + +To run your metric from the root directory, use: + +```shell +# within venv +python your_metric_file.py /path/to/your/data/dir +``` + +You can check the generated metric file in your `/metrics`, its name should be `_example_title.csv` . When you've run your metric, you can visualize your results by running: + +```shell +# within venv +encord-active start +``` + +Now, you can improve your data/labels/model by choosing your own custom metric in the app. \ No newline at end of file diff --git a/docs/active-tutorials/active-diversity-sampling-on-unlabeled-data.md b/docs/active-tutorials/active-diversity-sampling-on-unlabeled-data.md new file mode 100644 index 000000000..4fbad55ac --- /dev/null +++ b/docs/active-tutorials/active-diversity-sampling-on-unlabeled-data.md @@ -0,0 +1,66 @@ +--- +title: "Running Diversity Based Acquisition Function on Unlabeled Data" +slug: "active-diversity-sampling-on-unlabeled-data" +hidden: false +metadata: + title: "Running Diversity Based Acquisition Function on Unlabeled Data" + description: "Tutorial: Run Clustering-based Diversity Sampling to Rank Images. Enhance acquisition function. Boost image ranking with diversity sampling." + image: + 0: "https://files.readme.io/3889a75-image_16.png" +createdAt: "2023-07-11T16:27:41.992Z" +updatedAt: "2023-08-09T12:34:28.153Z" +category: "65a71bbfea7a3f005192d1a7" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +In this tutorial, you will see how to run a clustering-based diversity sampling acquisition function to rank the images. + + +> ℹ️ Note +> This tutorial assumes that you have [installed](https://docs.encord.com/docs/active-oss-install) `encord-active`. + +## 1. Inference on the unlabeled data + +First, terminate any running Encord Active app before running any code. Then, get the root directory path of the encord-active project and copy it to `data_dir` variable below. You will use [Image Diversity](https://docs.encord.com/docs/active-data-quality-metrics#image-diversity) metric to rank the images. Image Diversity metric simply clusters the dataset according to the number of classes in the <> and selects equal number of samples from each cluster. Your project may consist of both labeled and unlabeled examples and you may want to run this acquisition function only on the unlabeled data; therefore, you will set `skip_labeled_data` to `True`. + +```python +from pathlib import Path +from encord_active.lib.metrics.semantic.image_diversity import ImageDiversity +from encord_active.lib.metrics.execute import execute_metrics + +data_dir = Path("/path/to/encord-active/project") +acquisition_func = ImageDiversity() +execute_metrics([acquisition_func], data_dir=data_dir, use_cache_only=True, skip_labeled_data=True) +``` + +## 2. Refresh metric files + +After executing the acquisition function. It should output two new files in the metrics folder of the root project folder. We need to update the metric information in the project to reflect the changes in the UI: + +```python +from encord_active.lib.metrics.io import get_metric_metadata +from encord_active.lib.metrics.metadata import fetch_metrics_meta, update_metrics_meta +from encord_active.lib.project.project_file_structure import ProjectFileStructure + +project_fs = ProjectFileStructure(data_dir) +metrics_meta = fetch_metrics_meta(project_fs) +metrics_meta[acquisition_func.metadata.title]= get_metric_metadata(acquisition_func) +update_metrics_meta(project_fs, metrics_meta) +project_fs.db.unlink(missing_ok=True) +``` + +Now, open the encord-active app using the following CLI command in the project or its root folder: + +```shell +encord-active start +``` + +Go to Data Quality -> Explorer, and choose Image Diversity from the metric drop-down menu. You will see the examples sorted according to the image diversity function. From now on, you can select the first N samples and: + +1. create a new project to label based off these samples. +2. export the selected samples using Actions tab and use them in you own label annotation pipeline. \ No newline at end of file diff --git a/docs/active-tutorials/active-easy-active-learning-mnist.md b/docs/active-tutorials/active-easy-active-learning-mnist.md new file mode 100644 index 000000000..ad82938d4 --- /dev/null +++ b/docs/active-tutorials/active-easy-active-learning-mnist.md @@ -0,0 +1,136 @@ +--- +title: "Easy active learning on MNIST" +slug: "active-easy-active-learning-mnist" +hidden: false +metadata: + title: "Easy active learning on MNIST" + description: "Tutorial: Integrate Random Forest in Encord Active for optimal data labeling. MNIST dataset guide. Train, rank, and sample efficiently." + image: + 0: "https://files.readme.io/bcab15c-image_16.png" +createdAt: "2023-07-11T16:27:42.048Z" +updatedAt: "2023-08-09T12:36:27.203Z" +category: "65a71bbfea7a3f005192d1a7" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +In this tutorial, you will see how to plug a Random Forest model in to Encord Active and use it to select the best data to label next on the MNIST dataset. You will go through the following steps: + +1. [Download the MNIST sandbox project](#1-download-the-mnist-sandbox-project). +2. [Train the model with labeled data from the project](#2-train-the-model-with-labeled-data-from-the-project). +3. [Run the acquisition function powered by the model to rank the project data](#3-run-the-acquisition-function-powered-by-the-model-to-rank-the-project-data). +4. [Rank and sample the data to label next](#4-rank-and-sample-the-data-to-label-next). + + +> ℹ️ Note +> This tutorial assumes that you have [installed](https://docs.encord.com/docs/active-oss-install) `encord-active`. + + +## 1. Download the MNIST sandbox project + +Download the data by running the following CLI command: + +```shell +encord-active download --project-name "[open-source][test]-mnist-dataset" +``` + +When the process is done, the MNIST test dataset is ready to be used. + +From now on, the tutorial is hands-on with python code, so we need a reference to the folder where the project was downloaded. + +```python +from pathlib import Path +from encord_active.lib.project.project_file_structure import ProjectFileStructure + +project_path = Path("/path/to/project/directory") +project_fs = ProjectFileStructure(project_path) +``` + +## 2. Train the model with labeled data from the project + +It's a common scenario to start spinning the active learning cycle using a model trained with some initial data. +Let's select [sklearn.ensemble.RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) as the base model. + +```python +from sklearn.ensemble import RandomForestClassifier + +forest = RandomForestClassifier(n_estimators = 500) +``` + +We need to wrap the model with a `BaseModelWrapper` in order to interface the model's behaviour with the one expected by the acquisition functions. +The two main functionalities wrapped around the model are: + +1. Prepare the input data to be ingested by the model (`prepare_data(..)`), and +2. Be able to obtain predicted probabilities of data samples (`_predict_proba(..)`). + +Encord Active has a built-in wrapper for _scikit-learn_ classifiers (`SKLearnModelWrapper`), so let's use it. + +```python +from typing import List +import numpy as np +from PIL import Image +from encord_active.lib.common.active_learning import get_data, get_data_hashes_from_project +from encord_active.lib.metrics.acquisition_functions import SKLearnModelWrapper + +def transform_image_data(images: List[Image]) -> List[np.ndarray]: + return [np.asarray(image).flatten() / 255 for image in images] + +w_model = SKLearnModelWrapper(forest) + +data_hashes = get_data_hashes_from_project(project_fs, subset_size=5000) +X, y = get_data(project_fs, data_hashes, class_name="digit") +X = transform_image_data(X) + +w_model._model.fit(X, y) +``` + +## 3. Run the acquisition function powered by the model to rank the project data + +Encord Active provides multiple acquisition functions ready to be used with the wrapped model. + +We use an acquisition function called `Entropy` that measures the average level of “uncertainty” in the model's predicted probabilities. +The higher the entropy, the more “uncertain” the model. + +```python +from encord_active.lib.common.active_learning import get_metric_results +from encord_active.lib.metrics.acquisition_functions import Entropy +from encord_active.lib.metrics.execute import execute_metrics + +acq_func = Entropy(w_model) + +execute_metrics([acq_func], data_dir=project_fs.project_dir, use_cache_only=True) + +acq_func_results = get_metric_results(project_fs, acq_func) +``` + +> ℹ️ Note +> We use `Entropy` in this tutorial but Encord Active has multiple acquisition functions that can be inspected [here](https://github.com/encord-team/encord-active/blob/main/src/encord_active/lib/metrics/acquisition_metrics/acquisition_functions.py). + + +## 4. Rank and sample the data to label next + +As soon as the acquisition function finishes its execution through all the data samples, we proceed to rank them. + +```python +from encord_active.lib.common.active_learning import get_n_best_ranked_data_samples + +batch_size_to_label = 100 # amount of data samples selected to label next +data_to_label_next, scores = get_n_best_ranked_data_samples( + acq_func_results, + batch_size_to_label, + rank_by="desc", + exclude_data_hashes=data_hashes) +``` + +The output variable `data_to_label_next` contains the hashes of the best ranked data samples. +Now you can proceed to label these samples and enable your own active learning pipeline. + +## Summary + +This section concludes the end-to-end example on easy active learning on the MNIST dataset using Random Forest. We covered training a Random Forest model, wrapping the model to match Encord Active requirements on models, selecting and running acquisition functions over the data, and choosing the best data to label next. + +Now, you should have a good idea about how Encord Active can be used to run your active learning pipeline while enabling smart selection of the data for labeling. \ No newline at end of file diff --git a/docs/active-tutorials/active-touring-coco-dataset.md b/docs/active-tutorials/active-touring-coco-dataset.md new file mode 100644 index 000000000..486433355 --- /dev/null +++ b/docs/active-tutorials/active-touring-coco-dataset.md @@ -0,0 +1,242 @@ +--- +title: "Touring the COCO Sandbox dataset" +slug: "active-touring-coco-dataset" +hidden: false +metadata: + title: "Touring the COCO Sandbox dataset" + description: "Explore Encord Active with COCO Sandbox dataset: Download, browse, flag errors, analyze metrics for model enhancement" + image: + 0: "https://files.readme.io/d3d8971-image_16.png" +category: "65a71bbfea7a3f005192d1a7" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +In this tutorial, you will see some cool features of Encord Active based on the Coco sandbox dataset. +You will go through the following steps: + +1. [Downloading the dataset](#1-downloading-the-dataset) +2. [Opening Encord Active in your browser](#2-opening-encord-active-in-your-browser) +3. [Finding and flagging label errors](#3-finding-and-flagging-label-errors) +4. [Figuring out what metrics influence model performance](#4-figuring-out-what-metrics-influence-model-performance) + +> ℹ️ Note +> This tutorial assumes that you have [installed](https://docs.encord.com/docs/active-oss-install) `encord-active`. + +## 1. Downloading the dataset + +Download the data by running this command + +```shell +encord-active download +``` + +The script asks you to choose a project, navigate the options with and and hit enter. + +Now `encord-active` will download your data. + +## 2. Opening Encord Active in your browser + +When the download process is done, follow the printed instructions to launch the app with the [start][ea-cli-start] CLI command: + +```shell +cd /path/to/downloaded/project +encord-active start +``` + +> ℹ️ Note +> If the terminal seems stuck and nothing happens, try visiting http://localhost:8000 in your browser. + + +## 3. Finding and flagging label errors + +You will carry out this process in two steps: + +1. [Identifying metrics with label errors](#identifying-metrics-with-label-errors) +2. [Tagging label errors](#tagging-label-errors) + + + +### Identifying metrics with label errors + +1. Open Encord Active (from the web-app or from a local installation): + + ![Encord Active Landing page](https://storage.googleapis.com/docs-media.encord.com/static/img/tutorials/active_landing-page-new.png) + + ![Encord Active Landing (Quickstart) page](https://storage.googleapis.com/docs-media.encord.com/static/img/tutorials/active_landing-page.png) + +2. Select a project. + + > ℹ️ Note + > If a project does not exist in Encord Active, create one (in the web-app) or import one. + +Go to the _Summary_ > _Annotation_ page. +The page should look like this: + +![Annotation Quality Summary Page](https://storage.googleapis.com/docs-media.encord.com/static/img/tutorials/active-quickstart-annotations-quality-summary.png) + +On the Summary page, you will find all the outliers that Encord Active automatically found based on all the [metrics](https://docs.encord.com/docs/active-quality-metrics) that were computed for the labels. + +Go to the _Explorer_ page. + +Select "Annotation Duplicates" and scroll down the page. + +The page should look similar to this: + +![Annotation Duplicates](https://storage.googleapis.com/docs-media.encord.com/static/img/tutorials/active-annotation-duplicate.png) + +The page shows how this metric was computed, how many outliers were found and some of the most severe outliers. + +If you hover the mouse over the image with the orange, you can click the expand button as indicated here: + +![Expand the image](https://storage.googleapis.com/docs-media.encord.com/static/img/tutorials/active-expand-image-oranges.png) + +Clicking the button provides a larger view of the images and detailed information about the image. + +
+ Duplicated annotations +

Notice the duplicated annotations.

+
+ +Hit Esc to exit the full screen view. + +If you take a closer look at the annotations in the other displayed images, you will notice the same issue. + +
+
+ Duplicated annotations +
+
+ Duplicated annotations +
+
+ Duplicated annotations +
+
+ +> 👍 Tip +> You can find other sources of label errors by inspecting the other tabs. Good places to start could be the "Absolute Area" and "Aspect Ratio" label metrics. + + +### Tagging label errors + +To tag the images with the identified label errors, select the images and click the TAG button and provide the name for the new tag. + +![Add new tag](https://storage.googleapis.com/docs-media.encord.com/static/img/tutorials/active-label-duplicate-tag-add.png) + + + +## 4. Figuring out what metrics influence model performance + +Encord Active also allows you to figure out which metrics influence your model performance the most. +In this section, we'll go through a subset of those: + +- [The high level view of model performance](#the-high-level-view-of-model-performance). +- [Inspecting model performance for a specific metric](#inspecting-model-performance-for-a-specific-metric). + +### The high level view of model performance + +#### mAP and mAR scores + +The first section displays the _mean Average Precision (mAP)_, _mean Average Recall (mAR)_, _true positive (TP)_, _false positive (FP)_, and _false negative (FN)_ of your model based on the IOU threshold set in the top of the page. + +![mAP and mAR scored](https://storage.googleapis.com/docs-media.encord.com/static/img/tutorials/active-map-mar-fragment.png) + +Dragging the IOU slider changes the scores. +You can also choose to see the aggregate score for certain classes by selecting them in the drop-down to the left. + +#### Metric importance and correlation + +Scrolling down the _Summary_ page, the importance and correlations of your model performance display as functions of metrics. + +![Metric Importance](https://storage.googleapis.com/docs-media.encord.com/static/img/tutorials/active-importance-and-correlations.png) +![Metric Correlation](https://storage.googleapis.com/docs-media.encord.com/static/img/tutorials/active-importance-and-correlations_02.png) + +From this overview, you can see that, for example "Confidence" has a high importance for the model performance. + + + +Next, we can jump to the _Metric Performance_ page and take a closer look at exactly how the model performance is affected by this metric. However, we want to show you the rest of this page prior to doing this. + +You can skip straight ahead to the [Inspecting Model Performance for a Specific Metric](#inspecting-model-performance-for-a-specific-metric) if you are too curious to wait. + +Before jumping into specific metrics, we want to show you the decomposition of the model performance based on individual classes. Scrolling down the _Summary_ page, the Per Class average precision, average recall, and precision recall curve scores for each individual class appears. + +### Inspecting model performance for a specific metric + +Using the _Metric Performance_ and _Explorer_ pages you can see how specific metrics affect the model performance: + +1. Go to _Predictions_ > _Metric Performance_. +2. Select the "Confidence" metric from the _Metric_ drop-down list. + +![Performance by Metric page](https://storage.googleapis.com/docs-media.encord.com/static/img/tutorials/active-performance-by-object-area-coco.png) + +The plot shows the precision and the false negative rate as a function of the selected metric; "Confidence" in this case. + +3. Go to _Predictions_ > _Explorer_. +4. Filter the data based on a data or prediction metric and the prediction outcome. + +> ℹ️ Note +> Queries are only available in the web-app version of Active. + +## Summary + +This concludes the tour around Encord Active with the COCO Sandbox dataset. By now, you should have a good idea about how you can improve both your data, labels, and models by the insights you get from Encord Active. + +## Next steps + +- We've only covered each page in the app briefly in this tutorial. +- To learn more about concrete actionable steps you can take to improve your model performance, we suggest that you have a look at the [Workflow section](https://docs.encord.com/docs/annotate-workflows-and-templates). +- If you want to learn more about the existing metrics or want to build your own metric function, the [Quality Metrics section](https://docs.encord.com/docs/active-quality-metrics) is where you should continue reading. +- Finally, we have also included some in-depth descriptions the [Command Line Interface](https://docs.encord.com/docs/active-cli). + + +[ea-cli-start]: https://docs.encord.com/docs/active-cli#start \ No newline at end of file diff --git a/docs/active-tutorials/active-touring-quickstart-dataset.md b/docs/active-tutorials/active-touring-quickstart-dataset.md new file mode 100644 index 000000000..b3f557e4a --- /dev/null +++ b/docs/active-tutorials/active-touring-quickstart-dataset.md @@ -0,0 +1,203 @@ +--- +title: "Touring the Quickstart dataset" +slug: "active-touring-quickstart-dataset" +hidden: false +metadata: + title: "Touring the Quickstart dataset" + description: "Comprehensive tutorials: Encord Active use cases - MNIST active learning, COCO & Quickstart dataset tours, diversity sampling." + image: + 0: "https://files.readme.io/af10c8f-image_16.png" +createdAt: "2023-07-11T16:27:42.034Z" +updatedAt: "2023-08-11T13:50:23.892Z" +category: "65a71bbfea7a3f005192d1a7" +--- + +[block:html] +{ + "html": "\n\n\n \n \n Aligned Image with Page Break\n \n\n\n \"Your\n
\n\n" +} +[/block] + +In this tutorial, we will dive into the quickstart dataset and show you some cool features of Encord Active. You will go through the following steps: + +1. [Opening the quickstart dataset](#1-opening-the-quickstart-dataset). +2. [Finding and tagging outliers](#2-finding-and-tagging-outliers). +3. [Figuring out what metrics influence model performance](#3-figuring-out-what-metrics-influence-model-performance). + + +> ℹ️ Note +> This tutorial assumes that you have [installed](https://docs.encord.com/docs/active-oss-install) `encord-active`. + + +## 1. Opening the quickstart dataset + +To open the quickstart dataset run: + +```shell +encord-active quickstart +``` + +Encord Active downloads the dataset and opens the UI in your browser. + +![Encord Active Landing page](https://storage.googleapis.com/docs-media.encord.com/static/img/tutorials/active_landing-page.png) + +> ℹ️ Note +> If the terminal just seems to get stuck and nothing happens, try visiting http://localhost:8000 in your browser. + + +### About the dataset + +The Quickstart dataset contains images and labels for 200 random samples from the [COCO 2017 validation set](https://cocodataset.org/#download) with a pre-trained [MASK R-CNN RESNET50 FPN V2](https://arxiv.org/abs/1703.06870) model. + +## 2. Finding and tagging outliers + +First, we will find and tag image outliers. + +### Identifying metrics with outliers + +When you open Encord Active, you will start on the landing page. + +Click the `quickstart` project. The "Summary" page for the project appears: + +![Data Quality Summary Page](https://storage.googleapis.com/docs-media.encord.com/static/img/tutorials/active-quickstart-data-quality-summary.png) + +On the Summary > Data page, you can see all the data outliers that Encord Active automatically found based on all the [Quality Metrics](https://docs.encord.com/docs/active-quality-metrics) that were computed for the images. + +> 👍 Tip +> You check the metric of distribution for outliers using the Metrics Distribution graph and selecting the outlier from the drop-down. Good places to start could be the "Brightness" and "Sharpness" entries. + +On the Summary > Annotations page, you can see annotation outliers that Encord Active automatically found. + +![Annotation Quality Summary Page](https://storage.googleapis.com/docs-media.encord.com/static/img/tutorials/active-quickstart-annotations-quality-summary.png) + +### Tagging outliers + +To tag an image identified as an outlier, go to the Explorer page and select one or more images. The *TAG* button is enabled. Click the *TAG* button and specify a new tag. + +Once the tag is created, you can add the tag to the images by selecting images and clicking the *TAG* button and selecting a tag from the list of tags. + +![Tag an image](https://storage.googleapis.com/docs-media.encord.com/static/img/tutorials/active-quickstart-data-quality-tagging.png) + +> 👍 Tip +> Use the Explorer scatter plot graph, the filter feature (click the *FILTERS* button repeatedly to add multiple filters including tags or labels) and queries (queries are only available from the web-app) to specify the images you want to tag. +> ![Filters](https://storage.googleapis.com/docs-media.encord.com/static/img/tutorials/active-quickstart-data-quality-tagging_02.png) + +Once you are satisfied with your tagged subset of images, you can move on to exporting. + +> ℹ️ Note +> Multiple subsets can be created in the web-app. + + + +## 3. Figuring out what metrics influence model performance + +Encord Active also allows you to figure out which metrics influence your model performance the most. +In this section, we'll go through a subset of those: + +- [The high level view of model performance](#the-high-level-view-of-model-performance). +- [Inspecting model performance for a specific metric](#inspecting-model-performance-for-a-specific-metric). + +### The high level view of model performance + +#### mAP and mAR scores + +First, navigate to the _Predictions_ > _Summary_ page where you find multiple insights into your model performance. + +The first section displays the _mean Average Precision (mAP)_, _mean Average Recall (mAR)_, _true positive (TP)_, _false positive (FP)_, and _false negative (FN)_ of your model based on the IOU threshold set in the top of the page. + +![mAP and mAR scored](https://storage.googleapis.com/docs-media.encord.com/static/img/tutorials/active-quickstart-metrics.png) + +Dragging the _IOU_ slider changes the scores. +You can also choose to see the aggregate score for certain classes by selecting them in the drop-down to the left. + +#### Metric importance and correlation + +Scrolling down the _Summary_ page, the importances and correlations of your model performance display as functions of metrics. + +![Metric Importances](https://storage.googleapis.com/docs-media.encord.com/static/img/tutorials/active-quickstart-metrics-importance.png) + +![Metric Correlation](https://storage.googleapis.com/docs-media.encord.com/static/img/tutorials/active-quickstart-metrics-importance.png) + +From this overview, you can see that, for example "Confidence" has a high importance for the model performance. + + + + +Next, we can jump to the _Metric Performance_ page and take a closer look at exactly how the model performance is affected by this metric. However, we want to show you the rest of this page prior to doing this. + +You can skip straight ahead to the [Inspecting Model Performance for a Specific Metric](#inspecting-model-performance-for-a-specific-metric) if you are too curious to wait. + +Before jumping into specific metrics, we want to show you the decomposition of the model performance based on individual classes. Scrolling down the _Summary_ page, the Per Class average precision, average recall, and precision recall curve scores for each individual class appears. + + + +### Inspecting model performance for a specific metric + +Using the _Metric Performance_ and _Explorer_ pages you can see how specific metrics affect the model performance: + +1. Go to _Predictions_ > _Metric Performance_. +2. Select the "Confidence" metric from the _Metric_ drop-down list. + +![Performance by Metric page](https://storage.googleapis.com/docs-media.encord.com/static/img/tutorials/active_performance-metric-confidence.png) + +The plot shows the precision and the false negative rate as a function of the selected metric; "Confidence" in this case. + +3. Go to _Predictions_ > _Explorer_. +4. Filter the data based on a data or prediction metric and the prediction outcome. + +> ℹ️ Note +> Queries are only available in the web-app version of Active. + + + + +## Summary + +This concludes the tour of the quickstart dataset. In this tutorial we covered opening the quickstart dataset, finding image outliers, and analysing the performance of an off-the-shelf object detection model on the dataset. By now, you should have a good idea about how Encord Active can be used to understand your data, labels, and model. + +### Next steps + +- We have only covered a few of the page in the app briefly in this tutorial. +- To learn more about concrete actionable steps you can take to improve your model performance, we suggest that you have a look at the [Workflow section](https://docs.encord.com/docs/annotate-workflows-and-templates). +- If you want to learn more about the existing metrics or want to build your own metric function, the [Metrics section](https://docs.encord.com/docs/active-quality-metrics) is where you should continue reading. +- Finally, we have also included some in-depth descriptions the [Command Line Interface](https://docs.encord.com/docs/active-cli). \ No newline at end of file diff --git a/docs/active-tutorials/active-use-cases.md b/docs/active-tutorials/active-use-cases.md new file mode 100644 index 000000000..8873ee228 --- /dev/null +++ b/docs/active-tutorials/active-use-cases.md @@ -0,0 +1,61 @@ +--- +title: "Active Use Cases" +slug: "active-use-cases" +hidden: false +metadata: + title: "Active Use Cases" + description: "Use cases when using Active." +category: "6480a3981ed49107a7c6be36" +--- + +Active provides value to you in a number of use cases. This page lists a few. If you have other use cases you would like to explore with Active, contact us. + +## Data Cleansing/Curation + +[block:html] +{ + "html": "
" +} +[/block] + +Alex, a DataOps manager at **self-dr-AI-ving**, faces challenges in managing and curating data for self-driving cars. Alex's team struggles with scattered data, overwhelming amounts of data, unclear workflows, and an inefficient data curation processes. Alex is currently a big user of Encord Annotate, but would like to provide better datasets for annotation. + +1. **Initial setup**: Alex gathers a large number of images and gets them imported into Active. Alex then logs into Encord and navigates to Active (freemium). + +2. **First collection**: Alex opens the Project and after searching, sorting, and filtering the data, selects the images and clicks **Add to a Collection** and then clicks **New Collection**. Alex names the Collection **RoadSigns** as the Collection is designed for annotating road signs for the team. + +3. **Data curation**: Alex then further bulk-finds traffic sign images using the embeddings and similarity search. Alex then clicks **Add to a Collection** and then clicks **Existing Collection** and adds them images to the **RoadSigns** Collection in a matter of clicks. + +4. **Labeling workflow**: Thinking about different use-cases (for example, "Labeling" and "Data quality") Alex assigns good quality road signs for labeling, and bad quality road signs for "Data quality" and future "Deletion". In the future Alex might use "Active learning" to prioritize the data for labeling. + +5. **Sent to Annotate:** Alex goes to the _Collections_ page, selects the _Roadsigns_ Collection and clicks **Create Dataset**. Active creates the dataset and [a new project in Annotate](https://docs.encord.com/docs/annotate-annotation-projects#creating-annotation-projects). Alex then configures the workflow, annotators, and reviewers for the Project in Annotate. + +6. **Review and insights**: At the end of the week, Alex reviews the _RoadSigns_ Project in Annotate. The dataset has been annotated. Alex goes to Active, clicks the **More** button on the Project then clicks **Sync Project Data**. Alex then clicks the Project, clicks _Analytics_ and then _Model Predictions_ where Alex gains insights into: + + - Number of labels per image + - Quality of annotations + - Distribution of annotations across metric + +The process is seamless and fast, and Alex can focus on more strategic tasks while her team enjoys a much-improved, streamlined data curation workflow. + +## Label Correction/Validation + +Chris, a Machine Learning Engineer at a micro-mobility startup, has been working with Encord Annotate. His team is dealing with a large set of scooter images that need accurate labeling for obstacle detection. After an initial round of annotations, Chris notices that some labels are incorrect or ambiguous. This has a significant impact on the performance of the ML model. + +1. **Access Collections:** Chris logs into Encord Active. Chris opens the Scooters project that was imported from Annotate. Chris goes to the _Collections_ page for the project. Chris browses the existing Collections to see if he should create a new Collection or add to an existing Collection. + +2. **Data exploration:** Chris searches, sorts, and filters the previously annotated scooter images and identifies those that need re-labelling. + +3. **Create re-labeling Collection:** Chris selects the images and clicks **Add to a Collection**. Chris then clicks **New Collection** and names the Collection **Re-label - Scooters**. + +4. **Initiate re-labelling:** With the Collection ready, Chris returns to the _Collections_ page, selects the **Re-label - Scooters** Collections and clicks **Create Dataset**. The Collection is sent to Annotate. + +5. **Assigning in Annotate:** In Annotate, Chris assigns re-labelling tasks to specific annotators. Annotators then complete their relabelling tasks. + +6. **Quality check in Active:** After the re-labeling tasks are completed, Chris clicks **Sync Project Data**. The updated labels then sync back to the Project. Chris reviews the changes, confirms the label quality, and plans for model re-training. + +The "Collections" feature has simplified the task of identifying and re-labeling inaccurate or ambiguous data, streamlining the entire data annotation and quality control process for Chris and his team. + +## Model/Prediction Evaluation + +_Coming soon..._ \ No newline at end of file