Follow the steps below to train a form recognizer composite machine learning (ML) model. This also includes the steps to train a non-composite ML model.
After finishing the deployment process described in Deployment Scripts Guide, your form recognizer resource should be ready for you to start creating new Form Recognizer ML model. From your cloned repository, navigate to the project root directory then to Deployment\Data\samples\train
You will see two folders that contain sample training forms, as illustrated below.
- Go to the Azure Portal and select the Azure Data Lake Storage Account that was created by deployment script.
- On the left side of the menu pane, select
Data storage
, thenContainers
and go to thesamples
container. - Create a new folder and name it
train
. - In the
train
folder, create two folders. One namedcontoso_set_1
and the other namedcontoso_set_2
. - Upload the sample labeling files in Data/samples/train/contoso_set_1 and Data/samples/train/contoso_set_2 into the corresponding folders. You now have two full sets of pre-labeled data to create the machine learning models.
In this step you will train custom Azure Form Recognizer ML models and merge them into a composite model. For more information, please refer to Azure online document Compose Custom Models v3.0.
- Go to Form Recognizer Studio, scroll down to
Custom Model
and selectCreate new
, as illustrated below.
- Select
+Create a project
to create a project. - Enter a project name. For example
SafetyFormProject-Set-1
or any other project name of your choice. - Enter a project description. For example
Custom form recognizer model with samples contoso_set_1
and clickContinue
. - Select your Subscription, Resource Group and Azure Form Recognizer created by the deployment scripts main-deploy.ps1.
- Select the latest API Version. This solution was tested with API version 2022-08-31, as illustrated below. Click
Continue
. - Now you will be prompted to enter the training data source, as illustrated below. Select your subscription. Select Resource Group, and Azure storage created by the deployment scripts main-deploy.ps1. Enter
samples
in the Blob container field. Entertrain/contoso_set_1
in the Folder path field. ClickContinue
.
- Review Information and click
Create Project
. This step connects the form recognizer studio to Azure data lake storage/container resource in your subscription to access the training data. - After the project is created, forms with OCR, field key and value pair will appear as illustrated below. Click '
Train
' on upper right corner,
- Fill in information as below, and select the dropdown "Build Mode" to
Template
, and then clickTrain
.
- Once the training for
contoso_set_1
samples is done, the model will be located inModels
tab with confidence score of each field, as illustrated below.
- Train second model with files stored in
train/contoso_set_2
, using above steps. Name your second model asconsoto-set-2
or choice of your own. Click 'Models' from your project, you will see a list of models already created. You can now merge individual models into a composite model. Selectcontoso-set-1
andcontoso-set-2
, then clickCompose
. The system will prompt you for a new model name and description. Entercontoso-safety-forms
or choice of your own, with a description. ClickCompose
.
- Now your model id
contoso-safety-forms
will appear in the Model ID list, as illustrated below. Please make a note of this composed model id as you will need it to set up the Azure Functions App to use this model id, as described in Solution Configuration Guide.