The task of creating an image classifier based on a user dataset is a common task for an AI programmer, but I haven’t come across an automatic tool that solves the problem without the participation of a programmer.
CV tools is a visual information analysis service that includes tools for setting up and testing the necessary tasks for a non-programmer; it also allows you to use the trained system as a REST server for image analysis and multi-user support.
-
Install dependencies.
Install pytorch according to the instructions https://pytorch.org/get-started/locally/
Install dependent packages:
-Go to the working directory of CV Tools
-execute command
pip install -U -r requirements.txt
-
Setting up the program.
All program settings are stored in the config.py file and must be made before starting the program.
dataset_dir Defines the full or relative path to the directory where the user dataset is located.
neuronet Determines what type of neural network is used for additional training. All types of neural networks from the section are supported https://huggingface.co/models?pipeline_tag=image-classification&sort=trending It makes sense to leave the value set 'facebook/convnextv2-base-22k-224' which provides the highest accuracy that I have tested or "facebook/convnextv2-tiny-1k-224", if you are interested in maximum speed with a loss of accuracy of 3-5%.
test_size = 0.2 The program automatically divides each group into two parts: training and testing. In the first part, the neural network is trained, in the second, which the neural network does not see during training, it is checked. The optimal values are usually in the range of 0.2-0.3 and with a small number of images (up to 40) in a group 0.2 is selected, with a large number 0.25 - 0.3.
batch_size Determines the size of the simultaneously analyzed group of images to build the index. Depends on the amount of GPU memory and RAM. The minimum value of 16 will allow training on weak hardware, 64 and 128 - for owners of powerful computers with a GPU of 16 GB or more.
max_equal_distance Defines the maximum distance in the n-dimensional space of the neural network at which images are considered very similar. Used to filter Errors results. It is selected experimentally for each type of neural network. Set by default for 'facebook/convnextv2-base-22k-224'
-
Start of the program.
Go to the working directory and run the command python run.py Open Chrome at localhost:8000 or, if the server is on a remote host, server_address:8000
-
View the dataset,
remove visible errors, for example irrelevant images in groups, expand the dataset using Google Images, if necessary.
-
Go to tab Learning tab and click Start learning. Wait until the end and set the trained network as a system one by clicking button Set trained as the main button.
-
Analyze errors by clicking Calculate.
Correct errors if possible as described below.
-
If the errors are corrected, go to step 4.
-
Switch to the Image analysis tab and test the system using images from the Internet.
At the top there is a system menu; clicking on the elements opens the corresponding screen.
The dataset editor allows you to add and delete groups and group images to the system dataset.
Group images fall into the following categories:
Group - images confirmed by the user and reliably reflecting membership in the group. Only images with this status are used when training the visual system.
New - images downloaded from Google Photos at the request of the user and requiring further viewing and changing the status to Group or Deleted.
Deleted - images defined by the user as not belonging to the group. They are stored in the system for possible revision and to avoid repeated downloading and analysis. The system will not offer to reconsider an image previously rated by the user if it appears in the search results again.
Adding a group. A root or child group can be created.
You can add images to any group. To do this, it makes sense to go to Google in the Images section in your browser and select a query for which Google will show the most relevant images for the desired group.
Then in the Vision dataset window, select the group in the list of groups to which you want to add images and click and it will appear
At the dialog you need to enter the selected query, the system will download images from Google references, add and open them in the New section.
Then the user must view and select the images by clicking on each one to transfer to the Group or Deleted sections.
Buttons transfer selected images from the New section to the Group or Deleted sections.
changes the image selection status to the opposite for all displayed images.
If you need to take a closer look at the image, clicking on the square with the number in the image brings up the image lens.
Selecting group sections is done by clicking on the switch
At this screen the system is trained using Dataset data. The training parameters have values close to optimal, but can be changed and selected by the user to achieve maximum accuracy of the neural network. Detailed information about the meaning and meaning of these parameters can be found here https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments The user does not need to understand the meaning and meaning of these parameters.
Start learning process. The learning progress is displayed in the Epochs table.
The neural network is trained to achieve maximum accuracy. During training, the system is not available for use by other users and systems.
The accuracy of the resulting neural network for each training iteration is shown in the column Accuracy.
The resulting network with maximum Accuracy can be saved and assigned in the system as the main one (Production)
After appointment the neuronetwork, you can try to improve the accuracy of it:
-
Changing training parameters. To do this, you need at least a superficial understanding of the parameters and learning process of the neural network.
-
Improve the quality of training data - Dataset.
To do this, you need to find and correct anomalies in the data, for which the Anomalies section is intended.
There are 2 types of anomalies:
-
Group detection errors when the neural network incorrectly determines the group of an image. To do this, in Anomalies you need to select the Errors type and click the button A list will appear
Which shows incorrectly identified images. In the description of the image, the first is the erroneous group and its probability, the second is the real group of the image and its probability. For erroneous images, the probability of an erroneous group is greater than the probability of a correct and expected one.
For erroneously detected imagesis are provided transferring a image from the current group to another ‘correct’ one.
To do this, select the desired images to transfer and press the button and select the group where to transfer.
-
Removing erroneously detected images.
If an image does not correspond to the group it is in, and the other groups are not suitable for it, then it is better to delete such an image. To do this, click each such image and press .
The second type of anomalies are the same or very similar images, for example, of poorer quality or cropping of one image. Such images have almost no effect on the quality of analysis for datasets with more than 30 images in a group, however, for small datasets, the individual characteristics of duplicate images can have a negative impact on the quality of analysis. To search for duplicates in Anomalies you need to select Type -> Duplicates and click the button A list will appear
The images in the list come in pairs and the first digit in the caption of the image is the number of the pair.
Next comes the groups of images and the distance to the other image of the pair. The smaller the distance, the less the images differ from each other. The same transfer and deletion operations are available for images as for incorrectly identified ones. It makes sense to remove duplicates, especially when they are in different groups.
Allows you to test the output of a trained neural network and calculate the most similar images of groups. Similar images make it possible to understand why the classifier’s response is as follows: adjust the Dataset and get a better result on the corrected Dataset.
To upload an image you need to drag it onto the button
or click on + there. When clicked, a file selection dialog will appear in which you need to select an image and click OK.
Information about the groups and their probabilities recognized by the neural network will appear in the Image classification table. When you select any group, images from this category will appear in the table with a distance in ~n-dimensional neural network space to each image in the Similar images block, where n is the number of groups in the Dataset.
The data in the table is as follows:
Group - group_ Dataset.
Probability - the analyzer’s confidence (probability) that the image belongs to this group. _
Additional settings:
The Search switch, when active, activates the search for similar images.
How many images to search tells the system exactly how many similar images it should show in descending order of similarity.
CV Tools with a trained neural network supports a REST interface for using by external systems.
-
Server_address/cv?file_path
Where file_path is the local path to the file or http address.
Example:
http://localhost:8000/cv?/home/george/Projects/save/animals/bison/4a11160cd7.jpg
-
Server_address/cv?file_path:how_many
Where file_path is the local path to the file or http address.
How_many - how many top values to return
Example:
http://localhost:8000/cv?/home/george/Projects/save/animals/bison/4a11160cd7.jpg:3
The returned result in JSON format
An example dataset used for preparing the article:
https://www.kaggle.com/datasets/iamsouravbanerjee/animal-image-dataset-90-different-animals
CV Tools supports Unified Remote API for using by external systems.
Example for using: https://github.com/unisi-tech/unisi/blob/main/tests/proxy/run_vision.py