Label.ai

Description

Label.ai is an application for companies and researchers in the field of machine learning classification. This project allows companies to validate their classification dataset by having a large number of people label image datasets. The process is simple:

Companies will provide Label.ai with labelled images
Label.ai will outsource to individuals to verify whether each image actually contains the given labels.
Individuals will be compensated respective to how well/how much they label

For example, consider an image with a cat that looks very similar to a dog. If this image was classified by some sort of AI/algorithm (especially the AI is not optimized), it would be likely that this image will be misclassified. Through the procedure above, Label.ai will ensure that the proper classification is provided to the image

Motivation

The main motivation for creating our platform, Label.ai, is because machine learning and many other techniques in artificial intelligence require large amounts of data, and thus datasets are considered the new gold. The issue is that massive datasets with accurate labelling are extremely time consuming to create and often, datasets are not perfectly annotated. Label.ai aims to connect researchers/companies with independent labelling contractors who will label/verify the data/images for them. We believe that the existence of such a platform comes with many benefits. On one hand, it will improve the quality of datasets for companies who are in the AI / ML field. Moreover, it provides a secondary form of income for individual labellers with minimal training required.

Application Overview

Our application is very simple to use. After starting up the 3 instances (postgresql, frontend, backend), open the frontend interface. The frontend has 2 main dashboards. The main dashboard will allow users to label images (the main functionality of labelai). The other dashboard is a dashboard that shows the different reports (AllConfirmed, MisClassified, Discovered, Controversial, NoVotes). This option is only available to admin users.

Required installations

Node.js >= 14
PSQL
All packages inside backend/requirements/base.txt
- python -m pip install backend/requirements/base.txt

Postgresql on Mac

Open the terminal application on your mac.

Install Homebrew. This helps with installing and managing applications on MacOS. Simply run:

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Update Homebrew and install postgresql by running:
```
brew update
brew install postgresql
```

Check to make sure you have PostgreSQL by checking its version:

postgres --version
# This must print something like:
# > postgres (PostgreSQL) 12.2

Now you can initialize the physical space on your hard-disk to allocate databases. To do this, create a default postgres database on the command line. Run:

initdb /usr/local/var/postgres
# You will/might see "initdb: directory "/usr/local/var/postgres" exists but is not empty"
# this means the database was already created and you are fine for this step

Start the database by running:
```
pg_ctl -D /usr/local/var/postgres start
```

The Website

Create a virtualenv:
```
python3.9 -m venv venv
```
Activate the virtualenv you have just created:
```
source venv/bin/activate
```
Windows
```
./venv/Scripts/activate.bat
```
install requirements:
```
pip install -r requirements/local.txt
```

Create a db:

createdb label_ai -U postgres --password <password>

Windows

createdb -U postgres label_ai

Set the environment variables for your database:

export DATABASE_URL=postgres://postgres:<password>@127.0.0.1:5432/label_ai

Windows

set DATABASE_URL=postgres://[email protected]:5432/label_ai

Apply migrations:
```
python manage.py migrate
```
Run the server:
```
python manage.py runserver 0.0.0.0:8000
```
Visit 0.0.0.0:8000 and enjoy! The 2 functionalities currently supported are:
- List all images: 0.0.0.0:8000/images/
- List all labels: 0.0.0.0:8000/labels/

Features

backend/label_ai/labels/views.py:

List all labels

lists all the labels in the Label table http://localhost:8000/labels/all

backend/label_ai/classifications/views.py:

List all classifications

lists all the classifications in the Classification table http://localhost:8000//all

backend/label_ai/members/views.py:

List all members

Lists all the members in the Member table http://localhost:8000/members/all

backend/label_ai/submissions/views.py:

List all submissions

Lists all the submissions in the Submission table http://localhost:8000/submissions/all
Create Submission

Creates a submission entry in the Submission table http://localhost:8000/submissions/insert (requires the fields of the submission object in the post call)

queries/create_all.sql:

Calculation of confidence for classifications (View + Joins)

For each classification c, calculate the lower bound of the Wilson Confidence interval for ‘votes’ in submission corresponding to ‘c’, using user trust as a weighted vote.

Confidence is represented in ClassificationView which is the endpoint for our application to make classification queries.

backend/label_ai/images/views.py:

List all images:

Lists all the images in the Image table http://localhost:8000/images/all
ImagesByLabelView

Returns a list of images by a label id. Will only return the images where the confidence / total count matches a certain threshold. http://localhost:8000/images/confirmed?label_id=<label_id>
MislabelledImagesView

Queries all images we believe have one or more classifications with confidence < 0.05 http://localhost:8000/images/mislabelled?count=
GetPromptImageClassification

Queries 1 random,unlabelled image classification prompt that http://localhost:8000/images/prompt

Core Features

Get random classification prompts Parameters for number of prompts and the user User must not have classified previously Prompts must be unconfirmed (actually need classification)
Get random classification prompts by label Same 1. With label parameter
Submit a user classification - submission Parameters for user, classification and their response How to handle non-unique case? Triggers?
Get all images we are confident fall under a label
Get all the images with confident classification which were not previously classified
Get all the misclassified images which were previously classified
Calculate confidence in classifications

Relational Schema

Here is the relational schema of our backend architecture for those interested!

Name		Name	Last commit message	Last commit date
Latest commit History 192 Commits
backend		backend
dataset_processing		dataset_processing
frontend		frontend
milestones		milestones
queries		queries
scripts		scripts
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Label.ai

Description

Motivation

Application Overview

Required installations

Postgresql on Mac

The Website

Windows

Windows

Windows

Features

Core Features

Relational Schema

About

Releases

Packages

Contributors 5

Languages

amirali-shfz/Label.ai

Folders and files

Latest commit

History

Repository files navigation

Label.ai

Description

Motivation

Application Overview

Required installations

Postgresql on Mac

The Website

Windows

Windows

Windows

Features

Core Features

Relational Schema

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages