Skip to content

OpenPecha/prodigy-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


OpenPecha

Prodigy Tools

RFCDescriptionOwnerDocs


Description

Tools for OpenPecha's use of Prodigy

Owner

Docs

How to create new Instance

Files Requirements

  1. Instance_Name.service file (service unit file used by systemd)
  2. Instance_Name.json file (Prodigy configuration file)
  3. Instance_Name.conf file (Nginx web server configuration file)
  4. Instance_Name_recipe.py (Instance's prodigy recipe file in .py)
  5. Input data for recipe source file (can be .jsonl, .csv, etc.)

Creating Required Files

  1. Create Instance recipe to stream images to the Prodigy web application.

    return {
        "dataset": dataset,
        "stream": stream_from_s3(obj_keys),
        "view_id": "image_manual",
        "config": {
            "labels": ["PAGE"]
        }
    }
    
    • dataset: Name of the dataset (bdrc-crop)
    • stream: Yield image's s3 key or image URL
    • view_id: image_manual for annotating images
    • labels: List of labels to annotate on the image
  2. Create Prodigy configuration JSON file.

    "db_settings": {
        "sqlite": {
            "name": "bdrc_crop_images.sqlite",
            "path": "/usr/local/prodigy"
        }
    }
    
    • name: Name of the SQLite file where the annotations are saved
    • path: Path to where the SQLite file should be saved
  3. Create .service file to be used by Systemd, a system and service manager for Linux OS.

    Environment=PRODIGY_CONFIG="/usr/local/prodigy/prodigy-tools/configuration/bdrc_crop_images.json"
    
    • Environment=PRODIGY_CONFIG: Path to the Prodigy configuration JSON file
    ExecStart=/usr/bin/python3.9 -m prodigy bdrc-crop-images-recipe bdrc_crop '/usr/local/prodigy/prodigy-tools/data/page_cropping.csv' -F /usr/local/prodigy/prodigy-tools/recipes/bdrc_crop_images.py
    
    • bdrc-crop-images-recipe: Name of the recipe from the recipe.py
    • bdrc_crop: Name of the dataset
    • '/usr/local/prodigy/prodigy-tools/data/page_cropping.csv': Path to the data input source
    • /usr/local/prodigy/prodigy-tools/recipes/bdrc_crop_images.py: Path to instance recipe .py file
  4. Create Nginx configuration .conf.

    upstream prodigyimages {
    server localhost:8090  fail_timeout=20s;
    keepalive 32;
    }
    
    • localhost: port number to listen to

How to Load an instance

  • To test niginx configuration command : sudo nginx -t

  • To restart the nginx (only needs to when .conf has been updated or changed) command : sudo service nginx restart

  • To start the instance with service file command : sudo systemctl daemon-reload sudo systemctl restart name_of_service_file.service

  • To check the log of the instance command : sudo journalctl -u name_of_service_file.service

  • To stop the instance from running command : sudo systemctl stop name_of_service_file.service

  • To git pull the changes from the prodigy-tools to the server command : sudo -u prodigy git pull at /usr/local/prodigy/prodigy-tools/