Skip to content

IRySTeam/IRyS-BE-V2

Repository files navigation

Logo

IRyS

Intelligent Repository System
Explore the docs »

Report Bug · Request Feature

Table of Contents

About The Project

Screenshot

IRyS (Intelligent Repository System) is a digital repository system that can be used to store documents and perform search on those documents. When a document is stored, it will be processed to extract important information from the document such as metadata or entities. Search on documents can be performed using semantic similarity between documents combined with corresponding metadata or entities. There are other features such as authentication, notification, repository management, access management, and others.

Built With

Getting Started

To get a local copy up and running follow these simple steps.

Requirements

  • Pyenv (Recommended) for python version management
  • Python ^3.10.x
    • To install using pyenv
      pyenv install 3.10.x
      
  • Poetry for Python package and environment management.
  • Postgres
  • Redis
  • Elasticsearch cloud service
    • For instructions on how to setup Elasticsearch cloud service, please refer to Elasticsearch section.

BERT model

You can run following commands to download the BERT model:

cd bertserving
wget https://storage.googleapis.com/bert_models/2018_10_18/cased_L-12_H-768_A-12.zip
unzip cased_L-12_H-768_A-12.zip
List of released pretrained BERT models (click to expand...)
BERT-Base, Uncased 12-layer, 768-hidden, 12-heads, 110M parameters
BERT-Large, Uncased 24-layer, 1024-hidden, 16-heads, 340M parameters
BERT-Base, Cased 12-layer, 768-hidden, 12-heads , 110M parameters
BERT-Large, Cased 24-layer, 1024-hidden, 16-heads, 340M parameters
BERT-Base, Multilingual Cased (New)104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
BERT-Base, Multilingual Cased (Old) 102 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
BERT-Base, Chinese Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters
Note that if you don't download the BERT model in local, the script in the docker will download it for you.

Elasticsearch

Cloud (Recommended)

To setup elasticsearch cloud service, you can follow the steps below:

  1. Create an account in Elastic Cloud.
  2. Create a new deployment.
  3. The deployment will be created in a few minutes. After that, elasticsearch will give you an password for the default user elastic. Fill the value of ELASTICSEARCH_USER with elastic and ELASTICSEARCH_PASSWORD with the password given by elastic search.
  4. Go to manage deployment page and copy the cloud deplpoyment id and paste it in the ELASTICSEARCH_CLOUD_ID environment variable.
  5. Create a new API key, and copy the API key and paste it in the ELASTICSEARCH_API_KEY environment variable.
  6. Set the ELASTICSEARCH_CLOUD environment variable to True.

Local

To setup elasticsearch locally, you can follow the steps below:

  1. Install elasticsearch using docker by following this link.
  2. Change the ELASTICSEARCH_CLOUD environment variable to False.
  3. Change the ELASTICSEARCH_HOST environment variable to localhost and ELASTICSEARCH_SCHEME to http.
  4. Change the ELASTICSEARCH_PORT to assigned port during installation.

Local development

Installing required dependency

  1. Install each dependency from the requirements section above.
  2. Install python dependecies by running
    poetry install
  3. Run poetry shell to open Poetry Shell
  4. Train the Machine Learning model for document classification by running this command:
    python3 app/classification/mlutil/classifier_training.py
    NOTE: If you get error while installing psycopg2-binary package, try to run: $ poetry run pip install psycopg2-binary first then re-run $ poetry install
  5. Install pre-commit git hook (for auto formatting purpose)
    pre-commit install
    

Configure App

  1. Find all files below.
  2. Duplicate those files and rename the duplicate files from [prefix_name].example pattern to [prefix_name]
  3. Open newly created files and adjust the content according to your environment. To see the explanation of each environment variable, you can check the environment variable section.

Database Migration (Optional)

Migrate Up

If you want to migrate the database, you can run the following command.

alembic upgrade head

Migrate Down / Rollback

If you want to fully rollback the database, you can run the following command.

alembic downgrade base

If you want to rollback to specific version, you can run the following command.

alembic downgrade [version]

To see the list of available version, you can run the following command.

alembic history

Add new migration

If you want to add new migration, you can run the following command to generate new migration file.

alembic revision --autogenerate -m "migration message"

Dont forget to add the model in migrations/env.py file (if not exist).

from app.<folder>.models import *

Running the needed services

Run docker compose by running

docker-compose -f docker-compose-local.yml up

Below are services that are running:

  1. bert-serving: Used for sentence embedding using BERT
  2. redis: Used for celery result backend and message broker
  3. celery_worker -> Used for running celery tasks
  4. celery_beat: Used for running celery beat (cron jobs scheduler)
  5. flower: Used for monitoring celery tasks, located at http://localhost:5557

Below are some useful commands for docker:

  1. To rebuild docker containers, run
    docker-compose -f docker-compose-local.yml up --build
  2. To remove unused docker containers, run
    docker container prune
  3. To remove unused docker images, run
    docker rmi $(docker images --filter "dangling=true" -q --no-trunc)
  4. To exec into a docker container, run
    docker exec -it <container_name> bash

Running the app

  1. Run poetry shell to open Poetry Shell
  2. Lastly, run the app using this command:
    ENV=local|development|production python3 main.py

Usage

  1. To access the documentation, you can go to localhost:8000/docs on your web browser.

Environment Variables

Name Description Example Value
DEV_DB_HOST Database host address localhost
DEV_DB_USER Database user's username postgres
DEV_DB_PASSWORD Database user's password postgres
DEV_DB_NAME Database name used for application IRyS_v1
PROD_DB_HOST Database host address localhost
PROD_DB_USER Database user username postgres
PROD_DB_PASSWORD Database user password postgres
PROD_DB_NAME Database name used for application IRyS_v1
ELASTICSEARCH_CLOUD Whether using Elasticsearch Cloud or not True
ELASTICSEARCH_CLOUD_ID Elasticsearch Cloud deployment ID fcggg111hgg2jjh2:jhhhllk
ELASTICSEARCH_USER Elasticsearch username (either using Elasticsearch Cloud or not) elastic
ELASTICSEARCH_PASSWORD Elasticsearch password (either using Elasticsearch Cloud or not) password
ELASTICSEARCH_API_KEY Elasticsearch API key (when using Elasticsearch Cloud) 1234567890
ELASTICSEARCH_SCHEME Elasticsearch scheme (when using local Elasticsearch) http
ELASTICSEARCH_HOST Elasticsearch host address (when using local Elasticsearch) localhost
ELASTICSEARCH_PORT Elasticsearch port (when using local Elasticsearch) 9200
MAIL_USERNAME Email username username
MAIL_PASSWORD Email password password
MAIL_FROM Email sender
MAIL_PORT Email port 587
MAIL_SERVER Email server smtp.gmail.com
CELERY_BROKER_URL Celery broker URL redis://localhost:6379/0
CELERY_RESULT_BACKEND Celery result backend URL redis://localhost:6379/0

Note:

  1. More on elasticsearch see Elasticsearch section.
Variable Description Default
CELERY_BROKER_URL The URL of the broker to use. redis://redis:6379/0
CELERY_RESULT_BACKEND The URL of the result backend to use. redis://redis:6379/0

Note:

  1. The value of CELERY_BROKER_URL and CELERY_RESULT_BACKEND should be the same as the value of redis configuration in the docker-compose.yml file.
Variable Description Default
MODEL_DOWNLOAD_URL The URL of the BERT model to download. https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip
MODEL_NAME The name of the BERT model to download. uncased_L-12_H-768_A-12

Note:

  1. More on BERT see BERT model section.
  2. The value of MODEL_NAME will be used as the name of the folder that contains the BERT model.

Maintainers

List of Maintainers

License

Copyright (c) 2023, IRyS-Team.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages