Skip to content

Commit

Permalink
Merge pull request #54 from duggalsu/test_documentation
Browse files Browse the repository at this point in the history
Update documentation
  • Loading branch information
duggalsu authored Feb 2, 2024
2 parents af54ac0 + 375c1b9 commit a29e7a7
Show file tree
Hide file tree
Showing 2 changed files with 25 additions and 100 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -129,3 +129,5 @@ _docs
# YOLO segement pt file
src/api/core/operators/yolov8n-seg.pt

# SonarQube
.scannerwork
123 changes: 23 additions & 100 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,20 @@ Please create a new Discussion [here](https://github.com/tattle-made/tattle-api/
1. Set environment variables by replacing the credentials in `/src/api/.env-template` with your credentials. Rename the file to `development.env`.
(For production, update the RabbitMQ and Elasticsearch host and credentials in the `.env` files)

2. Run `docker-compose up` . This will bring up the following containers:
For development, replace the following in `development.env`:
- Replace the value of `MQ_USERNAME` with the value of `RABBITMQ_DEFAULT_USER` from `docker-compose.yml`
- Replace the value of `MQ_PASSWORD` with the value of `RABBITMQ_DEFAULT_PASS` from `docker-compose.yml`

2. Install packages for local development. These will be installed automatically with `docker compose up`

```
# Install locally in venv
$ cd src/api/
$ pip install -r requirements.txt
```


3. Run `docker-compose up` . This will bring up the following containers:

Elasticsearch : Used to store searchable representations of multilingual text, images and videos.

Expand All @@ -42,15 +55,13 @@ Please create a new Discussion [here](https://github.com/tattle-made/tattle-api/

The first time you run `docker-compose up` it will take several minutes for all services to come up. Its usually instantaneous after that, as long as you don't make changes to the Dockerfile associated with each service.

3. To verify if every service is up, visit the following URLs:

elasticsearch : visit http://localhost:9200
4. To verify if every service is up, visit the following URLs:

rabbitmq UI : visit http://localhost:15672
elasticsearch: http://localhost:9200

search server : visit http://localhost:5000
rabbitmq UI: http://localhost:15672

4. Install required operators
5. Install required operators
Each operator has to be installed separately

```
Expand All @@ -60,19 +71,15 @@ Please create a new Discussion [here](https://github.com/tattle-made/tattle-api/
$ pip install -r vid_vec_rep_resnet_requirements.txt
```

5. Then start the server with:
6. Then, in a new terminal, start the server with:

```
$ cd src/api
$ docker exec -it feluda_api python server.py
```

6. Verify that the server is running by opening: http://localhost:7000
7. Verify that the server is running by opening: http://localhost:7000

7. Then start the indexer with:
```
$ docker exec -it search_indexer python receive.py
```

#### Server endpoints

Expand All @@ -89,109 +96,25 @@ The `/upload_image`, `/upload_video` and `/upload_text` endpoints index data dir
http://localhost:7000/search : Receives a query image / video / text and returns the top 10 matches found in the Elasticsearch index in descending order.
Note: A text search returns two sets of matches: `simple_text_matches` and `text_vector_matches`. The former is useful for same-language search and the latter for multilingual search.

#### Examples

```
curl --location --request POST 'http://localhost:7000/upload_text' \
--header 'Content-Type: application/json' \
--data-raw '{"source_id": "1",
"media_type": "text",
"source": "A",
"text": "Symptoms of COVID-19 are variable, but often include fever, cough, fatigue, breathing difficulties, and loss of smell and taste. Symptoms begin one to fourteen days after exposure to the virus. Around one in five infected individuals do not develop any symptoms.",
"metadata": {"test": "text indexing"}}'
```

```
curl --location --request POST 'http://localhost:7000/upload_image' \
--header 'Content-Type: application/json' \
--data-raw '{"source_id": "1",
"media_type": "image",
"source": "B",
"image_url": "https://tattle-story-scraper.s3.ap-south-1.amazonaws.com/e9ec45b7-3e9a-46a2-ba8e-46e606b85da6",
"metadata": {"test": "image indexing"}
}'
```

```
curl --location --request POST 'http://localhost:7000/search' \
--header 'Content-Type: application/json' \
--data-raw '{"source_id": "2",
"media_type": "text",
"source": "A",
"text": "कोविड-19 के लक्षण परिवर्तनशील हैं, लेकिन अक्सर बुखार, खांसी, थकान, सांस लेने में कठिनाई और गंध और स्वाद की हानि शामिल हैं। वायरस के संपर्क में आने के एक से चौदह दिन बाद लक्षण दिखाई देने लगते हैं। पांच में से एक संक्रमित व्यक्ति में कोई लक्षण विकसित नहीं होते हैं।",
"metadata": {"test": "text search"}}'
```

```
curl --location --request POST 'http://localhost:7000/upload_video' \
--header 'Content-Type: application/json' \
--data-raw '{"source_id": "2",
"media_type": "video",
"source": "B",
"file_url": "https://s3.ap-south-1.amazonaws.com/sharechat-scraper.tattle.co.in/test-videos/135f1b16_1596110374725_c_v__963e6ace-961c-41be-a28c-b3010136cedf.mp4",
"metadata": {"test": "video indexing"}}'
```

#### Bulk indexing

Bulk indexing scripts for the data collected by various Tattle services should be located in the service repository, such as [this one](https://github.com/tattle-made/sharechat-scraper/blob/development/workers/indexer/tattlesearch_indexer.py) and triggered as required. This makes the data searchable via this search API.
The indexing status of each record can be updated via a [reporter](https://github.com/tattle-made/sharechat-scraper/blob/development/workers/reporter/tattlesearch_reporter.py).
While the former fetches data from the service's MongoDB and sends it to the API via HTTP requests, the latter is a RabbitMQ consumer that consumes reports generated by `receive.py` and adds them to the DB.

#### Dockerization

The Dockerfiles in `src/api-server` and `src/indexer` implement multistage builds to reduce image size (from approximately 5 gb each to 1.6 gb each). Since both the server and indexer have the same dependencies, it could be useful to push the first stage of their Docker builds to Dockerhub as a separate image, and then pull that image in the Dockerfiles.

Building the first stage and pushing it to a Dockerhub repository -

```
cd src/api-server
docker build --target builder -t username/repository:tag .
docker push -t username/repository:tag
```

And then replacing the following code in both the Dockerfiles -

```
FROM python:3.7-slim as builder
RUN apt-get update \
&& apt-get -y upgrade \
&& apt-get install -y \
--no-install-recommends gcc build-essential \
--no-install-recommends libgl1-mesa-glx libglib2.0-0 \
# Vim is only for debugging in dev mode. Uncomment in production
vim \
&& apt-get purge -y --auto-remove \
gcc build-essential \
libgl1-mesa-glx libglib2.0-0 \
&& rm -rf /var/lib/apt/lists/*
RUN pip install --upgrade pip
COPY requirements.txt /app/requirements.txt
WORKDIR /app
RUN pip install --user -r requirements.txt
```

with -

```
FROM username/repository:tag AS builder
```

Note that this builder image would need to be rebuilt if there is any change in the dependencies.

#### Updating Packages

1. Update packages in `src/api/requirements.in`
1. Update packages in `src/api/requirements.in` or operator specific requirements file:
`src/api/core/operators/<operator>_requirements.in`
2. Use `pip-compile` to generate `requirements.txt`

Note:

- Use a custom `tmp` directory to avoid memory issues
- Do not use `--generate-hashes` flag for `pip-compile` since the cpu version of `pytorch` is being used from official repository as it is not available in `pypi`. `pip-compile` will manually generate the hash for the architecture specific file and the code will not be compatible with other architectures.
- If an operator defaults to a higher version than allowed by feluda core `requirements.txt`, manually edit the `<operator>_requirements.txt` to the compatible version. Then run `pip install`. If it runs without errors, the package version is valid for the operator.

```bash
$ cd src/api/
Expand Down

0 comments on commit a29e7a7

Please sign in to comment.