Skip to content

Commit

Permalink
Merge pull request #286 from tattle-made/development
Browse files Browse the repository at this point in the history
chore: merging development to main
  • Loading branch information
aatmanvaidya authored May 1, 2024
2 parents 0cc9624 + 6239cef commit bba03e4
Show file tree
Hide file tree
Showing 25 changed files with 862 additions and 120 deletions.
46 changes: 46 additions & 0 deletions .github/workflows/docker-push-media-worker-staging.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
name: Publish Media Worker to Dockerhub for Staging

permissions:
contents: read

on: workflow_dispatch

jobs:
docker:
runs-on: ubuntu-latest
steps:
- name: Set up QEMU
uses: docker/setup-qemu-action@68827325e0b33c7199eb31dd4e31fbe9023e06e3 # v3.0.0

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@2b51285047da1547ffb1b2203d8be4c0af6b1f20 # v3.2.0

- name: Login to Docker Hub
uses: docker/login-action@e92390c5fb421da1463c202d546fed0ec5c39f20 # v.3.1.0
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}

- name: Build and push amd64
uses: docker/build-push-action@2cdde995de11925a030ce8070c3d77a52ffcf1c0 # v5.3.0
with:
context: "{{defaultContext}}:src/"
file: worker/media/Dockerfile.media_worker
platforms: linux/amd64
build-args: |
"UID=1000"
"GID=1000"
push: true
tags: tattletech/feluda-operator-media:worker-amd64-latest

- name: Build and push arm64
uses: docker/build-push-action@2cdde995de11925a030ce8070c3d77a52ffcf1c0 # v5.3.0
with:
context: "{{defaultContext}}:src/"
file: worker/media/Dockerfile.media_worker_graviton
platforms: linux/arm64
build-args: |
"UID=1000"
"GID=1000"
push: true
tags: tattletech/feluda-operator-media:worker-arm64-latest
58 changes: 29 additions & 29 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -1084,45 +1084,45 @@ refactor: merging development to main ([`22bb325`](https://github.com/tattle-mad

* feat: feluda store supports audio (#78)

* feat: feluda store supports audio

* fix: delete and refresh for ES

* feat: feluda store supports audio

* fix: delete and refresh for ES

* dhore: profiling audio operator ([`f6987a6`](https://github.com/tattle-made/feluda/commit/f6987a6d3aa4ff018b5ebac248a4469437df80d3))

* feat: add poc multiprocess test ([`f43646b`](https://github.com/tattle-made/feluda/commit/f43646b4145af6cd8c6ea718be895ecbca77d271))

* feat: audio operator to extract embedding vectors (#59)

* feat: audio emebddings

* chore: deleting music files

* chore: renaming files

* docs: documentation for audio embedding operator

* feat: audio emebddings

* chore: deleting music files

* chore: renaming files

* docs: documentation for audio embedding operator

* docs: adding work to be done for the operator ([`484d5ae`](https://github.com/tattle-made/feluda/commit/484d5aed902b46d627c060625fad2a64a6246461))

* feat: c-profiling test for video vec (#60)

* feat: c-profiling test for video vec

* feat: c-profiling test for video vec

* feat: test to find time taken for video vec ([`247f5db`](https://github.com/tattle-made/feluda/commit/247f5db90dc708f04bd0818f790d95c3e2c67a42))

* feat: add workflow to push vidvec specific operator to dockerhub ([`17e0d57`](https://github.com/tattle-made/feluda/commit/17e0d576492e379b2e4165e2b728868ab3fad455))

* feat: operator to detect objects using YOLO (#44)

* feat: operator to detect objects using YOLO

* test file comment main function

* feat: operator to detect objects using YOLO

* test file comment main function

* chore: moving ultralytics install to opreator ([`17b9d10`](https://github.com/tattle-made/feluda/commit/17b9d107464f24875ca7008c4cf81cf0466e45d3))

* feat: operator to extract text in images using tesseract (#40)

* feat: opreator to detect text in images using tesseract
* feat: opreator to detect text in images using tesseract
* chore: adding test images and making test multilingual ([`edec4a9`](https://github.com/tattle-made/feluda/commit/edec4a97763dd81e7dd7013833c690890563a1e9))

* feat: add license ([`a44e233`](https://github.com/tattle-made/feluda/commit/a44e233bf36fb4ad0d5bbd41a29523dc1f5364aa))
Expand Down Expand Up @@ -1194,8 +1194,8 @@ refactor: merging development to main ([`22bb325`](https://github.com/tattle-mad

* fix: video search (#52)

* chore: moving test files to a folder
* fix: video search
* chore: moving test files to a folder
* fix: video search
* docs: commenting TODO in search.py ([`af54ac0`](https://github.com/tattle-made/feluda/commit/af54ac0b7e2ef1afc139939e88cfc0f3fcc2dbfc))

* fix: search api as client ([`2573490`](https://github.com/tattle-made/feluda/commit/25734905ed04d6f302542ae152a3954d20e7ad31))
Expand Down Expand Up @@ -1238,10 +1238,10 @@ refactor: merging development to main ([`22bb325`](https://github.com/tattle-mad

* refactor: benchmark test sh file (#64)

* refactor: benchmark test sh file

* ci: dockerfile udpate for benchmark.sh

* refactor: benchmark test sh file

* ci: dockerfile udpate for benchmark.sh

* chore: echo statements for benchmark file ([`37e768a`](https://github.com/tattle-made/feluda/commit/37e768a38af2dcc169c395dace49bd708eeeaef1))

* refactor: cleanup deprecated thigns. ([`4c67853`](https://github.com/tattle-made/feluda/commit/4c67853e75b8511cfb89158a358f60523851b138))
Expand All @@ -1264,8 +1264,8 @@ refactor: merging development to main ([`22bb325`](https://github.com/tattle-mad

* test: worker to queue and index video files (#84)

* refactor: small improvements

* refactor: small improvements

* test: worker to queue and index video vec ([`6eaf19b`](https://github.com/tattle-made/feluda/commit/6eaf19b39298762b6f9a3f34e50858d7314d02e6))

### Unknown
Expand Down Expand Up @@ -1437,8 +1437,8 @@ Add ElasticSearch benchmarking ([`03915d3`](https://github.com/tattle-made/felud

* [WIP] test: evaluating audio vec ES index and search (#77)

* test: evaluating audio vec ES index and search

* test: evaluating audio vec ES index and search

* docs: delete stored documents ([`ad94ad7`](https://github.com/tattle-made/feluda/commit/ad94ad745d85c734fe1c7671ce43c4f2a9e28876))

* Merge pull request #76 from duggalsu/add_arch_to_docker_tag
Expand Down
19 changes: 14 additions & 5 deletions src/core/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,15 @@
"""

import logging
from typing import List, Optional
from typing import List, Optional, Union
import yaml
from dataclasses import dataclass
from dacite import from_dict

log = logging.getLogger(__name__)


@dataclass
class StoreParameters:
class StoreESParameters:
host_name: str
image_index_name: str
text_index_name: str
Expand All @@ -26,10 +25,20 @@ class StoreParameters:


@dataclass
class StoreConfig:
class StorePostgresParameters:
table_names: List[str]


@dataclass
class StoreEntity:
label: str
type: str
parameters: StoreParameters
parameters: Union[StoreESParameters, StorePostgresParameters]


@dataclass
class StoreConfig:
entities: List[StoreEntity]


@dataclass
Expand Down
7 changes: 4 additions & 3 deletions src/core/feluda.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ def __init__(self, configPath):
if self.config.store:
from core import store

self.store = store.get_store(self.config.store)
self.store = store.get_stores(self.config.store)
if self.config.queue:
# print("---> 1", self.config.queue)
from core.queue import Queue
Expand Down Expand Up @@ -61,8 +61,9 @@ def start_component(self, component_type: ComponentType):
if component_type == ComponentType.SERVER and self.server:
self.server.start()
elif component_type == ComponentType.STORE and self.store:
self.store.connect()
self.store.optionally_create_index()
for store in self.store:
self.store[store].connect()
self.store[store].initialise()
elif component_type == ComponentType.QUEUE and self.queue:
self.queue.connect()
self.queue.initialize()
Expand Down
76 changes: 50 additions & 26 deletions src/core/models/media_factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,11 @@
from werkzeug.datastructures import FileStorage
import wget
from core.models.media import MediaType
from core.models.s3_utils import AWSS3Utils
import logging
import os
import tempfile
import boto3
from pydub import AudioSegment

log = logging.getLogger(__name__)

Expand Down Expand Up @@ -70,23 +71,6 @@ def make_from_file_in_memory(image_data: FileStorage):
pass

class VideoFactory:
aws_access_key_id = os.getenv('AWS_ACCESS_KEY_ID')
aws_secret_access_key = os.getenv('AWS_SECRET_ACCESS_KEY')
aws_region = os.getenv('AWS_REGION')
aws_bucket = os.getenv('AWS_BUCKET')
session = boto3.Session(
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key,
region_name=aws_region
)
s3 = session.client('s3')
@staticmethod
def download_file_from_s3(bucket_name, file_key, local_file_path):
try:
VideoFactory.s3.download_file(bucket_name, file_key, local_file_path)
print(f"File {file_key} downloaded successfully as {local_file_path}")
except Exception as e:
print(f"Error downloading file {file_key}: {e}")

@staticmethod
def make_from_url(video_url):
Expand All @@ -104,13 +88,13 @@ def make_from_url(video_url):
print("Error downloading video:", e)
raise Exception("Error Downloading Video")
else:
bucket_name = VideoFactory.aws_bucket
bucket_name = AWSS3Utils.aws_bucket
file_key = video_url
file_name = file_key.split("/")[-1]
file_path = os.path.join(temp_dir, file_name)
try:
print("Downloading video from S3")
VideoFactory.download_file_from_s3(bucket_name, file_key, file_path)
AWSS3Utils.download_file_from_s3(bucket_name, file_key, file_path)
print("Video downloaded")
except Exception as e:
print("Error downloading video from S3:", e)
Expand All @@ -134,21 +118,61 @@ class AudioFactory:
@staticmethod
def make_from_url(audio_url):
temp_dir = tempfile.gettempdir()

if audio_url.startswith("http"):
temp_url = audio_url.split("?")[0]
file_name = temp_url.split("/")[-1] + ".wav"
file_path = os.path.join(temp_dir, file_name)
try:
print("Downloading audio from URL")
wget.download(audio_url, out=file_path)
print("Audio downloaded")
except Exception as e:
print("Error downloading audio:", e)
raise Exception("Error Downloading audio")
else:
bucket_name = AWSS3Utils.aws_bucket
file_key = audio_url
file_name = file_key.split("/")[-1]
file_path = os.path.join(temp_dir, file_name)
try:
print("Downloading audio from S3")
AWSS3Utils.download_file_from_s3(bucket_name, file_key, file_path)
print("Audio downloaded")
except Exception as e:
print("Error downloading audio from S3:", e)
raise Exception("Error Downloading audio")

return {"path": file_path}

@staticmethod
def make_from_url_to_wav(audio_url):
temp_dir = tempfile.gettempdir()
temp_url = audio_url.split("?")[0]
file_name = temp_url.split("/")[-1] + ".wav"
audio_file = temp_dir + os.sep + file_name
file_name = temp_url.split("/")[-1]
audio_file = os.path.join(temp_dir, file_name)

try:
print("Downloading audio from url")
print("Downloading audio from URL")
wget.download(audio_url, out=audio_file)
print("audio downloaded")
print("\naudio downloaded")

_, file_extension = os.path.splitext(file_name)
if file_extension != '.wav':
audio = AudioSegment.from_file(audio_file, format=file_extension[1:])
wav_file = os.path.splitext(audio_file)[0] + '.wav'
audio.export(wav_file, format='wav')
os.remove(audio_file)
audio_file = wav_file
except Exception as e:
log.exception("Error downloading audio:", e)
raise Exception("Error Downloading audio")
logging.exception("Error downloading or converting audio:", e)
raise Exception("Error downloading or converting audio")
return {"path": audio_file}

@staticmethod
def make_from_file_on_disk(audio_path):
return {"path": audio_path}



media_factory = {
Expand Down
23 changes: 23 additions & 0 deletions src/core/models/s3_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
import boto3
import os

class AWSS3Utils:
aws_access_key_id = os.getenv('AWS_ACCESS_KEY_ID')
aws_secret_access_key = os.getenv('AWS_SECRET_ACCESS_KEY')
aws_region = os.getenv('AWS_REGION')
aws_bucket = os.getenv('AWS_BUCKET')
session = boto3.Session(
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key,
region_name=aws_region
)
s3 = session.client('s3')

@staticmethod
def download_file_from_s3(bucket_name, file_key, local_file_path):
try:
AWSS3Utils.s3.download_file(bucket_name, file_key, local_file_path)
print(f"File {file_key} downloaded successfully!")
except Exception as e:
print(f"Error downloading file {file_key}: {e}")
raise Exception("Error Downloading file from S3")
Loading

0 comments on commit bba03e4

Please sign in to comment.