Merge pull request #286 from tattle-made/development

chore: merging development to main
tattle-made · May 1, 2024 · bba03e4 · bba03e4
2 parents 0cc9624 + 6239cef
commit bba03e4
Show file tree

Hide file tree

Showing 25 changed files with 862 additions and 120 deletions.
diff --git a/.github/workflows/docker-push-media-worker-staging.yml b/.github/workflows/docker-push-media-worker-staging.yml
@@ -0,0 +1,46 @@
+name: Publish Media Worker to Dockerhub for Staging
+
+permissions:
+  contents: read
+
+on: workflow_dispatch
+
+jobs:
+  docker:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Set up QEMU
+        uses: docker/setup-qemu-action@68827325e0b33c7199eb31dd4e31fbe9023e06e3 # v3.0.0
+
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@2b51285047da1547ffb1b2203d8be4c0af6b1f20 # v3.2.0
+
+      - name: Login to Docker Hub
+        uses: docker/login-action@e92390c5fb421da1463c202d546fed0ec5c39f20 # v.3.1.0
+        with:
+          username: ${{ secrets.DOCKERHUB_USERNAME }}
+          password: ${{ secrets.DOCKERHUB_TOKEN }}
+
+      - name: Build and push amd64
+        uses: docker/build-push-action@2cdde995de11925a030ce8070c3d77a52ffcf1c0 # v5.3.0
+        with:
+          context: "{{defaultContext}}:src/"
+          file: worker/media/Dockerfile.media_worker
+          platforms: linux/amd64
+          build-args: |
+            "UID=1000"
+            "GID=1000"
+          push: true
+          tags: tattletech/feluda-operator-media:worker-amd64-latest
+
+      - name: Build and push arm64
+        uses: docker/build-push-action@2cdde995de11925a030ce8070c3d77a52ffcf1c0 # v5.3.0
+        with:
+          context: "{{defaultContext}}:src/"
+          file: worker/media/Dockerfile.media_worker_graviton
+          platforms: linux/arm64
+          build-args: |
+            "UID=1000"
+            "GID=1000"
+          push: true
+          tags: tattletech/feluda-operator-media:worker-arm64-latest
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1084,45 +1084,45 @@ refactor: merging development to main ([`22bb325`](https://github.com/tattle-mad
 
 * feat: feluda store supports audio (#78)
 
-* feat: feluda store supports audio
-
-* fix: delete and refresh for ES
-
+* feat: feluda store supports audio
+
+* fix: delete and refresh for ES
+
 * dhore: profiling audio operator ([`f6987a6`](https://github.com/tattle-made/feluda/commit/f6987a6d3aa4ff018b5ebac248a4469437df80d3))
 
 * feat: add poc multiprocess test ([`f43646b`](https://github.com/tattle-made/feluda/commit/f43646b4145af6cd8c6ea718be895ecbca77d271))
 
 * feat: audio operator to extract embedding vectors (#59)
 
-* feat: audio emebddings
-
-* chore: deleting music files
-
-* chore: renaming files
-
-* docs: documentation for audio embedding operator
-
+* feat: audio emebddings
+
+* chore: deleting music files
+
+* chore: renaming files
+
+* docs: documentation for audio embedding operator
+
 * docs: adding work to be done for the operator ([`484d5ae`](https://github.com/tattle-made/feluda/commit/484d5aed902b46d627c060625fad2a64a6246461))
 
 * feat: c-profiling test for video vec (#60)
 
-* feat: c-profiling test for video vec
-
+* feat: c-profiling test for video vec
+
 * feat: test to find time taken for video vec ([`247f5db`](https://github.com/tattle-made/feluda/commit/247f5db90dc708f04bd0818f790d95c3e2c67a42))
 
 * feat: add workflow to push vidvec specific operator to dockerhub ([`17e0d57`](https://github.com/tattle-made/feluda/commit/17e0d576492e379b2e4165e2b728868ab3fad455))
 
 * feat: operator to detect objects using YOLO (#44)
 
-* feat: operator to detect objects using YOLO
-
-* test file comment main function
-
+* feat: operator to detect objects using YOLO
+
+* test file comment main function
+
 * chore: moving ultralytics install to opreator ([`17b9d10`](https://github.com/tattle-made/feluda/commit/17b9d107464f24875ca7008c4cf81cf0466e45d3))
 
 * feat: operator to extract text in images using tesseract (#40)
 
-* feat: opreator to detect text in images using tesseract
+* feat: opreator to detect text in images using tesseract
 * chore: adding test images and making test multilingual ([`edec4a9`](https://github.com/tattle-made/feluda/commit/edec4a97763dd81e7dd7013833c690890563a1e9))
 
 * feat: add license ([`a44e233`](https://github.com/tattle-made/feluda/commit/a44e233bf36fb4ad0d5bbd41a29523dc1f5364aa))
@@ -1194,8 +1194,8 @@ refactor: merging development to main ([`22bb325`](https://github.com/tattle-mad
 
 * fix: video search (#52)
 
-* chore: moving test files to a folder
-* fix: video search
+* chore: moving test files to a folder
+* fix: video search
 * docs: commenting TODO in search.py ([`af54ac0`](https://github.com/tattle-made/feluda/commit/af54ac0b7e2ef1afc139939e88cfc0f3fcc2dbfc))
 
 * fix: search api as client ([`2573490`](https://github.com/tattle-made/feluda/commit/25734905ed04d6f302542ae152a3954d20e7ad31))
@@ -1238,10 +1238,10 @@ refactor: merging development to main ([`22bb325`](https://github.com/tattle-mad
 
 * refactor: benchmark test sh file (#64)
 
-* refactor: benchmark test sh file
-
-* ci: dockerfile udpate for benchmark.sh
-
+* refactor: benchmark test sh file
+
+* ci: dockerfile udpate for benchmark.sh
+
 * chore: echo statements for benchmark file ([`37e768a`](https://github.com/tattle-made/feluda/commit/37e768a38af2dcc169c395dace49bd708eeeaef1))
 
 * refactor: cleanup deprecated thigns. ([`4c67853`](https://github.com/tattle-made/feluda/commit/4c67853e75b8511cfb89158a358f60523851b138))
@@ -1264,8 +1264,8 @@ refactor: merging development to main ([`22bb325`](https://github.com/tattle-mad
 
 * test: worker to queue and index video files (#84)
 
-* refactor: small improvements
-
+* refactor: small improvements
+
 * test: worker to queue and index video vec ([`6eaf19b`](https://github.com/tattle-made/feluda/commit/6eaf19b39298762b6f9a3f34e50858d7314d02e6))
 
 ### Unknown
@@ -1437,8 +1437,8 @@ Add ElasticSearch benchmarking ([`03915d3`](https://github.com/tattle-made/felud
 
 * [WIP] test: evaluating audio vec ES index and search (#77)
 
-* test: evaluating audio vec ES index and search
-
+* test: evaluating audio vec ES index and search
+
 * docs: delete stored documents ([`ad94ad7`](https://github.com/tattle-made/feluda/commit/ad94ad745d85c734fe1c7671ce43c4f2a9e28876))
 
 * Merge pull request #76 from duggalsu/add_arch_to_docker_tag

diff --git a/src/core/config.py b/src/core/config.py
@@ -8,16 +8,15 @@
 """
 
 import logging
-from typing import List, Optional
+from typing import List, Optional, Union
 import yaml
 from dataclasses import dataclass
 from dacite import from_dict
 
 log = logging.getLogger(__name__)
 
-
 @dataclass
-class StoreParameters:
+class StoreESParameters:
     host_name: str
     image_index_name: str
     text_index_name: str
@@ -26,10 +25,20 @@ class StoreParameters:
 
 
 @dataclass
-class StoreConfig:
+class StorePostgresParameters:
+    table_names: List[str]
+
+
+@dataclass
+class StoreEntity:
     label: str
     type: str
-    parameters: StoreParameters
+    parameters: Union[StoreESParameters, StorePostgresParameters]
+
+
+@dataclass
+class StoreConfig:
+    entities: List[StoreEntity]
 
 
 @dataclass

diff --git a/src/core/feluda.py b/src/core/feluda.py
@@ -22,7 +22,7 @@ def __init__(self, configPath):
         if self.config.store:
             from core import store
 
-            self.store = store.get_store(self.config.store)
+            self.store = store.get_stores(self.config.store)
         if self.config.queue:
             # print("---> 1", self.config.queue)
             from core.queue import Queue
@@ -61,8 +61,9 @@ def start_component(self, component_type: ComponentType):
         if component_type == ComponentType.SERVER and self.server:
             self.server.start()
         elif component_type == ComponentType.STORE and self.store:
-            self.store.connect()
-            self.store.optionally_create_index()
+            for store in self.store:
+                self.store[store].connect()
+                self.store[store].initialise()
         elif component_type == ComponentType.QUEUE and self.queue:
             self.queue.connect()
             self.queue.initialize()

diff --git a/src/core/models/media_factory.py b/src/core/models/media_factory.py
@@ -6,10 +6,11 @@
 from werkzeug.datastructures import FileStorage
 import wget
 from core.models.media import MediaType
+from core.models.s3_utils import AWSS3Utils
 import logging
 import os
 import tempfile
-import boto3
+from pydub import AudioSegment
 
 log = logging.getLogger(__name__)
 
@@ -70,23 +71,6 @@ def make_from_file_in_memory(image_data: FileStorage):
         pass
 
 class VideoFactory:
-    aws_access_key_id = os.getenv('AWS_ACCESS_KEY_ID')
-    aws_secret_access_key = os.getenv('AWS_SECRET_ACCESS_KEY')
-    aws_region = os.getenv('AWS_REGION')
-    aws_bucket = os.getenv('AWS_BUCKET')
-    session = boto3.Session(
-        aws_access_key_id=aws_access_key_id,
-        aws_secret_access_key=aws_secret_access_key,
-        region_name=aws_region
-    )
-    s3 = session.client('s3')
-    @staticmethod
-    def download_file_from_s3(bucket_name, file_key, local_file_path):
-        try:
-            VideoFactory.s3.download_file(bucket_name, file_key, local_file_path)
-            print(f"File {file_key} downloaded successfully as {local_file_path}")
-        except Exception as e:
-            print(f"Error downloading file {file_key}: {e}")
 
     @staticmethod
     def make_from_url(video_url):
@@ -104,13 +88,13 @@ def make_from_url(video_url):
                 print("Error downloading video:", e)
                 raise Exception("Error Downloading Video")
         else:
-            bucket_name = VideoFactory.aws_bucket
+            bucket_name = AWSS3Utils.aws_bucket
             file_key = video_url
             file_name = file_key.split("/")[-1]
             file_path = os.path.join(temp_dir, file_name)
             try:
                 print("Downloading video from S3")
-                VideoFactory.download_file_from_s3(bucket_name, file_key, file_path)
+                AWSS3Utils.download_file_from_s3(bucket_name, file_key, file_path)
                 print("Video downloaded")
             except Exception as e:
                 print("Error downloading video from S3:", e)
@@ -134,21 +118,61 @@ class AudioFactory:
     @staticmethod
     def make_from_url(audio_url):
         temp_dir = tempfile.gettempdir()
+
+        if audio_url.startswith("http"):
+            temp_url = audio_url.split("?")[0]
+            file_name = temp_url.split("/")[-1] + ".wav"
+            file_path = os.path.join(temp_dir, file_name)
+            try:
+                print("Downloading audio from URL")
+                wget.download(audio_url, out=file_path)
+                print("Audio downloaded")
+            except Exception as e:
+                print("Error downloading audio:", e)
+                raise Exception("Error Downloading audio")
+        else:
+            bucket_name = AWSS3Utils.aws_bucket
+            file_key = audio_url
+            file_name = file_key.split("/")[-1]
+            file_path = os.path.join(temp_dir, file_name)
+            try:
+                print("Downloading audio from S3")
+                AWSS3Utils.download_file_from_s3(bucket_name, file_key, file_path)
+                print("Audio downloaded")
+            except Exception as e:
+                print("Error downloading audio from S3:", e)
+                raise Exception("Error Downloading audio")
+
+        return {"path": file_path}
+
+    @staticmethod
+    def make_from_url_to_wav(audio_url):
+        temp_dir = tempfile.gettempdir()
         temp_url = audio_url.split("?")[0]
-        file_name = temp_url.split("/")[-1] + ".wav"
-        audio_file = temp_dir + os.sep + file_name
+        file_name = temp_url.split("/")[-1]
+        audio_file = os.path.join(temp_dir, file_name)
+
         try:
-            print("Downloading audio from url")
+            print("Downloading audio from URL")
             wget.download(audio_url, out=audio_file)
-            print("audio downloaded")
+            print("\naudio downloaded")
+
+            _, file_extension = os.path.splitext(file_name)
+            if file_extension != '.wav':
+                audio = AudioSegment.from_file(audio_file, format=file_extension[1:])
+                wav_file = os.path.splitext(audio_file)[0] + '.wav'
+                audio.export(wav_file, format='wav')
+                os.remove(audio_file)
+                audio_file = wav_file
         except Exception as e:
-            log.exception("Error downloading audio:", e)
-            raise Exception("Error Downloading audio")
+            logging.exception("Error downloading or converting audio:", e)
+            raise Exception("Error downloading or converting audio")
         return {"path": audio_file}
 
     @staticmethod
     def make_from_file_on_disk(audio_path):
         return {"path": audio_path}
+
 
 
 media_factory = {

diff --git a/src/core/models/s3_utils.py b/src/core/models/s3_utils.py
@@ -0,0 +1,23 @@
+import boto3
+import os
+
+class AWSS3Utils:
+    aws_access_key_id = os.getenv('AWS_ACCESS_KEY_ID')
+    aws_secret_access_key = os.getenv('AWS_SECRET_ACCESS_KEY')
+    aws_region = os.getenv('AWS_REGION')
+    aws_bucket = os.getenv('AWS_BUCKET')
+    session = boto3.Session(
+        aws_access_key_id=aws_access_key_id,
+        aws_secret_access_key=aws_secret_access_key,
+        region_name=aws_region
+    )
+    s3 = session.client('s3')
+
+    @staticmethod
+    def download_file_from_s3(bucket_name, file_key, local_file_path):
+        try:
+            AWSS3Utils.s3.download_file(bucket_name, file_key, local_file_path)
+            print(f"File {file_key} downloaded successfully!")
+        except Exception as e:
+            print(f"Error downloading file {file_key}: {e}")
+            raise Exception("Error Downloading file from S3")