Added support to extract language from video #408

plon-Susk7 · 2024-10-14T16:07:00Z

Solved issue #319

Made changes to detect_lang_of_audio and test_detect_lang_of_audio and their corresponding requirements file.
Used ffmpeg library to extract audio from video.
Renamed file name from detect_lang_of_audio to detect_lang_of_media. Similar renaming done for test file and requirements file.

merge dev to main

Automatically generated by python-semantic-release

src/core/operators/detect_lang_of_media.py

aatmanvaidya · 2024-10-15T16:37:43Z

hi @plon-Susk7 - I am facing an error when I run the test from inside the docker container.

ResourceWarning: Enable tracemalloc to get the object allocation traceback
.Error extracting audio: ffmpeg error (see stderr output for detail)
E
======================================================================
ERROR: test_english_detection_video (core.operators.test_detect_lang_of_media.Test.test_english_detection_video)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/app/core/operators/test_detect_lang_of_media.py", line 28, in test_english_detection_video
    lang = detect_lang_of_media.run(audio_file_path,'video')
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/app/core/operators/detect_lang_of_media.py", line 236, in run
    extract_audio_from_video(audio_file["path"])
  File "/usr/app/core/operators/detect_lang_of_media.py", line 136, in extract_audio_from_video
    .run(quiet=True, overwrite_output=True)
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/app/venv/lib/python3.11/site-packages/ffmpeg/_run.py", line 325, in run
    raise Error('ffmpeg', out, err)
ffmpeg._run.Error: ffmpeg error (see stderr output for detail)

am I doing something wrong?

also there are other some small things I feel we should do

Can you profile this operator on video's of different lengths for RAM usage and time. Here is the wiki that might help you - https://github.com/tattle-made/feluda/wiki/Optimization. You can just time.time() to calcuate the duration and use the memary code from the wiki to profile for RAM. You can choose videos of different lengths 5min, 10min, 10min, 20min and 30min. Here are the results of profiling the same operator on audio files.
Majorly when we use the operator's, we first download the media item from a hosted URL or our AWS cloud service. So once the processing is done, can you make sure the file is safely processed and deleted.
There is already code present in the operator to delete the file - code line. Can you also just make sure that the video file gets deleted. Just in case if you want, here is a JSON file with a lot of CDN links for audio and video files you can use for testing.
Just like other functions in the operator, can you add 2 line documentation comment writing the input and output of the run() function. This is just to make sure someone else using the operator know's what the run() function expects.

Let me know if I can be of help in any of the above!

plon-Susk7 · 2024-10-16T04:47:06Z

hi @plon-Susk7 - I am facing an error when I run the test from inside the docker container.
ResourceWarning: Enable tracemalloc to get the object allocation traceback
.Error extracting audio: ffmpeg error (see stderr output for detail)
E
======================================================================
ERROR: test_english_detection_video (core.operators.test_detect_lang_of_media.Test.test_english_detection_video)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/app/core/operators/test_detect_lang_of_media.py", line 28, in test_english_detection_video
    lang = detect_lang_of_media.run(audio_file_path,'video')
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/app/core/operators/detect_lang_of_media.py", line 236, in run
    extract_audio_from_video(audio_file["path"])
  File "/usr/app/core/operators/detect_lang_of_media.py", line 136, in extract_audio_from_video
    .run(quiet=True, overwrite_output=True)
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/app/venv/lib/python3.11/site-packages/ffmpeg/_run.py", line 325, in run
    raise Error('ffmpeg', out, err)
ffmpeg._run.Error: ffmpeg error (see stderr output for detail)
am I doing something wrong?

also there are other some small things I feel we should do

Can you profile this operator on video's of different lengths for RAM usage and time. Here is the wiki that might help you - https://github.com/tattle-made/feluda/wiki/Optimization. You can just time.time() to calcuate the duration and use the memary code from the wiki to profile for RAM. You can choose videos of different lengths 5min, 10min, 10min, 20min and 30min. Here are the results of profiling the same operator on audio files.

Majorly when we use the operator's, we first download the media item from a hosted URL or our AWS cloud service. So once the processing is done, can you make sure the file is safely processed and deleted.
There is already code present in the operator to delete the file - code line. Can you also just make sure that the video file gets deleted. Just in case if you want, here is a JSON file with a lot of CDN links for audio and video files you can use for testing.

Just like other functions in the operator, can you add 2 line documentation comment writing the input and output of the run() function. This is just to make sure someone else using the operator know's what the run() function expects.

Let me know if I can be of help in any of the above!

I'll make the necessary changes soon.

aatmanvaidya · 2024-10-16T06:00:33Z

hi, did you accidentally close the PR? or any specific reason to do so?

plon-Susk7 · 2024-10-16T06:06:58Z

Sorry Aatman this was accidental. I'll open it again once changes are made.

plon-Susk7 · 2024-10-16T11:03:39Z

Majorly when we use the operator's, we first download the media item from a hosted URL or our AWS cloud service. So once the processing is done, can you make sure the file is safely processed and deleted.
There is already code present in the operator to delete the file - code line. Can you also just make sure that the video file gets deleted. Just in case if you want, here is a JSON file with a lot of CDN links for audio and video files you can use for testing.

We are converting the mp4 file to speech file. So isn't removing videos same as removing speech here? Are we supposed to physically remove the file from disk?

aatmanvaidya · 2024-10-16T11:50:22Z

Majorly when we use the operator's, we first download the media item from a hosted URL or our AWS cloud service. So once the processing is done, can you make sure the file is safely processed and deleted.
There is already code present in the operator to delete the file - code line. Can you also just make sure that the video file gets deleted. Just in case if you want, here is a JSON file with a lot of CDN links for audio and video files you can use for testing.

We are converting the mp4 file to speech file. So isn't removing videos same as removing speech here? Are we supposed to physically remove the file from disk?

yes we have to physically remove the file from disk

plon-Susk7 · 2024-10-16T16:37:27Z

Video Length	CPU Time (s)	RAM Usage
5 mins	3.75	619.9 MiB
10 mins	6.14	628.8 MiB
15 mins	8.46	638.1 MiB
20 mins	10.99	647.1 MiB
30 mins	15.29	665.6 MiB

plon-Susk7 · 2024-10-16T16:38:16Z

Video Length CPU Time (s) RAM Usage
5 mins 3.75 619.9 MiB
10 mins 6.14 628.8 MiB
15 mins 8.46 638.1 MiB
20 mins 10.99 647.1 MiB
30 mins 15.29 665.6 MiB

stats after profiling

aatmanvaidya · 2024-10-16T17:16:18Z

@plon-Susk7 thank you so much for your work and effort, things look good, merging the PR!

aatmanvaidya and others added 9 commits September 12, 2024 14:20

Merge pull request tattle-made#381 from tattle-made/development

48bfc87

merge dev to main

0.8.0

690d48a

Automatically generated by python-semantic-release

added video support

a73d9c3

minor change to req

74a5d8f

added test cases

18a7a3b

changed files names

4b8bdd6

minor change to test file

9c29d79

added exception handling to function

ac80fd7

bug fixed for test file

60024d9

aatmanvaidya self-requested a review October 14, 2024 16:18

aatmanvaidya changed the base branch from main to development October 14, 2024 16:18

updated hash file

61705f0

aatmanvaidya reviewed Oct 14, 2024

View reviewed changes

src/core/operators/detect_lang_of_media.py Show resolved Hide resolved

plon-Susk7 added 2 commits October 14, 2024 22:32

added exception handling for file formats

a56c8f9

made changes to run

7b0f8bc

plon-Susk7 closed this Oct 16, 2024

plon-Susk7 reopened this Oct 16, 2024

tests fixed

4348805

plon-Susk7 added 3 commits October 16, 2024 21:35

changed media from audio to media

3cbc919

test cases updated

bc3eb57

removed redundant ifs

2e2f96d

aatmanvaidya linked an issue Oct 16, 2024 that may be closed by this pull request

Operator to detect language in a Video file #319

Closed

aatmanvaidya merged commit 3805859 into tattle-made:development Oct 16, 2024
4 of 5 checks passed

plon-Susk7 deleted the audio_to_media branch October 16, 2024 18:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support to extract language from video #408

Added support to extract language from video #408

plon-Susk7 commented Oct 14, 2024

aatmanvaidya commented Oct 15, 2024

plon-Susk7 commented Oct 16, 2024

aatmanvaidya commented Oct 16, 2024

plon-Susk7 commented Oct 16, 2024

plon-Susk7 commented Oct 16, 2024

aatmanvaidya commented Oct 16, 2024

plon-Susk7 commented Oct 16, 2024

plon-Susk7 commented Oct 16, 2024

aatmanvaidya commented Oct 16, 2024

Added support to extract language from video #408

Added support to extract language from video #408

Conversation

plon-Susk7 commented Oct 14, 2024

aatmanvaidya commented Oct 15, 2024

plon-Susk7 commented Oct 16, 2024

aatmanvaidya commented Oct 16, 2024

plon-Susk7 commented Oct 16, 2024

plon-Susk7 commented Oct 16, 2024

aatmanvaidya commented Oct 16, 2024

plon-Susk7 commented Oct 16, 2024

plon-Susk7 commented Oct 16, 2024

aatmanvaidya commented Oct 16, 2024