Video downloader #303

dale-wahl · 2022-10-25T13:18:06Z

The only part that currently works without issue is the video downloader.

Right now, I am stuck on an issue with ffmpeg which is annoyingly the basis of a very different processors. It seems that when running ffmpeg as a python subprocess, it produces errors such as the following:

[h264 @ 0x559bdfb273c0] Invalid NAL unit size (2767169 > 10809).
[h264 @ 0x559bdfb273c0] Error splitting the input into NAL units.
[h264 @ 0x559bdfb44040] Invalid NAL unit size (3041857 > 11882).
[h264 @ 0x559bdfb44040] Error splitting the input into NAL units.

Eventually ending as such:

Error while decoding stream #0:0: Invalid data found when processing input
    Last message repeated 7 times
frame=    3 fps=0.0 q=1.6 Lsize=N/A time=00:00:00.60 bitrate=N/A dup=0 drop=3 speed=5.05x    
video:20kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
Conversion failed!

As far as I can tell, running the same command as being used in subprocess does not produce these errors if run directly. This seems to be true on Linux at least. The command creates the desired number of frames (depending on provided framerate) while as a python subprocess only a few (3-5) frames are created before failure. I was able to determine that it has nothing to do with the unzipping process by hardcoding an unzipped folder and testing that as the source.

The processors/visualisation/video_frames.py is a bare bones wrapper for subprocess running ffmpeg.

makes no difference anyway; issue with subprocesses/ffmpeg not unzipping process.

DOES NOT WORK DUE TO CURRENT ffmpeg/subprocess issue

dale-wahl · 2022-10-25T14:32:06Z

Even more confusion. I made a simple bash script wrapper.

#!/bin/bash

ffmpeg -i $1 -s 144x144 -r $2 $3 >> $4 2>>$4

I can run that with expected variables and get a good result. But not when 4CAT runs it. I can hardcode the unzipped videos and have 4CAT use them as parameters and still get the same NAL unit errors resulting in return code 69.

Tue Oct 25 14:22:11 2022: Ran command: /usr/src/app/ffmpeg_wrapper.sh /usr/src/app/test/https_video_twimg_com_ext_tw_video_1576577278083010561_pu_vid_576x772_74u7etzog9wuumoz_mp4_tag_12.mp4 5.0 /usr/src/app/result/video_frame_%07d.jpeg /usr/src/app/result/wrapper.log
Tue Oct 25 14:22:11 2022: Error Return Code with video /usr/src/app/test/https_video_twimg_com_ext_tw_video_1576577278083010561_pu_vid_576x772_74u7etzog9wuumoz_mp4_tag_12.mp4: 69

While I can literally open up python3

import subprocess
import shlex
command = '/usr/src/app/ffmpeg_wrapper.sh /usr/src/app/test/https_video_twimg_com_ext_tw_video_1576577278083010561_pu_vid_576x772_74u7etzog9wuumoz_mp4_tag_12.mp4 5.0 /usr/src/app/result/video_frame_%07d.jpeg /usr/src/app/result/wrapper.log'
result = subprocess.run(shlex.split(command))

Return code 0 and good results. Clearly I've lost my mind.

dale-wahl · 2022-10-26T08:46:14Z

I've no idea. The video_frames processor refuses to work even if I extract videos myself and just feed it that directory. I made a simple ffmpeg_wrapper.py that does, as far as I can tell, exactly the same thing. It works without issue in the same environment. Something must be screwing around in the environment/libraries/something that 4CAT loads, but I haven't a clue what it could be. The Invalid NAL unit size error seems to be rare and I cannot pin down what it really means. Something a about start bytes perhaps? What is causing the difference is a mystery. Unsure how to proceed.

dale-wahl · 2022-10-26T09:33:22Z

Thinking specifically about the ffmpeg error being related to somehow the byte sequence being off. Then looking at how we use subprocess across 4CAT, I found this https://stackoverflow.com/a/52008583/8683110. And I'll be damned but it worked.

there??), update description

https://github.com/dale-wahl/videohash submitted PR akamhy/videohash#99

dale-wahl · 2022-10-27T15:03:02Z

Currently the videohash library will not work with 4CAT due to the subprocess bug. I have a PR request and a fork that works, so we just have to install that version.

stijn-uva · 2022-12-22T15:41:51Z

OK, I've tested this and I think it's mostly ready for merging (aside from the hash processor). The migrate script can install ffmpeg in existing Docker containers, I haven't tested other options very extensively. Let's take a last look after the winter break.

dale-wahl · 2022-12-22T18:23:54Z

OK, I've tested this and I think it's mostly ready for merging (aside from the hash processor). The migrate script can install ffmpeg in existing Docker containers, I haven't tested other options very extensively. Let's take a last look after the winter break.

Ok. I haven’t looked through all your commits and must have missed commenting earlier, but I had added my working videohash library to setup.py already so there shouldn’t be any issue there with that processor either manual install or Docker. Also Docker setup was already updated to install ffmpeg, the only issue was needing an additional step with a manual install which could have been solved with first run. Adding it to the newest migrate makes sense for upgrades.

noticed some processors (that used iterate_archive_contents were not removing staging areas if one was provided

This is a proxy check since we are using the .env file copied into the Docker container. It will always be the version used to create the Docker container, but if a user already updated the .env file and for some reason has not yet used that .env file to create the 4CAT container, this message would still appear.

dale-wahl · 2023-01-05T09:41:46Z

I see that the allow-indirect admin setting disables ytdlp entirely. I'm wondering if we shouldn't be more explicit in the setting description. Right now it just mentions "e.g. embedded in a linked tweet", but I think we should at least mention YouTube and perhaps even link to ytdlp supported sites. Hmmm, I should turn off that reference if allow-indirect is not selected.

had a set where the last timeline was the widest and the canvas was being cut short. Still seems that if the last is the widest, the thumbnails layer on top of "made with 4CAT".

dale-wahl · 2023-01-05T12:40:47Z

last commit fixes issue with timelines, but the "made with 4CAT" is overwritten if the last timeline is the full width of the canvas.

dale-wahl added 15 commits October 6, 2022 18:01

initial download_videos.py; only works with tiktok

6fc02a7

fix on tiktok-urls to retry retries

c28b609

Merge branch 'master' into video_downloader

c29b48b

download_videos extract urls from text and extensions from Content-Type

2643a90

Merge branch 'master' into video_downloader

dc2f481

twitter works now that search_twitterv2 and map_item have been updated

9526d9f

initial video hash processor

8bca07f

docker install ffmpeg and videohash

0d3bee6

catch no ffmpeg error

e06b99c

fix collage paths

f218aaf

Merge branch 'master' into video_downloader

0ef87da

processor to extract frames; only uses ffmpeg. FAILS

1584656

Testing various unzips

799fac1

Use python unzip (instead of hardcoded unzipped files)

ec00261

makes no difference anyway; issue with subprocesses/ffmpeg not unzipping process.

clean up and rename video hasher

2c298ba

DOES NOT WORK DUE TO CURRENT ffmpeg/subprocess issue

dale-wahl added 2 commits October 26, 2022 10:40

simplify video_frames

a1f9193

test direct python script... WORKS!!!

4da4e3b

dale-wahl requested a review from stijn-uva October 26, 2022 08:46

dale-wahl added 2 commits October 26, 2022 11:32

inspired by Gondor

abfff63

don't need you anymore

237b6e0

allow frame size, save metadata, remove staging_area (why was it still

976f742

there??), update description

dale-wahl mentioned this pull request Oct 26, 2022

pyhon subprocess inherits stdin by default and causes ffmpeg to fail akamhy/videohash#98

Open

dale-wahl added 4 commits October 26, 2022 13:51

NOTE: this requires modification in videohash

e49885d

https://github.com/dale-wahl/videohash submitted PR akamhy/videohash#99

update descriptions, convert output to csv

a09dfba

create bit hash network based on similarities between all hashes

5feb6c8

update title

914f81b

stijn-uva added 14 commits December 21, 2022 14:07

Merge branch 'master' into video_downloader

9c29d46

Allow Markdown in processor option labels

ee4cdc8

Reword scene detection interface strings

bbf7dde

Call them "timelines" instead of "stripes"

5672b53

Add 'scene timelines' preset

1cf3238

Don't require "parameters" key in presets

1f411d5

Make scene frame extractor work in presets

4c55c5e

Cheeky 4CAT link in canvas footer

577b9df

More video download status updates

64b2176

Carry over metadata when extracting scene frames

68a3fe4

Update docstrings

b5ebee9

Disable VideoHasher pending upstream fixes

dee5083

Merge branch 'master' into video_downloader

3276c21

Migrate fixes

f2505e6

dale-wahl added 7 commits January 3, 2023 12:36

add raw arg to set_or_create in migrate 1.30

2570080

add other raw

fac708b

clean up staging_areas

6de06cd

noticed some processors (that used iterate_archive_contents were not removing staging areas if one was provided

Merge branch 'master' into video_downloader

ce62d4f

Merge branch 'master' into video_downloader

f4ce7f7

only create scenes from downloaded vids

7fdfd7f

dale-wahl added 3 commits January 5, 2023 10:43

ytdlp references behind admin setting

9c3e86a

0 actually allows unlimited video downloads (if admin allows)

1096e3d

fix bug where last timeline width overwritten

fe25e14

had a set where the last timeline was the widest and the canvas was being cut short. Still seems that if the last is the widest, the thumbnails layer on top of "made with 4CAT".

Re-enable video hasher

dd3e393

stijn-uva merged commit 302205b into master Jan 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Video downloader #303

Video downloader #303

dale-wahl commented Oct 25, 2022

dale-wahl commented Oct 25, 2022

dale-wahl commented Oct 26, 2022

dale-wahl commented Oct 26, 2022

dale-wahl commented Oct 27, 2022 •

edited

Loading

stijn-uva commented Dec 22, 2022

dale-wahl commented Dec 22, 2022 •

edited

Loading

dale-wahl commented Jan 5, 2023

dale-wahl commented Jan 5, 2023

Video downloader #303

Video downloader #303

Conversation

dale-wahl commented Oct 25, 2022

dale-wahl commented Oct 25, 2022

dale-wahl commented Oct 26, 2022

dale-wahl commented Oct 26, 2022

dale-wahl commented Oct 27, 2022 • edited Loading

stijn-uva commented Dec 22, 2022

dale-wahl commented Dec 22, 2022 • edited Loading

dale-wahl commented Jan 5, 2023

dale-wahl commented Jan 5, 2023

dale-wahl commented Oct 27, 2022 •

edited

Loading

dale-wahl commented Dec 22, 2022 •

edited

Loading