Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project Set Up #2

Open
technosaby opened this issue Jun 8, 2022 · 16 comments
Open

Project Set Up #2

technosaby opened this issue Jun 8, 2022 · 16 comments
Assignees
Labels
enhancement New feature or request

Comments

@technosaby
Copy link
Owner

The idea is to prepare the project set up in the Singularity inside Redhen's infrastructure

@technosaby
Copy link
Owner Author

@brucearctor I need one help here. I am not able to run the script (to generate the audio files from video) in Case HPC. Do I need to create a Docker env for the same ?

@brucearctor
Copy link
Collaborator

Ultimately, yes, everything needs to be able to run in the infra -- likely just a bit of specific containerizing/packaging to get things working [ python/tensorflow/etc will run in that environment ]. Check with the community ( ex: slack ) or hop on one of the calls Wednesday or Friday for some preliminary tips, if needed.

@turnermarkb
Copy link
Collaborator

turnermarkb commented Jun 21, 2022 via email

@technosaby
Copy link
Owner Author

For clips, see https://sites.google.com/case.edu/techne-data-requests/home

@turnermarkb Sorry I could not understand your comments. I was thinking to run audio processing from "/mnt/rds/redhen/gallina/tv/2022" folder first and for other years (2021,...) and generate the audio files in my Gallina home. After that I plan to do the tagging and store the results in safe. Is this correct approach or we need to run this on some specified set of file ?

@turnermarkb
Copy link
Collaborator

turnermarkb commented Jun 22, 2022 via email

@technosaby
Copy link
Owner Author

technosaby commented Jun 26, 2022

@brucearctor I was able to create a tensorflow based local docker image from github workflows. Then I creates a local singularity container and copied it to HPC. Now I plan to run the container in HPC to execute my scripts.

Can you please check if I am going in the correct direction (Blog: https://technosaby.github.io/gsoc/phase1/week5) . The latest code is in main branch.

@turnermarkb
Copy link
Collaborator

turnermarkb commented Jun 26, 2022 via email

@technosaby
Copy link
Owner Author

technosaby commented Jun 26, 2022

@turnermarkb Thanks for your suggestion. Can you please explain what do you mean by "outside of Singularity" ? Do you mean only use the docker and not singularity ? If so, I could not find a way to run the docker containers directly in HPC. Please let me know if there is a resource which I have missed.

So I am using this approach,

Build Scripts (Local) -> Put in Docker container (Local) -> Build Singularity container sif image(Local) -----Copy to HPC ----> Execute containers (HPC).

As there is no existing audio pipleine, I am planning to build all audio data from the videos from /mnt/rds/redhen/gallina/tv/2021 and extend it for other videos (years) later. As the size of this is big, I need to run it in HPC.

Please let me know if this approach is correct or there is some faster way to do things as the tensorflow based sif image is also around 3GB, so takes a good amount of time to copy.

@brucearctor
Copy link
Collaborator

  1. @technosaby -- I like that you're getting containers going. It does seem there is a possibility that CWRU HPC might support docker, in addition to singularity. The path you're on re: docker/singularity, seems fine for your current stage. Singularity isn't going to hurt anything -- ultimately, the choice of runtime singularity/docker should just be one minor implementation detail [ even though getting to work and run in HPC infrastructure is required ].

  2. Prove that the tagging and a 'pipeline' can work for a single video file, then multiple, then more ... don't worry about addressing for year/years at this time. You'll want to explore [ at some times manually ] over many files to ensure you're happy with the performance of your tagger, and that the output produced on a given file is in the desired format.

I think that @turnermarkb is also saying -- no need ( and probably not even desired ... until the end of your project ) to get things running over years of data. It is great if you are prepared to do so, but you'll want to run it over years with what you determine to be the optimal model, which I imagine that you will iterate on throughout the summer.

@turnermarkb
Copy link
Collaborator

turnermarkb commented Jun 26, 2022 via email

@technosaby
Copy link
Owner Author

  1. @technosaby -- I like that you're getting containers going. It does seem there is a possibility that CWRU HPC might support docker, in addition to singularity. The path you're on re: docker/singularity, seems fine for your current stage. Singularity isn't going to hurt anything -- ultimately, the choice of runtime singularity/docker should just be one minor implementation detail [ even though getting to work and run in HPC infrastructure is required ].
  2. Prove that the tagging and a 'pipeline' can work for a single video file, then multiple, then more ... don't worry about addressing for year/years at this time. You'll want to explore [ at some times manually ] over many files to ensure you're happy with the performance of your tagger, and that the output produced on a given file is in the desired format.

I think that @turnermarkb is also saying -- no need ( and probably not even desired ... until the end of your project ) to get things running over years of data. It is great if you are prepared to do so, but you'll want to run it over years with what you determine to be the optimal model, which I imagine that you will iterate on throughout the summer.

@brucearctor Thanks for your comments. I will keep this task for later work and work on baselining.

For now I am processing the audio using my script.

@turnermarkb
Copy link
Collaborator

turnermarkb commented Jun 27, 2022 via email

@technosaby
Copy link
Owner Author

Final model updates and merging to singularity container for delivery will be taken care in the last milestone

@technosaby
Copy link
Owner Author

As discussed in last meeting with @turnermarkb , as the tagging is being done properly, it is the correct time to do the packaging and them focus on improving that from there. So I will work on making a singularity image from my codebase.

@brucearctor
Copy link
Collaborator

Yes, start with the baseline of things working -- tagging works, now operationalize with good foundations -- then optimize/retrain/improve.

@technosaby
Copy link
Owner Author

After copying the video files from the /mnt/rds/rehen/gallina to my scratch folder and then running the scripts using the singularity container from the docker file, all tags get generated properly @brucearctor @turnermarkb

@technosaby technosaby moved this from In Progress to Review in progress in @technosaby's Tagging Sound Effects Aug 1, 2022
@technosaby technosaby moved this from Review in progress to Done in @technosaby's Tagging Sound Effects Aug 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

3 participants