-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflow always stuck on stage_in_local_local_0_0 when using container #1
Comments
It could just be a slow download. You could wait for the workflow to finish (takes some time due to retries/backoffs), and then run In the meantime, you can check that access to the data and containers is working correctly. On the same host you are running the workflow, try the following two commands:
Are they getting stuck? Taking a long time? |
Yes, the analyzer says it was a download failure.
I am wondering if it is possible to use the downloaded sif image when submitting the plan instead of downloading it from web. I just want the workflow to use the local image (the sif file). It seems that the image value of Container API cannot be the local file server or it throws an error when submitting. container = None
if tc_target == 'container':
container = Container('montage',
Container.SINGULARITY,
'file:///home/scitech/montage-workflow-v3/montage-workflow-v3.sif'
).add_env(MONTAGE_HOME='/opt/Montage')
tc.add_containers(container)
|
It seems Pegasus is a little bit sensitive to the
|
I've followed your workaround. The sif file is located at diff --git a/montage-workflow.py b/montage-workflow.py
index 14d6474..2c4a209 100755
--- a/montage-workflow.py
+++ b/montage-workflow.py
@@ -69,7 +69,8 @@ def build_transformation_catalog(tc_target, wf):
if tc_target == 'container':
container = Container('montage',
Container.SINGULARITY,
- 'https://data.isi.edu/montage/images/montage-workflow-v3.sif'
+ 'file:///local-scratch/scitech/montage-workflow-v3/montage-workflow-v3.sif',
+ image_site="local"
).add_env(MONTAGE_HOME='/opt/Montage')
tc.add_containers(container)
@@ -87,7 +88,7 @@ def build_transformation_catalog(tc_target, wf):
else:
# container
transformation = Transformation(fname,
- site='insidecontainer',
+ site='condorpool',
pfn=os.path.join(base_dir, fname),
container=container,
is_stageable=False) Then I got this result.
Do I have to write the sites.yml additionally? If it does, can you share your sites.yml? I've tried several versions of sites.yml and they went wrong during the planning. pegasus: '5.0'
sites:
- name: local
directories:
- type: localScratch
path: /tmp/wf/scratch
fileServers:
- url: file:///home/scitech/montage-workflow-v3/scratch
operation: all pegasus: '5.0'
sites:
- name: condorpool
directories:
- type: localScratch
path: /tmp/wf/scratch
fileServers:
- url: file:///home/scitech/montage-workflow-v3/scratch
operation: all
|
Did you change that file:// location to your location ( You should not need a site catalog - the default here is using HTCondor's builtin file transfers. What version of Pegasus are you using? |
Yes, I've changed the image path in montage-workflow.py as the previous comment showing. The image path is set to $ realpath scratch/montage-workflow-v3.sif
/home/scitech/montage-workflow-v3/scratch/montage-workflow-v3.sif I also tried to change the path to container = None
if tc_target == 'container':
container = Container('montage',
Container.SINGULARITY,
'file:///local-scratch/montage-workflow-v3.sif',
image_site="local"
).add_env(MONTAGE_HOME='/opt/Montage')
tc.add_containers(container) My Pegasus version is $ pegasus-version
5.0.8 |
I mean that |
You can reproduce this in the Pegasus tutorial container. I downloaded the
montage-workflow-v3.sif
and modified the apptainer command to make it use the local image inexample-dss-containers.sh
.After submitting the plan for 10 minutes, you can see that there's only
stage_in_local_local_0_0
still running. Now, it has been running over an hour while the workflow gallery says it should only take 5 minutes.In run dir, no stdout or stderr is printed to files. The contents of stdin are shown below.
The text was updated successfully, but these errors were encountered: