Skip to content

feat(workflows-local-runner-unifiedstorage): update start container script to reslove symlink and use its target for mount #709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ruijiang-rjian
Copy link
Contributor

@ruijiang-rjian ruijiang-rjian commented Jun 18, 2025

Description

With unified storage projects, Workflows local runner is going to mount on the new shared directory /home/sagemaker-user/shared/. As the /home/sagemaker-user/shared/ is a symlink created from /mnt/custom-file-systems/..(s3fs mounted) and symlinks are currently rejected by LL docker proxy for mount, so we've decided to go with option1 proposed in this doc as a short-term solution:
https://quip-amazon.com/PGgoAw5bTv5N/1-Pager-LL-docker-proxy-change-for-SMUS-workflows-local-sidecar-container, that is the local runner would use the resloved symlink target for mount.

Type of Change

  • Image update - Bug fix
  • Image update - New feature
  • Image update - Breaking change
  • SMD image build tool update
  • Documentation update

Release Information

Does this change need to be included in patch version releases? By default, any pull requests will only be added to the next SMD image minor version release once they are merged in template folder. Only critical bug fix or security update should be applied to new patch versions of existed image minor versions.

  • Yes (Critical bug fix or security update)
  • No (New feature or non-critical change)
  • N/A (Not an image update)

If yes, please explain why:
[Explain the criticality of this change and why it should be included in patch releases]

How Has This Been Tested?

tested in SMUS with updated scripts + docker-compose file, created /home/sagemaker-user/d/c symlinked from /home/sagemaker-user/c

sagemaker-user@default:~$ ls -l /home/sagemaker-user/d/c
lrwxrwxrwx 1 sagemaker-user users 22 Jun 17 21:25 /home/sagemaker-user/d/c -> /home/sagemaker-user/c

restart the local runner container and was able to see all 3 container started successfully:

sagemaker-user@default:~$ bash /etc/sagemaker-ui/workflows/start-workflows-container.sh
Project is using S3 storage, project directory set to: /home/sagemaker-user/d/c
/home/sagemaker-user/c
resolved symlink target is: /home/sagemaker-user/c
Hit:1 https://download.docker.com/linux/ubuntu jammy InRelease
Hit:2 https://apt.corretto.aws stable InRelease                                                                                
Hit:3 http://archive.ubuntu.com/ubuntu jammy InRelease                                                                         
Hit:4 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:5 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:6 http://security.ubuntu.com/ubuntu jammy-security InRelease
Reading package lists... Done
Hit:1 https://download.docker.com/linux/ubuntu jammy InRelease
Hit:2 https://apt.corretto.aws stable InRelease                                                                                  
Hit:3 http://security.ubuntu.com/ubuntu jammy-security InRelease                                                                 
Hit:4 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:5 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:6 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Reading package lists... Done
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
docker-ce-cli is already the newest version (5:28.2.2-1~ubuntu.22.04~jammy).
docker-compose-plugin is already the newest version (2.29.2-1~ubuntu.22.04~jammy).
0 upgraded, 0 newly installed, 0 to remove and 22 not upgraded.
WARNING! Your password will be stored unencrypted in /home/sagemaker-user/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credential-stores

Login Succeeded
[+] Running 2/3
[+] Running 3/3aa-292-db         Running                                                                                                            0.0s 
 ✔ Container mwaa-292-db         Healthy                                                                                                            3.8s 
 ✔ Container mwaa-292-scheduler  Started                                                                                                            3.9s 
 ✔ Container mwaa-292-webserver  Started                                                                                                            3.9s 
workflows_healthcheck: started

Checklist:

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works

Test Screenshots (if applicable):

Related Issues

[Link any related issues here]

Additional Notes

[Any additional information that might be helpful for reviewers]

@ruijiang-rjian ruijiang-rjian requested a review from a team as a code owner June 18, 2025 18:09
@agupta01
Copy link

Were you able to verify a successful workflow run after the local runner started?

@ruijiang-rjian
Copy link
Contributor Author

Were you able to verify a successful workflow run after the local runner started?

yeah please see the "How Has This Been Tested?" section in pr description

@@ -7,6 +7,8 @@ is_s3_storage=${1:-"1"} # Default to 1 (Git storage) if no parameter is passed
if [ "$is_s3_storage" -eq 0 ]; then
PROJECT_DIR="$HOME/shared"
echo "Project is using S3 storage, project directory set to: $PROJECT_DIR"
MOUNT_DIR=$(readlink -f $PROJECT_DIR) # get the symlink source
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not have unit tests for this file ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have unit tests for this file here: https://github.com/aws/sagemaker-distribution/blob/main/test/test_artifacts/v2/scripts/run_sagemaker_workflows_tests.sh
basically checks if healthchecker + airflow APIs are running (containers are up successfully) so should cover the e2e experience

Copy link
Contributor

@nandab-work nandab-work Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to check if the symlink source was used for mounting the local runner in the unit test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean to check the /mnt/custom-file-system/s3/shared is the actual path used for mount in local runner?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, basically ensuring that we are mounting the symlink source of "$HOME/shared"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants