Open
Description
Bug Report
push: hangs after data transfer to s3 compatible remote
Description
When pushing files to s3 compatible storage with endpointurl with :
dvc push -vv --show-stack
or
dvc push -j 32 -vv --show-stack
DVC intermittently hangs after pushing data to storage :
2024-03-26 10:56:55,858 DEBUG: Preparing to transfer data from '/root/workdir/.dvc/cache' to 's3://my-s3-bucket/'
2024-03-26 10:56:55,858 DEBUG: Preparing to collect status from 'my-s3-bucket/'
2024-03-26 10:56:55,858 DEBUG: Collecting status from 'my-s3-bucket/'
2024-03-26 10:56:55,860 DEBUG: Querying 3 oids via object_exists
2024-03-26 10:56:56,544 DEBUG: Querying 0 oids via object_exists
2024-03-26 10:56:57,582 DEBUG: Preparing to transfer data from '/root/workdir/.dvc/cache/files/md5' to 's3://my-s3-bucket/files/md5'
2024-03-26 10:56:57,583 DEBUG: Preparing to collect status from 'my-s3-bucket/files/md5'
2024-03-26 10:56:57,591 DEBUG: Collecting status from 'my-s3-bucket/files/md5'
2024-03-26 10:56:57,592 DEBUG: Querying 2 oids via object_exists
2024-03-26 10:56:58,639 DEBUG: Estimated remote size: 749568 files
2024-03-26 10:56:58,639 DEBUG: Querying 28882 oids via traverse
2024-03-26 10:58:02,859 DEBUG: Preparing to collect status from '/root/workdir/.dvc/cache/files/md5'
2024-03-26 10:58:02,866 DEBUG: Collecting status from '/root/workdir/.dvc/cache/files/md5'
2024-03-26 10:58:03,559 DEBUG: transfer dir: md5: e73f784c9d7a9d79aa8ddbdef314e12d.dir with 18343 files
Pushing |0.00 [01:07, ?file/s]
100%|█████████▉|Pushing to s3 18.3k/18.3k [19:42<00:00, 19.8file/s]
No verbose output after this point however, Ctrl+C gives systematically the following traceback :
2024-03-26 12:00:14,267 ERROR: interrupted by the user
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/dvc/repo/push.py", line 144, in push
push_transferred, push_failed = ipush(
File "/usr/local/lib/python3.10/dist-packages/dvc_data/index/push.py", line 84, in push
result = transfer(
File "/usr/local/lib/python3.10/dist-packages/dvc_data/hashfile/transfer.py", line 224, in transfer
failed = _do_transfer(
File "/usr/local/lib/python3.10/dist-packages/dvc_data/hashfile/transfer.py", line 93, in _do_transfer
dir_fails = _add(src, dest, bound_file_ids, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/dvc_data/hashfile/transfer.py", line 165, in _add
dest.add(
File "/usr/local/lib/python3.10/dist-packages/dvc_data/hashfile/db/__init__.py", line 111, in add
transferred = super().add(
File "/usr/local/lib/python3.10/dist-packages/dvc_objects/db.py", line 188, in add
generic.transfer(
File "/usr/local/lib/python3.10/dist-packages/dvc_objects/fs/generic.py", line 319, in transfer
copy(
File "/usr/local/lib/python3.10/dist-packages/dvc_objects/fs/generic.py", line 87, in copy
return _put(
File "/usr/local/lib/python3.10/dist-packages/dvc_objects/fs/generic.py", line 172, in _put
for i, result in enumerate(fut.result()):
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 453, in result
self._condition.wait(timeout)
File "/usr/lib/python3.10/threading.py", line 320, in wait
waiter.acquire()
KeyboardInterrupt
Seems like some futures of the underlying S3FileSystem in dvc-objects are never returning and do not have a timeout.
Running dvc push -j 1 -vv
works, however seems quite slower on the status collection (approx. 30min for 18k files).
Environment information
x86 Ubuntu 22.04 Docker
aiobotocore==2.12.1
awscli==1.32.51
awscli-plugin-endpoint==0.4
boto3==1.34.51
botocore==1.34.51
dvc==3.36.0
dvc-data==3.2.0
dvc-http==2.32.0
dvc-objects==3.0.6
dvc-render==1.0.1
dvc-s3==3.1.0
dvc-studio-client==0.20.0
dvc-task==0.4.0
fsspec==2024.3.1
s3fs==2024.3.1
s3transfer==0.10.1
Output of dvc doctor
:
DVC version: 3.36.0 (pip)
-------------------------
Platform: Python 3.10.12 on Linux-6.5.0-26-generic-x86_64-with-glibc2.35
Subprojects:
dvc_data = 3.2.0
dvc_objects = 3.0.6
dvc_render = 1.0.1
dvc_task = 0.4.0
scmrepo = 2.1.1
Supports:
http (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
https (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
s3 (s3fs = 2024.3.1, boto3 = 1.34.51)
Config:
Global: /root/.config/dvc
System: /etc/xdg/dvc
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: s3, s3
Workspace directory: overlay on overlay
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/da3f4f6485fee7c550b5a6ccb3e96e47
Same isssue using latest 3.48 DVC.
Output of dvc config -l
:
remote.dvc-cache.url=s3://my-s3-bucket/
remote.dvc-cache.endpointurl=https://endpoint.url
remote.dvc-cache.profile=scw
remote.scans.url=s3://my-s3-bucket-2/
remote.scans.endpointurl=https://endpoint.url
remote.scans.profile=scw
cache.type=hardlink
core.remote=dvc-cache
core.hardlink_lock=false