Skip to content

Conversation

@sbassam
Copy link
Contributor

@sbassam sbassam commented Nov 7, 2025

Have you read the Contributing Guidelines?

Issue #

Describe your changes

Clearly and concisely describe what's in this pull request. Include screenshots, if necessary.


Note

Increases max file size to 50.1GB, switches multipart target part size to 250MB with sliding-window concurrency, adds progress bars to validation, and sets a download request timeout.

  • Uploads:
    • Multipart: Set TARGET_PART_SIZE_MB to 250; keep MAX_MULTIPART_PARTS at 250.
    • Implement sliding-window concurrency in MultipartUploadManager._upload_parts_concurrent to respect max_concurrent_parts while continuously feeding new parts.
    • Increase max supported file size to MAX_FILE_SIZE_GB = 50.1.
  • Download:
    • Add request_timeout=3600 to raw GET in DownloadManager.download.
  • File validation:
    • Wrap JSONL iteration with tqdm for progress; import tqdm.
    • Improve UTF-8 check to read in chunks.
  • Tests:
    • Update expectations for part sizing/count (e.g., 500MB → 2×250MB, 50GB → ~205 parts) and size-limit error message.
    • Adjust as_completed mocking to align with sliding-window behavior.

Written by Cursor Bugbot for commit 7ac3d44. This will update automatically on new commits. Configure here.

@sbassam sbassam marked this pull request as ready for review November 7, 2025 03:21
@sbassam sbassam requested a review from vorobyov01 November 8, 2025 03:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants