Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize FileStore sync performance by reducing redundant directory scans #5434

Open
YinhaoHu opened this issue Dec 25, 2024 · 0 comments
Open
Labels
kind/feature New feature or request

Comments

@YinhaoHu
Copy link

YinhaoHu commented Dec 25, 2024

What would you like to be added:

Optimize the store.List behavior for FileStore by increasing the limit argument when listing files during sync operations.

Why is this needed:

Currently, when syncing large directories with FileStore, there are significant performance issues:

  1. Current behavior:
  • listCommonPrefix uses maxResult=1000 for each list operation
  • FileStore reads the entire directory(let's say over 10 million) to return just 1,000 files

This process repeats multiple times for large directories

  1. Problems:
  • Extremely inefficient for directories with millions of files
  • Each list operation unnecessarily scans the entire directory
  • Makes syncing large directories (e.g., 10 million files) practically impossible

I met the case recently. Read the entire directory needs around 90 seconds but only returns 1000 in filestore List, which is terrible to get the entire 10 millions files in listCommonPrefix.

  1. Proposed solution:

Increase the limit to a higher value if the store is file store in listCommonPrefix.

@YinhaoHu YinhaoHu added the kind/feature New feature or request label Dec 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant