More thread to use? #52

Badb0yBadb0y · 2025-02-10T06:26:26Z

Hi,

Is there a way to do the scraping in more thread?

We have 80k buckets, the timeout set to 30mins now for the query and also on haproxy timeout 30 mins but I need to increase now to be able to not timeout.

If there is a way to use more thread, would be good so it can finish faster.

Thank you

blemmenes · 2025-02-10T17:06:52Z

Hi @Badb0yBadb0y,

Thanks for the request. I don't think this is possible however.

While we could use multiple threads on the request side the only way I can think of on how to batch that data would be if the Ceph Admin Ops API supported sending a request with an offset (e.g. bucket offset) so that you could divvy up the GET operations for your 80k buckets.

Looking at the ceph docs there doesn't appear to be the needed capability to do that. I'll think about this more and do some additional digging though.

Thanks,
Berant

Badb0yBadb0y · 2025-02-11T04:44:40Z

Thank you very much, I have actually a user who has 64K buckets and most probably that blocks and slows down the query.
The bucket prefix is the same in it like picture-00[1-64k] and have some other user with couple of 1000 of buckets.
Or maybe somehow skip the bucket check and check only the user? Not sure how would be the best solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More thread to use? #52

More thread to use? #52

Badb0yBadb0y commented Feb 10, 2025

blemmenes commented Feb 10, 2025

Badb0yBadb0y commented Feb 11, 2025

More thread to use? #52

More thread to use? #52

Comments

Badb0yBadb0y commented Feb 10, 2025

blemmenes commented Feb 10, 2025

Badb0yBadb0y commented Feb 11, 2025