Add support for repodata.json.zst #675

beenje · 2023-11-30T17:13:29Z

Adding support for repodata.json.zst (fix #573).

repodata.json.zst is now created as well as the .bz2 and .gz versions
added a test checking that the compressed files are identical to repodata.json
Ensure that repodata.json is updated for proxy channels when a compressed file is requested. Before that fix, if repodata.json.zst was requested and existed locally, it was never updated. Even with the support added in first commit, it would only be updated when someone was requesting the repodata.json file (which triggers the creation of all compressed files). Note that I didn't find an easy way to write a test for that, which would be nice.

beenje · 2023-12-04T07:58:40Z

With the tests added in #677 I could easily add a test in this MR.

beenje · 2023-12-07T08:05:28Z

Doing some tests locally, I noticed that this change makes the download of the repodata.json from a proxy channel very slow (when the file is big).

The initial download from the remote repo is actually not the biggest issue as I thought in #660. Problem is the compression which is quite slow for big files. Compressing conda-forge/linux-64/repodata.json file to gz, then bz2 and now zst takes several seconds. The download of the file is blocked during that time and explains the time-out I saw on the client side.

Will look if the compression can be done in the background. And maybe add options to disable that compression (when using quetz as internal conda server, network between clients and server is usually fast).

ivergara · 2023-12-07T08:12:09Z

You might want to look at how to use the asynchronous capabilities of the package store. I fixed (#626) an oversight in how packages were uploaded some time ago. It was using synchronous "filesystem" calls, and it was blocking for too long for big files.

beenje · 2023-12-10T11:03:25Z

I don't think async will help in this case. The compression is done in add_static_file (and add_temp_static_file), which aren't async. They are called in the background by update_indexes. There is no issue there.
But for proxy channel, we download the remote repodata.json, compress it and then serve it. The client has to wait during that time. Would be the same if it was async.
Compression should probably be done in the background for proxy channels as well.

In the meantime, I added a new compression section in the config to enable/disabled bz2, gz and zst compressions.
By default zst is disabled and the 2 others are enabled to keep the same behaviour as today.

Note that for the tests, I re-used what I implemented in #677. So I rebased this PR on the other one branch. Hoping #677 can be merged soon.

codecov-commenter · 2023-12-10T13:54:08Z

Codecov Report

Attention: 8 lines in your changes are missing coverage. Please review.

Comparison is base (0b49467) 83.61% compared to head (f0d7927) 83.90%.
Report is 3 commits behind head on main.

Files	Patch %	Lines
quetz/tasks/mirror.py	84.21%	6 Missing ⚠️
quetz/tasks/common.py	75.00%	1 Missing ⚠️
quetz/utils.py	98.18%	1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #675      +/-   ##
==========================================
+ Coverage   83.61%   83.90%   +0.28%     
==========================================
  Files          79       79              
  Lines        6233     6324      +91     
==========================================
+ Hits         5212     5306      +94     
+ Misses       1021     1018       -3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

rattler is very efficient to download repodata

serve_repo_data fixture copied from rattler

dummy_remote_session_object wasn't cleaning after itself (using return instead of yield)

Test with migration failed with: Error: The action 'Testing server' has timed out after 5 minutes.

New version of mamba requests repodata.json.zst first. The compressed files are created locally when downloading the non compressed version. Quetz should always check if the repodata.json file needs to be re-downloaded so that all files stay consistent.

Add options to enable bzip2, gzip and zstandard compression

beenje marked this pull request as draft December 7, 2023 07:57

beenje force-pushed the issue-573 branch from 2c7749b to 49f664a Compare December 10, 2023 10:53

beenje force-pushed the issue-573 branch from 49f664a to c5c3e38 Compare December 10, 2023 11:38

beenje added 7 commits June 4, 2024 14:48

Use py-rattler to fetch repodata in proxy mode

8da77c2

rattler is very efficient to download repodata

Make download_repodata async

fccd24d

Add test for RemoteRepository rattler_channel

5dd4b9a

Add test-server directory from rattler for testing

38477fe

Add tests for download_repodata

edddb6a

serve_repo_data fixture copied from rattler

Fix dummy_repo fixture

876e929

dummy_remote_session_object wasn't cleaning after itself (using return instead of yield)

Increase timeout for ci tests

a32a1cd

Test with migration failed with: Error: The action 'Testing server' has timed out after 5 minutes.

beenje mentioned this pull request Jun 4, 2024

Use py-rattler to fetch repodata in proxy mode #677

Open

beenje added 9 commits June 5, 2024 11:52

Add repodata.json.zst support

af9c141

Add test to check compressed repodata

c18f044

Ensure compressed file not downloaded from remote

cc162d0

Make compression optional

863e277

Add options to enable bzip2, gzip and zstandard compression

Add debug log in download_remote_file

f32ae29

Use dataclass to store and pass compression config

7faec93

Add some documentation about compression

d712010

Add missing compression to plugins

75776eb

beenje force-pushed the issue-573 branch from f0d7927 to 75776eb Compare June 5, 2024 10:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for repodata.json.zst #675

Add support for repodata.json.zst #675

beenje commented Nov 30, 2023

beenje commented Dec 4, 2023

beenje commented Dec 7, 2023

ivergara commented Dec 7, 2023

beenje commented Dec 10, 2023

codecov-commenter commented Dec 10, 2023

Add support for repodata.json.zst #675

Are you sure you want to change the base?

Add support for repodata.json.zst #675

Conversation

beenje commented Nov 30, 2023

beenje commented Dec 4, 2023

beenje commented Dec 7, 2023

ivergara commented Dec 7, 2023

beenje commented Dec 10, 2023

codecov-commenter commented Dec 10, 2023

Codecov Report