-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reimplemented xlocate as xbps-locate #585
base: master
Are you sure you want to change the base?
Conversation
how does this affect repodata size? why not make it part of xbps-query? this also means x(bps-)locate loses the power of pcre and delta-updating the index |
_BSD_SOURCE was fixed in 48c9879, rebase |
Making some calculations: about 60bytes filepath and some overhead, let's talk about 100bytes per file. 100x50x13'000 ≈ 60mb uncompressed. Maybe using an extra file like You're right about loosing the power of PCRE then, maybe a third-party library? |
so at the very minimum 235 MB assuming single-byte ASCII characters only and no plist overhead |
Oke! I wasn't aware of that much overhead to include it directly into |
It's worth noting that the existing xlocate index is large enough to already be in git, where it still takes ages to download if you don't already have a clone and can't take advantage of the delta updating that git provides. |
After some research, making a plist with all the files in xlocate.git: % cat ../make-plist.sh
echo "<plist>"
echo "\t<dict>"
for pkg in *; do
echo "\t\t<key>$pkg</key>"
echo "\t\t<array>"
for file in $(awk '{print $1}' $pkg); do
echo "\t\t\t<string>$file</string>"
done
echo "\t\t</array>"
done
echo "\t</dict>"
echo "</plist>"
% sh ../make-plist.sh | zstd -f9o ../files.zstd
/*stdin*\ : 5.21% ( 238 MiB => 12.4 MiB, ../files.zstd)
% find * -print -exec cat {} \; | zstd -f9o ../files.zstd
/*stdin*\ : 6.58% ( 197 MiB => 13.0 MiB, ../files.zstd) 13MiB still is a lot to just include into Taking gcc-fortran which is about 13MB takes 5.3s, cloning the xlocate.git takes about 11s. Then updating the git is for sure faster, but how often is that needed if files-lists don't really change with every version. I cannot tell how accurate this comparision is and how linear is behaves on slower networks. Please correct me if I'm wrong.
|
8a8c61a
to
e65aef4
Compare
That's reason git is used, it provides the mechanism to download only the new parts of the index ("delta-updating"), keeping existing files as is |
e65aef4
to
dc02b0e
Compare
I've now re-implemented xlocate into xbps-query (-o and --ownedhash) to have better integration. From there, you can still search by file/link but also by hash! Every file-hash is included into Also can someone with a binary-repo make a index-file with |
196d5ff
to
4ac9897
Compare
I've implemented a new xbps-tool
xbps-locate
!xbps-rindex
collects data intoindex.plist
inside*-repodata
but also files intofiles.plist
.xbps-locate
will fetch thefiles.plist
from the repo-pool and search for the desired file. I cannot testAlso added to
TODO
, cleanage ofxbps-rindex
doesn't cleanfiles.plist
yet.I've also added into
repo_open_*
inlib/repo.c
that the archive-iterator just assumes that the files are in order (they are still written in order for compatiblity) but is checking the actual filename.On my computer, I've to manually disable _BSD_SOURCE and _SVID_SOURCE, so there is a commit, I don't know if it's only on my computer. (Void Linux x86_64-musl)
Thanks for looking into my code!