-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib: replace xbps_file_hash_check_dictionary with transaction_file lookup #359
base: master
Are you sure you want to change the base?
Conversation
…okup xbps_file_hash_check_dictionary was called for every file that is getting unpacked, because the files list is an array it iterated over the whole files array to find the matching file. With a lot of files this is really slow and a lot of time was spend in locking the proplib array and iterating over it. Time from my Ryzen 3700X system with nvme disk updating texlive-fontsextra with files 86375: time xbps-install -y texlive-fontsextra 6m34.61s real 6m27.71s user 0m03.95s system And with this patch: time xbps-install -y texlive-fontsextra 0m08.40s real 0m07.34s user 0m00.98s system
xbps_dbg_printf(xhp, "%s: %s\n", entry_pname, file->sha256); | ||
assert(file->sha256); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be nice to make it clear that a (null)
there is an error? Not sure.
file->sha256 ? file->sha256 : "error..."
For the record, a good test case here seems to be updating papirus-icon-theme. It took 2m37s going from I will build this locally and see how better it is. EDIT: I was testing on a loaded machine, so it isn't a clean test, but this PR still took 52s. Without load, 31s. |
This should greatly reduce the time needed, not sure why this example with more files is faster, maybe its just a lot more symlinks or smaller files that would need to be checksumed. For the implementation, I think I would rename a few things, |
TYPE_DIR, | ||
TYPE_FILE, | ||
TYPE_CONFFILE, | ||
} transaction_file_type_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically speaking *_t
is reserved for libc (https://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html), but everyone uses it anyway.
const char *pkgname; | ||
const char *pkgver; | ||
char *sha256; | ||
const char *target; | ||
uint64_t size; | ||
transaction_file_type_t type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would these first 6 be the xbps_file
struct? If so, I guess transaction_file_type
would be switched over to file_type
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need two struct a xbps_transaction_file
and a xbps_file
which is part of the former.
The idea would be to start using the xbps_file
struct in places where we currently use the xbps_dictionary
references to files.plist
so that we gradually switch from dynamic plist types to actual structs which should help to avoid some issues at compile time through the type system.
I think xbps_file
would be a simple way to start, doing this for packages would be the end goal, there are so many places that use xbps_dictionary
s for packages and there are a lot of places that can be improved on the way to switching to c structs.
if (sha256) | ||
item->new.sha256 = strdup(sha256); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is item->new
always guaranteed to be zero initialized? Might be reasonable to add an else item->new.sha256 = NULL;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally everything is zero initialized, IMHO not doing that by default when initializing the memory should only be done if you know you overwrite it immediately.
void xbps_transaction_files_free(void); | ||
int HIDDEN xbps_transaction_fetch(struct xbps_handle *, | ||
xbps_object_iterator_t); | ||
int HIDDEN xbps_transaction_pkg_deps(struct xbps_handle *, xbps_array_t, xbps_dictionary_t); | ||
|
||
struct transaction_file *xbps_transaction_file_new(struct xbps_handle *, const char *); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming is a bit confusing, in that xbps_transaction_file_new
was added, but no xbps_transaction_file_free
, and xbps_transaction_files_free
was added, without a corresponding _new
, and doesn't seem to be called at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the _new does not actually create anything, it gives you a reference to a file struct that already exists from collecting all files in the transaction, "_new" because it returns the new file as opposed to the old file.
Not sure how this should be done exactly, something like xbps_transaction_get_file
with either _new or _old or a parameter to choose which one you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xbps_transaction_files_free
deletes the whole hashtable/tree, this should be called from somewhere after the transaction is done, maybe from xbps_transaction_commit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how this should be done exactly, something like xbps_transaction_get_file with either _new or _old or a parameter to choose which one you want.
I think the parameter helps avoid duplicated logic, so I'm in favor of that.
xbps_file_hash_check_dictionary
was called for every file that isgetting unpacked, because the files list is an array it iterated over
the whole files array to find the matching file.
With a lot of files this is really slow and a lot of time was spend
in locking the proplib array and iterating over it.
Time from my Ryzen 3700X system with nvme disk updating
texlive-fontsextra with files 86375:
And with this patch: