Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib: replace xbps_file_hash_check_dictionary with transaction_file lookup #359

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Duncaen
Copy link
Member

@Duncaen Duncaen commented Dec 12, 2020

xbps_file_hash_check_dictionary was called for every file that is
getting unpacked, because the files list is an array it iterated over
the whole files array to find the matching file.
With a lot of files this is really slow and a lot of time was spend
in locking the proplib array and iterating over it.

Time from my Ryzen 3700X system with nvme disk updating
texlive-fontsextra with files 86375:

time xbps-install -y texlive-fontsextra
    6m34.61s real     6m27.71s user     0m03.95s system

And with this patch:

time xbps-install -y texlive-fontsextra
    0m08.40s real     0m07.34s user     0m00.98s system

Verified

This commit was signed with the committer’s verified signature.
Duncaen Duncan Overbruck
…okup

xbps_file_hash_check_dictionary was called for every file that is
getting unpacked, because the files list is an array it iterated over
the whole files array to find the matching file.
With a lot of files this is really slow and a lot of time was spend
in locking the proplib array and iterating over it.

Time from my Ryzen 3700X system with nvme disk updating
texlive-fontsextra with files 86375:

time xbps-install -y texlive-fontsextra
    6m34.61s real     6m27.71s user     0m03.95s system

And with this patch:

time xbps-install -y texlive-fontsextra
    0m08.40s real     0m07.34s user     0m00.98s system
Comment on lines +393 to +394
xbps_dbg_printf(xhp, "%s: %s\n", entry_pname, file->sha256);
assert(file->sha256);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to make it clear that a (null) there is an error? Not sure.

file->sha256 ? file->sha256 : "error..."

include/xbps_api_impl.h Outdated Show resolved Hide resolved

Verified

This commit was signed with the committer’s verified signature.
Duncaen Duncan Overbruck

Verified

This commit was signed with the committer’s verified signature.
Duncaen Duncan Overbruck
@ericonr
Copy link
Member

ericonr commented Feb 10, 2021

For the record, a good test case here seems to be updating papirus-icon-theme. It took 2m37s going from 20201001_1 to 20210201_1, all of it inside xbps-install, not any hooks.

I will build this locally and see how better it is.

EDIT: I was testing on a loaded machine, so it isn't a clean test, but this PR still took 52s. Without load, 31s.

@Duncaen
Copy link
Member Author

Duncaen commented Feb 10, 2021

This should greatly reduce the time needed, not sure why this example with more files is faster, maybe its just a lot more symlinks or smaller files that would need to be checksumed.

For the implementation, I think I would rename a few things, xbps_transaction_file_new does not really represent what the function actually does and maybe it would make sense to call the structure xbps_file instead og xbps_transaction_file as the struct could be reused for other things in the future.
Might also make sense to move some of the fields like "bool update" back into the transaction_files item structure as some of those are not important outside of the context of finding conflicts in the transaction.

TYPE_DIR,
TYPE_FILE,
TYPE_CONFFILE,
} transaction_file_type_t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically speaking *_t is reserved for libc (https://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html), but everyone uses it anyway.

Comment on lines +106 to +111
const char *pkgname;
const char *pkgver;
char *sha256;
const char *target;
uint64_t size;
transaction_file_type_t type;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would these first 6 be the xbps_file struct? If so, I guess transaction_file_type would be switched over to file_type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need two struct a xbps_transaction_file and a xbps_file which is part of the former.
The idea would be to start using the xbps_file struct in places where we currently use the xbps_dictionary references to files.plist so that we gradually switch from dynamic plist types to actual structs which should help to avoid some issues at compile time through the type system.
I think xbps_file would be a simple way to start, doing this for packages would be the end goal, there are so many places that use xbps_dictionarys for packages and there are a lot of places that can be improved on the way to switching to c structs.

Comment on lines +511 to +512
if (sha256)
item->new.sha256 = strdup(sha256);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is item->new always guaranteed to be zero initialized? Might be reasonable to add an else item->new.sha256 = NULL;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally everything is zero initialized, IMHO not doing that by default when initializing the memory should only be done if you know you overwrite it immediately.

Comment on lines +158 to +164
void xbps_transaction_files_free(void);
int HIDDEN xbps_transaction_fetch(struct xbps_handle *,
xbps_object_iterator_t);
int HIDDEN xbps_transaction_pkg_deps(struct xbps_handle *, xbps_array_t, xbps_dictionary_t);

struct transaction_file *xbps_transaction_file_new(struct xbps_handle *, const char *);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming is a bit confusing, in that xbps_transaction_file_new was added, but no xbps_transaction_file_free, and xbps_transaction_files_free was added, without a corresponding _new, and doesn't seem to be called at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the _new does not actually create anything, it gives you a reference to a file struct that already exists from collecting all files in the transaction, "_new" because it returns the new file as opposed to the old file.

Not sure how this should be done exactly, something like xbps_transaction_get_file with either _new or _old or a parameter to choose which one you want.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xbps_transaction_files_free deletes the whole hashtable/tree, this should be called from somewhere after the transaction is done, maybe from xbps_transaction_commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how this should be done exactly, something like xbps_transaction_get_file with either _new or _old or a parameter to choose which one you want.

I think the parameter helps avoid duplicated logic, so I'm in favor of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants