Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: explore gix APIs, experiment with gix-blame API #1453

Draft
wants to merge 100 commits into
base: main
Choose a base branch
from

Conversation

cruessler
Copy link
Contributor

This is a draft PR that is supposed to help me get familiar with gix’s APIs as well as to experiment with possible APIs for gix-blame. Any feedback that helps me not go down the wrong path, is much appreciated! :-)

Copy link
Owner

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great to see this PR and I am looking forward to witnessing it evolve over time :)!

gix-blame/tests/blame.rs Show resolved Hide resolved
git commit -q --allow-empty -m c5
git tag at-c5
git merge branch1 -m m1b1
echo "line 2" >> file.txt
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd do the same - start extremely simple, maybe do the first rough implementation so it passes this test, and then think of some tougher cases to throw at it, validating that they still come out right.

Maybe it's worth investing into a baseline test which parses the output of git blame to get the expected output, which then has to be matched by the algorithm. These are typically the second stage as they are more obscure, but make it easier to get correct expectations for a lot of possibly complex inputs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the input, sounds like a great plan!

@cruessler
Copy link
Contributor Author

@Byron I think I’ve now gotten to a point where I’ve learned enough about gixoxide’s APIs in order to hopefully be able write the first meaningful test. Would you mind looking at the PR’s current state? I’m not sure whether I it would be better to try and find a good API for struct Blame or to continue adding to the already quite large test it_works.

My goal now is to write a test that runs a complete blame for the sample repo that is part of this PR.

Copy link
Owner

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, you already dug deep into the most complex APIs gitoxide has to offer, I am pretty sure it can't get much worse than that. Having it, however, helps to know what data is needed which will help tremendously with defining the API later.

I think continuing with the test to try and complete the algorithm would be good, even though I'd probably start with refactoring the 'setup' portion of the test a little more to be able to focus on the blame algorithm.

But that's details, I think you are on the right track and this PR already does everything that's needed to do a blame.

Once the core algorithm is proven, as most naive and simple version of course, I'd probably create a baseline of some samples with git itself, parse the result and use it as expectation for the algorithm created here.

I hope that helps.


use gix_diff::blob::intern::Token;
use gix_hash::ObjectId;
use gix_odb::pack::FindExt;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd want to use the equally named gix_object::FindExt, it's easier to use.

@cruessler
Copy link
Contributor Author

Thanks, that’s very helpful already! I think I wanted to mostly make sure I’m still on the right track and not missing anything. I’ll continue working on the test, then, and worry about the public API later.

@Byron
Copy link
Owner

Byron commented Aug 11, 2024

Yes, I think it's great to keep up the explorative approach using cargo test and then put that behind an API for abstraction, along with the first simple 'actual' test. And from there, one can iterate to validate more complex cases, while slowly getting to the kind of observable and interruptible API that will eventually be needed for gitui.

This is not a material change as the loop gets only executed once. It is
mostly to keep indentation changes in a separate commit.
@cruessler
Copy link
Contributor Author

@Byron This is a quick status update! The algorithm is now able to generate a blame for the sample repo. I extended the test to compare this blame against git blame’s output. Next, I will review the algorithm in more detail (the loop count, for example, is still hard-coded), then I would start adding more cases, trying to abstract/extract pieces of code on the way.

There’s an exit condition `parent_ids.is_empty()` which is sufficient
for the first test case.
Otherwise, diffs with more than one hunk would yield incorrect results.
This required merging the separate setup section into the loop, in order
to be able to use `continue` when a diff did not contain changes for the
path currently being blamed.
@cruessler
Copy link
Contributor Author

@Byron Sooner than expected, there seems to have been quite a bit of progress! I extended the test repo so that it now also contains commits that don’t touch the file being blamed. I also added a file that contains multiline hunks. There’s now 2 test cases that cover one file each. Also, most of the logic is now inside the loop, so I was able to delete a couple of lines. I was thinking of maybe trying to tackle merge commits next. In its current state, the algorithm only diffs against a single parent. What do you think? (Also, I’ll address your comment above.)

@Byron
Copy link
Owner

Byron commented Aug 23, 2024

That's great news!

In its current state, the algorithm only diffs against a single parent.

That's interesting - I just know that Git usually ignores merge-commits, but I don't know if it includes them in blames nonetheless. I never thought about how that would work, and would probably see what Git would do. It's something I do in general, that is, try to understand what Git can do and ask if this would (eventually) be possible with the gitoxide implementation. When it comes to diffing, I definitely have a lot to learn.

Besides that, I definitely recommend to keep clippy happy and run it locally at least occasionally - ideally CI can stay green even during early development, and I found it worth it as it typically leads to cleaner code. Often I find myself doing a push&run, just to return one more time to quickly fix CI if it breaks so I can continue 'clean' in the morning :D.

Add assertions related to offset
The addition might match subsequent unblamed hunks.
The unchanged lines might match subsequent unblamed hunks.
At this point, this is not strictly necessary as the changes to
`offset_in_destination` are not read anywhere yet after they are changed
in the added branches.
The deletion might match subsequent unblamed hunks.
There might be more changes that are supposed to affect the unblamed
hunk. The deletion might, for example, be followed by another deletion
or by an addition both of which would have an effect on the offset that
needs to be applied to the unblamed hunk.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants