-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge: Parameterize merge functions + git merge-file
on marker_size
(with a stubbed lookup)
#101
base: main
Are you sure you want to change the base?
Conversation
@@ -255,6 +267,9 @@ def merge_files( | |||
"--quiet", | |||
# Send results to stdout instead of overwriting "current file". | |||
"--stdout", | |||
# Ensure markers are the expected length to ensure consistent rerere results. | |||
"--marker-size", | |||
str(marker_size), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, I thought there was no way to override attributes.
I'd probably put this in a separate commit, apart from the refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's part of the "parameterize marker size" commit. I think that seems like the right place to me?
Note that this does not override attributes: they weren't and aren't used at all here. This is merging arbitrary temp files and git merge-file
doesn't refer to the repo at all other than to check user preference for diff3 style, which isn't relevant here. (The rerere normalization strips those out, and otherwise we'd want to preserve their preference if the user wants to manually resolve).
The note about using git check-attr
to determine conflict-marker-size
is just so that the rerere
conflict IDs here would be cross-compatible with the ones created and used by vanilla git rebase
/git merge
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah sorry I thought this commit already had a behavior change
@@ -284,7 +302,11 @@ def replay_recorded_resolution( | |||
): | |||
return (b"", None, None) | |||
|
|||
(normalized_preimage, conflict_id) = normalize_conflicted_file(preimage) | |||
(normalized_preimage, conflict_id) = normalize_conflicted_file( | |||
marker_size=marker_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if we are supposed to use kwarg-style calls for non-kwargs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow what you mean by non-kwargs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
marker_size is a positional argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm still puzzling over what you mean here. Saying "it's a positional argument" on a line where it's passed by name doesn't make sense. normalize_conflicted_file
does not have opaque *args
: there are no positional-only parameters. There's nothing wrong with giving the names explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry I mixed that up.
I was just wondering if there is a convention of when to pass the name along an argument.
I think I usually pass the name only for arguments that have default values (and kwargs).
However, we use the name when calling Repository.bool_config
for example, this is consistent with your version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to prefer named arguments in general, unless the order is unambiguous (like something like call_with(fn, arg1, arg2)
) or irrelevant (e.g. max(a, b)
). They're particularly helpful during refactoring, since they're both resilient to re-ordering of parameters, their use makes the introduction of default parameters a little easier to check for, and having the names there can make a merge resolution much more obvious. Though, these are also ultimately justifications of habit :)
@@ -28,6 +28,12 @@ | |||
DEFAULT_CONFLICT_MARKER_SIZE = 7 | |||
|
|||
|
|||
def get_conflict_marker_size(__repo: Repository, __file: Path) -> int: | |||
# TODO: Determine on a per-file basis by its `conflict-marker-size` attribute. | |||
# See ll_merge_marker_size in git/ll-merge.c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding your comment about performance concerns: here's a possible way to avoid them:
- In merge_files(), run "git merge-file", without passing a marker size
- Only if the above "git merge-file" failed, ask "git check-attr" for the conflict marker size, and re-run "git merge-file"
In the common case (no conflict) we're as fast as ever.
I often use git-revise for rewrites that cause the same conflicts - that's
where rerere kicks in. However, there are typically very few of those
conflicts, so I don't think that matters. Also that's not the typical use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing an optimistic merge is a good idea to consider. I'd worry a little about being tempted (or tempting others) to look for or save cached resolutions for both marker sizes, which could lead to inconsistency. Probably avoidable, though! Or not a big deal.
I do wonder what the actual performance impact of batch-querying git check-attrs
for each un-encountered entry per tree, e.g. using a persistent process like git cat-file
is using. I suspect that the performance difference between that and re-merging is pretty small. (I was mainly leery of my hamfisted query-each-entry-individually 😄).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably avoidable, though!
Yeah, merge_files() would do two runs of "git merge-file" and if the first failed, callers would only ever see the results of the second.
Using a persistent process for git check-attrs
sounds great, and there are probably other reasons to use that: we can detect attributes like file encoding and whether a file is binary.
@@ -28,6 +28,12 @@ | |||
DEFAULT_CONFLICT_MARKER_SIZE = 7 | |||
|
|||
|
|||
def get_conflict_marker_size(__repo: Repository, __file: Path) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should maybe be a method of Repository (or at least the generic check-attr wrapper should be)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or at least the generic check-attr wrapper should be
Yep, I agree that a general check-attr wrapper thing would benefit from being lifted up, if implemented. It would make sense to allow that to be a persisted process and have those path → attributes cached.
As far as handling the "conflict-marker-size"
attribute in particular, it might benefit from staying standalone. There's some extra validation vs other attributes (turning it into an int, and defaulting to 7 if non-positive), and it's very specific to this area of code dealing with merge-conflict text.
3292c35
to
c393215
Compare
@mystor Just wanted to bump this. This branch has no conflicts with base and has been ready for a while. I've pushed up a no-op rebase in order to keep things fresh. |
c393215
to
068319d
Compare
068319d
to
3f2927c
Compare
3f2927c
to
4359ed9
Compare
NOTE: For convenience, this branch is pre-resolved with https://github.com/rwe/git-revise/tree/rerere-skip-unsized-marker which is PR'd as #95 and would have a minor conflict once merged. The standalone change is available on https://github.com/rwe/git-revise/tree/conflict-marker.
This parameterizes the relevant pieces of
merge.py
onconflict_marker
. The parameterization currently is dummied out to resolve to the default marker size, which is 7. However this refactoring makes that dependency explicit and therefore possible to extend to usegit check-attr
. See discussion here: #95 (comment), which describes a naïve and likely-not-performant example implementation of "real" resolution.