Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: explore gix APIs, experiment with gix-blame API #1453

Draft
wants to merge 100 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
100 commits
Select commit Hold shift + click to select a range
48d6050
Add content to file for blame test
cruessler Jul 13, 2024
7c8ff66
Start exploring gix APIs in gix-blame
cruessler Jul 13, 2024
4fb06e4
Use gix-traverse for graph traversal
cruessler Jul 14, 2024
7031d9e
Get diff between trees
cruessler Jul 14, 2024
2001525
Get diff between two files
cruessler Jul 14, 2024
75ec62a
Start to keep track of lines to blame
cruessler Aug 7, 2024
19710f1
Start to keep track of blamed lines
cruessler Aug 7, 2024
301967c
Add Blame
cruessler Aug 8, 2024
9c6cf0a
Wrap diffing in loop
cruessler Aug 21, 2024
27e21cd
Run loop more than once
cruessler Aug 21, 2024
1cad4df
Record commit ids instead of blob ids
cruessler Aug 21, 2024
1c595bc
Compare result against git blame
cruessler Aug 22, 2024
07dba00
Turn for into loop
cruessler Aug 23, 2024
e931a65
Move new_lines_to_blame out of closure
cruessler Aug 23, 2024
b54ef01
Remove unnecessary code
cruessler Aug 23, 2024
87a7ae5
Assign remaining lines to last suspect before break
cruessler Aug 23, 2024
08838ed
Add comment
cruessler Aug 23, 2024
30f546b
Extract diffing into function
cruessler Aug 23, 2024
f573bef
Rename file in fixture
cruessler Aug 23, 2024
c54dc04
Skip commits that don’t affect file
cruessler Aug 23, 2024
ca37a03
Add first test for multiline hunk blames
cruessler Aug 23, 2024
09b1d23
Fix clippy issues
cruessler Aug 23, 2024
857cbcc
Add first test for history with deleted lines
cruessler Aug 25, 2024
30fbb7d
Fix clippy issues
cruessler Aug 26, 2024
cfe40f5
Add test for more than one unchanged section
cruessler Aug 26, 2024
0f148ac
Add test for changed lines
cruessler Aug 26, 2024
34d7f55
Add test for single changed line between unchanged ones
cruessler Aug 26, 2024
f1482dc
Add missing test setup
cruessler Aug 26, 2024
ed4873d
Add test for lines added before other line
cruessler Aug 27, 2024
06e3405
Extract diffing into DiffRecorder
cruessler Aug 27, 2024
2eb1a16
Split DiffRecorder into ChangeRecorder and process_changes
cruessler Aug 28, 2024
1c35e06
Add test for lines added around other line
cruessler Aug 28, 2024
29c2738
Replace platform-dependent sed by echo
cruessler Aug 29, 2024
20c43cb
Add semicolon recommended by clippy
cruessler Aug 29, 2024
d273131
Annotate type
cruessler Aug 30, 2024
5a10add
Turn if into match
cruessler Aug 30, 2024
9767ddd
Add assert_hunk_valid!
cruessler Aug 30, 2024
f973e43
Extend test for delete line
cruessler Aug 30, 2024
ab69d6b
Add test for switched lines
cruessler Aug 30, 2024
de3f183
Condense empty lines
cruessler Aug 30, 2024
1ddc883
Take worktree_path as argument
cruessler Aug 30, 2024
b9b1214
Simplify tests through macro
cruessler Aug 31, 2024
6608af5
Add first tests for process_changes
cruessler Sep 2, 2024
e4d42fa
Replace PathBuf by Path
cruessler Sep 2, 2024
875e580
Add UnblamedHunk to be able to track offset
cruessler Sep 4, 2024
53fbe0c
Track offset in process_changes
cruessler Sep 5, 2024
bddcfd8
Fix clippy issues
cruessler Sep 5, 2024
70cdb19
Add BlameEntry::new
cruessler Sep 5, 2024
1f524cb
Correctly handle non-inclusive end
cruessler Sep 8, 2024
6ba878b
Add UnblamedHunk::new
cruessler Sep 8, 2024
6aa23a2
Remove obsolete comment
cruessler Sep 8, 2024
82a9aa0
Keep two ranges in UnblamedHunk for clarity
cruessler Sep 8, 2024
e846264
Better separate offset and offset_in_destination
cruessler Sep 8, 2024
3de5028
Better handle offset when no changes left
cruessler Sep 8, 2024
125326e
Better handle offset when no changes left
cruessler Sep 8, 2024
a964579
Add UnblamedHunk::offset
cruessler Sep 9, 2024
125ee47
Add test for change before addition
cruessler Sep 9, 2024
ce6b0c7
Add more test for process_changes
cruessler Sep 9, 2024
cfc0359
More reliably detect group header
cruessler Sep 9, 2024
7ca85e2
Remove unnecessary clone
cruessler Sep 9, 2024
01f747f
Record unchanged hunk at end of file
cruessler Sep 9, 2024
d64fb23
Add test for same line changed twice
cruessler Sep 10, 2024
77e5f03
Take offset into account
cruessler Sep 10, 2024
4b1c509
Add Offset
cruessler Sep 15, 2024
e038dad
Add LineRange
cruessler Sep 15, 2024
42fa847
Add BlameEntry::with_offset
cruessler Sep 15, 2024
c1badf3
Add Offset::Deleted
cruessler Sep 15, 2024
55a19cf
Count line numbers in destination
cruessler Sep 16, 2024
6474729
Take hunks with deletions only into account
cruessler Sep 16, 2024
70d56db
Sort result in test
cruessler Sep 16, 2024
9ce6d35
Replace Into<Range<u32>> by From<LineRange>
cruessler Sep 16, 2024
a2cd71b
Add match arm for unchanged hunks
cruessler Sep 16, 2024
de76eeb
Extract process_change
cruessler Sep 17, 2024
83a6e03
Start adding tests for process_change
cruessler Sep 17, 2024
b850da5
Take hunk offset into account for new hunk
cruessler Sep 17, 2024
e6da874
Fill match arms
cruessler Sep 17, 2024
6b16568
Add more tests for unchanged lines
cruessler Sep 17, 2024
7a7fd0a
Add test for deleted hunk
cruessler Sep 17, 2024
e6103df
Add more tests for added lines
cruessler Sep 17, 2024
3f0de4b
Fix offset of new hunk
cruessler Sep 17, 2024
68e5f17
Fix offset when no overlap
cruessler Sep 17, 2024
0b7cd03
Consume addition when before hunk
cruessler Sep 17, 2024
9aff3e4
Add semicolons recommended by clippy
cruessler Sep 17, 2024
362e7e6
Fix offset of new hunk
cruessler Sep 17, 2024
5909dc1
Fix expectation in test
cruessler Sep 17, 2024
34530fd
Apply offset to chunk after deletion
cruessler Sep 17, 2024
5a8af77
Split hunk that contains deletion
cruessler Sep 17, 2024
060f73d
Split addition related to more than one hunk
cruessler Sep 17, 2024
3a296e5
Rename range to range_in_blamed_file
cruessler Sep 19, 2024
bb16cc1
Add range_in_original_file to BlameEntry
cruessler Sep 19, 2024
1e4191d
Assert baseline length matches result's length
cruessler Sep 19, 2024
381b673
Coalesce adjacent blame entries
cruessler Sep 19, 2024
817b2ce
Add more context to comment
cruessler Sep 19, 2024
b953eaa
Fix added lines overlapping unblamed hunk's start
cruessler Sep 19, 2024
76b047c
Use LineRange::with_offset
cruessler Sep 19, 2024
6290e10
Don't consume addition preceding unblamed hunk
cruessler Sep 20, 2024
a93323a
Don't consume unchanged lines preceding unblamed hunk
cruessler Sep 20, 2024
ca8f9e2
Change offset for changes when there is no hunk
cruessler Sep 20, 2024
025ff2a
Don't consume deletion preceding unblamed hunk
cruessler Sep 20, 2024
d053429
Don't consume unblamed hunk following deletion
cruessler Sep 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 10 additions & 0 deletions gix-blame/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,14 @@ doctest = false
[dependencies]

[dev-dependencies]
gix-diff = { path = "../gix-diff" }
gix-filter = { path = "../gix-filter" }
gix-fs = { path = "../gix-fs" }
gix-hash = { path = "../gix-hash" }
gix-index = { path = "../gix-index" }
gix-object = { path = "../gix-object" }
gix-odb = { path = "../gix-odb" }
gix-ref = { path = "../gix-ref" }
gix-testtools = { path = "../tests/tools" }
gix-traverse = { path = "../gix-traverse" }
gix-worktree = { path = "../gix-worktree" }
307 changes: 306 additions & 1 deletion gix-blame/tests/blame.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,309 @@
use std::{ops::Range, path::PathBuf, str::FromStr};

use gix_diff::blob::intern::Token;
use gix_hash::ObjectId;
use gix_odb::pack::FindExt;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd want to use the equally named gix_object::FindExt, it's easier to use.

use gix_ref::{file::ReferenceExt, store::WriteReflog};

struct Blame {
_resource_cache: gix_diff::blob::Platform,
}

impl Blame {
fn new(worktree_root: impl Into<PathBuf>) -> Self {
let worktree_root: PathBuf = worktree_root.into();
let git_dir = worktree_root.join(".git");
let index =
gix_index::File::at(git_dir.join("index"), gix_hash::Kind::Sha1, false, Default::default()).unwrap();

let capabilities = gix_fs::Capabilities::probe(&git_dir);
let stack = gix_worktree::Stack::from_state_and_ignore_case(
&worktree_root,
false,
gix_worktree::stack::State::AttributesAndIgnoreStack {
attributes: Default::default(),
ignore: Default::default(),
},
&index,
index.path_backing(),
);

let resource_cache = gix_diff::blob::Platform::new(
Default::default(),
gix_diff::blob::Pipeline::new(
gix_diff::blob::pipeline::WorktreeRoots {
old_root: None,
new_root: None,
},
gix_filter::Pipeline::new(Default::default(), Default::default()),
vec![],
gix_diff::blob::pipeline::Options {
large_file_threshold_bytes: 0,
fs: capabilities,
},
),
gix_diff::blob::pipeline::Mode::ToGit,
stack,
);

Blame {
_resource_cache: resource_cache,
}
}
}

#[test]
fn blame_works() {
let _blame = Blame::new(fixture_path());
}

#[test]
fn it_works() {
let _worktree = gix_testtools::scripted_fixture_read_only("make_blame_repo.sh").unwrap();
// TODO
// At a high level, what we want to do is the following:
//
// - get the commit that belongs to a commit id
// - walk through parents
// - for each parent, do a diff and mark lines that don’t have a suspect (this is the term
// used in `libgit2`) yet, but that have been changed in this commit
//
// The algorithm in `libgit2` works by going through parents and keeping a linked list of blame
// suspects. It can be visualized as follows:
//
// <---------------------------------------->
// <---------------><----------------------->
// <---><----------><----------------------->
// <---><----------><-------><-----><------->
// <---><---><-----><-------><-----><------->
// <---><---><-----><-------><-----><-><-><->

let worktree = fixture_path();

let store = gix_ref::file::Store::at(
worktree.join(".git"),
gix_ref::store::init::Options {
write_reflog: WriteReflog::Disable,
..Default::default()
},
);
let odb = odb_at("");

let mut reference = gix_ref::file::Store::find(&store, "HEAD").unwrap();
cruessler marked this conversation as resolved.
Show resolved Hide resolved

let mut buffer = Vec::new();

let head_id = reference.peel_to_id_in_place(&store, &odb).unwrap();
let (head, _) = odb.find_commit(&head_id, &mut buffer).unwrap();

let mut buffer = Vec::new();
let head_tree_iter = odb
.find(&head.tree(), &mut buffer)
.unwrap()
.0
.try_into_tree_iter()
.unwrap();

let mut traverse = gix_traverse::commit::Simple::new(Some(head_id), &odb);

traverse.next();

let iter = traverse.commit_iter();
let parent_ids = iter.parent_ids().collect::<Vec<_>>();

let last_parent_id = parent_ids.last().unwrap();

let mut buffer = Vec::new();

let (last_parent, _) = odb.find_commit(&last_parent_id, &mut buffer).unwrap();

let mut buffer = Vec::new();
let last_parent_tree_iter = odb
.find(&last_parent.tree(), &mut buffer)
.unwrap()
.0
.try_into_tree_iter()
.unwrap();

let mut recorder = gix_diff::tree::Recorder::default();
let _result = gix_diff::tree::Changes::from(last_parent_tree_iter)
.needed_to_obtain(head_tree_iter, gix_diff::tree::State::default(), &odb, &mut recorder)
.unwrap();

assert!(matches!(
recorder.records[..],
[gix_diff::tree::recorder::Change::Modification { .. }]
));

let [ref modification]: [gix_diff::tree::recorder::Change] = recorder.records[..] else {
todo!()
};
let gix_diff::tree::recorder::Change::Modification { previous_oid, oid, .. } = modification else {
todo!()
};

// The following lines are trying to get a line-diff between two commits.
let git_dir = fixture_path().join(".git");
let index = gix_index::File::at(git_dir.join("index"), gix_hash::Kind::Sha1, false, Default::default()).unwrap();
let stack = gix_worktree::Stack::from_state_and_ignore_case(
worktree.clone(),
false,
gix_worktree::stack::State::AttributesAndIgnoreStack {
attributes: Default::default(),
ignore: Default::default(),
},
&index,
index.path_backing(),
);
let capabilities = gix_fs::Capabilities::probe(&git_dir);
let mut resource_cache = gix_diff::blob::Platform::new(
Default::default(),
gix_diff::blob::Pipeline::new(
gix_diff::blob::pipeline::WorktreeRoots {
old_root: None,
new_root: None,
},
gix_filter::Pipeline::new(Default::default(), Default::default()),
vec![],
gix_diff::blob::pipeline::Options {
large_file_threshold_bytes: 0,
fs: capabilities,
},
),
gix_diff::blob::pipeline::Mode::ToGit,
stack,
);

resource_cache
.set_resource(
*previous_oid,
gix_object::tree::EntryKind::Blob,
"file.txt".into(),
gix_diff::blob::ResourceKind::OldOrSource,
&odb,
)
.unwrap();
resource_cache
.set_resource(
*oid,
gix_object::tree::EntryKind::Blob,
"file.txt".into(),
gix_diff::blob::ResourceKind::NewOrDestination,
&odb,
)
.unwrap();

let outcome = resource_cache.prepare_diff().unwrap();
let input = outcome.interned_input();

assert_eq!(input.before, [Token(0), Token(1), Token(2),]);
assert_eq!(input.after, [Token(0), Token(1), Token(2), Token(3)]);

// Assumption: this works because “imara-diff will compute a line diff by default”, so each
// token represents a line.
let number_of_lines: u32 = input.after.len().try_into().unwrap();

assert_eq!(number_of_lines, 4);

let lines_to_blame: Vec<Range<u32>> = vec![0..number_of_lines];

assert_eq!(lines_to_blame, vec![0..4]);

#[derive(Debug, PartialEq)]
struct BlameEntry {
range: Range<u32>,
oid: ObjectId,
}

let mut lines_blamed: Vec<BlameEntry> = vec![];

let mut lines = Vec::new();

use gix_ref::bstr::ByteSlice;

// The following lines were inspired by `gix::object::blob::diff::Platform::lines`.
gix_diff::blob::diff(
gix_diff::blob::Algorithm::Histogram,
&input,
|before: Range<u32>, after: Range<u32>| {
lines.clear();
lines.extend(
input.before[before.start as usize..before.end as usize]
.iter()
.map(|&line| input.interner[line].as_bstr()),
);
let end_of_before = lines.len();
lines.extend(
input.after[after.start as usize..after.end as usize]
.iter()
.map(|&line| input.interner[line].as_bstr()),
);
let hunk_before = &lines[..end_of_before];
let hunk_after = &lines[end_of_before..];
if hunk_after.is_empty() {
// Intentionally empty.
} else if hunk_before.is_empty() {
assert_eq!(hunk_after, ["line 4\n"]);
} else {
}

let mut new_lines_to_blame: Vec<Range<u32>> = Vec::new();

for range in &lines_to_blame {
if range.contains(&after.start) {
if range.contains(&after.end) {
// <---------->
// <--->
// <--> <->
new_lines_to_blame.push(range.start..after.start);
new_lines_to_blame.push((after.end + 1)..range.end);

lines_blamed.push(BlameEntry {
range: after.clone(),
oid: oid.clone(),
});
} else {
// <-------->
// <------->
// <-->
new_lines_to_blame.push(range.start..after.start);

lines_blamed.push(BlameEntry {
range: after.start..range.end,
oid: oid.clone(),
});
}
} else {
// <------->
// <------>
// <-->
new_lines_to_blame.push((after.end + 1)..range.end);

lines_blamed.push(BlameEntry {
range: range.start..after.end,
oid: oid.clone(),
});
}
}

assert_eq!(new_lines_to_blame, vec![0..3]);
assert_eq!(
lines_blamed,
vec![BlameEntry {
range: 3..4,
oid: ObjectId::from_str("9c2a7090627d0fffa9ed001bf7be98f86c2c8068").unwrap()
}]
);
assert_eq!(lines_blamed, vec![BlameEntry { range: 3..4, oid: *oid }]);
},
);

assert_eq!(lines, ["line 4\n"]);
}

fn odb_at(name: &str) -> gix_odb::Handle {
gix_odb::at(fixture_path().join(name).join(".git/objects")).unwrap()
}

fn fixture_path() -> PathBuf {
gix_testtools::scripted_fixture_read_only("make_blame_repo.sh").unwrap()
}
27 changes: 14 additions & 13 deletions gix-blame/tests/fixtures/make_blame_repo.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,19 @@ git init -q
git config merge.ff false

git checkout -q -b main
git commit -q --allow-empty -m c1
git tag at-c1
git commit -q --allow-empty -m c2
git commit -q --allow-empty -m c3
git commit -q --allow-empty -m c4

git checkout -q -b branch1
git commit -q --allow-empty -m b1c1
git tag at-b1c1
git commit -q --allow-empty -m b1c2
echo "line 1" >> file.txt
git add file.txt
git commit -q -m c1

git checkout -q main
git commit -q --allow-empty -m c5
git tag at-c5
git merge branch1 -m m1b1
echo "line 2" >> file.txt
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd do the same - start extremely simple, maybe do the first rough implementation so it passes this test, and then think of some tougher cases to throw at it, validating that they still come out right.

Maybe it's worth investing into a baseline test which parses the output of git blame to get the expected output, which then has to be matched by the algorithm. These are typically the second stage as they are more obscure, but make it easier to get correct expectations for a lot of possibly complex inputs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the input, sounds like a great plan!

git add file.txt
git commit -q -m c2

echo "line 3" >> file.txt
git add file.txt
git commit -q -m c3

echo "line 4" >> file.txt
git add file.txt
git commit -q -m c4
Loading