Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

irmin-pack-tool: Add a tool for last indexed accessible commits #2218

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

clecat
Copy link
Contributor

@clecat clecat commented Mar 17, 2023

This PR aims to help restoring a store to a previous state when a corruption occurs.

It uses the index of the store in order to find the latest commit indexed which is available in the suffix files.
It should become useful in the case of a control file being updated, but the suffixes staying behind (i.e. power outage).

There might be some work to do in order to ensure that we do it correctly, but a good direction we discussed @art-w and I for this PR would be to re-generate a working control file which a user could use in order to avoid deleting and reinstalling it's store.
We might however have some issues with data registered in the index and such, but we still wanted to create a draft PR to open the discussion.

One last point is that this PR currently makes several assumptions based on the control file and it's validity, and we might want to create additional checks which do not use it.

@clecat clecat added the no-changelog-needed No changelog is needed here label Mar 17, 2023
Copy link
Member

@metanivek metanivek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this!

@@ -95,5 +96,37 @@ However, it might be useful to sort the json output using the offset. You can do
$ jq -s 'sort_by(.off)' -- index
```

## last-valid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## last-valid
## last-accessible

let main_cmd =
let doc =
"gives the last accessible commit informations stored in the index of an \
upper store"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
upper store"
store"

Arg.(
required
& pos 0 (some string) None
& info [] ~docv:"root folder" ~doc:"the path to the store")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
& info [] ~docv:"root folder" ~doc:"the path to the store")
& info [] ~docv:"root folder" ~doc:"the path to the store root")

```

Note that this tool makes several assumptions about the state of the store:
- The store is an upper store.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The store is an upper store.
- The path given to <path-to-store> is the store's root.

Conceptually, the upper and (optional) lower layer are the store. It's a nit, but perhaps this point is that the path provided in <path-to-store> is the config root, not lower_root. I also make some suggestions below to attempt to make this consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, thx for the clarification

@@ -0,0 +1,8 @@
(executable
(public_name irmin-last_accessible)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think it would look better to not mix dash and underscore 🥺

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, thx

let r = Upper_control.read_payload ~path:control_file in
match r with Error err -> Io_errors.raise_error err | Ok payload -> payload

let get_suffixes_sizes root
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let get_suffixes_sizes root
let get_suffix_size root

Since you're only returning a single size from this function.

let r = ref 0 in
for i = payload.chunk_start_idx to payload.chunk_num do
let suffix = Irmin_pack.Layout.V5.suffix_chunk ~chunk_idx:i ~root in
let stats = Unix.stat suffix in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it matters too much but it might be nice to use the Io.size_of_path function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, thx

let payload = get_payload root_folder in
let sizes = get_suffixes_sizes root_folder payload in
let start_offset, dead_bytes = get_status_infos payload in
let off_max = sizes + start_offset - dead_bytes in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fully backwards compatible, this needs to take into account the dead header size (see this function) by subtracting it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, I thought that maybe the From_v1_v2_post_upgrade would need something specific, but I didnt know what and was waiting for tests to tell me what !

let r = ref { hash = "n/a"; off = Int63.zero; len = 0; kind = "n/a" } in
let off_max = Int63.of_int off_max in
let f k v =
let hash = Base64.encode_exn (Index.Key.encode k) in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use something like the following (haven't checked it compiles 😅) to give a string representation (since index's Key.t derives repr) that doesn't use base64.

Suggested change
let hash = Base64.encode_exn (Index.Key.encode k) in
let hash = (Irmin.Type.to_string Index.Key.t) k in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check that thx

@art-w
Copy link
Contributor

art-w commented Aug 1, 2023

Thanks! Do you mind addressing @metanivek comments and running dune build @fmt? I think this PR is an important step towards behind able to repair a broken control file :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-changelog-needed No changelog is needed here
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants