Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Improve [empty] block indexing by using an alternative to symlinks with absolute paths #3853

Closed
easy2stake opened this issue Oct 16, 2024 · 6 comments · Fixed by #3861
Assignees
Labels
enhancement New feature or request external Issues created by non node team members

Comments

@easy2stake
Copy link

easy2stake commented Oct 16, 2024

Hi,

Semantic version: v0.18.2-arabica
Commit: 4309c8349857638b033a2a278da0f8ab182fdb26
Build Date: Tue Oct  8 01:11:02 PM EEST 2024
System version: amd64/linux
Golang version: go1.23.2

The "issue":

Right now, on celestia mocha there are 634247 links inside blocks/heights folder, all poiting to:
/home/celestia-bridge-t-shwap/.celestia-bridge-mocha-4/blocks/3D96B7D238E7E0456F6AF8E7CDF0A67BD6CF9C2089ECB559C659DCAA1F880353.ods

The above is the "absolute" path in my filesystem.
And this is the reason I'm asking for this "feature request", I didn't consider it a BUG, but it's problematic being "absolute".

Why is this problematic:
The links above are all using absolute paths towards the block hash 3D96B7D238E7E0456F6AF8E7CDF0A67BD6CF9C2089ECB559C659DCAA1F880353.ods which is a "special empty block" accrodiing to @Wondertan

For node runners who run archival bridge nodes these links are:

  • limiting our ability to copy the database from one machine to another
  • it is complicated to have backups / snapshots to restore from
  • some cloning tools such as rclone, doesn't fully support symlinks when copying files
  • it makes it harder to minimize downtime if, for some reason, you have to restore it
  • you can't even move the datadir on the same machine but another disk, such as a bigger partition when this will grow large enough

Solution:
The first thing that crossed my mind was, Why not use a KVdb for this?
And it seems it's overkill since this behavior only happens when empty blocks are created, and we don't expect to many empty blocks on mainnet.

Then another idea:
Why not using the same symlinks but make them relative considering the directory structure. The current structure:

.celestia-node-datadir/blocks/3D96B7D238E7E0456F6AF8E7CDF0A67BD6CF9C2089ECB559C659DCAA1F880353.ods
.celestia-node-datadir/blocks/heights/here-are-the-symlinks

Therefore, each symlink could point to ../3D96B7D238E7E0456F6AF8E7CDF0A67BD6CF9C2089ECB559C659DCAA1F880353.ods.

Not sure if this could have other implications, but I hope we can elaborate on the subject.
The database should be 100% flexible and we should be able to move it from X to Y instead of beeying rooted on the folder it was first initialized.

@easy2stake easy2stake added the enhancement New feature or request label Oct 16, 2024
@github-actions github-actions bot added the external Issues created by non node team members label Oct 16, 2024
@Wondertan
Copy link
Member

Wondertan commented Oct 16, 2024

So, do you say the root problem is that symlinks have an absolute path rather than a relative one? Should be an easy fix

@easy2stake
Copy link
Author

easy2stake commented Oct 17, 2024

Correct !
Moving from absolute to relative paths will be super useful for all operators!

LE:
I think this should also be done retroactively.
When then node will first start, it should convert all "absolute paths" to "relative paths". And use some sort of flag, if this was once performed don't try it again at the next restart.

Also some logging mentioning that it may take a while to convert all these links.


Ideally there should be no symlinks at all because of the partial compatibility with backup tools.
If you don't know there are symlinks inside the DB, your backup may fail. If you do know, then you can take a proper backup.
Example:

  • rsync: you need to pass the -l option for a proper directory sincronisation
  • rclone: fails unless -l is passed and it only work for local to local clone. Used with remotes it will fail to copy your link files no matter what you try (this was my situation recently)
  • tar: It includes symlinks by default, and if symlinks are relative it will not fail

@Wondertan
Copy link
Member

We might provide automatic converting from absolute to relative depending on how hard it is. This is not a final release and we don't commit to full backwards support, so bugs like this are expected and may require full resync.


If passing an option -l allows this to work, then this issues can be solved via documentation, rather then migrating off symlinks. The symlink are actually a hack around harlink limits in ext4.

@walldiss
Copy link
Member

Thank you for issue! We will use relative path in symlinks starting from the next version.
The migration tool will not be included in the node itself, there will be script that can be used on demand. I've made small script to perform migration if needed. It will be also linked in release notes.
Links migration script

@easy2stake
Copy link
Author

I used a similar script to migrate my symlinks. That would work.
Relative paths + migration script mentioned in documentation / release notes. Thanks a lot for considering this, it will be really helpful for ops.

As for backing up I agree it's worth mentioning the -l flag for rsync also for awareness. People will know what to do then.

@easy2stake
Copy link
Author

easy2stake commented Oct 18, 2024

I tested your script against my db. It takes ~30 mins to complete and it's working fine. 634325 entries updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request external Issues created by non node team members
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants