Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REQUEST: provide a bool arg to get_chunks that doesn't cache the data #14

Open
jason-green-io opened this issue Apr 1, 2024 · 4 comments
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@jason-green-io
Copy link

I'l like to iterate over all chunks in a world, but since the Regions hold on to the data, my RAM goes poof. Is there a way to "uncache" the data as I go along?

@MestreLion MestreLion added good first issue Good for newcomers enhancement New feature or request labels Apr 9, 2024
@MestreLion
Copy link
Owner

Hum, that's a very good point! Yes, it can eat a lot of RAM if you load all chunks of a big world.

To implement that bool arg (cache?) a draft roadmap could be this:

  • First implement a Regions.uncache((rx, rz)) method, so clients can at least manually uncache a region when they're done with the data. It would merely replace the value in its dictionary from a (loaded) RegionFile back to its file Path.
  • Note this wouldn't free RAM if the client still have references to the RegionFile instances. It'd be up to them to also delete their references and let Python's GC take care of the rest. But at least the library itself wouldn't have any references of its own.
  • That's why it may be preferred to call it uncache() instead of a more traditional unload(), as I'm actually not unloading anything.
  • Optionally, if RegionFile instances have a reference to their parent Regions and their indexes in it (I don't remember if they do), then they could also have a RegionFile.uncache() method as a convenience. I'm not sure on that as it is a bit misleading, as it won't actually unload itself, it would just call the above and un-reference itself from Regions.
  • Then World.get_chunks() and friends (not sure if I implemented a Regions.get_chunks()) could have that cache arg, which defaults to True to keep current behavior.
  • It just came to me that Regions might not have an "easy" way to know the file paths. I don't remember who does the initial filesystem scan for the available region files, if Regions or World. So there might have additional steps and shuffling methods around to make Regions.uncache happen. Not hard, but might be not trivial.

In any case, as much as I endorse this feature, I'm currently too busy to come back to this project and implement this. But I'd welcome any PR that does, it could use the ideas I've outlined above as a guidance.

@MestreLion
Copy link
Owner

Improvement: regardless of who does the initial filesystem scan, Regions.uncache((rx, rz)) can derive the file path from its arguments, and it should check if the file (still) exists: if it does then replace the value with Path; if not then delete self[rx, rz] altogether.

@MestreLion MestreLion added the help wanted Extra attention is needed label Apr 17, 2024
@NielsPichon
Copy link

NielsPichon commented May 29, 2024

It just came to me that Regions might not have an "easy" way to know the file paths. I don't remember who does the initial filesystem scan for the available region files, if Regions or World.

My understanding is this is done by the world when loading the dimensions and the loading is handled here. upon calling regions[pos].

So a naive implementation of uncache could be something along the lines of:

# in Regions
def uncache(self, pos):
    # replace reference to RegionFile with that of the corresponding RegionPos
    self[pos] = self[pos].pos

If we also want to check the underlying file still exists, we can add the following in the method:

   # with `recursive` as boolean kwarg to the method, defaults to False

    path = os.path.join(self.world.path, self.dimension.subfolder(), self.category)
    candidates = list(pathlib.Path(path).glob(f"{'**/' if recursive else ''}r.{pos.filepart}.mca"))
    if len(candidates) == 0:
        del self[pos]

I am not entirely sure this last part is needed though, at least as long as the user manually calls uncache (which means they are responsible ans should not what they are doing IMO)

Optionally, if RegionFile instances have a reference to their parent Regions and their indexes in it (I don't remember if they do), then they could also have a RegionFile.uncache() method as a convenience. I'm not sure on that as it is a bit misleading, as it won't actually unload itself, it would just call the above and un-reference itself from Regions.

They do so we could have an uncache method there too:

# in RegionFile
def uncache(self):
    self.regions.uncache(self.pos)

@MestreLion
Copy link
Owner

Partially fixed by #16. Thanks to @NielsPichon now it's trivial to add the requested cache arg. As long as it's optional and defaults to False for backward-compatibility (at least for now), I'm fine with any PR implementing this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants