REQUEST: provide a bool arg to `get_chunks` that doesn't cache the data #14

jason-green-io · 2024-04-01T19:04:24Z

I'l like to iterate over all chunks in a world, but since the Regions hold on to the data, my RAM goes poof. Is there a way to "uncache" the data as I go along?

The text was updated successfully, but these errors were encountered:

MestreLion · 2024-04-09T02:30:37Z

Hum, that's a very good point! Yes, it can eat a lot of RAM if you load all chunks of a big world.

To implement that bool arg (cache?) a draft roadmap could be this:

First implement a Regions.uncache((rx, rz)) method, so clients can at least manually uncache a region when they're done with the data. It would merely replace the value in its dictionary from a (loaded) RegionFile back to its file Path.
Note this wouldn't free RAM if the client still have references to the RegionFile instances. It'd be up to them to also delete their references and let Python's GC take care of the rest. But at least the library itself wouldn't have any references of its own.
That's why it may be preferred to call it uncache() instead of a more traditional unload(), as I'm actually not unloading anything.
Optionally, if RegionFile instances have a reference to their parent Regions and their indexes in it (I don't remember if they do), then they could also have a RegionFile.uncache() method as a convenience. I'm not sure on that as it is a bit misleading, as it won't actually unload itself, it would just call the above and un-reference itself from Regions.
Then World.get_chunks() and friends (not sure if I implemented a Regions.get_chunks()) could have that cache arg, which defaults to True to keep current behavior.
It just came to me that Regions might not have an "easy" way to know the file paths. I don't remember who does the initial filesystem scan for the available region files, if Regions or World. So there might have additional steps and shuffling methods around to make Regions.uncache happen. Not hard, but might be not trivial.

In any case, as much as I endorse this feature, I'm currently too busy to come back to this project and implement this. But I'd welcome any PR that does, it could use the ideas I've outlined above as a guidance.

MestreLion · 2024-04-09T02:38:18Z

Improvement: regardless of who does the initial filesystem scan, Regions.uncache((rx, rz)) can derive the file path from its arguments, and it should check if the file (still) exists: if it does then replace the value with Path; if not then delete self[rx, rz] altogether.

NielsPichon · 2024-05-29T06:18:51Z

It just came to me that Regions might not have an "easy" way to know the file paths. I don't remember who does the initial filesystem scan for the available region files, if Regions or World.

My understanding is this is done by the world when loading the dimensions and the loading is handled here. upon calling regions[pos].

So a naive implementation of uncache could be something along the lines of:

# in Regions
def uncache(self, pos):
    # replace reference to RegionFile with that of the corresponding RegionPos
    self[pos] = self[pos].pos

If we also want to check the underlying file still exists, we can add the following in the method:

   # with `recursive` as boolean kwarg to the method, defaults to False

    path = os.path.join(self.world.path, self.dimension.subfolder(), self.category)
    candidates = list(pathlib.Path(path).glob(f"{'**/' if recursive else ''}r.{pos.filepart}.mca"))
    if len(candidates) == 0:
        del self[pos]

I am not entirely sure this last part is needed though, at least as long as the user manually calls uncache (which means they are responsible ans should not what they are doing IMO)

Optionally, if RegionFile instances have a reference to their parent Regions and their indexes in it (I don't remember if they do), then they could also have a RegionFile.uncache() method as a convenience. I'm not sure on that as it is a bit misleading, as it won't actually unload itself, it would just call the above and un-reference itself from Regions.

They do so we could have an uncache method there too:

# in RegionFile
def uncache(self):
    self.regions.uncache(self.pos)

MestreLion · 2024-09-29T06:51:31Z

Partially fixed by #16. Thanks to @NielsPichon now it's trivial to add the requested cache arg. As long as it's optional and defaults to False for backward-compatibility (at least for now), I'm fine with any PR implementing this

MestreLion added good first issue Good for newcomers enhancement New feature or request labels Apr 9, 2024

MestreLion added the help wanted Extra attention is needed label Apr 17, 2024

MestreLion mentioned this issue May 28, 2024

Memory grows out of hand #15

Closed

NielsPichon mentioned this issue May 29, 2024

Add the ability to uncache a loaded region #16

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REQUEST: provide a bool arg to `get_chunks` that doesn't cache the data #14

REQUEST: provide a bool arg to `get_chunks` that doesn't cache the data #14

jason-green-io commented Apr 1, 2024

MestreLion commented Apr 9, 2024

MestreLion commented Apr 9, 2024

NielsPichon commented May 29, 2024 •

edited

Loading

MestreLion commented Sep 29, 2024

REQUEST: provide a bool arg to get_chunks that doesn't cache the data #14

REQUEST: provide a bool arg to get_chunks that doesn't cache the data #14

Comments

jason-green-io commented Apr 1, 2024

MestreLion commented Apr 9, 2024

MestreLion commented Apr 9, 2024

NielsPichon commented May 29, 2024 • edited Loading

MestreLion commented Sep 29, 2024

REQUEST: provide a bool arg to `get_chunks` that doesn't cache the data #14

REQUEST: provide a bool arg to `get_chunks` that doesn't cache the data #14

NielsPichon commented May 29, 2024 •

edited

Loading