Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load FOF catalogues & Cosmology Metadata Changes #190

Merged
merged 39 commits into from
Sep 19, 2024
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
143d65f
Make reader more generic
robjmcgibbon Jun 18, 2024
3d0187a
Load SOAP catalogues
robjmcgibbon Jun 20, 2024
9ca0d29
Read all SOAP dataset attributes
robjmcgibbon Jun 21, 2024
4ac1cba
Formatter
robjmcgibbon Jun 21, 2024
8b94a0b
Add new SWIFT attributes
robjmcgibbon Jul 16, 2024
0e6c14c
Squash bug registering field names
robjmcgibbon Aug 2, 2024
186968a
Add swiftsimio.metadata.soap to module list.
kyleaoman Aug 20, 2024
8b8937f
Bugfixes following changes in particle_types.py
kyleaoman Aug 21, 2024
43fe3b3
Add missing visualisation.smoothing_length to pyproject.toml.
kyleaoman Aug 21, 2024
3292d56
Implement an option to mask a SOAP catalogue down to a single row.
kyleaoman Aug 27, 2024
90f9e4c
Merge branch 'master' into load_fof_catalogues
robjmcgibbon Sep 2, 2024
6a1675b
Fix some tests
robjmcgibbon Sep 3, 2024
7a48a7d
Comment out failing test
robjmcgibbon Sep 3, 2024
b4d7649
Format
robjmcgibbon Sep 3, 2024
2ac42bf
Add InvalidConversionError
robjmcgibbon Sep 12, 2024
64b8d94
Format
robjmcgibbon Sep 12, 2024
695f7e1
Added refactoring
JBorrow Sep 18, 2024
6b26291
Added test loading a SOAP catalogue
JBorrow Sep 18, 2024
988dc28
Nice names
robjmcgibbon Sep 18, 2024
276f4fe
Packaging changes
JBorrow Sep 18, 2024
828d649
Added optional named column metadata
JBorrow Sep 18, 2024
aca5048
Add shared cell counts
robjmcgibbon Sep 18, 2024
e6b4a62
Formatting and cleanup
JBorrow Sep 19, 2024
d8ccdae
Merge pull request #195 from SWIFTSIM/load_fof_catalogues_refactor
JBorrow Sep 19, 2024
c7cd6b0
Drop 3.8 and 3.9
JBorrow Sep 19, 2024
c325047
Merge branch 'load_fof_catalogues' of https://github.com/swiftsim/swi…
JBorrow Sep 19, 2024
77e361c
Fix broken test and remove unused writer object
JBorrow Sep 19, 2024
ba2491e
Add `Writer` back
JBorrow Sep 19, 2024
a472d71
Use uv for package installation
JBorrow Sep 19, 2024
3188823
Use system?
JBorrow Sep 19, 2024
fab3d14
Formatting
JBorrow Sep 19, 2024
155e626
where does system go???
JBorrow Sep 19, 2024
44e22e8
Remove pip caching
JBorrow Sep 19, 2024
68c0382
Go back to pip...
JBorrow Sep 19, 2024
66ea952
Merge branch 'master' into load_fof_catalogues
JBorrow Sep 19, 2024
d184d0b
Fix incorrect import now that particle datasets have been renamed
JBorrow Sep 19, 2024
43582fc
Fix volume render test
JBorrow Sep 19, 2024
2dd0eb8
Resolve warnings when accessing internal numpy objects
JBorrow Sep 19, 2024
45e2b8f
Added SOAP docs
JBorrow Sep 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@ packages = [
"swiftsimio.visualisation.projection_backends",
"swiftsimio.visualisation.slice_backends",
"swiftsimio.visualisation.tools",
"swiftsimio.visualisation.smoothing_length"
"swiftsimio.visualisation.smoothing_length",
"swiftsimio.metadata.soap"
]

[project]
Expand Down
12 changes: 9 additions & 3 deletions swiftsimio/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from .reader import *
from .writer import SWIFTWriterDataset
from .snapshot_writer import SWIFTSnapshotWriter
from .masks import SWIFTMask
from .statistics import SWIFTStatisticsFile
from .__version__ import __version__
Expand Down Expand Up @@ -109,5 +109,11 @@ def load_statistics(filename) -> SWIFTStatisticsFile:
return SWIFTStatisticsFile(filename=filename)


# Rename this object to something simpler.
Writer = SWIFTWriterDataset
class Writer:
def __new__(cls, *args, **kwargs):
# Default to SWIFTSnapshotWriter if no filetype is passed
filetype = kwargs.get("filetype", "snapshot")
if filetype == "snapshot":
return SWIFTSnapshotWriter(*args, **kwargs)
# TODO implement other writers
# elif filetype == '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it likely that anyone will want to write their own catalogue files with swiftsimio? Ideally we would just leave this feature alone until we are ready to make that change.

Given that people have to choose their 'file type' anyway, why don't we just make them choose the class they want to instantiate? We don't need to abstract this away.

104 changes: 64 additions & 40 deletions swiftsimio/masks.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
snapshots.
"""

import warnings

import unyt
import h5py

Expand Down Expand Up @@ -48,6 +50,10 @@ def __init__(self, metadata: SWIFTMetadata, spatial_only=True):
self.units = metadata.units
self.spatial_only = spatial_only

if self.metadata.filetype == "FOF":
# No virtual snapshots or cells metadata for fof currently
raise NotImplementedError("Masking not supported for FOF filetype")

if self.metadata.partial_snapshot:
raise InvalidSnapshot(
"You cannot use masks on partial snapshots. Please use the virtual "
Expand All @@ -65,9 +71,11 @@ def _generate_empty_masks(self):
types.
"""

for ptype in self.metadata.present_particle_names:
for group_name in self.metadata.present_group_names:
setattr(
self, ptype, np.ones(getattr(self.metadata, f"n_{ptype}"), dtype=bool)
self,
group_name,
np.ones(getattr(self.metadata, f"n_{group_name}"), dtype=bool),
)

return
Expand Down Expand Up @@ -100,21 +108,24 @@ def _unpack_cell_metadata(self):
# contain at least one of each type of particle).
sort = None

for ptype, pname in zip(
self.metadata.present_particle_types, self.metadata.present_particle_names
for group, group_name in zip(
self.metadata.present_groups, self.metadata.present_group_names
):
part_type = f"PartType{ptype}"
counts = count_handle[part_type][:]
offsets = offset_handle[part_type][:]
if self.metadata.filetype == "SOAP":
counts = count_handle["Subhalos"][:]
offsets = offset_handle["Subhalos"][:]
elif self.metadata.filetype == "snapshot":
counts = count_handle[group][:]
offsets = offset_handle[group][:]

# When using MPI, we cannot assume that these are sorted.
if sort is None:
# Only compute once; not stable between particle
# types if some datasets do not have particles in a cell!
sort = np.argsort(offsets)

self.offsets[pname] = offsets[sort]
self.counts[pname] = counts[sort]
self.offsets[group_name] = offsets[sort]
self.counts[group_name] = counts[sort]

# Also need to sort centers in the same way
self.centers = unyt.unyt_array(centers_handle[:][sort], units=self.units.length)
Expand All @@ -128,7 +139,7 @@ def _unpack_cell_metadata(self):

def constrain_mask(
self,
ptype: str,
group_name: str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but in general I don't like _name as an addition to the variable name. Given that we know its type is string, group is obviously a name.

quantity: str,
lower: unyt.array.unyt_quantity,
upper: unyt.array.unyt_quantity,
Expand All @@ -139,13 +150,13 @@ def constrain_mask(

We update the mask such that

lower < ptype.quantity <= upper
lower < group_name.quantity <= upper

The quantities must have units attached.

Parameters
----------
ptype : str
group_name : str
particle type

quantity : str
Expand All @@ -169,23 +180,17 @@ def constrain_mask(
print("Please re-initialise the SWIFTMask object with spatial_only=False")
return

current_mask = getattr(self, ptype)
current_mask = getattr(self, group_name)

particle_metadata = getattr(self.metadata, f"{ptype}_properties")
group_metadata = getattr(self.metadata, f"{group_name}_properties")
unit_dict = {
k: v
for k, v in zip(
particle_metadata.field_names, particle_metadata.field_units
)
k: v for k, v in zip(group_metadata.field_names, group_metadata.field_units)
}

unit = unit_dict[quantity]

handle_dict = {
k: v
for k, v in zip(
particle_metadata.field_names, particle_metadata.field_paths
)
k: v for k, v in zip(group_metadata.field_names, group_metadata.field_paths)
}

handle = handle_dict[quantity]
Expand All @@ -203,7 +208,7 @@ def constrain_mask(

current_mask[current_mask] = new_mask

setattr(self, ptype, current_mask)
setattr(self, group_name, current_mask)

return

Expand Down Expand Up @@ -282,7 +287,7 @@ def _generate_cell_mask(self, restrict):

return cell_mask

def _update_spatial_mask(self, restrict, ptype: str, cell_mask: np.array):
def _update_spatial_mask(self, restrict, group_name: str, cell_mask: np.array):
"""
Updates the particle mask using the cell mask.

Expand All @@ -296,28 +301,28 @@ def _update_spatial_mask(self, restrict, ptype: str, cell_mask: np.array):
restrict : list
currently unused

ptype : str
group_name : str
particle type to update

cell_mask : np.array
cell mask used to update the particle mask
"""

if self.spatial_only:
counts = self.counts[ptype][cell_mask]
offsets = self.offsets[ptype][cell_mask]
counts = self.counts[group_name][cell_mask]
offsets = self.offsets[group_name][cell_mask]

this_mask = [[o, c + o] for c, o in zip(counts, offsets)]

setattr(self, ptype, np.array(this_mask))
setattr(self, f"{ptype}_size", np.sum(counts))
setattr(self, group_name, np.array(this_mask))
setattr(self, f"{group_name}_size", np.sum(counts))

else:
counts = self.counts[ptype][~cell_mask]
offsets = self.offsets[ptype][~cell_mask]
counts = self.counts[group_name][~cell_mask]
offsets = self.offsets[group_name][~cell_mask]

# We must do the whole boolean mask business.
this_mask = getattr(self, ptype)
this_mask = getattr(self, group_name)

for count, offset in zip(counts, offsets):
this_mask[offset : count + offset] = False
Expand Down Expand Up @@ -367,8 +372,8 @@ def constrain_spatial(self, restrict, intersect: bool = False):
# we just make a new mask
self.cell_mask = self._generate_cell_mask(restrict)

for ptype in self.metadata.present_particle_names:
self._update_spatial_mask(restrict, ptype, self.cell_mask)
for group_name in self.metadata.present_group_names:
self._update_spatial_mask(restrict, group_name, self.cell_mask)

return

Expand All @@ -391,21 +396,40 @@ def convert_masks_to_ranges(self):
# Use the accelerate.ranges_from_array function to convert
# This into a set of ranges.

for ptype in self.metadata.present_particle_names:
for group_name in self.metadata.present_group_names:
setattr(
self,
ptype,
group_name,
# Because it nests things in a list for some reason.
np.where(getattr(self, ptype))[0],
np.where(getattr(self, group_name))[0],
)

setattr(self, f"{ptype}_size", getattr(self, ptype).size)
setattr(self, f"{group_name}_size", getattr(self, group_name).size)

for ptype in self.metadata.present_particle_names:
setattr(self, ptype, ranges_from_array(getattr(self, ptype)))
for group_name in self.metadata.present_group_names:
setattr(self, group_name, ranges_from_array(getattr(self, group_name)))

return

def constrain_index(self, index: int):
"""
Constrain the mask to a single row.

Intended for use with SOAP catalogues, mask to read only a single row.

Parameters
----------
index : int
The index of the row to select.
"""
if not self.metadata.filetype == "SOAP":
warnings.warn("Not masking a SOAP catalogue, nothing constrained.")
return
for group_name in self.metadata.present_group_names:
setattr(self, group_name, np.array([[index, index + 1]]))
setattr(self, f"{group_name}_size", 1)
return

def get_masked_counts_offsets(self) -> (Dict[str, np.array], Dict[str, np.array]):
"""
Returns the particle counts and offsets in cells selected by the mask
Expand Down
2 changes: 2 additions & 0 deletions swiftsimio/metadata/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
from .particle import particle_types
from .particle import particle_fields

from .soap import soap_types

from .unit import unit_types
from .unit import unit_fields

Expand Down
2 changes: 2 additions & 0 deletions swiftsimio/metadata/metadata/metadata_fields.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@
"NumPart_ThisFile": "num_part",
"CanHaveTypes": "has_type",
"NumFilesPerSnapshot": "num_files_per_snapshot",
"OutputType": "output_type",
"SubhaloTypes": "subhalo_types",
}

# Some of these 'arrays' are really types of mass table, so unpack
Expand Down
42 changes: 21 additions & 21 deletions swiftsimio/metadata/particle/particle_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,31 +4,31 @@

# Describes the conversion of particle types to names
particle_name_underscores = {
0: "gas",
1: "dark_matter",
2: "boundary",
3: "sinks",
4: "stars",
5: "black_holes",
6: "neutrinos",
"PartType0": "gas",
"PartType1": "dark_matter",
"PartType2": "boundary",
"PartType3": "sinks",
"PartType4": "stars",
"PartType5": "black_holes",
"PartType6": "neutrinos",
}

particle_name_class = {
0: "Gas",
1: "DarkMatter",
2: "Boundary",
3: "Sinks",
4: "Stars",
5: "BlackHoles",
6: "Neutrinos",
"PartType0": "Gas",
"PartType1": "DarkMatter",
"PartType2": "Boundary",
"PartType3": "Sinks",
"PartType4": "Stars",
"PartType5": "BlackHoles",
"PartType6": "Neutrinos",
}

particle_name_text = {
0: "Gas",
1: "Dark Matter",
2: "Boundary",
3: "Sinks",
4: "Stars",
5: "Black Holes",
6: "Neutrinos",
"PartType0": "Gas",
"PartType1": "Dark Matter",
"PartType2": "Boundary",
"PartType3": "Sinks",
"PartType4": "Stars",
"PartType5": "Black Holes",
"PartType6": "Neutrinos",
Comment on lines -7 to +33
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed present_particle_types to return "PartType{i}" rather than integers

return [f"PartType{i}" for i in types]

So I could either change this file, or else I'd have to convert back to an integer when getting the particle name

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we not able to remove these for modern snapshots? As they contain the particle type names and fields named appropriately..? I'd like to use those so we don't have to add/handle additional or renamed particle types manually.

}
1 change: 1 addition & 0 deletions swiftsimio/metadata/soap/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .soap_types import *
32 changes: 32 additions & 0 deletions swiftsimio/metadata/soap/soap_types.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
"""
Includes the fancy names.
"""

# Describes the conversion of hdf5 groups to names
def get_soap_name_underscore(group: str) -> str:
soap_name_underscores = {
"BoundSubhalo": "bound_subhalo",
"InputHalos": "input_halos",
"InclusiveSphere": "inclusive_sphere",
"ExclusiveSphere": "exclusive_sphere",
"SO": "spherical_overdensity",
"SOAP": "soap",
"ProjectedAperture": "projected_aperture",
}
split_name = group.split("/")
split_name[0] = soap_name_underscores[split_name[0]]
return "_".join(name.lower() for name in split_name)
Comment on lines +5 to +18
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we do this and not handle the conversion in the same way as field names?



def get_soap_name_nice(group: str) -> str:
soap_name_nice = {
"BoundSubhalo": "BoundSubhalo",
"InputHalos": "InputHalos",
"InclusiveSphere": "InclusiveSphere",
"ExclusiveSphere": "ExclusiveSphere",
"SO": "SphericalOverdensity",
"SOAP": "SOAP",
"ProjectedAperture": "ProjectedAperture",
}
split_name = group.split("/")
return "".join(name.capitalize() for name in split_name)
Loading
Loading