-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load FOF catalogues & Cosmology Metadata Changes #190
Conversation
0: "gas", | ||
1: "dark_matter", | ||
2: "boundary", | ||
3: "sinks", | ||
4: "stars", | ||
5: "black_holes", | ||
6: "neutrinos", | ||
"PartType0": "gas", | ||
"PartType1": "dark_matter", | ||
"PartType2": "boundary", | ||
"PartType3": "sinks", | ||
"PartType4": "stars", | ||
"PartType5": "black_holes", | ||
"PartType6": "neutrinos", | ||
} | ||
|
||
particle_name_class = { | ||
0: "Gas", | ||
1: "DarkMatter", | ||
2: "Boundary", | ||
3: "Sinks", | ||
4: "Stars", | ||
5: "BlackHoles", | ||
6: "Neutrinos", | ||
"PartType0": "Gas", | ||
"PartType1": "DarkMatter", | ||
"PartType2": "Boundary", | ||
"PartType3": "Sinks", | ||
"PartType4": "Stars", | ||
"PartType5": "BlackHoles", | ||
"PartType6": "Neutrinos", | ||
} | ||
|
||
particle_name_text = { | ||
0: "Gas", | ||
1: "Dark Matter", | ||
2: "Boundary", | ||
3: "Sinks", | ||
4: "Stars", | ||
5: "Black Holes", | ||
6: "Neutrinos", | ||
"PartType0": "Gas", | ||
"PartType1": "Dark Matter", | ||
"PartType2": "Boundary", | ||
"PartType3": "Sinks", | ||
"PartType4": "Stars", | ||
"PartType5": "Black Holes", | ||
"PartType6": "Neutrinos", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed present_particle_types
to return "PartType{i}" rather than integers
swiftsimio/swiftsimio/reader.py
Line 652 in 143d65f
return [f"PartType{i}" for i in types] |
So I could either change this file, or else I'd have to convert back to an integer when getting the particle name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we not able to remove these for modern snapshots? As they contain the particle type names and fields named appropriately..? I'd like to use those so we don't have to add/handle additional or renamed particle types manually.
I'm now loading SOAP catalogues as well. The groups which we want to load are stored in the SOAP metadata. A slight complication is that we have subgroups, i.e. the 200c halos properties are stored in the group Masking seems to work for the SOAP catalogues. The FOF catalogues aren't sorted so we can't apply masks to them. |
SOAP contains some additional metadata for each dataset that we need to read in.
|
Hi Rob, little thing that I noticed trying to experiment a bit with this, if I install normally (
If I install via link ( |
Gotta add the submodule to the list in the pyproject.toml |
When you get to the stage of looking at the masking, could I make a request? I know that it was planned to be able to apply spatial masks like we do with particle data. Could we also have an option to read a user-defined slice of the subhalo table (might have to handle subhalo and fof tables separately? can't remember the structure in SOAP hdf5 files and the sample seems to have moved/vanished). It's possible that there's nothing extra to be done since with a
I would like to read only the single row of interest. I said slice above because there could be use cases (hypothetically) to read a range or other slice, and it's the same amount of work to implement this as masking to a single row. Like I said I don't think much is needed because the masking tools just about support this already, but I guess the code needs to understand what's an array of subhalo data vs fof data vs anything else. |
Sorry about moving the example catalogue, they're now at I can make a new method for SWIFTMask called something like |
Makes sense. In EAGLE we had |
@JBorrow update on the current status of this in case you have time to look it over before the meeting next week This change adds an attribute to the SWIFTMetadata class called "filetype" which can be either "snapshot", "FOF", or "SOAP". I haven't changed much when it comes to reading in the files, everything has just been made more generic so instead of requiring the data to be in groups named "PartTypeX", the groups can have any names. Masking appears to be working for the "SOAP" filetype, it's not supported for "FOF" since SWIFT does not output spatially sorted catalogues. SWIFT has added two new attributes to it's units metadata - one to indicate if a value is stored as physical, another to indicate if a value can be transformed between comoving and physical. This was motivated by quantities such as BirthDensities, which it doesn't make sense to convert. These attributes are now read in when reading a file, and the cosmo_array has a Things to be done:
|
I am not sure that there's any functionality that only makes sense for snapshots. Visualisation is still useful with halo catalogues; for instance I use that in a hacky way right now to create cross spectra with haloes... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would really like to see a comprehensive test suite here to make sure we're not missing anything.
swiftsimio/__init__.py
Outdated
if filetype == "snapshot": | ||
return SWIFTSnapshotWriter(*args, **kwargs) | ||
# TODO implement other writers | ||
# elif filetype == ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it likely that anyone will want to write their own catalogue files with swiftsimio? Ideally we would just leave this feature alone until we are ready to make that change.
Given that people have to choose their 'file type' anyway, why don't we just make them choose the class they want to instantiate? We don't need to abstract this away.
@@ -128,7 +139,7 @@ def _unpack_cell_metadata(self): | |||
|
|||
def constrain_mask( | |||
self, | |||
ptype: str, | |||
group_name: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, but in general I don't like _name
as an addition to the variable name. Given that we know its type is string, group
is obviously a name.
0: "gas", | ||
1: "dark_matter", | ||
2: "boundary", | ||
3: "sinks", | ||
4: "stars", | ||
5: "black_holes", | ||
6: "neutrinos", | ||
"PartType0": "gas", | ||
"PartType1": "dark_matter", | ||
"PartType2": "boundary", | ||
"PartType3": "sinks", | ||
"PartType4": "stars", | ||
"PartType5": "black_holes", | ||
"PartType6": "neutrinos", | ||
} | ||
|
||
particle_name_class = { | ||
0: "Gas", | ||
1: "DarkMatter", | ||
2: "Boundary", | ||
3: "Sinks", | ||
4: "Stars", | ||
5: "BlackHoles", | ||
6: "Neutrinos", | ||
"PartType0": "Gas", | ||
"PartType1": "DarkMatter", | ||
"PartType2": "Boundary", | ||
"PartType3": "Sinks", | ||
"PartType4": "Stars", | ||
"PartType5": "BlackHoles", | ||
"PartType6": "Neutrinos", | ||
} | ||
|
||
particle_name_text = { | ||
0: "Gas", | ||
1: "Dark Matter", | ||
2: "Boundary", | ||
3: "Sinks", | ||
4: "Stars", | ||
5: "Black Holes", | ||
6: "Neutrinos", | ||
"PartType0": "Gas", | ||
"PartType1": "Dark Matter", | ||
"PartType2": "Boundary", | ||
"PartType3": "Sinks", | ||
"PartType4": "Stars", | ||
"PartType5": "Black Holes", | ||
"PartType6": "Neutrinos", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we not able to remove these for modern snapshots? As they contain the particle type names and fields named appropriately..? I'd like to use those so we don't have to add/handle additional or renamed particle types manually.
# Describes the conversion of hdf5 groups to names | ||
def get_soap_name_underscore(group: str) -> str: | ||
soap_name_underscores = { | ||
"BoundSubhalo": "bound_subhalo", | ||
"InputHalos": "input_halos", | ||
"InclusiveSphere": "inclusive_sphere", | ||
"ExclusiveSphere": "exclusive_sphere", | ||
"SO": "spherical_overdensity", | ||
"SOAP": "soap", | ||
"ProjectedAperture": "projected_aperture", | ||
} | ||
split_name = group.split("/") | ||
split_name[0] = soap_name_underscores[split_name[0]] | ||
return "_".join(name.lower() for name in split_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we do this and not handle the conversion in the same way as field names?
""" | ||
|
||
# TODO: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is TODO
swiftsimio/reader.py
Outdated
try: | ||
# SOAP catalogues can be compressed/uncompressed | ||
is_compressed = dataset.attrs["Is Compressed"] | ||
except KeyError: | ||
is_compressed = True | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_compressed = dataset.attrs.get("Is Compressed", True)
swiftsimio/reader.py
Outdated
except AttributeError: | ||
# Compression is saved as str not bytes | ||
comp = dataset.attrs["Lossy compression filter"] | ||
except KeyError: | ||
# Can't load compression string! | ||
comp = "No compression info available" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
swiftsimio/reader.py
Outdated
try: | ||
physical = dataset.attrs["Value stored as physical"][0] == 1 | ||
except: | ||
physical = False | ||
return physical |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't like this at all; never have non-specific exceptions. Can you do:
return bool(dataset.attrs.get("Value stored as physical", [1])[0])
swiftsimio/reader.py
Outdated
def get_valid_transform(dataset): | ||
try: | ||
valid_transform = ( | ||
dataset.attrs["Property can be converted to comoving"][0] == 1 | ||
) | ||
except: | ||
valid_transform = True | ||
return valid_transform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same concern
swiftsimio/reader.py
Outdated
if group_metadata.metadata.filetype == "snapshot": | ||
group_nice_name = metadata.particle_types.particle_name_class[group] | ||
elif group_metadata.metadata.filetype == "FOF": | ||
group_nice_name = "FOFGroups" | ||
elif group_metadata.metadata.filetype == "SOAP": | ||
group_nice_name = metadata.soap_types.get_soap_name_nice(group) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why if we wrote all this code are we doing this three different ways for three catalogues that we wrote..?
Load fof catalogues refactor
…ftsimio into load_fof_catalogues
Allow swiftsimio to load FOF files. I've made the
SWIFTDataset
class more generic, so it now has afiletype
attribute (currently can be snapshot or FOF). Instead of storing an int for each particle type it now saves the names of the groups within the hdf5 that we want to load ('PartType{i}' for snapshots, and 'Groups' for fof files).This seems to work for loading both snapshots and fof files. I'll have a look at the tests next.