-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove phys prop #164
base: develop
Are you sure you want to change the base?
Remove phys prop #164
Conversation
- Remove more spurious normalizers
- Remove 2 extra editable annotations
- TODO: localize sub-sections
- Move out standalone property defintions to `properties/__init__.py`
Regarding re-grouping, I'm undecided on the final interface. Using electronic properties (e.g. eigenvalues, DOS, band structure), I'll illustrate the 2 options I see. Generically, any of these electronic property can be identified using any of the following attributes: particle kind*, semantic group**, k-point, spin. This list is not necessarily exhaustive. Overall, we'd like to present any user with an as similar as possible interface (similar idea in the old
(note: will add example figures later on) Hierarchies like these are easy to express in JSON key-value pairs, but mean deeper and different paths to traverse.
Alternatively, we could have a container section with repeating subsections that annotate all identifying attributes together, at the same level. This means more homogeneity in structure. This also mimics the plots (which are just overlays) better. It does make data traversal harder (especially without the metainfo browser to show
|
I am not sure I understand what "particle kind" and "semantic group" mean. I think particle kind might be info stored in ModelMethod, is that so? Can you put an example? And the same goes with semantic group: is this the degrees of freedom? Only corresponds to some index for orbital or plane-wave index depending on the basis? Furthermore, I don't follow your hierarchy for the different properties. How I consider this, from purely the perspective of the property per se is that we have:
The first JSON looks good. Now, also for @JFRudzinski: is the |
I find it more intuitive and easier to follow when there is a structured tree where i can conceptually drill down |
Sure, but imagine now having 5 of these trees side-by-side that have mostly similar, but not identical structures. |
You're pointing out the same observations as I did:
The 1st approach puts up a preferential structure / order to run over these indices. The 2nd approach groups them all together at the same level. This helps consistency and visualization. That's their main difference.
|
- Apply standard template to `DensityOfStates` - Add naming convention to `OrbitalsState` - Add few comments + correct typos
@JosePizarro3 Thank you for keeping an eye out and giving feedback. Sorry for the delay in updating you about PhysicalProperty, I hope this MR did not come as too much of a shock. There has been much movement in recent weeks and we are trying to make quick movement now in terms of schema dev and parser migration. I want to just slightly expand upon what Nathan shared: Starting from your prototype, Markus made his own implementation of PhysicalProperty, a bit more in a dataframe-like style. If you are interested we will be able to share more information in the near future and there will also be a cafe about it. Markus, Laurie, Hampus, and Nathan have been working to test and improve this implementation, and it is pretty close to usable now. Nathan did many tests of the new implementation into our schema. In the end, we came to the conclusion that applying this structure to every single property in our detailed schema was both tedious and probably not practical in the long run. Instead we focus for now on applying this new structure/tool to try to improve interoperability at a higher level, hence Nathan's mention of results and workflow2. In principle, it could also be used in data if an appropriate use case comes along. It's just that we don't implement it everywhere as default. This also helps us to make more progress on our schema immediately, which is our top priority. I'm happy to discuss further with you, also to see if there are specific use cases you have in mind that could be useful for further testing. Otherwise, we plan to tag you on any relevant MRs for potential input as we go through here. We appreciate your continued input! |
Sure, I just asked if PhysicalProperty was being reworked, and Variables deprecated. Whatever you and the others decide is ok for me. So the idea is to just do whatever in |
- Correct eigenvalues and DOS
Of course, I want to keep you updated 👍 I wouldn't exactly say that the idea is to do "whatever" in data. We are trying to develop some standardization within the schema and also some quasi templates for people to easily use when extending later. You will see here that Nathan kept several aspects of your PP implementation. It's just that we don't allow for the flexibility in data, at least not by default for all properties. If a use cases arises, PP can also be used in data. And yes, exactly, results and workflow2.results are responsible for interoperability. There are still some decisions exactly what is done with the results section. |
From my side, the second structure that Nathan suggested has some advantages for standardization and plotting. I think this will become more clear with some concrete examples... |
Pull Request Test Coverage Report for Build 13209227390Details
💛 - Coveralls |
- Update test case
- Fix plotting legend
- Remove re-use of quantities
Okay, feel free to give some preliminary feedback.
The most fleshed out example is the |
from nomad.metainfo import Quantity, Reference | ||
|
||
|
||
class ModelBaseSection(ArchiveSection): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a small note: the terminology "model" here is a bit confusing to me considering the other uses of model within our schema
I'm starting to set this up for testing but already having some import problems. ../nomad-simulations/src/nomad_simulations/schema_packages/properties/decomposable.py:1: in <module>
from nomad.metainfo import placeholder, Quantity, SubSection, Reference I can only find placeholder defined in the javascript part of nomad-FAIR. Perhaps it's due to the older mapping annotation branch. @ladinesa could you let me know when you rebase your nomad-FAIR branch, I tried myself, but the conflicts are too complicated/unknown to me...also I guess if you have any alternative insight into the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left some comments throughout. I tried to test some things out but there are too many uncertainties for me at this preliminary stage to really be able to implement a working example.
However, here are some thoughts about the overall construction and potential approaches as I see them:
Outputs section structure
Until told otherwise, I will assume that we are sticking to the following structure of Outputs()
which is a repeating subsection of simulation:
class Outputs(ArchiveSection):
model_system_ref = Quantity(
...
)
model_method_ref = Quantity(
...
)
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# List of properties
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
property_quantity = Quantity()
property_section = Subsection()
So far this is consistent with the approach of runschema.
With the new support for system hierarchies, this also allows us to point to any branch of a system hierarchy, and so, more generally define properties that only apply to a portion of the system.
Simple property quantities
There are a number properties that can be simply be defined directly under Outputs as quantities. Nathan already started collecting some of these in common_properties.py
. This probably includes most basic thermodynamic properties, temperature
, pressure
, etc, as we also did in runschema. I think we would opt to flatten the outputs structure compared a bit with runschema, which subgrouped semantically related "simple" properties (e.g., in the thermodynamic subsection).
I presume there is no controversy here, but please share if you think otherwise.
Properties with metadata
These are properties that have more than just a value, they also have some surrounding quantities that provide additional context to that value (note: I treat the case with multiple value types separately below). These properties are stored as subsections in output.
Again, this was also done in runschema, e.g., GreensFunctions.
There is probably some questions concerning subsection and "main value" naming conventions and such, which would should nail down, but otherwise it is clear we will need this structure.
One question is whether we need or want some sort of base section / template for such a class.
Properties with metadata and types
These are properties that contain multiple value types depending on the context.
In runschema, there seems to be a lack of consistency with how these were dealt with:
-
types of dos are defined as unique subsections under Calculation, but are defined with respect to an identical base class.
-
types of greens functions are stored within the same repeating subsection under outputs, with a metadata quantity
type
to distinguish them. -
types of eigenvalues/BandEnergies are defined with respect to a single class which lists out different types of values as distinct quantities:
values_correlation
,values_exchange
, etc. This is somewhat similar to how energy and force contributions are treated.
In this MR, Nathan is proposing some template classes for treating this class of properties. The underlying base class for this template comes with a set of plotting functionalities. However, at its core, it is suggesting that we group repeating value types in a repeating subsection of the property subsection of outputs, i.e.: outputs.property_w_types[].groups[].value
, with shared metadata being stored directly under property_w_types[]
Decomposable properties with metadata and types
This is a special class of property with types that has a type total
which presumably the sum of the other types. Although I am not sure if it was still intended in this PR, it is possible that we decide to treat these in a special way compared with the previous section, by lifting the total
quantity above the groups
or contributions
.
There is something toward this implementation in decomposable.py
, although again perhaps Nathan was thinking of treating this also with the SemanticGroupContainer()
Interoperable Physical Properties
This refers to Markus' PP implementation, which is akin to a data frame structure. One major consideration in this implementation is the flexibility of the dimensions to enable interoperability of properties defined in quite different contexts.
SUMMARY
In hope this helps to clarify a bit the situation. I think what remains to decide is 1. what kind of structure do we want to treat properties with metadata and types, 2. do we want to treat decomposable properties the same or different than 1.
The associated question then is if Nathan's template fits our desires or what amendments do we need to achieve our wish list.
To address Alvin's already voiced concern we need to clarify the fundamental difference to Markus' PP approach, both technically and in the desired support. Here the main difference I see is the flexibility. Also, at least at the moment I am viewing Nathan's template as a way to treat values of differing types within the same property, not to group distinct properties together, which is the secondary purpose of the data frame approach.
I look forward to hearing some more thoughts.
from structlog.stdlib import BoundLogger | ||
|
||
from nomad_simulations.schema_packages.utils import RussellSaundersState | ||
|
||
|
||
class OrbitalsState(Entity): | ||
class OrbitalsState(ArchiveSection): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are the +/- 's of using Entity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Entity
is an ArchiveSection
with a specific semantic meaning, as defined in OBO. Can link the specifics later.
@@ -206,7 +187,7 @@ class Simulation(BaseSimulation, Schema): | |||
|
|||
model_method = SubSection(sub_section=ModelMethod.m_def, repeats=True) | |||
|
|||
outputs = SubSection(sub_section=Outputs.m_def, repeats=True) | |||
outputs = SubSection(sub_section=KResolvedElectronicProperties.m_def, repeats=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is just for testing, but what would be your plan generally? Same structure as before? I think you mentioned not listing out the properties under Outputs()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, I prefer the flat structure under outputs, though we need the same approach for handling browsing (in the GUI, and when downloaded).
Some properties (like most of the solid state electronics) have shared normalization logic, and might be better off grouped together.
# ! extract name from value | ||
|
||
|
||
class TotalEnergy(DecomposableProperty): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still how you want to treat total energy? Or was this a precursor to your SemanticGroup?
super().__init__(*args, **kwargs) | ||
self.name = 'Total Energy' # ! | ||
|
||
class EnergyContribution(PropertyContribution): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation caused some difficulties for because it prevents me from directly importing EnergyContribution into my parser schema for potential extensions. This is a special case, but is it necessary or advantageous to define it within the other class?
) | ||
|
||
normalize_all(archive) | ||
with open('test.archive.json', 'w') as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately this archive did not get processed correctly for me, but I would like to browse this example within the gui at some point
Feel free to remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Molecular electronics is a subfield in nanotechnology, and is used for electronic devices usually.
Could we update the nomenclature as molecular_eigenstates.py? This way eigenstate - quantum state - physical property connection is clear.
Although I have no strong opinions on this, storing them within the same subsection makes the archive look more organized and clean, just in case of greens functions with a |
I am posting here a short summary of our meeting with Alvin this week, and some adjustments to the perspective that I posted above: Alvin proposes that instead of removing Chema's PP class, we simply remove the current validation checks, and amend this into a core base class for our properties. Beyond the benefit of less code changes, this implementation is already compatible with the mapping parser. He believes that the current variables implementation can be useful for a sort of normalization of quantities, when we have different types to define within a single property subsection. Note that this is no longer intended to be for flexibility of rank, the dataframe PP class from Markus should be used for such use-cases. For decomposable properties, we would opt for keeping the total outside of these groupings. Plotting functionalities can then be developed similar to this PR. @ndaelman-hu @ladinesa and myself should meet early next week to discuss this and to come to a consensus about how to move forward. My current perspective is that the proposals by Nathan and Alvin are not incompatible, rather with some adjustments we can accommodate the main features of each into one cohesive approach...at least, this is the goal. The new overall structure would be: Outputs section structureclass Outputs(ArchiveSection):
model_system_ref = Quantity(
...
)
model_method_ref = Quantity(
...
)
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# List of properties
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
property_section = Subsection(PP) i.e., all properties should use the PP base class (we should change the name to not confuse with the dataframe class). Simple property quantitiesNo simple properties as quantities under outputs Properties with metadatause the PP base class Properties with metadata and typesuse the PP base class with a standard groupings subsection Decomposable properties with metadata and typesuse the PP base class with a total value at the top level and standard groupings subsection Interoperable Physical Propertiesproperties that require flexibility use the data frame PP approach |
Make
data
solely reliant on sections and quantities, reserving the use ofDataframe
(newPhysicalProperty
) forresults
.Re-group properties to with a focus on shared normalized quantities, as well as visualization.
Closes #143 #85 #80