Study and proposal for improved subsetting

Table of Contents VisIt's Current Subsetting Features What is a subset? Domain Subsets Group Subsets Material Subsets Enumerated Scalar Subsets Subset GUI Controls (VisIt's Subset Window) Other limitations with current implementation Other features with subsetting consequences Tablularized Summary of Current Subsetting Features Species selections and their relationship to subsetting What about Data Selections? Notes on current implementation Design Goals of New Subsetting Support Subset Selection Expressions (SSEs) Illustrative Examples Support for Knowledge of Cross-Intersections General Algorithm for Querying an SSE for on/off State of a Set Subset Metadata Subset Details; Problem-sized Data Application of SSEs to Plots Phase 0 Implementation

VisIt's Current Subsetting Features

The ability to manage and display subsets of meshes is a key part of effective visualization, particularly for large, 3D datasets.

From its inception, VisIt has offered some degree of subsetting capabilities via its domain, group, material and more recently, enumerated scalar features. There are also other features of VisIt's operators and GUI that provide some limited ability to define and manipulate mesh subsets. These features and there current limitations are described in the sub-sections below.

What is a subset?

When we talk about subsets, the first question is...subset of what? Here, were are talking about subsets of a mesh. So, in order to even talk about subsets, we need a mesh. Because a mesh is typically composed of node, edge, face and/or volume entities, we frequently wind up defining subsets by identifying groups of those entities. A mesh represents a sort of bucket of node, edge, face and volume entities from which we can draw entities to define various subsets.

However, once we define just one subset of a mesh, defining a next subset now involves a choice. We can use either the new subset or again the mesh as a source bucket to enumerate the entities in the new subset. That is, we can choose to identify node, edge, face and volume elements from the new subset or from the mesh to define the next subset. This choice in the source bucket from which we pick mesh entities to define yet more subsets is an important aspect to defining and constructing collections of subsets.

In addition, there are two fundamentally different ways in which we can identify mesh entities to define subsets.

Label the entities: For each entity, identify the subsets that contain that entity
List the entities: For each subset, identify the entities that are contained in it

In the Labeling approach, if each entity belongs to only one of a group of subsets, a single scalar field can serve to define the whole group of subsets. If there are only a small number of subsets, say less than 256, a field of type unsigned char is sufficient to define all the subsets in the group. In addition, in this specific case, the group of subsets winds up defining a partition of the mesh (all the subsets so defined a pair-wise disjoint -- they have no members in common). This is typically the way in which materials are defined on a mesh.

The Listing approach is more general and is more amenable to supporting such things as partial inclusion where a mesh entity is only partially included in a subset and, for large numbers of subsets, is more likely to have better scaling behavior.

A final point to make is that users use subsets to define pieces of a mesh that have a meaningful name and represent a semantically meaningful portion of a mesh.

Domain Subsets

Domain is a key concept in VisIt's representation of meshes in parallel. A domain in VisIt represents a chunk of mesh. In some sense it is a storage quanta of a mesh. It is a piece of mesh that is stored and manipulated internally within VisIt as one VTK (grid) object. Likewise, any variable defined on the mesh is stored as one VTK data array object. These VTK mesh and data array objects are the fundamental chunks of mesh and variable data that database plugins deliver to VisIt during I/O operations GetMesh() and GetVar().

Conceptually, domains represent disjoint parts of the mesh. For example, for 3D meshes, this means that no two domains have a non-empty volumetric (3D) intersection. In other words, no mesh element (or zone or cell) is in more than one domain. However, because domains do abut, they can wind up sharing mesh nodes (and faces). Nonetheless, for purposes of ensuring continuity in computed results across domain boundaries (e.g. iso-contours or surface normals for shading), each domain may include extra mesh elements (or zones or cells), called ghost zones. This allows such computations to proceed without requiring each domain to always communicate with its neighbors whenever computations cross domain boundaries. Nonetheless, these ghost zones exist only for the purposes of reducing the need for communication and so do not figure in to the requirement that domains are disjoint.

Because domain subsetting defines the fundamental chunks of mesh by which scalable, parallel I/O operations between the file(s) and VisIt occur, the design of input databases domain subsetting structure is often dictated by I/O requirements of the data producer with little flexibility for adaptation by the data consumer. VisIt is often described as piggy-backing on the data producer's parallel domain decomposition.

In some limited situations, and so far only for structured grids, it is possible the domain decomposition can be dynamic. VisIt can effectively change the domain decomposition from whatever was generated by the data producer at the time it is deciding to read the file(s). This is most easily done for large, monolithic structured grids such as large images, large image volumes or large netcdf input files in which the input database is effectively one large, monolithic whole array. In these cases, VisIt can dynamically decide at run time how to decompose the input mesh into domains based on the number of MPI tasks it has available. Nonetheless, this decomposition is decided once and for all when the database is opened and it is fixed for the remainder of the VisIt session.

In the context of subsetting, domain subsets are almost of no meaning to the end user. In fact, they do not so much provide useful subsetting functionality as they do represent the initial source buckets of mesh entities out of which other subsets can be defined. On the other hand, because domain subsets typically represent the parallel decomposition of the mesh into pieces upon which the data producer operates with different MPI ranks, the domains subsets as well as their neighbor relationships in the parallel decomposition are often relevant to code developer users.

Group Subsets

Group is a very limited mechanism for defining subsets formed as unions of domains. The key limitations of VisIt's Group mechanism is that the subsets must be definable in terms of unions of domains and that there is no domain that is in more than one group.

Group subsetting data is delivered from a plugin to VisIt proper during PopulateDatabaseMetadata and is carried around in an avtDatabaseMetadata object. An integer array of length equal to the number of domains is defined where each entry specifies the group id for each domain. In this way, it is a labeling mechanism applied to domain entities to define subsets.

A good example of the use of VisIt's Group mechanism to define subsets is the representation of levels in AMR meshes. In AMR meshes, the domains are the individual AMR patches and the levels are simply groups of all patches at the same level. AMR patches are in fact not disjoint parts of the mesh and this would seem to violate the previously stated requirement that domains are disjoint. However, there are other conditions specific to AMR mesh patches that VisIt enforces such that the disjoint requirement of domains continues to hold true. First, in AMR meshes, patches overlap spatially only when they represent different resolution levels in the AMR hierarchy. So, VisIt includes logic to ensure wherever AMR patches overlap, spatially, the higher resolution patch wins out. Specifically, VisIt blanks out (e.g. VTK i-blanking) those portions of a domain that are overlapped by another domain at a higher resolution.

VisIt's Group abstraction is also frequently also used for structured grids for organizing various mesh pieces into larger IJK-logical indexing spaces.

Finally, in the current implementation and usage of group subsets in VisIt, there is permitted only one group decomposition of any mesh into subsets. There cannot be multiple different group decompositions of a mesh into subsets. Though there do not appear to be any technical or practical reasons why this restriction exists, it is currently a limitation of group subsetting.

Material Subsets

Material is a very different kind of subsetting mechanism than the two previously mentioned. In material subsetting, each mesh element (zone or cell) is labeled with a material identifier. Each mesh zone can exist in only one material. So, material subsetting represents a partition of the mesh. But, material subsetting is wholly orthogonal to domain and group subsetting. VisIt has a special operation called Material Interface Reconstruction (or MIR) which operates upon a mesh and breaks it up into the pieces defined by the various material identifiers.

Ordinarily, each mesh element is wholly contained in its material. However, VisIt's material subsetting feature does allow for partial inclusion where a single mesh element is fractionally included in multiple different materials. Nonetheless, the element's inclusion is still disjoint among the materials that contain the pieces of it. MIR algorithms determine how to break up the element based upon the volume fractions of each material into disjoint pieces. When elements are fractionally included in different materials in this way it is known as mixing materials.

Mixing materials are common in applications where materials are permitted to advect through the mesh enabling the application to decouple mesh motion from material motion and exert finer grained control over mesh relaxation.

Material subset data is delivered from a database plugin to VisIt proper during a GetAuxiliaryData() call with type AUXILIARY_DATA_MATERIAL.

Finally, in the current implementation and usage of material subsets in VisIt, there is permitted only one material decomposition of any mesh into subsets. There cannot be multiple different material decompositions of a mesh into subsets. Though there do not appear to be any technical or practical reasons why this restriction exists, it is currently a limitation of material subsetting.

Enumerated Scalar Subsets

Enumerated Scalars is a more recent addition of subsetting capability to VisIt. A scalar variable is defined on the mesh but is also highly quantized in the range of values it takes on. For example, there may be only a handful of different values in the variable's range. Each value defines a subset. The vtkEnumThreshold operator breaks a mesh up into pieces similarly to a threshold operator based on the scalar values on the mesh. In this way, a typical enumerated scalar is similar to a material subset. But, there are many differences too. First, while materials define subsets solely in terms of mesh elements, enumerated scalars can define subsets using any mesh entity type; nodes, edges, faces or zones. Next, unlike materials which must define a partition, enumerated scalars permit the same mesh entities to exist in multiple subsets. Finally, the different subsets of an enumerated scalar can be organized hierarchically though this feature is not well known and rarely used.

Finally, there can be any number of different enumerated scalar decompositions of a mesh into subsets. One enumerated scalar can define one group of related subsets. For example, one enumerated scalar can define a group of ExodusII nodesets. Another enumerated scalar can be used to define sidesets. However, a current restriction in their use internally in VisIt is that only one such decomposition can be actively operating on a mesh at any one time. In other words, a user cannot combine the effects of two different enumerated scalar decompositions in the same visualization.

Enumerated scalar subset data is delivered from a database plugin to VisIt proper by a GetVar() call in which the returned vtkDataArray is often either a vtkIntArray or vtkBitArray.

Subset GUI Controls (VisIt's Subset Window)

VisIt provides a GUI for users to control the display of subsets of the data. This is VisIt's subset controls window. This interface presents subsets to the user by name and organized into various categories (domains, materials, etc.). The user can then scroll through various lists of subsets which are often presented with meaningful names and turn on and off various subsets to be displayed in the visualization.

Other limitations with current implementation

The most significant limitation in the current implementation is that there can be only one domain decomposition (e.g. one bunch of subsets defining mesh domains), only one group decomposition, and only one material decomposition of any given mesh. If a database requires more than these three groups of subsets or requires that the subsets be more arbitrarily defined than these three mechanisms allow, the only option is to use enumerated scalar subsets.

Another significant limitation is that it is generally either very cumbersome or not possible to combine the effects of subsetting operations from different bunches of subsets. For example, it is not easy to combine a selection of domain and material subsets and display say "copper" on only domains 7-24. Likewise, it is not possible to combine a selection of subsets from one enumerated scalar definition with those of another enumerated scalar or with any other subsetting mechanism.

A key limitation in the subset controls user interface is that it operates solely on the basis of turning sets pre-defined by the database on and off. It does not allow for the creation of new sets in terms of existing ones or for the creation of new sets defined by the user either by explicit enumeration, by application of operators or by some external source (though I do not understand data selections I think they somehow related to this). Likewise, it does not operate on the basis of defining and then applying set expressions that represent the desired outcome in the visualization.

Finally, as we approach exascale where there could be on the order of 1 billion mesh pieces, in the current GUI the user may have to scroll through many pages of subset names to find or select subsets to be turned on or off. Likewise the GUI itself can wind up managing billions of Qt widget objects. In addition, internally in VisIt there is currently a list of domain subsets (a vector of integer identifiers that uniquely identify individual domains) that gets passed around and paired down as different operators can wind up changing it. This explicit list of domains represents a potential scalability issue. In a re-design, it will be preferred to encapsulate this in an iterator-like class where there is no need to explicitly instantiate a list of domains. Another related issue is that the current implementation aims to represent not only domain sets and material sets but the set of sets defined by the cross-product of domains and materials. Internally in VisIt, this is handled with something called the SIL Matrix class. But, there is in fact no underlying need for this and it leads to other unnecessary complexities.

Other features with subsetting consequences

VisIt supports a number of operators that permit a user to construct subsets of the input data. These include such operators as box, clip, index-select, threshold, iso-volume.

For example, if there exists in the input database a node- or zone-centered field defined on the mesh that contains specific constant values over specific pre-defined regions of the mesh, then the user can apply the threshold operator with this field to affect various subsets. This is very similar to how enumerated scalars work but with some additional limitations.

As another example, a user can use the index-select operator (only for structured grids) to construct subsets defined over the logical indexing space of a structured mesh.

There are some key limitations in the approach of using operators for defining subsets in this way is. First, for operators that operate on the range-space of a mesh field (e.g. threshold and iso-volume), the input database must have pre-defined on the mesh the fields needed to drive the operators. Next, for any of the operators, the user is required to define operator attributes that result in the desired subsets being constructed. In many cases, in order for the user to construct visualizations involving multiple subsets (and not just selecting one from among several), this requires applying different operator attributes to different instances of the same plot.

Another way of affecting some limited subsetting operations is to define different instances of the same mesh for commonly used subsets. For example, for Mili data, it is common practice to define the main mesh and a separate mesh called the free nodes mesh. The user then determines which subset to display at the moment s/he selects the particular mesh object in the GUI. For OpenFOAM data, an internal mesh is defined and a separate boundary mesh is defined that is simply the boundary of the main mesh.

Although these approaches do permit us to work-around some limitations in the current subsetting mechanisms VisIt provides, in general these methods are simply too cumbersome for users.

Tablularized Summary of Current Subsetting Features

The various subsetting modalities described above are summarized in the table below.

Modality	Common Uses	I/O Path	Limitations	Other Notes
Domain	Storage and I/O chunking	`vtkDataset` returned via a call to `GetMesh()` call	Only one domain decomposition allowed for each mesh	Impacts granularity of all I/O and parallel execution. Typically determined by data producer's file structure. Developer has little control to change.
Group	Unions of Domains	An integer array defined in `avtDatabaseMetadata` returned via `PopulateDatabaseMetadata`	Only one group decomposition allowed for each mesh. Must be a partition of domains.	Typically used for AMR levels and structured grid IJK-indexing over swaths involving many domains.
Material	Materials or alternative decomposition	An `avtMaterial` object returned from a `GetAuxiliaryData()` call	Only one material decomposition allowed for each mesh. Defined only over mesh zones. Must define a partition.	Partial, sub-element inclusion is permitted (e.g. mixing materials)
Enumerated Scalar	Arbitrary subsetting	A `vtkDataArray` returned from a `GetVar()` call	Cannot be combined in one visualization with domain, group, material or other enumerated scalar subsetting	Multiple enumerated scalar subsettings can be defined. Multiple inclusion in different subsets permitted. Subsets defined over any kind of mesh entity (node, edge, face or zone). Hierarchical subset structure supported.
Range-space Operators	ex. Threshold & Iso-Volume	User selected, pre-defined field returned from a `GetVar()` call	Requires pre-defined fields in input database. Subsets must define a partition only	Threshold interface permits combining effects from subsets of different type. No way to define operator attributes in input database and feed through to operators for more convenient UI.
Domain-space Operators	ex. Index-Select, Box, Onion-Peel	User selected spatial configuration applied to mesh returned from `GetMesh()` call	Subsets defined spatially or on logical indexing. Defining multiple subsets extremely cumbersome	No way to define operator attributes in input database and feed through to operators for more convenient UI.
Different GUI instances of same mesh	ex. Main Mesh, Boundary Mesh, Free nodes mesh	User selected, pre-defined mesh instances in GUI returned in `GetMesh()` call	In visualizations combining different object instances, difficult to combine any of the above subsetting approaches.	Extremely cumbersome to deal with except in relatively simple cases.

It is desirable to overhaul the internal subsetting mechanism in VisIt to remove existing limitations as well as add certain requested and desired features.

Species selections and their relationship to subsetting

In VisIt's current implementation of subsetting features, species selection is included. Species defines a finer composition of materials. It is easiest to understand species by example.

Consider the two materials, brass and steel. Neither brass nor steel are themselves pure elements on the periodic table. They are instead alloys of other (pure) metals. For example, common yellow brass is, nominally, a mixture of Copper (Cu) and Zinc (Zn) while tool steel is composed primarily of Iron (Fe) but mixed with some Carbon (C) and a variety of other elements. For this example, lets suppose we are dealing with Brass (65% Cu, 35% Zn), T1 Steel (76.3% Fe, 0.7% C, 18% W, 4% Cr,1% V) and O1 Steel (96.1% Fe, 0.90% C,1.4% Mn, 0.50% Cr, 0.50% Ni, 0.50% W). Since T1 Steel and O1 Steel are composed of different elements, we think of each type of steel as a different material. Material subsetting would define 3 subsets, one for each of of Brass, T1 Steel and O1 steel. Species would then define the further decomposition of these materials into their other components.

Applications can then define mesh variables that are specific not only to each material but also to each of the species components. The combined effect of a variable over all species in a given material is a sum (super-position) of the effect on each species-specific part of the variable. When users wish to display variables that are specific to species components, they wish to be able to turn on and off various species components in any given visualization and see the effect of including or eliminating a given species in the sum. In a typical pseudocolor visualization, the colors will change as species are turned on an off because the variable values being displayed vary accourding to the terms included in the sum. Is this the same the same kind of operation as subsetting operations defined above?

IMHO, species sub-selection is not the same thing as the other subsetting operations we have discussed above. Subsetting operations are aimed at breaking up the domain of a field while species sub-selection is aimed at breaking up the range of a field. As a species selection is varied, does the part of the mesh we wish to display also vary and does the user want to see the mesh part (domain) vary or only the field values (range) part vary? We could decide to remove those parts of the mesh where a given species selection results in zero values for the visualized field. However, we do not currently do that nor do users require or expect that. The sole expectation is for a field's value over the mesh to vary as species selection is varied.

Nonetheless, in the current implementation of VisIt's subset controls GUI, species selection is also included there and as a result, it has been included in the internal subsetting machinery (Subset Inclusion Lattice or SIL classes). We should re-consider whether continuing with this design is appropriate or should be changed.

What about Data Selections?

I am aware of some special feature in VisIt called Data Selections which involves in some way external text files to help specify selections (almost like Adobe Photoshop's selections abstraction). However, I am totally unfamiliar with this feature and its relationship to subsetting.

Notes on current implementation

The current implementation of subsets involves several parts

avtDatabaseMetadata and siblings such as avtMeshMetadata, avtMeterialMetadata, etc.
- Used to define the names of domains, materials, groups, enumerated scalar sets
- Constructed upon opening a database in a plugin's PopulateDatabaseMetadata.
- In cases where subset composition of the database does not vary with time, there are currently optimizations in VisIt to avoid attempting to re-read and re-construct subset knowledge upon changes in timesteps. However, this optimization has proven robustness issues and it is unclear if it will continue to be necessary. This is the HasInvariantSIL() method on the avtFileFormat classes.
avtSIL classes and siblings
- SIL is an abbreviation for Subset Inclusion Lattice (SIL), which is a graph-like data structure for representing the subset structure of a database.
- A key complication is that there is a single avtSIL object for a whole database. If a database contains multiple meshes, all the subsetting structure from all the meshes is combined and co-mingles in a single avtSIL object.
- There are optimizations for specific kinds of subsetting structures (avtSILArray and avtSILMatrix) which are likely not necessary in the new implementation.
avtSILGenerator class
- This class basically transforms the subset metadata descriptions that come from a database plugin in avtDatabaseMetadata and constructs an avtSIL object for all the meshes in the input database.
- There are only a handful of common cases the avtSILGenerator class can generate an avtSIL object for. An alternative is for a database plugin to generate the avtSIL object directly but this is presently not done.
avtSILRestriction
- Class that represents a restriction (e.g. indicating which subsets are off and on) of the relevant subsets in a given visualization. The SIL is restricted either explicitly by a user's inputs or indirectly by operator and load-balancing actions.
avtSILRestrictionTraverser
- A class whose primary role is to determine which domain sets are involved for any given SIL restriction but also helps to answer other questions about the SIL including such things as which material(s) are on, which enumerated scalar sets are on and for enforcing on and off state of sets throughout the SIL hierarchy.
SILRestrictionAttributes and its compact variant, CompactSILRestrictionAttributes
- These are the serializiable attributes object representations for SILs that VisIt passes between executables when subset selections are changed by the user and need to be communicated between GUI, mdserver, engine and viewer.
- A memory issue with an avtSILRestriction is that it contains a the SIL it is restricting (
- The compact variant is an optimization that codifies a SIL restriction as an array of unsigned chars and relies upon the fact that sender and reciever already have the underlying SIL object with which the restriction is associated.
GetMesh() method of a database plugin
- returns a vtkDataSet for a piece of mesh, that is a domain subset
GetVar() method of a database plugin
- returns a cell or point vtkDataArray intended to map 1:1 with the nodes or zones of the mesh object returned by the GetMesh() method.
- represents a node- or zone-centered variable defined on the mesh
- In the context of subsetting, GetVar() is relevant when it returns an enumerated scalar variable.
GetAuxiliaryData() method of a database plugin
- returns a avtMaterial object when the requested auxiliary data type is AUXILIARY_DATA_MATERIAL.
- VisIt's avtMaterial object is nearly identical to Silo's DBmaterial struct
The subset controls window in the GUI
- Instantiates a QTreeWidgetItem for every set in the database. This could pose a scalability issue for database containing millions of domain subsets.
SILRestriction python object and methods to manipulate it in the CLI
- Although we will introduce a new CLI interface, in the short term, we must maintain compatibility with this older interface.
- It may be necessary to maintain this older interface indefinitely though maybe with some warning messages to encourage other developers to update it.
Subset restriction operators
- Currently, these operators do not integrate well together. Part of the re-design will involve addressing that limitation.
- Domain filtering operator (based on spatial and data extents)
  - Eliminates domains from consideration based on bounds (an optimization that is presently not often used).
- Load balancing operator
  - Exists to decide which domains should be processed by which MPI tasks
- Material Interface Reconstruction (MIR)
  - This operation is significantly complicated by the need to deal with partial inclusion (e.g. mixing materials). Otherwise, for clean materials, this operation can be handled just like vtkEnumThreshold operator. But, restrictions are limited to mesh zone entities only.
- vtkEnumThreshold operator is capable of operating on restrictions involving node, edge, face and volume mesh enities.

Design Goals of New Subsetting Support

Scalability in numbers of subsets supported
- Note that this implies scaling in problem size as well due to fact that domains are one kind of subset to be supported
Common subsetting infrastructure internally in VisIt supports all different categories of subsetting
Arbitrary, user-defined collections of subsets
Hierarchical relationships between subsets within or between different categories
Multiple independent decompositions of a mesh into material subsets or domain subsets, etc.
Ability to combine effects of subset selection between different categories of subsets (e.g. display nodeset 7 on domains 5 & 19)
Support subset entity enumeration internally in native datatype of the plugin.
Support both labeling and listing methods of subset enumeration
Subset Selection Expressions; boolean set expressions that define what is and is not to be displayed
Ability to define new subsets on-the-fly by entity enumeration
- Directly entered by the user or constructed via various enumeration constructors
- Produced and saved from various operators applied to the mesh
- Held in external files via some other external means (e.g. input deck nodelists)
Internal simplification and unification of the manner in which subset knowledge is handled and managed within VisIt
Handling subset structures that vary with time automagically, without special work from either VisIt or a database plugin.
Support time-varying subset structures efficiently without VisIt having to assume and/or the plugin specifying that it is invariant or not.

Subset Selection Expressions (SSEs)

A key element in the new subsetting mechanism is that it will operate in terms of expressions involving subsets. Users will use Subset Selection Expressions or SSEs to define the part(s) of the mesh to display in any given visualization. The current Expressions window will be enhanced with a new tab for SSEs. The old check-box based (SIL) interface which permits users only to turn sets on and off will continue to be supported in the same way the old Selected Files interface is supported.

The existence and names of sets will come primarily from the database just as the existence and names of variables does now. However, users will also be permitted to construct new sets from existing sets just as is currently possible with variables.

The user will be able to define new sets in one of the following ways...

Creation of new sets by subset selection expression (SSE) where new sets can be defined in terms of existing sets.
Creation of new sets by listing of mesh entities (e.g. lists of integers identifying nodes, edges, faces or volumes)...
- Explicitly entered by the user directly in the GUI and CLI
- Implicitly derived from application of operators such as box, clip, iso-volume, onion-peel, etc.
- Explicitly defined in some external data source (e.g. files) containing lists of mesh entities.
  - This may be related to the current notion of data selections

SSEs will involve unions, intersections and differences as well as optimized variants of these operators for commonly used cases. For example, the current approach to turning off some domains, say 1,5 and 6, by checking each domain's box in the GUI, becomes the set expression Universe - Union(domains, 1,5,6)

Just as databases can define variable expressions (e.g. Silo's DBPutDefvars()), so will they be permitted to define SSEs similarly.

Illustrative Examples

Consider, for example, a database which includes a mesh composed of 6 domains (D0, D1, ..., D5) and 3 materials ('C'opper, 'S'teel, 'P'lastic) and two nodesets (Ns0, Ns1).

If the user wished to display domains D[0-2], in the current SIL controls window, s/he would uncheck domains 3,4 and 5. In the new subset selection express (SSE) approach, the user will still be able to use the old check-box approach as the current SIL controls widow does now. Or, the user will be allowed to define a subset selection expression the result of which will be displayed. In this case, the user could define the expression (D0+D1+D2) where + is the set-union operation. This expression says to display the union of domains 0,1 and 2. Alternatively, the user could define the expression U-(D3+D4+D5) where U is the universe (e.g. everything) and the- operator is the relative compliment operator. This expressions says to display everything (U) except (-) domains 3, 4 and 5.

Expressions involving just domains are simple to understand and introspect. For example, in many cases internally in VisIt code, a given plot or operator needs to ask questions about the involved subsets; is domain 2 on or off. That is, is domain 2 being used in the current expression. In the expression, (D0+D1+D2) a simple inspection of the expression indicates domain 2 is on.

This is somewhat less obvious with the expression U-(D3+D4+D5). A simple inspection of the expression tells us D2 doesn't appear in the expression. But, that really doesn't mean D2 is off. In fact, a simple inspection of the expression is really not the correct way to answer the question, is domain 2 on?. In general, to answer questions like this about the SSE, the expression needs to be evaluated. But, it needs to be evaluated in terms that are relevant to the question being posed. If the question is one about domains, then the expression needs to be evaluated in terms of domains. If the question is one of materials, then the expression needs to be evaluated in terms of materials. Lets consider some examples...

In the SSE=U-(D3+D4+D5), to determine if domain 2 is on, we need to determine if there is a non-zero intersection between domain 2 and the current SSE. That is, is D2*SSE==0 where * is the set-intersection operator. To do this, we need to substitute every non-domain term in the SSE with its equivalent in terms of domains. That is easy for U. U is just the set of all domains (D0+D1+D2+D3+D4+D5). So, we have...

SSE=U-(D3+D4+D5)
U=(D0+D1+D2+D3+D4+D5)
SSE=(D0+D1+D2+D3+D4+D5)-(D3+D4+D5)
SSE=(D0+D1+D2)
D2*SSE=D2*(D0+D1+D2)=D2!=0

Since D2*SSE!=0, we can say yes domain 2 is on.

However, next consider the SSE, C+Ns1 where C is the copper material subset and Ns1 is a nodeset subset. In this SSE, is domain 2 on? As before, we need to evaluate this SSE in terms of domains. So, we need suitable substitutions for both C and Ns1 in terms of domains. Up until now, VisIt has never had features that would, for example, indicate on which domains a given material existed. For that reason, VisIt has always operated assuming every material exists on every domain. Likewise for a subset like a nodeset, Ns1. That would mean that all we can say for sure about C or Ns1 in terms of domains is

C<=U ==> C<=(D0+D1+D2+D3+D4+D5))
Ns1<=U ==> Ns1<=(D0+D1+D2+D3+D4+D5)

Therefore...

C+Ns1<=(D0+D1+D2+D3+D4+D5)

where <= is the is contained in relation. Thus, we know C is contained in the Universe and Ns1 is contained in the Universe and therefore so is their union, C+Ns1 which is the current SSE we need to evaluate. Now, does this help us evaluate the SSE in terms of domains? Well, yes and no. Since the relations are not equivalences, we cannot say, SSE=U+U=U=(D0+D1+D2+D3+D4+D5). However, we can say that whatever C+Ns1 is equivalent to, it is contained in U. So, SSE<=(D0+D1+D2+D3+D4+D5) and whatever domains we then need for the RHS, we can assume they are needed for the LHS. This may result in our using more domains than are truly needed but we cannot do any better without detailed knowledge of which materials exist on which domains. Finally, since SSE<=(D0+D1+D2+D3+D4+D5), intersecting both sides with D2 we see that the RHS is non-empty and so, indeed, domain 2 is on in this SSE.

Next, lets consider a slightly more complicated example, (D1+D2+D3)-C. This expression says we want to see everything that is on domains 1,2 and 3 but that is not copper. How do we evaluate the SSE on domains in this case...

SSE=(D1+D2+D3)-C
C<=U ==> C<=(D0+D1+D2+D3+D4+D5)

The problem here is that knowing that C is contained in U doesn't really help evaluate this SSE because C is being sort of subtracted here (via relative compliment). Now we can adjust the equations a bit...

(D1+D2+D3)-C=(D1+D2+D3)*~C where ~ is the set-compliment operator.

Now, by simple reasoning, we can say...

~C<=U. In other words, everywhere copper does not exist is also contained in the Universe.

So, we have...

SSE=(D1+D2+D3)-C=(D1+D2+D3)*~C<=(D1+D2+D3)*(D0+D1+D2+D3+D4+D5)=(D1+D2+D3)

Is domain 2 on?

SSE*D2=(D1+D2+D3)*D2=D2!=0

So, yes, domain 2 is on.

Support for Knowledge of Cross-Intersections

Up until now, in evaluating SSEs, we have assumed we do not know anything about the relationships between sets in different categories. In particular, we do not know which domains contain, at least some of, a given material. In this section, we consider how the SSE evaluation process could be improved if such information was available. Our intention is to make it possible, though optional, for a database to provide inter-category cross-intersection information as an optimization for supporting SSEs.

For example, for each domain and each material, the database plugin could inform VisIt of all the pair-wise intersections of Di*Mj

Di*Mj=0, empty, no part of Mj exists on Di
Di*Mj=Mj, Mj is wholly contained in Di
Di*Mj=Di, Di is wholly contained in Mj
Di*Mj=X, part of Mj contains part of Di

Suppose we have the following arrangement of domains and materials.

The database plugin could, optionally, for example as the result of a GetAuxiliaryData query, return a table of cross-intersections like so...

	Copper	Steel	Plastic
D0	DO	0	0
D1	X	X	0
D2	0	X	X
D3	X	X	X
D4	0	X	X
D5	0	0	D5

From the table reading down each column we know the following...

C<=(D0+D1+D3) and ~C<=(D1+D2+D3+D4+D5)
S<=(D1+D2+D3+D4) and ~S<=(D0+D1+D2+D3+D4+D5)
P<=(D2+D3+D4+D5) and ~P<=(D0+D1+D2+D3+D4)

These column-wise contains expressions are useful for substituting material subsets, C, S and P with an upper bound on the set of domains that contain the material. In other words, we can use these expressions to convert material subsets to domains. Without this information, the best we can assume is that each material is contained in the union of all domains or the Universe. With these expressions, in some cases we can significantly reduce the number of domains we'll need. For example, for the Copper material set, C, we know we need only half of the domains, (DO+D1+D3).

Next, reading across each row, we know...

D0<=C
D1<=(C+S)
D2<=(S+P)
D3<=(C+S+P)
D4<=(S+P)
D5<=P

These row-wise contains expressions are useful for substituting domain subsets for materials when we need to evaluate the SSE for materials.

The rules for constructing the above is-contained-in relations is simply to exclude every box that has a 0 (empty set). So, reading down the Copper column, we excluded D2 and D5. Likewise, reading across the D5 row, we excluded Copper and Steel.

Lets see how we might use this information to improve our evaluation of some SSEs. First, lets consider the very simple SSE=C and we want to know if domain 5 is on. This SSE says to display just the Copper material. Since C<=(D0+D1+D3), we can say that the SSE<=(D0+D1+D3) and since D5*(D0+D1+D3)=0, we can say domain 5 is off in this SSE whereas without the cross-intersection information we would not have had enough information to know domain 5 was not necessary to display just the Copper material.

Likewise, given the SSE=(D4+D5)*Ns1, we can determine if material C is on or off using the same approach except substituting all non-material terms for their material upper bounds. So, we would have...

SSE=(D4+D5)*Ns1

Using upper bounds for D4 and D5 from the cross-intersection table, we have...

SSE<=((S+P)+P)*Ns1

Using the default upper bound for Ns1, Ns1<=U

SSE<=((S+P)+P)*U=(S+P)+P=S+P
SSE*C<=(S+P)*C<=0

Therefore, we know material C isn't involved in the portion of the mesh this SSE defines.

Now, it turns out that we are rarely interested in whether a given SSE winds up not needing a given material because we only want to know when materials are explicitly turned on or off by the user or are otherwise necessary to cull out individually for purposes of a MIR operation. So, in the above example, the fact that material C is not involved should not result in VisIt performing MIR.

General Algorithm for Querying an SSE for on/off State of a Set

What is most important derive from this discussion is the basic algorithm for evaluating SSEs to answer questions about which sets are involved. To determine if a set of a given class (e.g. material, domain, nodeset, part, patch, etc.)...

For each set in the SSE not in the given class, find either an equivalence substitution if one is available or a contains-in substitution (upper bound) which by default is always Universe, to put all terms in the SSE in sets of the given class.
Evaluate the resulting SSE intersected with the set being queried
If the result is non-empty, the queried set is on. Otherwise, it is off
We should consider optimizations when many sets on/off state need to be queried
- One simple optimization is to evaluate the SSE without the final intersection term and then perform only the intersection evaluation for each set to be queried.

Subset Metadata

A new class, avtSubsetsMetaData is used to define groups of related subsets. Typically, a database will define an avtSubsetsMetaData object for each kind of subset grouping needed. For example, there will be one for domains, one for materials, etc. So, this one class is designed to support knowledge of all kinds of subsets; domains, materials, nodesets, sidesets, element blocks, groups, files, element blocks, AMR patches and levels, parts in an assembly etc., etc. This one class supports knowledge of partial inclusion (mixing material case) where necessary. It supports hierarchical (graph-like) organizations of subsets, multiple decompositions of the mesh into different categories of subsets such as materials or parallel decompositions. It will support explicit names of subsets as well as nameschemes. It will also support coloring assignment to subsets.

Currently, this class is also designed to support knowledge of empty-ness or non-empty-ness of intersections between sets in different groups. This can be used, for example, to indicate which materials exist on which domains and, as necessary, vise-versa. It can also be used to indicate which AMR patches exist on which levels as the current grouping mechanism does. As currently designed, storing this information in avtDatabaseMetaData likely represents a scalability issue and we'll need to use an approach akin to GetAuxliaryData for this potentially large data. The same is true for hierarchical relationships between subsets such as in an assembly.

An avtSubsetsMetaData object involves the following members

string catName: name of the category of the group of subsets this object defines
enum catRole: the role of the subsets in the mesh (e.g. domain, material, boundary, etc.)
int catCount: the number of subsets of the group of subsets this object defines
enum decompMode: indicates if the group of subsets covers the whole mesh, partitions it or neither.
namescheme: how subsets are named (optional)
colorscheme: how subsets are colored (optional)
cross-intersections: empty-ness of intersections with sets in other categories (optional)
graph-edges: how subsets are organized hierarchically (optional)

A database will plugin will construct one avtSubsetsMetaData object for each kind of group of subsets it needs to define. For typical cases of Silo data, there will be one for domains, materials, groups, as well as one for each enumerated scalar.

Subset Details; Problem-sized Data

Subset details are the problem-sized lists of mesh entities each subset contains. This is a new kind of object, avtSubsets, database plugins will have to serve up in response to requests from VisIt for subset details.

As mentioned above, there are two fundamentally different ways to represent subset details. One is as a set of lists of mesh entities. Another is as a vtkBitArray defined over a specific class of mesh entities (nodes, edges, faces or volumes) that indicates for each entity which set(s) it is contained in. Although the latter approach is attractive because it looks a lot like how GetVar() currently operates, there are storage concerns in cases of large numbers of sets and/or sets that contain only a small number of mesh entities that make it unattractive.

With the exception of subset categories of role domain, VisIt will expect to obtain the details of subsets by requesting an avtSubsets object. Subsets of role domain will be obtained via a GetMesh() call as they always have. In the case that a database offers multiple different decompositions of the mesh into domains, some options are

Adjusting the GetMesh() interface to accept an additional argument indicating which decomposition
- This seems over-kill especially when I am familiar with only a few cases of this now.
defining an active decomposition
organizing of domain identifiers such that domains in different decompositions represent contiguous segments of the linear address space of positive integers.

Consider for example how subset details are currently handled for materials or enumerated scalars. A single I/O request to the database plugin returns all the details for all the subsets in the given category. In the short term, this is probably an acceptable approach for the new subsetting mechanism. However, I am aware of meshes in reactor modeling that involve hundreds of thousands of assembly subsets. In that case, it may not be appropriate to require a single avtSubsets object to store all details for all subsets. Some apriori design for this possibility may be appropriate.

In the current design, we expect to handle avtSubsets as an auxiliary data query instead of introducing a new top-level plugin method. This is consistent with how materials are handled and is consistent with how non-VTK type objects are currently returned from plugins.

Application of SSEs to Plots

Phase 0 Implementation

Replace all internal infrastructure with the new subsetting machinery but externally (GUI, CLI, database plugins, etc.), nothing changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly