Applying Hindley-Milner to unit-checking #3491

qlambert-pro · 2024-03-19T15:04:45Z

This is fairly complete proposal for unit-checking based on the Hindley-Milner inference algorithm. This corresponds to the implementation in Wolfram System Modeler.

Co-authored-by: Henrik Tidefelt <[email protected]>

HansOlsson · 2024-03-20T10:48:55Z

I will have to study it in more detail - and in particular would want to first consider just the definition of the unit-checking and then consider the variants for adding units.

For the unit-checking I noticed that several functions were not considered (but could trivially be added - Dymola already supported these checks except cat - and some part of array-handling):

cross - treated as multiplication
interval - return "s"
hold, sample, subSample, superSample, previous, fill, skew, noEvent, abs, pre, promote where the unit of the result corresponds to the unit of the first argument (and other arguments are ignored for unit-checking)
{}, [], array (and sort of scalar, vector, matrix) (assuming we don't scalarize first) where all arguments must have the same unit and the result then also has this unit
similarly for cat ignoring first argument
smooth where the unit corresponds to the unit of the second argument (and other arguments are ignored for unit-checking)
sqrt should be treated as power with rational exponent 1/2; I also find the general restrictions on the power operator somewhat lacking

And something for min, max, transpose, diagonal, sum, product.

Dymola also treats the different transcendental functions differently regarding "rad" vs. "1"; even though those differences are later ignored.

qlambert-pro · 2024-03-20T10:55:08Z

It was a conscious choice not to include an exhaustive list of builtin functions as, as you said, they can be trivially added later if we agree on the base mechanism. Similarly, for arrays, previous attempt at standardizing unit-checking have focus on scalar expressions, but as mentioned in the document it can be generalized to element-wise operations on arrays. And we are confident that it will work with matrix operations too, but we don't have a proof of concept for that last part.

DagBruck · 2024-03-21T06:58:18Z

Three comments on the proposal.

A number of examples would help explain a lot what it does, what are valid cases and what are not.
I would like to see any such scheme applied to some real-world libraries (not only MSL), and what the impact would be.
What does this proposal do that existing tools cannot do?

In general, I think this is the kind of extension that is best formalized through a layered standard, as there is not change for correct models.

HansOlsson · 2024-03-21T08:14:00Z

As far as I understand there are three parts to this (now that we added fractional powers in the unit-grammar):

The unit-constraints (such that a,b,c must have the same unit in c=a+b;; and the more complicated rule for multiplication). They are important, and although fairly clear there are some minor issues to discuss regarding the details.
How to handle unknown units and literals.
That we solve for the unit of variables one at a time (as a I understand Hindley-Milner is just a smart way of doing that).

Regarding (1): Of which unit-constraints are active. There are some minor details of which unit-constraints are active that are important (only active equations?, and clearly allowing different units for different instances of the class corresponding to "let-polymorphism"). Note that these details are important if we are considering storing the deduced units in the model (only possible for the current model, and even then it can be problematic). Having some way of de-activating unit-checking per equation (or model, package) also seems needed in practice, but is just a minor detail.

Regarding (2): The handling of literals is the major new change; and originally Dymola's unit-check had literals as total wild-cards which wasn't good triggering this MCP, whereas requiring units for every literal would be too much work for users for too small a benefit.

Regarding (3): One alternative for solving for units one variable at a time is to view the unit-constraints as a large (over and under-constrained) system of equations in rational numbers; another is to have more ad-hoc rules (so that we have one set of rules for checking units and another for reducing - we would then need to ensure consistency etc). Having some restriction on when to deduce (like for (global) constants can still be added to the current idea.

To illustrate the alternative of system of equations consider two problematic equations:

k*der(x)=-x;
cos(y*z)+cos(y/z)=f(time)

Assuming no units are given the approach with solving for variable-units one at a time cannot find any units.

With unit-constraints as equations we would for the first equation have:
unit.k+unit.x-[s]=unit.x
which gives: unit.k=[s], and no constraint for unit.x

And for the second equation:
unit.y+unit.z=[1]
unit.y-unit.z=[1]
which gives: unit.y=[1], unit.z=[1].

Obviously this alternative would be more powerful, but:

I'm not sure exactly how to efficiently implement the equation solving (well, just use a sparse solver...), especially considering the next item:
Diagnostics will be harder to understand when it works; instead of saying that we got the unit of x from expression x...y where y had unit ... we now get units for multiple variables from multiple expressions, making it harder to understand
Diagnostics when it fails will be harder still (will be discussed later)
I don't know if there are realistic use-cases where it will give a significant gain. The goal isn't that people write large models with 100+ variables and equations and then add the unit for one variable and everything is magically handled, but more that perhaps 80% of variables have unit and the algorithm checks for mistakes and fills in the rest.

Regarding diagnostics when it fails I would more say it is a matter of finding the possible problematic equations, and in my experience analyzing that even for a handful of equations takes care - and domain knowledge; whereas this other method seem designed for a large number of equations.

HansOlsson · 2024-03-21T10:50:47Z

For the power operator I would just split it based on the type of the exponent:

Integer: Integer-power even if not evaluated
Real: Treat the same way as: exp(ln(x)*y); - so both must have unit="1".

qlambert-pro · 2024-03-21T11:50:02Z

For the power operator I would just split it based on the type of the exponent:
* Integer: Integer-power even if not evaluated

What would be the unit of y in the following models then?

model Test
  Real x(unit = "m");
  Integer i;
  Real y = x^i;
initial equation
  i = 2;
equation
  when (time>0.4) then
    i = 3;
  end when;
end Test;

model Test2
  Real x(unit = "m");
  parameter Integer i = 1;
  Real y = x^i;
end Test2;

For context in the current state of the proposal, we would consider both models invalid because x doesn't have unit "1".

AHaumer · 2024-03-21T14:12:09Z

@qlambert-pro I would consider your example an ill-conditioned model without physical meaning and rewrite it:

model Test
  Real x(unit = "m");
  constant Real x0(unit = "m")=1;
  Integer i;
  Real y = (x/x0)^i;
initial equation
  i = 2;
equation
  when (time>0.4) then
    i = 3;
  end when;
end Test;

HansOlsson · 2024-03-21T16:11:33Z

@qlambert-pro I would consider your example an ill-conditioned model without physical meaning and rewrite it:
model Test
  Real x(unit = "m");
  constant Real x0(unit = "m")=1;
  Integer i;
  Real y = (x/x0)^i;
initial equation
  i = 2;
equation
  when (time>0.4) then
    i = 3;
  end when;
end Test;

I can see this - and it would be similar to figuring out the unit for Real z=if time>=0.4 then x else x*x;

gwr69 · 2024-03-21T17:44:05Z

@qlambert-pro I would consider your example an ill-conditioned model without physical meaning and rewrite it:
model Test
  Real x(unit = "m");
  constant Real x0(unit = "m")=1;
  Integer i;
  Real y = (x/x0)^i;
initial equation
  i = 2;
equation
  when (time>0.4) then
    i = 3;
  end when;
end Test;
I can see this - and it would be similar to figuring out the unit for Real z=if time>=0.4 then x else x*x;

This all makes probably little sense physically, but for cybernetic model parts I wonder how to easily lookup a magnitude with implicit unit = "1"? It looks cumbersome to have to use some constant with identical units to make expressions dimensionsless quickly, if you know what you are doing.

AHaumer · 2024-03-21T17:58:02Z

@gwr69 I'm an engineer, to me it's not cumbersome. I prepare my equations from textbook or from my own research before I start modeling. As long as equations are not meeting in units they're bad and not fit for modeling. In some cases - even from a physical point of view - you have to use dimensionless variables, i.e x/x_ref. In many cases x_ref has a meaningful value different from1.

gwr69 · 2024-03-21T18:41:16Z

@gwr69 I'm an engineer, to me it's not cumbersome. I prepare my equations from textbook or from my own research before I start modeling. As long as equations are not meeting in units they're bad and not fit for modeling. In some cases - even from a physical point of view - you have to use dimensionless variables, i.e x/x_ref. In many cases x_ref has a meaningful value different from1.

@AHaumer I always envy the engineers and how they use 7 base units for everything. Coming from economics and business modeling, I would certainly like to apply the rigorous procedures and unit checking that you describe—and to which I whole heartedly agree—to more abstract fields than engineering. Unfortunately, there are no monetary units in Modelica and units of time typically end with the "day" ignoring that even in scientific realms years are frequently used, even though they are typically variable units of time.

Of course, modeling in those abstract and "soft" realms will appear simple, if you only allow yourself the use of 1 for counting. But how to build models for predator-prey dynamics where the nonlinear terms call for dimensionless rates per predator or per prey and give them proper units different from 1 as there may quickly be equation errors if you neglect those differences? This is what I meant with "cumbersome" and sometimes one may need to strike a careful balance.

qlambert-pro · 2024-03-22T06:05:07Z

I wonder how to easily lookup a magnitude with implicit unit = "1"

This proposal helpfully provides an operator just for that purpose:
withoutUnit(expr, "unit")
It outputs the magnitude of expr after conversion from its unit to "unit". Which still provides you with unit-safety, but allows you to slip in a unit less world when you need it.

qlambert-pro · 2024-03-22T06:09:36Z

I can see this - and it would be similar to figuring out the unit for Real z=if time>=0.4 then x else x*x;

Which this proposal treat a similar way, it requires both side of an if whose condition hasn't been evaluated to have the same unit.

MartinOtter · 2024-03-22T08:40:01Z

I find it sub-optimal if a particular algorithm ("Hindley Midler") is required in the specification: To my knowledge, a concrete algorithm was never required in the Modelica Specification, simply because there might be different algorithms with pros and cons and the tool vendor should decide which algorithm to use.

Please, add examples to the proposed specification, as already mentioned.

If better unit-checking is introduced, it must be introduced in a way, that existing models can still be compiled and simulated, otherwise, people will be very upset. From a tool perspective, there should be an option to either ignore unit checking, give a warning or an error (giving a warning as a default). From my experience with Julia this is really essential, because very strict unit checking (as in Julia) can be a complete nightmare for a modeler who just wants to quickly produce a result and does not have time to search for a unit issue. The question is, how to define this in the specification, in order that unit checking is optional.

MartinOtter · 2024-03-22T08:43:46Z

As far as I understand there are three parts to this (now that we added fractional powers in the unit-grammar):
1. The unit-constraints (such that a,b,c must have the same unit in `c=a+b;`; and the more complicated rule for multiplication). They are important, and although fairly clear there are some minor issues to discuss regarding the details.

2. How to handle unknown units and literals.

3. That we solve for the unit of variables one at a time (as a I understand Hindley-Milner is just a smart way of doing that).
...
Regarding (3): One alternative for solving for units one variable at a time is to view the unit-constraints as a large (over and under-constrained) system of equations in rational numbers; another is to have more ad-hoc rules (so that we have one set of rules for checking units and another for reducing - we would then need to ensure consistency etc). Having some restriction on when to deduce (like for (global) constants can still be added to the current idea.

To illustrate the alternative of system of equations consider two problematic equations:
k*der(x)=-x;
cos(y*z)+cos(y/z)=f(time)
Assuming no units are given the approach with solving for variable-units one at a time cannot find any units.

With unit-constraints as equations we would for the first equation have: unit.k+unit.x-[s]=unit.x which gives: unit.k=[s], and no constraint for unit.x

And for the second equation: unit.y+unit.z=[1] unit.y-unit.z=[1] which gives: unit.y=[1], unit.z=[1].

Obviously this alternative would be more powerful, but:
* I'm not sure exactly how to efficiently implement the equation solving (well, just use a sparse solver...), especially considering the next item:

* Diagnostics will be harder to understand when it works; instead of saying that we got the unit of x from expression x...y where y had unit ... we now get units for multiple variables from multiple expressions, making it harder to understand

* Diagnostics when it fails will be harder still (will be discussed later)

* I don't know if there are realistic use-cases where it will give a significant gain. The goal isn't that people write large models with 100+ variables and equations and then add the unit for one variable and everything is magically handled, but more that perhaps 80% of variables have unit and the algorithm checks for mistakes and fills in the rest.
Regarding diagnostics when it fails I would more say it is a matter of finding the possible problematic equations, and in my experience analyzing that even for a handful of equations takes care - and domain knowledge; whereas this other method seem designed for a large number of equations.

I think the answer is simple from your analysis: Use only the units for one variable at a time, because simpler and better diagnostics. If the user is not satisfied with unit propagation, he just has to add more units at some places. Its much more critical if it is hard to figure out why a tool has selected a unit based on the analysis of many equations

qlambert-pro · 2024-03-22T08:49:13Z

I find it sub-optimal if a particular algorithm ("Hindley Midler") is required in the specification: To my knowledge, a concrete algorithm was never required in the Modelica Specification, simply because there might be different algorithms with pros and cons and the tool vendor should decide which algorithm to use.

Making the Hindley-Milner algorithm a requirement of the specification was never our intention, but it relates to a state of the art type-system, which shares a lot of properties with the way we want unit to work in Modelica. We felt the need to spell it out in this document for two reasons:

To demystify the way it works and to allow us to talk about constraint while being clear about this constraints lead to unit inference and checking
Because the type system in its usual form is, effectively, impaired to allow models without units (e.g. HelloWorld) to still work, and this comes with notable difference.

HansOlsson · 2024-03-22T08:56:10Z

I find it sub-optimal if a particular algorithm ("Hindley Midler") is required in the specification: To my knowledge, a concrete algorithm was never required in the Modelica Specification, simply because there might be different algorithms with pros and cons and the tool vendor should decide which algorithm to use.

Making the Hindley-Milner algorithm a requirement of the specification, but it relates to a state of the art type-system, which share a lot of properties with the way we want unit to work in Modelica. We felt the need to spell it out in this document for two reasons:

To demystify the way it works and to allow us to talk about constraint while being clear about this constraints lead to unit inference and checking

Because the type system in its usual form is, effectively, impaired to allow models without units (e.g. HelloWorld) to still work, and this comes with notable difference.

But consider the following alternative (not efficient):

Do
- Go through all unit-constraints in some order (using the same rules to deduce units from individual equations/expressions). If we can uniquely deduce the unit for one variable (or one expression) based on one equation/sub-expression do that.
While (some progress)

Will this give the same result? (Except for error diagnostics when it fails.)

qlambert-pro · 2024-03-22T09:05:57Z

If better unit-checking is introduced,

There is no such thing as unit-checking in the specs. This is literally what this MCP is about.

qlambert-pro · 2024-03-22T09:08:42Z

I think the answer is simple from your analysis: Use only the units for one variable at a time, because simpler and better diagnostics. If the user is not satisfied with unit propagation, he just has to add more units at some places. Its much more critical if it is hard to figure out why a tool has selected a unit based on the analysis of many equations

One could also write:
If the user is not satisfied with unit diagnostics, they just have to add more units at some places. Its much more critical if it is hard to know in what unit the source of a signal is expressed, as it can lead to silent numerical errors.

qlambert-pro · 2024-03-22T09:13:06Z

Will this give the same result? (Except for error diagnostics when it fails.)

I think the place it may differ is for models like HelloWorld. Where, a sensible unit-system would find an error, but the Modelica community wants it to be valid. Which we have found to add complexity, and justified us spelling the algorithm out.

HansOlsson · 2024-03-22T09:23:25Z

Will this give the same result? (Except for error diagnostics when it fails.)

I think the place it may differ is for models like HelloWorld. Where, a sensible unit-system would find an error, but the Modelica community wants it to be valid. Which we have found to add complexity, and justified us spelling the algorithm out.

I don't see how it would differ (I assume HelloWorld means a simple k*der(x)=-x; model where we don't derive unit for k?). I understand that solving it as a system of equations would give additional units, but I don't see where this simple algorithm would differ from Hindley-Milner - and I believe we need to understand this.

gwr69 · 2024-03-22T09:38:26Z

Will this give the same result? (Except for error diagnostics when it fails.)

I think the place it may differ is for models like HelloWorld. Where, a sensible unit-system would find an error, but the Modelica community wants it to be valid. Which we have found to add complexity, and justified us spelling the algorithm out.

I don't see how it would differ (I assume HelloWorld means a simple k*der(x)=-x; model where we don't derive unit for k?). I understand that solving it as a system of equations would give additional units, but I don't see where this simple algorithm would differ from Hindley-Milner - and I believe we need to understand this.

Great example! Make it der(x) = k * x for the economists and all of a sudden nobody would want to enter a magnitude matching 1/s and entering the value from your broker/bank will get you in numerical trouble soon. ;-)

What took getting used to for me (the BusinessSimulation as of v2.2.0-wsm passes all unit checks in WSM without abuse of “fudge factors”, e.g. 1[“”]) is that I would expect that users know what they enter (input and parameters and constants), but they can only have expectations regarding the results (output) as there may be equation errors hidden in a model. Setting units for an ouput connector is helpful as it makes unit checks fail earlier. At the same time it is also confusing as the unit values are not seen as expectations but as definite values to “solve unit equations.” Finding the location of a unit error still is not easy imho.

qlambert-pro · 2024-03-22T09:52:56Z

I don't see how it would differ (I assume HelloWorld means a simple k*der(x)=-x; model where we don't derive unit for k?). I understand that solving it as a system of equations would give additional units, but I don't see where this simple algorithm would differ from Hindley-Milner - and I believe we need to understand this.

So I don't think they would necessarily differ from Hindley-Milner. Although, the details contained in "If we can uniquely deduce the unit for one variable" can mean that a tool is able to infer something that another isn't but there is an inconsistency in the way a variable is used.
However, the version that is presented here support the following model for instance:

model HelloWorld
  Real x(start = 1);
equation
  der(x) = -x;
end HelloWorld;

on the basis that if a user doesn't set units for their variables they probably aren't interested in getting unit diagnostics. It was my understanding that people like Martin would probably would like that to be the case. But if I misunderstood, I am more than happy to consider that an error. It will simplify the proposal a lot.

qlambert-pro · 2024-03-22T09:55:57Z

Finding the location of a unit error still is not easy imho.

I think that this is a tool issue, on our side we are both thinking of ways to make the derivation more intuitive as well as showing the user how we reached our conclusions.

HansOlsson · 2024-03-25T12:38:42Z

RationaleMCP/0027/HindleyMilner.md

+The natural generalization to functions with multiple outputs applies.
+
+### Transcendental functions
+As examples of transcendental functions, consider `sin`, `sign`, and `atan2`.


The sign function is as far as I understand not a transcendental function, as they are defined as non-polynomial analytic functions, and I'm a bit unsure about atan2.

More importantly neither of those functions should use the rules for transcendental functions:

The sign-function is just comparing a value to zero; the unit of the input doesn't matter.

The atan2-function should ideally be defined as atan2(y, x)=atan(y/x); i.e. the two inputs should have the same unit - but it is perfectly normal to use atan2 for variables with unit.

The sign-function is just comparing a value to zero; the unit of the input doesn't matter.

It does if it is a temperature.

The atan2-function should ideally be defined as atan2(y, x)=atan(y/x); i.e. the two inputs should have the same unit - but it is perfectly normal to use atan2 for variables with unit.

This is what is specified here.

The sign-function is just comparing a value to zero; the unit of the input doesn't matter.

It does if it is a temperature.

No.

Or with more words: it's more complicated - the unit sort of matters if it is an absolute temperature, but not if it is a temperature difference (and the unit system doesn't separate the two). And in the SI-system, which we base the unit handling on, absolute temperatures are measures in Kelvin and using the sign for an absolute temperature in Kelvin is kind of meaningless.

Similarly you should normally use sign for voltages, not potentials; and for relative positions (or velocities) - not absolute positions. The unit system doesn't separate them, and there are lots of models using sign for variables with units (fluid, mechanical, etc) - and they make sense.

Fair enough for sign, I agree that we would need to ensures that the value is a difference.

I just had another look at sign, the specification doesn't enforce any unit on its argument, it is just specifying that sign(-1) should behave as a literal.

henrikt-ma · 2024-03-26T14:16:27Z

To illustrate the alternative of system of equations consider two problematic equations:
k*der(x)=-x;
cos(y*z)+cos(y/z)=f(time)
Assuming no units are given the approach with solving for variable-units one at a time cannot find any units.

I think this hasn't been answered yet, at least not the part with y and z, so I'll do it now.

I believe the use of the transcendental function cos makes it unnecessarily complicated due to the special handling of empty units, so let's make our own function instead:

function f
  input Real u(unit = "1");
  output Real y = u;
end f;

f(y*z) = f(y/z)

Here, we get the two constraints:

y.unit * z.unit ~ "1"
y.unit / z.unit ~ "1"

The one-sided for is:

y.unit * z.unit
y.unit / z.unit

The algorithm starts by turning one of these into a substituation. Let's pick the first one, and apply to the second:

y.unit → z.unit^(-1)`
z.unit^(-1) / z.unit that is, z.unit^(-2)

Finally, the remaining constraint is turned into a substution, and applied to the first substitution:

y.unit → "1"`
z.unit → "1"`

That the two mutually dependent constraints are not approached with linear methods over the rationals doesn't mean we don't find the solution.

RationaleMCP/0027/HindleyMilner.md

Co-authored-by: Hans Olsson <[email protected]>

HansOlsson · 2024-03-27T09:06:25Z

As far I understand one topic was to check: ThermoFluidStream ( DLR-SR/ThermofluidStream@3480936 ), MSL 4.0.0 and MSL and link to existing unit-issues for MSL.

Looking through unit-related pull requests for Dymola I find that the following was found in that way (I might have missed some):

Unit consistency for s-parameter in Modelica.Fluid.Vessels.PartialLumpedVessel ModelicaStandardLibrary#4398 (new 2024-04-24)
AdditionalSqrt ModelicaStandardLibrary#4396 (new 2024-04-24)
Proper unit for gamma. It is computed as the square root of something. ModelicaStandardLibrary#4377 (new)
Incorrect units, confusing style, and probably more in Modelica.Media.Air.ReferenceMoistAir ModelicaStandardLibrary#4368 likely actual model errors
Use sqrt(x) instead of x^0.5 in several places. ModelicaStandardLibrary#4358
Improve ElastoGap ModelicaStandardLibrary#4200 (missed)
Remove unit-casting ModelicaStandardLibrary#4158
Correct start-value for t0 in ThreePhaseTwoLevel ModelicaStandardLibrary#4152 - minor model error
Propose better unit-handling in equation for LogSweep ModelicaStandardLibrary#4055
Remove incorrect unit-casting ModelicaStandardLibrary#4053
Fix SI units by introducing reference phase impedance ZsRef ModelicaStandardLibrary#4052
Have different sources for different units to help unit-checking. ModelicaStandardLibrary#4051
Remove unit error changing ModelicaStandardLibrary#4041 - actual model error
Revert major part of #227, and remove unit from gain parameter of Modelica.Blocks.Continuous.* blocks and Modelica.Blocks.Math.Gain ModelicaStandardLibrary#3881
What is the unit of R? ModelicaStandardLibrary#2377
FixUnits ModelicaStandardLibrary#2375

Related ones:

The dT is a temperature difference ModelicaStandardLibrary#4032 was found in other ways as far as I recall
Add unit="1" to quaternions ModelicaStandardLibrary#4397 was found when looking at the logs, but isn't an error; just an improvement
Have final modifier for these rescalings. ModelicaStandardLibrary#4379 is just a first step for that one. Changing the conversion factor between Pa and bar is clearly a bad idea, but we might do more.

The log for ThermoFluidStream with current Dymola is attached:
TFS.txt
The issues are:

ThermofluidStream.Examples.Utilities.Piston deliberately has x.unit="1" and x0.unit="m" and still use x=x0
Many others - in particular functions (including log(T) for a temperature) and constants assumed to have unit; some overlap with Use sqrt(x) instead of x^0.5 in several places. ModelicaStandardLibrary#4358

qlambert-pro · 2024-03-27T10:44:37Z

It is going to take us a few days to bring our implementation on par with the proposal, we will get back to you as soon as we are there.

HansOlsson · 2024-03-27T11:18:33Z

For MSL "master" I will check modelica/ModelicaStandardLibrary#4358 with modelica/ModelicaStandardLibrary#4351 reverted to give a reasonable log-file for MSL.

However, there were was one minor issue in the latest Dymola that needs to be corrected first, but at least the log seems smaller than for ThermoFluidStream

HansOlsson · 2024-03-27T13:15:05Z

The log file for MSL master (with modelica/ModelicaStandardLibrary#4351 reverted) is:
MSL410.txt

Some issues are known (like magnetic ones that need sqrt-units and changes); e.g., Hysteresis37 ModelicaStandardLibrary#4241
In some models it may make sense to add units:
- For parameters in Modelica.Electrical.Analog.Examples.CauerLowPassOPV
- Possibly for Modelica.Magnetic.FundamentalWave.Examples.BasicMachines.InductionMachines.IMS_Start (etc) or it may be something else
- Some examples just use unit-sloppy test-equations like: Modelica.Media.Examples.WaterIF97
In some models I don't know what is happening like Modelica.Electrical.Spice3.Internal.Mos.mos2CalcCalcTempDependenciesRevised (etc); Units for gamma in Modelica.Electrical.Spice3.Internal.Mos.mos2CalcCalcTempDependenciesRevised ModelicaStandardLibrary#4369
Modelica.Fluid.Examples.TraceSubstances.RoomCO2 is due to lack of nthRoot; Modelica.Fluid.Vessels.ClosedVolume roots ModelicaStandardLibrary#4370
Most of the issues are for functions (and some models) where constants/literals don't have units in large expressions. Disabling the unit-checking in strategic places seems like the best solution in general without changing too much.

HansOlsson · 2024-03-27T13:18:51Z

BTW: The main change in Dymola's handling is that instead of creating a substitution list each constraint is (potentially) handled individually.

Added: A previously discussed consequence of this is that there is no need to treat the der-operator specially.

HansOlsson · 2024-03-28T13:21:37Z

BTW: The main change in Dymola's handling is that instead of creating a substitution list each constraint is (potentially) handled individually.

And that it is done before expansion (this has both benefits and disadvantages). Clearly unit-checking shouldn't in itself cause expansion as that will be too confusing. And ThermoFluidstream demonstrates that unit-checking after expansion of evaluable parameter allows a style where the unit depends on an evaluable parameter (whether people like that is another matter); the primary downside is that it adds complexity to the overall processing - including expansion.

HansOlsson · 2024-04-02T13:06:24Z

So in terms of this PR Dymola's handling can be described by replacing "Solvability of constraints", "substitution" and "the algorithm" by something like:

#### Solvability of constraints

After meta-expression evaluation a constraint's solvability can be determined using:
* If there are no empty units and zeros variable with unknown unit the constraints are verified.
* If there are no empty units, and only one variable with unknown unit and that variable only occurs once, the unit of that variable is inferred to satisfy the constraint.

The unit for that variable is then used in other constraints.

Obviously it will infer units in fewer cases than the Hindley-Milner based one - the benefits are that it gives easily understandable derivation of units, avoids special cases for der(x)=-x;. Additionally it isn't clear if the missed cases are relevant in practice.

qlambert-pro · 2024-04-02T13:31:46Z

What is solved also depends on the order in which the constraints are processed, as I pointed out above.

This approach would at least require defining an order of processing if we want to agree on which models are valid.

I also don't understand why it avoids der(x)=-x, since I would assume that there is a need to perform simplifications, and in what you outline there are no special rules for der, which means that the constraint extracted from der(x)=-x would be simplified to "1/s" = "1" which is clearly covered by the first case.

HansOlsson · 2024-04-02T14:23:52Z

What is solved also depends on the order in which the constraints are processed, as I pointed out above.

This approach would at least require defining an order of processing if we want to agree on which models are valid.

I cannot see that. Obviously the order will matter if there is an inconsistency, but I don't see how it can matter in other cases.

I also don't understand why it avoids der(x)=-x, since I would assume that there is a need to perform simplifications, and in what you outline there are no special rules for der, which means that the constraint extracted from der(x)=-x would be simplified to "1/s" = "1" which is clearly covered by the first case.

There are no simplifications of the constraints!

HansOlsson · 2024-04-02T15:04:14Z

As a simple example consider Modelica.Electrical.Machines.Examples.ControlledDCDrives.Utilities.DcdcInverter corrected in modelica/ModelicaStandardLibrary#4041 and how we can infer the unit of
iDCFilter.y_start (which now has the value 0 - not VMax).

VMax is declared with unit "V"
vDCFilter.y_start = VMax; gives vDCFilter.y_start.unit= "V"
vDCFilter.y.start=vDCFilter.y_start; gives vDCFilter.y.unit= "V"
vDCFilter.y = gain.u; gives gain.u.unit="V"
vDC = gain.u; gives vDC.unit="V"
pDC = vDC*iDC; gives iDC.unit="A" (as the declaration has pDC.unit="W")
iDCFilter.y = iDC; gives iDCFilter.y.unit="A"
iDCFilter.y.start=iDCFilter.y_start; gives iDCFilter.y_start.unit="A"

HansOlsson · 2024-04-02T15:16:32Z

BTW: Note that MCP/0027 is now rebased on master and this PR is intended to be merged into MCP/0027.

henrikt-ma · 2024-04-03T07:45:37Z

I also don't understand why it avoids der(x)=-x, since I would assume that there is a need to perform simplifications, and in what you outline there are no special rules for der, which means that the constraint extracted from der(x)=-x would be simplified to "1/s" = "1" which is clearly covered by the first case.

There are no simplifications of the constraints!

I think simplification is necessary in order to be able to reason about units. For example, I assume one would like the same unit inference in the following equations, and I don't see how that would happen without some sort of intelligent processing of the constraints:

x = y^2 * z
x = y * y * z
x = y * z * y

henrikt-ma · 2024-04-03T08:15:20Z

Obviously it will infer units in fewer cases than the Hindley-Milner based one - the benefits are that it gives easily understandable derivation of units, avoids special cases for der(x)=-x;. Additionally it isn't clear if the missed cases are relevant in practice.

Even without any "back-substitution" unit derivations can be long and more or less impossible to see through without tool support. An implementation which strives to keep down the number of back-substitutions (not difficult) will then end up with the same sort of derivations in 99.99% of the cases, a somewhat more intricate derivation in the remaining 0.01% of cases.

Solving equations is at the core of what a Modelica tool does, and the equations we encounter in the form of unit constraints are trivial compared to all the structural and numerical challenges we fight in normal Modelica equations every day. Tying the hands of tools so that they are not allowed to process these equations just because there are mutual dependencies seems exactly like the kind of thing the specification shouldn't interfere with. (It is bad enough that one wouldn't be allowed to solve all constraints due to presence of "non-expanding der".)

HansOlsson · 2024-04-03T08:20:53Z

I also don't understand why it avoids der(x)=-x, since I would assume that there is a need to perform simplifications, and in what you outline there are no special rules for der, which means that the constraint extracted from der(x)=-x would be simplified to "1/s" = "1" which is clearly covered by the first case.

There are no simplifications of the constraints!

I think simplification is necessary in order to be able to reason about units. For example, I assume one would like the same unit inference in the following equations, and I don't see how that would happen without some sort of intelligent processing of the constraints:

x = y^2 * z

x = y * y * z

x = y * z * y

I'm neither sure about that assumption, nor whether such equations (especially with unknown unit for y) are common. The idea with writing equations are that they should be clear. However, instead of discussing such abstract equations we should have the result of applying the algorithm itself.

qlambert-pro · 2024-04-25T05:03:01Z

Here is the data for us, I collected the issues and PRs you mentioned, as well as the ones we reported in the last year or so, and assessed whether we were detected them with our implementation of our proposal.

I have also attached the errors we report when running all examples in the three libraries/versions. They do not include the errors in the MSL when testing ThermofluidStream, neither do they include the errors in ModelicaTest.

Issues/PRs	WSM	Comments
#4032	No	This is not a unit issue.
#4358	Yes
#4200	Yes
#4158	Yes
#4152	Yes
#4055	Yes
#4053	Yes
#4052	Yes
#4051	Yes
#4041	Yes
#3881	Yes
#2377	Yes
#2375	Yes
#4241	Yes
#4369	Yes	After making Integer variables carry empty-unit.
#4370	Yes
#4368	Yes
#4117	Yes
#4116	Yes
#4115	Yes
#4114	Yes
#4113	Yes
#4112	Yes
#4111	Yes
#4109	Yes
#4108	Yes
#4119	Yes
#4103	Yes
#4096	Yes
#4076	No	No actual issue here.
#4193	Yes
#4192	Yes
#4097	Yes
#4080	Yes
#4079	Yes
#4078	Yes

modelica-4.0.0.txt
modelica-master.txt
thermofluidstream.txt

HansOlsson · 2024-04-25T07:21:10Z

modelica-master.txt

The messages are quite terse. I could recognize many of them, I find at least the following confusing:

Modelica.Electrical.Machines.Utilities.ParameterRecords.IM_SlipRingData at line 18:
Incompatible units. The unit of (0.04) / ((aimsData.turnsRatio) ^ (2)), namely "Hz-2.H-2/Ohm-2", is not compatible with "Ohm". The base unit factorizations of "Ohm" and "Hz-2.H-2/Ohm-2" are "A-2.kg.m2.s-3" and "1", respectively.

It is from parameter SI.Resistance Rr=0.04/turnsRatio^2. I agree that it is incorrect, but saying that the rhs-unit is "Hz-2.H-2/Ohm-2", is a bit weird. If we look at its equation:

VsNominal/VrLockedRotor*(2*pi*fsNominal*Lm)/sqrt(Rs^2 + (2*pi*
    fsNominal*(Lm + Lssigma))^2)

we see that Rs^2 and fsNominal*(Lm + Lssigma))^2 are seen as compatible, but the first term seem to give the unit which means that (2*pi*fsNominal*Lm)/sqrt(Rs^2 + (2*pi*fsNominal*(Lm + Lssigma))^2) of the type (w*L)/sqrt(R^2+(w*L)^2) doesn't get unit "1", as one would expect, but something else that is compatible.

Modelica.ComplexBlocks.Sources.ComplexRampPhasor at line 15:
Incompatible units. The unit of complexRamp.magnitude1, namely "A", is not compatible with "1". The base unit factorizations of "1" and "A" are "1" and "A", respectively.

The "Ampere" just seem confusing for a model without any currents.
However, looking at the use of that model in e.g., Modelica.Magnetic.QuasiStatic.FluxTubes.Examples.BasicExamples.QuadraticCoreAirgap I can see the problem (reported by Dymola as well).

The model improvement would be to use magnitude1*(magnitude2/magnitude1)^min(1, (time-startTime)/duration)

I also looked at the new PRs I added yesterday:
Proper unit for gamma. It is computed as the square root of something. ModelicaStandardLibrary#4377 - seems to be found
Unit consistency for s-parameter in Modelica.Fluid.Vessels.PartialLumpedVessel ModelicaStandardLibrary#4398 - missing as far as I can see (I see that you also have struggled with arrays)

qlambert-pro · 2024-04-25T07:25:20Z

We haven't really tried for arrays yet.

The errors do come from validating the examples, I realise now that the messages my lack that context. If it makes it easier I could give you the example that was being validated.

henrikt-ma · 2024-04-25T08:10:03Z

The messages are quite terse. I could recognize many of them, I find at least the following confusing:

We need to focus on the reject/accept (detect/miss) aspect, and primarily see messages as a quality-of-implementation thing. I say primarily, because I realize that a good proposal for unit checking must not make it impossible to generate good messages. Regarding the WSM implementation, I feel confident that we'll be able to generate more intuitive messages with some additional effort, but I'd like to avoid this delaying the standardization process.

HansOlsson · 2024-04-25T10:47:57Z

The messages are quite terse. I could recognize many of them, I find at least the following confusing:

We need to focus on the reject/accept (detect/miss) aspect, and primarily see messages as a quality-of-implementation thing.

Yes in general, but it limits understanding for the discussion and the 2nd example showed that it gave a message for a model in isolation, even though the problem only occurs when using that model in other models.

That can be:

Just confusing; no big deal - but I don't understand how it would occur; especially as I would imagine that one could work around this issue by modifying the models using this sub-model.
Or assuming that the model must have the same unit everywhere. That would be deeply problematic and might indicate a lack of "let-polymorphism" in the algorithm.

However, the bigger message I see is that we need a way to escape the unit-checking in specific classes/equations based on how some functions are written.

HansOlsson · 2024-04-30T08:55:50Z

Looking back I realize that we still don't know the following regarding the Hindley-Milner part:

Does it find anything new in actual Modelica libraries, compared to just looking at each unit-constraints by itself (repeatedly)? Note that this can be both units for more variables, or errors.
Is it equivalent to solving the unit-constraints as an over/under-determined system of integer-equations or not?
How does one actually generate good diagnostics based on this?

All of this matters. Specifying a new algorithm for finding units with unclear diagnostics that only help in contrived examples does not seem attractive.

HansOlsson · 2024-05-16T14:33:18Z

Regarding arrays we have tried the following in Dymola (as far as I recall - will have to double-check):

Subscript:
- Evaluable: unit inference for the specific array element
- Loop-variable: homogenous unit for the array
- Otherwise: if the array has a homogenous unit: use that - otherwise view as unknown, without deducing it
No subscript: homogenous unit for the array

As far as I can see this can deduce correct units for lots of models in MSL, and without any false warnings.
(As far as I recall no major new problems were found in models.)

Merely assuming that all arrays have homogenous unit isn't good, and will fail for some cases (in particular the block that output {abs,arg} as a Real vector for complex numbers; and Media-functions of course).

I'm aware of the following issues:

The loop handling is a bit simplified as the loop isn't necessarily over the entire array. One possibility would be to only do it if loop over entire array.
The "otherwise" case could be refined later.

However, it seems like a pragmatic solution.

Initial commit of ReadMe.md for MCP-0027

7524b11

qlambert-pro changed the base branch from master to MCP/0027 March 19, 2024 15:05

qlambert-pro changed the title ~~Applying Hindley-Milner to unit-checking.~~ Applying Hindley-Milner to unit-checking Mar 19, 2024

qlambert-pro added the MCP0027 Unit checking label Mar 19, 2024

qlambert-pro force-pushed the MCP/0027+Hindley-Milner branch from c0fe5d3 to 365663f Compare March 20, 2024 05:34

Provide a draft specification using the Hindley-Milner algorithm.

365663f

Co-authored-by: Henrik Tidefelt <[email protected]>

HansOlsson reviewed Mar 25, 2024

View reviewed changes

qlambert-pro changed the base branch from MCP/0027 to master March 25, 2024 12:38

HansOlsson added this to the 2024-2 milestone Mar 25, 2024

HansOlsson reviewed Mar 26, 2024

View reviewed changes

RationaleMCP/0027/HindleyMilner.md Outdated Show resolved Hide resolved

Fix typo

0dc006c

Co-authored-by: Hans Olsson <[email protected]>

HansOlsson changed the base branch from master to MCP/0027 April 2, 2024 12:49

HansOlsson modified the milestones: 2024-2, 2024-3 May 16, 2024

Applying Hindley-Milner to unit-checking #3491

Are you sure you want to change the base?

Applying Hindley-Milner to unit-checking #3491

Conversation

qlambert-pro commented Mar 19, 2024

HansOlsson commented Mar 20, 2024 • edited Loading

qlambert-pro commented Mar 20, 2024 • edited Loading

DagBruck commented Mar 21, 2024

HansOlsson commented Mar 21, 2024

HansOlsson commented Mar 21, 2024

qlambert-pro commented Mar 21, 2024 • edited Loading

AHaumer commented Mar 21, 2024 • edited Loading

HansOlsson commented Mar 21, 2024

gwr69 commented Mar 21, 2024

AHaumer commented Mar 21, 2024

gwr69 commented Mar 21, 2024

qlambert-pro commented Mar 22, 2024 • edited Loading

qlambert-pro commented Mar 22, 2024

MartinOtter commented Mar 22, 2024 • edited Loading

MartinOtter commented Mar 22, 2024

qlambert-pro commented Mar 22, 2024 • edited Loading

HansOlsson commented Mar 22, 2024

qlambert-pro commented Mar 22, 2024

qlambert-pro commented Mar 22, 2024

qlambert-pro commented Mar 22, 2024 • edited Loading

HansOlsson commented Mar 22, 2024

gwr69 commented Mar 22, 2024

qlambert-pro commented Mar 22, 2024 • edited Loading

qlambert-pro commented Mar 22, 2024

HansOlsson Mar 25, 2024

Choose a reason for hiding this comment

qlambert-pro Mar 25, 2024

Choose a reason for hiding this comment

HansOlsson Mar 25, 2024

Choose a reason for hiding this comment

qlambert-pro Mar 25, 2024 • edited Loading

Choose a reason for hiding this comment

qlambert-pro Apr 2, 2024 • edited Loading

Choose a reason for hiding this comment

henrikt-ma commented Mar 26, 2024

HansOlsson commented Mar 27, 2024 • edited Loading

qlambert-pro commented Mar 27, 2024

HansOlsson commented Mar 27, 2024

HansOlsson commented Mar 27, 2024 • edited Loading

HansOlsson commented Mar 27, 2024 • edited Loading

HansOlsson commented Mar 28, 2024

HansOlsson commented Apr 2, 2024

qlambert-pro commented Apr 2, 2024 • edited Loading

HansOlsson commented Apr 2, 2024

HansOlsson commented Apr 2, 2024

HansOlsson commented Apr 2, 2024

henrikt-ma commented Apr 3, 2024

henrikt-ma commented Apr 3, 2024 • edited Loading

HansOlsson commented Apr 3, 2024

qlambert-pro commented Apr 25, 2024

HansOlsson commented Apr 25, 2024

qlambert-pro commented Apr 25, 2024 • edited Loading

henrikt-ma commented Apr 25, 2024

HansOlsson commented Apr 25, 2024

HansOlsson commented Apr 30, 2024

HansOlsson commented May 16, 2024

HansOlsson commented Mar 20, 2024 •

edited

Loading

qlambert-pro commented Mar 20, 2024 •

edited

Loading

qlambert-pro commented Mar 21, 2024 •

edited

Loading

AHaumer commented Mar 21, 2024 •

edited

Loading

qlambert-pro commented Mar 22, 2024 •

edited

Loading

MartinOtter commented Mar 22, 2024 •

edited

Loading

qlambert-pro commented Mar 22, 2024 •

edited

Loading

qlambert-pro commented Mar 22, 2024 •

edited

Loading

qlambert-pro commented Mar 22, 2024 •

edited

Loading

qlambert-pro Mar 25, 2024 •

edited

Loading

qlambert-pro Apr 2, 2024 •

edited

Loading

HansOlsson commented Mar 27, 2024 •

edited

Loading

HansOlsson commented Mar 27, 2024 •

edited

Loading

HansOlsson commented Mar 27, 2024 •

edited

Loading

qlambert-pro commented Apr 2, 2024 •

edited

Loading

henrikt-ma commented Apr 3, 2024 •

edited

Loading

qlambert-pro commented Apr 25, 2024 •

edited

Loading