Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve function (for purity) #3589

Open
HansOlsson opened this issue Oct 24, 2024 · 7 comments
Open

Improve function (for purity) #3589

HansOlsson opened this issue Oct 24, 2024 · 7 comments
Labels
enhancement New feature or request
Milestone

Comments

@HansOlsson
Copy link
Collaborator

Continuing #1937

The current definitions of pure/impure are lacking in a number of ways, and that is getting problematic - and for users the problem is that since the definitions are so lacking the tools don't bother to implement and check the semantics.

Specifically impure is too coarse-grained and we can broadly see three levels:

  • Functions depending on the "external environment"; but acting purely when it is held fixed.
  • Functions behaving non-deterministically, i.e., the same inputs will give different outputs even if nothing changed. They might also depend on the "external environment"
  • Functions modifying the "external environment" (having side-effects). (Optionally including all of the effects above.)

I used "level" to indicate that I don't see a need to care:

  • whether a non-deterministic function depends on the "external environment",
  • whether a function modifying the "external environment" depends on it, and/or acts non-deterministically.

Being able to tell these apart has the following benefits in terms of clarity and future changes:

  • Future restriction: both parameter-bindings and when-clauses can use impure functions, but for parameter-bindings only the first two levels are a good idea (whereas all work for when-clauses).
  • Future extension related to Clarify impure in systems of equations. #3571: we could say that functions from the first level are actually acceptable in systems of equations, even if formally impure, since they will behave in a pure way in those cases (assuming no really impure functions are used in the system).

And also performance for the first level.

A preliminary implementation in Dymola uses:

It would be possible to turn this into syntax if we so desire (see #1937 for variants - including impure input).

However, it might also be best to start with them as annotations to more quickly get this into libraries in a tool-independent way, and for the moment only view restrictions as tool-warnings - and then once we have more experience we can modify the syntax and semantics (and state that the annotations behave in a similar way for backwards compatibility).

For pure functions the corresponding issue is thread-safety. I assume that should be a separate issue.

@HansOlsson HansOlsson added the enhancement New feature or request label Oct 24, 2024
@henrikt-ma
Copy link
Collaborator

I agree that we need something more expressive than just pure/impure, and largely agree with the picture presented in the introductory comment above.

I think a lot of this can be solved by extending the function variability described in https://specification.modelica.org/master/operators-and-expressions.html#function-variability. That is, by allowing function variability to be declared rather than just inferred from default arguments, much of what we need would follow automatically. Similar to today, function variability could remain inferred, but now not only from default arguments, but also from the function variability of functions used in the body.

A key advantage of extending the function variability concept is that we get the semantic restrictions based on variability for free.

With an extended function variability concept, we could use pure/impure just to orthogonally express whether there are side-effects.

According to this idea, annotation(__Dymola_impureConstant=true) would instead be expressed like so:

impure constant function f
  …
end f;

I see a small syntactic challenge in that we don't have a variability keyword for the highest variability (non-discrete-time), while one would also like function variability to be implicit when no variability prefix is explicitly given. That is, to express annotation(__Dymola_impureRandom=true) one would need something different than the current selection of variability prefixes, for instance:

impure impure function f

or

impure "random" function f

or

impure false function f

(When choosing syntax, one should also consider what it will look like when combined with pure, which makes impure impure a poor candidate in my opinion, since it would be difficult to remember that pure impure is not the same thing as impure constant.)

Similar to pure/impure it should be deprecated from start to not specify function variability for an external function. For safety and increased backward compatibility, we could state that the default function variability of an impure external function is "random", while it might make more sense for a pure external function to be constant by default.

@HansOlsson
Copy link
Collaborator Author

I agree that we need something more expressive than just pure/impure, and largely agree with the picture presented in the introductory comment above.

I think a lot of this can be solved by extending the function variability described in https://specification.modelica.org/master/operators-and-expressions.html#function-variability. That is, by allowing function variability to be declared rather than just inferred from default arguments, much of what we need would follow automatically. Similar to today, function variability could remain inferred, but now not only from default arguments, but also from the function variability of functions used in the body.

I'm not convinced that it is really about variability. However, I have no problem with using impure constant, even though I think it is more a matter of input vs. output for those functions.
(Thus it is more "const" in sort of the C/C++ member function sense than in the Modelica sense.)

A key advantage of extending the function variability concept is that we get the semantic restrictions based on variability for free.

With an extended function variability concept, we could use pure/impure just to orthogonally express whether there are side-effects.

According to this idea, annotation(__Dymola_impureConstant=true) would instead be expressed like so:

impure constant function f
  …
end f;

I see a small syntactic challenge in that we don't have a variability keyword for the highest variability (non-discrete-time), while one would also like function variability to be implicit when no variability prefix is explicitly given. That is, to express annotation(__Dymola_impureRandom=true) one would need something different than the current selection of variability prefixes, for instance:

In particular I think the variability arguments breaks down here, as I don't see that "impure random" being on the "top of hierarchy" since it doesn't actually impact other functions (as level 3).

Additionally, these functions don't really have side-effects (they depend on, but don't influence the external state), which would imply they are pure - but that seems even more confusing.

Similar to pure/impure it should be deprecated from start to not specify function variability for an external function. For safety and increased backward compatibility, we could state that the default function variability of an impure external function is "random", while it might make more sense for a pure external function to be constant by default.

I would more say that we keep external functions as impacting the external state. However, having the possibility to deduce these properties between functions would be helpful; the idea is that functions may in clever ways go down a level but not up (a function might use the random value in a way that removes the randomness - e.g., quick-sort of unique values internally using a random function to select the pivot; but a non-random function cannot become random).

However, this also indicates that more discussion is needed and we have to figure out how to introduce this in a good way. It may indicate that the annotations are useful as an intermediate solution, while we figure out the best way forward.

@HansOlsson
Copy link
Collaborator Author

Thinking more I realized that impureRandom isn't an ideal name.

The problem is that it differentiates it from impure constant, not from default impure - so impureNonWrite might be better, indicating that it doesn't write to the external environment (or impureOutput=false).

@HansOlsson HansOlsson added this to the 2024-November milestone Nov 6, 2024
@HansOlsson
Copy link
Collaborator Author

@HansOlsson add two examples.

@HansOlsson
Copy link
Collaborator Author

For examples I will first present two example that I want to support in a good way:

First a corrected example based on modelica/ModelicaStandardLibrary#4472

model checkExist
  Boolean Esiste(start = false);
equation 
  if (time > 0.0) then
    Modelica.Utilities.Streams.print(String(time),"file.txt");
  end if;
  when sample(0.1,0.1) then
    Esiste = Modelica.Utilities.Files.exist("file.txt");
  end when;
end checkExist;

And a second simplified example:

  model M
    parameter String tabFile;
    Modelica.Blocks.Sources.CombiTimeTable combiTimeTable(tableOnFile=Modelica.Utilities.Files.exist(tabFile),
      table=[0.0,0.0; 1,1; 2,4; 3,9],
      tableName="table",
      fileName=tabFile)
      annotation (Placement(transformation(extent={{-256,-42},{-236,-22}})));
  end M;
  parameter String tabFile="test.txt";
  M m[10](each tabFile=tabFile);

The intents of the examples are:

  • The first wants to check that the file exists after it has been written to; for that purpose it is important that print is impure, so even if exist is impureConstant so we do not assume that we get the same result for exist all the time. (More realistic variants of this will have it more separated, so something somewhere in the model writes to the file and instead of checking the file at regular interval we check it at the end). This example already works with existing semantics.
  • The second example is about two things:
    • Performance, specifically avoiding checking that the same file exists 10 different times. In a more realistic case the call would not only check if the file exists, but actually read the contents. Note that even if we used another function for reading the file it might still start by checking if the file exists.
    • Deterministic behavior. If all non-pure functions called for parameter bindings are impureConstant we don't have to care how they are ordered, and will still get predictable result. If they are impureNonWrite we still don't have to care about ordering, even if the results is not predictable.

Unpredictable cases are e.g.:

  parameter String s;
  parameter Real A[:,:]=...;
  parameter Real B[:,:]=...;
  parameter Boolean ok1=Modelica.Utilities.Streams.writeRealMatrix(s, "A", A);
  parameter Boolean ok2=Modelica.Utilities.Streams.writeRealMatrix(s, "B", B, append=true);

The problem here is that there's no guarantee that the "first" write-call will be ordered before the "second", so maybe B is written first and then A is added without appending to the file and thus overwriting B (the intent was the other order). I don't have a good solution for how to handle this, but adding impureConstant will at least ensure that we can see whether this issue may occur or not.

I will not consider evaluable parameters in detail and think we can deal with that later. However, we can say that if we have to evaluate a parameter bound to a general impure function it becomes problematic (what about side-effects?). However, if it is impureConstant we can at least say that simulation is valid as long as the external environment wasn't changed.

@henrikt-ma
Copy link
Collaborator

henrikt-ma commented Nov 29, 2024

Thinking more I realized that impureRandom isn't an ideal name.

The problem is that it differentiates it from impure constant, not from default impure - so impureNonWrite might be better, indicating that it doesn't write to the external environment (or impureOutput=false).

I still don't like the idea of hiding this information in annotations, and still believe that there should be orthogonal specification of whether a function may have side effects and with what variability it must be assumed to depend on the external environment.

If one wants a Files.exist which can be used in parameter bindings, it would be a pure parameter function, and it could make sense to name the function existAtInitialization in this case. Since this function returns a Boolean, the non-discrete-time variability would not be relevant, so the Files.exist without the parameter restriction would be a pure discrete function. Of course, that function should still be possible to used for initialization of a non-fixed parameter by placing it in an initial equation (we just have to fix #2602).

@henrikt-ma
Copy link
Collaborator

henrikt-ma commented Nov 29, 2024

For functions that have non-discrete-time dependency on the external environment, I guess one could actually use the following syntax:

pure not discrete function outsideTemperature

where not discrete would be the way to spell out non-discrete-time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants