-
Notifications
You must be signed in to change notification settings - Fork 20
lang types
The Ecstasy type system, language, compiler, and core libraries are designed to work in harmony to securely and effectively deliver the capabilities described in this guide. The term "type system" is often simultaneously vague and overused, so this chapter is intended to bring the concepts down to earth, and to illustrate the purpose of the design, and the value that it delivers to the programmer.
First, in Ecstasy there is actually such a thing as a TypeSystem class, so we're going to quickly stop using that word in the general programming language sense, other than to explain what it means in the generic sense: A language's type system refers to the abstractions and their classifications that a language provides around the raw data that the language machinery manages on behalf of the programmer. In a good design, those abstractions and classifications are used to efficiently avoid or constrain undesirable behavior, and to simplify and streamline the actual desired behavior.
In other words: At some level that it's all ones and zeros, but a type system helps the developer pretend instead that it's actually Int
, Boolean
, String
, Map
, and ShoppingCart
(etc.) objects, with lots of rules about what is legal or illegal to do with each of them.
So let's get the type system basics out of the way:
- The Ecstasy type system is static. That simply means that the compiler "statically" (i.e. at compile-time) enforces language type rules. Not all of the rules can be enforced at compile time, but as many as can be enforced at compile time, are enforced at compile time. Some rules that cannot be enforced at compile time are enforced at load-and-link-time instead. And some rules are enforced at runtime, only because they cannot be enforced any earlier.
- The Ecstasy type system is strong. That simply means that Ecstasy doesn't allow running code to ever mistakenly use an object of one type in a place that requires a different type. For example, assigning an
Int
value to aString
variable will always fail, and that failure is prevented by the compiler as a compile-time error, or if that isn't possible then it is prevented by the linker as a link-time error, or if that isn't possible then it is detected at runtime -- and if it is detected at runtime, the illegal operation is prevented before it can do any harm to our precious ones and zeros! - The Ecstasy type system carries complete runtime type information. At runtime, the class of every object is known, and the type of every reference is known. Detailed structural information about the class is also known, and is available via reflection. Furthermore, the compiler embeds specific compile-time type information that cannot otherwise be reconstructed at runtime; for example, when two objects are being compared for equality, the types that those references were "known to be of" by the compiler are included in the compiled form of the code, so that the runtime can make use of that information. Finally, type information is not erased, such as in the case of generic data types; this is referred to as a "reified type system", or as "reified types" -- which literally means "types made real".
- The Ecstasy type system is based on nominative types. This over-complicated term just means that types can be identified by their names, and their composition is formed by referring to other types by their names. For example, a class of one name may extend another class by specifying the other class' name. (An example from an earlier chapter is when the
Point3D
class extended thePoint
class.) To construct an instance of a type, the name of class being instantiated must be specified, such asnew Point(0,0)
. And so on. - The Ecstasy type system also supports "duck typing", but only for interfaces. This hilarious term is a play on the saying: "If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck." What this means is that if a class provides all of the members of an interface, then the class automatically implements that interface, even if the class does not specify that interface by name in its
implements
list. This exception to the nominative typing rule is permitted because the class "looks like" that interface and "swims like" that interface and "quacks like" that interface. Duck typing is useful for representing functionality from an existing class as an interface -- especially when you aren't able to change the class to add an explicitimplements
clause. It turns out that this is quite common in the real world, such as when that class is in a module that is already deployed to production, or when that class is in someone else's module and you want to or need to work with that module -- but you're not able to change their class in their module. - Other than the explicit duck typing support for interfaces, Ecstasy does not support structural typing. This just means that two classes are not considered to be of the same type just because they happen to have the same number of properties of the same types, for example.
- Ecstasy supports a type algebra that includes parameterized types; union, intersection, and difference types; tuple types; explicitly immutable types; access-controlled types; and annotated types.
- In Ecstasy, "everything is an object", and therefore everything is of an object type. Every bit is an object, of the
Bit
class. Every integer is an object, of the appropriateInt
class. And so on. There is no separate "primitive type system" that the language pre-defines and hard-codes, like in C++, Java, or C#. This Ecstasy type system design is called the "turtles type system", because it's turtles the whole way down; it has no external references or dependencies on types outside of its own object type system. For example, anInt
is built on an array ofBit
, which in turn is built on anIntLiteral
, which in turn is built on aString
, which in turn is built on an array ofChar
, which in turn is built on aUInt32
code point, which in turn is built on an array ofBit
, and it just keeps going, recursively, forever. - Ecstasy separates types from classes. This is a simple concept in some ways, and a terrifyingly complex concept in other ways. So let's try to keep it simple: The class of an object is what the object actually is. But in Ecstasy, an object can never be directly "touched" or "held" by the programmer; instead, every interaction with an object is via a reference to that object. When we "assign an object to a variable", what we really mean is that we are storing a reference-to-that-object in that variable. When we "set a property to a value", what we are actually doing is storing a reference-to-the-value in that property (and by "value", we mean "some object"). When we "pass an object to a method", we are actually passing a reference-to-the-object to the method. Even from within an object, the object's own interactions with itself are performed via its
this
reference; an object can't even touch itself! But in each case, it is the reference that has a type; the type is part of the reference, while the class is part of the object. And since this is Ecstasy, everything is an object, including every reference.
So now that we have defined what a type system is, or at least what it's supposed to do, we need to define what a type is. This is a bit harder than it sounds, so we're going to approach it from three different directions:
- A type is about its composition. Ecstasy types are defined by how they are formed:
-
Nominative. If there is a class called
Int
, then there is a type calledInt
. This includes anyclass
,interface
,mixin
,const
,enum
,module
,package
,service
, ortypedef
name. -
This Type. Within a class, when a method or property declaration refers to its own class name, the implied type is the "auto-narrowing this type", which is a special form of a nominative type. For example, the
toUnchecked()
method onInt64
returns an@Unchecked Int64
, which means "this type, unchecked". -
Access. With a nominative type, the default type view of the class is the public interface, but it is possible to specify any of four pre-defined types exposed by a class:
public
,protected
,private
, andstruct
. The first three of these are self-explanatory; thestruct
access refers to the structure type of the class, which provides the underlying passive field storage for each property of the class that may hold a value. For example, the type "(private Person)
" refers to the "private view" of thePerson
class. -
Immutable. A type can be indicated to be explicitly immutable. For example, the type
Map<String, Int>
can refer to either a mutable or immutableMap
, but the typeimmutable Map<String, Int>
refers explicitly to the immutable form of the class. -
Annotated. A type annotation in Ecstasy is when a mixin is used to add information to another type. For example, the
Unchecked
mixin can be applied to anInt64
using an annotation:@Unchecked Int64
. -
Parameterized. A parameterized type is parameterized using other types. For example, the
List
interface is declared as parameterized by anElement
type, so parameterizing the nominativeList
type with the nominativeInt
type results in theList<Int>
parameterized type. -
Tuple. A
Tuple
is a specially parameterized type form, which supports zero or more fields, each of which is identified by a zero-based index and is defined with its own type. For example, aTuple<Int, String>
is a tuple with two fields (also called a pair), with field[0]
being anInt
and field[1]
being aString
. -
Union. A union type is used to refer to one of two different types, as in "type A or type B". For example,
String
is aconst
class, andNullable
is anenum
class. If a variable can hold either theNull
value or aString
, we say that the variable type is a unionNullable|String
, pronounced as "nullable or string". Because that "Nullable|
" prefix is so common, there exists a short-hand that means the same thing:String?
. -
Intersection. An intersection type is used to refer to the combination of two different types, as in "type A plus type B", or "both type A and type B". As mentioned above,
List
is parameterized byElement
. There is also an interface calledFreezable
; anElement
that is also known to beFreezable
can be asserted to have the intersection typeElement+Freezable
; you can see this type at work in theListFreezer
implementation. -
Difference. A difference type is used to refer to the absence of a type, as in "type A minus type B", or "type A and not type B". For example, the
toChecked()
method onInt64
returns the(Int64 - Unchecked)
type, which means "this type, and notUnchecked
".
- A type is about its capabilities. Ecstasy types are defined by what they can do:
- A type contains type members. To simplify slightly, the members are the properties and the methods of the type.
- If the type of a reference has a readable property (the property, exposed as a
Ref
), then anyone with that reference can use it to read the value of that property on the object. - If the type of a reference has a writable property (the property, exposed as a
Var
), then anyone with that reference can use it to read and write the value of that property on the object. - If the type of a reference has a method, then anyone with that reference can use it to invoke that method on the object.
- In each of these cases, it doesn't matter if the members are
public
,protected
, orprivate
; these keywords are used for organization, and not for security. The actual type of the reference is what determines what is allowed to be done using the reference.
- A type is about constraints and guarantees. Here are the most common examples:
- If a property or a variable has a certain type, then only references of that type can be stored in that property or variable, and it is guaranteed that any reference obtained from the property or variable will be of that type.
- If a method or function declares a parameter of a certain type, then only arguments of that type can be passed, and from the method or functions point of view, it is guaranteed that the arguments will be of the defined types.
- If a method or function declares a return value of a certain type, then only values of that type can be returned by the code in the method or function, and from the caller's point of view, it is guaranteed that any returned values will be of the defined type.
- If a class defines a type parameter with a type constraint, then the class can only be parameterized by types that match that type constraint. For example, in the definition
class NumList<Element extends Number> extends List<Element>
, it is guaranteed that the typeElement
is aType<Number>
, which is an excellent segue to the next topic ...
We've already explained that assigning a Null
value to a non-Nullable
type is not legal:
String s = Null; // error: String required; Nullable found
It's also obvious that you can't do things like this:
Int n = "Hello!"; // error: Int required; String found
if (n) {...} // error: Boolean required; Int found
Yet some similar code would be legal:
// even though the variable is not declared explicitly as being Nullable,
// it is declared as being an Object, and the Null value is an Object
// (because "everything is an Object")
Object o = Null;
// similarly, since Nullable is an enum containing the single value Null,
// the Null value is an Enum value
Enum e = Null;
// the String class doesn't now anything about this interface,
// but the String class does have a "size" property of type Int
interface HasASizeProperty { @RO Int size; }
HasASizeProperty example = "Quack!";
There are two fundamental relationships that define which of the above are legal vs illegal, and why:
- The is-a relationship defines a relationship between two types, such that if type
B
is-a typeA
, then an reference of typeB
can be used anywhere that a reference to typeA
is required. The cliché example of an is-a relationship in a an object-oriented language is when typeB
is a sub-class of typeA
, but that is just one of many examples of an is-a relationship in Ecstasy. - The assignable-to relationship defines a relationship between two types, such that for type
B
to be assignable-to typeA
, that either (1) typeB
is-a typeA
, or (2) typeB
has an@Auto
conversion method that returns a type that is-a typeA
. Just like there are no "primitive" hard-wired types in the language, there are also no compiler "hard wired" conversions; the only way that an object can be converted to a different type is for the class of that object to define a method that performs the conversion.
With that information, let's review the above examples to understand why each was either legal or illegal:
- The assignment
String s = Null
is illegal, becauseNull
is not aString
, andNull
has no@Auto
conversion method toString
. - The assignment
Int n = "Hello!"
is illegal, becauseString
is not anInt
, andString
has no@Auto
conversion method toInt
. - The statement
if (n) {...}
is illegal, becauseInt
is not aBoolean
, andInt
has no@Auto
conversion method toBoolean
. - The assignment
Object o = Null
is legal, becauseNull
is-aObject
. - The assignment
Enum e = Null
is legal, becauseNull
is-aEnum
. - The assignment
HasASizeProperty example = "Hello!"
is legal, because theString
type is-aHasASizeProperty
(by duck typing).
The full definition of the is-a relationship is technically complex, but we can cover 99% of its complexity in a small number of statements; for two types, A
and B
, type B
is-a type A
if any of the following are true:
-
A
andB
are the same exact type; -
A
andB
are both class types, andB
is a sub-class ofA
; -
A
is an interface, andB
implements or delegates an interface that is-aA
; -
A
is an interface, andB
duck-types that interface; -
A
is a mixin, andB
is annotated by (or incorporates) that mixin; -
B
is a mixin, and it mixes into a type that is-aA
-
A
is an union type of two typesA1
andA2
, andB
is-aA1
and/orB
is-aA2
; -
A
is an intersection type of two typesA1
andA2
, andB
is-aA1
andB
is-aA2
; -
A
is an explicitlyimmutable A1
, andB
is-aA1
andB
is implicitly immutable (aconst
,enum
,package
, ormodule
); -
B
is an explicitlyimmutable B1
, andB1
is-aA
; -
B
specifies a public/protected/private access such as(protected B1)
andA
does not specify an access, andB1
is-aA
; - Both
A
andB
specify a public/protected/private access onA1
andB1
respectively, and the level of access specified forB
is greater than or equal to the level of access specified forA
andB1
is-aA1
; -
B
specifies astruct
access andA
does not specify an access, and the typeStruct
is-aA
; - Both
A
andB
specify astruct
access onA1
andB1
respectively, andB1
is-aA1
; -
B
is a parameterizedB1
andA
is not parameterized, andB1
is-aA
; -
B
is a parameterizedB1
andA
is a parameterizedA1
, andB1
is-aA1
and for each parameterPb
ofB
, eitherA
has no corresponding parameterPa
, orPa
andPb
are the same exact type.
With each of the above rules, it is possible to prove absolutely that some type B
is-a type A
. There are are also cases in which it is possible to weakly prove that some type B
is-a type A
at compile time; it is a weak proof because it may actually be proven wrong at runtime. The technical reason why such a condition is permitted to exist is known as type variance; variance occurs when a type is permitted to vary, including (1) in a sub-class, (2) when an interface is implemented, (3) when a type is parameterized. Let's look at an example in which type variance is extremely safe:
interface Lookup {
Object find(String key);
}
class StringCache implements Lookup {
@Override
String find(String key) {...}
}
The interface defined a find()
method that returned any Object
, but the class (which implemented the interface) explicitly narrowed the return type to String
. The simple rule involved here is that variance is generally considered to be safe when narrowing a return type or widening a parameter type; this aligns well with Postel's law: "Be liberal in what you accept, and conservative in what you send." Another common law states: "Contra-variant parameters; covariant returns", which means: When narrowing a type, such as by extending a class or implementing an interface, method parameters can safely widen and return values can safely narrow.
- Invariance - When a type narrows (e.g. when a class is subclassed) and its method parameters and return types do not change, they are type invariant.
- Covariance - When a type narrows and its method parameter and/or return types narrow as well, they are type covariant (because they are varying in the same direction).
- Contra-variance - When a type narrows and its method parameter and/or return types widen, they are type contra-variant (because they are varying in the opposite direction).
Ecstasy allows contra-variant parameters and covariant returns, and this is fully checked at compile time. Variance checks on generic types cannot be fully performed at compile time; generic types are are types that have type parameters. Ecstasy defines two terms that are used for the type variance rules on generic types:
-
Consumes: A type consumes another type
T
iff it contains a method or property that consumesT
. A method consumesT
iff (i) it has a parameter of typeT
; (ii) it has a parameter of a type that producesT
; or (iii) it has a return of a type that consumesT
. A property consumesT
iff (i) the type of the property isT
and the property is settable; (ii) the type of the property producesT
and the property is settable; or (iii) the type of the property consumesT
. -
Produces: A type produces another type
T
iff it contains a method or property that producesT
. A method producesT
iff (i) it has a return of typeT
; (ii) it has a parameter of a type that consumesT
; or (iii) it has a return of a type that producesT
. A property producesT
iff (i) the type of the property isT
; (ii) the type of the property consumesT
and the property is settable; or (iii) the type of the property producesT
.
Here is an example of a generic type that produces the type parameter Element
, but does not consume Element
:
interface IndexedExtractor<Element> {
Element getElement(Int index);
}
Let's consider four different possible assignments of this type, parameterized by two different values of Element
:
// these two assignments are obviously correct;
// the left hand side type and right hand side type match
IndexedExtractor<Object> objExtractor1 = new IndexedExtractor<Object>() {...}
IndexedExtractor<String> strExtractor1 = new IndexedExtractor<String>() {...}
// if an extractor produces a String, and a String "is a" Object, then it is
// reasonable to assume that a String extractor "is a" Object extractor, and
// Ecstasy allows this to compile, even though it may need to add run-time
// checks as a result
IndexedExtractor<Object> objExtractor2 = strExtractor1;
// an Object extractor is allowed to produce any Object, while a String
// Extractor can only produce String objects, so an IndexedExtractor<Object>
// does not meet the contract of the IndexedExtractor<String> type, and the
// compiler will reject this assignment as a type error
IndexedExtractor<String> strExtractor2 = objExtractor1;
In other words, when a type only produces a type parameter, as illustrated in this IndexedExtractor<Element>
example, the widening of the type parameter Element
results in the widening of the type IndexedExtractor<Element>
, and assignment is allowed because IndexedExtractor<String>
is-a IndexedExtractor<Object>
-- although only weakly so. This is an example of type covariance, since the generic type and its type parameter both widen and narrow together.
Here is an example of a type that consumes the type parameter Element
, but does not produce Element
:
interface Logger<Element> {
public void add(Element value);
}
Let's consider four different possible assignments of this type, parameterized by two different values of Element
:
// these two assignments are obviously correct;
// the left hand side type and right hand side type match
Logger<Object> objLogger1 = new Logger<Object>() {...}
Logger<String> strLogger1 = new Logger<String>() {...}
// if there is a logger than will only log String values, and we need a
// reference to a logger that will log any Object, then it's obvious that
// a Logger<String> cannot meet the contract of the Logger<Object> type,
// and the compiler will reject this assignment as a type error
Logger<Object> objLogger2 = strLogger1;
// if there is a logger that will log any Object, and a String "is a" Object,
// then it is reasonable to assume that a Object logger "is a" String logger,
// and Ecstasy allows this to compile, even though it may need to add run-time
// checks as a result
Logger<String> strLogger2 = objLogger1;
In other words, when a type only consumes a type parameter, as illustrated in this Logger<Element>
example, the narrowing of the type parameter Element
results in the widening of the type Logger<Element>
, and assignment is allowed because Logger<Object>
is-a Logger<String>
-- although only weakly so. This is an example of type contra-variance, since the generic type widens when its type parameter narrows, and vice-versa.
Not coincidentally, the Ecstasy Array
class, which implements the List
interface, has a legal is-a relationship (via duck typing) with the two example interfaces above:
Array<String> strs = ["hello", "world"];
IndexedExtractor<String> strExtractor3 = strs;
Logger<String> strLogger3 = strs;
The Array
type both consumes and produces the type parameter Element
, which affects the "is-a" rules for type variance. Once again, let's consider four different possible assignments of the type, parameterized by two different values of Element
:
// these two assignments are obviously correct;
// the left hand side type and right hand side type match
Array<Object> objArray1 = [1, "test", Null];
Array<String> strArray1 = ["hello", "world"];
// this is the big question: is an "Array of String" an "Array of Object"?
// after all, a String is-a Object, and an Array is-a Array, so Ecstasy
// chooses to allow "Array<String> is-a Array<Object>" to be (weakly) true
Array<Object> objArray2 = strArray1;
// an Object array could contain anything, while a String array can only
// contain String objects, so an Array<Object> does not meet the contract
// of the Array<String> type, and the compiler will reject this assignment
// as a type error
Array<String> strArray2 = objArray1;
Up until this point, all of the "weak" is-a examples of type variance with generic types have been fairly straight-forward from a technical standpoint, but this Array
example is different: It's quite simple to abuse this "weak" is-a result by employing the classic cliché "cats and dogs" example:
Cat[] cats = [new Cat("Tabby"), new Cat("Russian Blue")];
Animal[] animals = cats;
animals += new Dog("Wolf"); // <- throws an exception!
The exception itself is worth looking at:
Exception: Missing operation "+" on Array<test:Cat>
at run() (test.x:9)
The exception tells us that the source code was in the aptly-named file "test.x" and that the exception occurred at line 9, where an attempt was made to add a Dog
object to an Array<Cat>
. It's not a type mismatch exception, though; instead, it's an exception that says "the compiler assumed that there would be an operator on the underlying type that could add a Dog to the array, and the actual type at runtime didn't have any such operator". In other words, the compiler purposefully allowed what it knew to be a potentially incorrect assumption, in order to support covariant generic types.
Many languages, including Java and C#, are purposefully type invariant for generic types; in Java or C#, a List<Cat>
cannot be assigned to a List<Animal>
. Java supports a limited wild-card syntax, combinable with super
and extends
keywords, that in practice is nearly unusable. C# does support limited type variance for interfaces by adding in
and out
keywords to an interface's type parameters that need to be allowed to vary; Scala supports a similar notation using +
/-
. After carefully evaluating the approaches used in a dozen popular languages, for the design of the Ecstasy language we chose to automatically infer type variance based on the produces/consumes rules, instead of either (i) disallowing variance or (ii) introducing a complicated explicit syntax. The Ecstasy design is based on the real-world utility of type variance to developers, the benefits of automatically inferring valid type variance, and the near-zero incidence of real-world errors caused by type variance -- even in the absence of the additional rule-based compile-time type checking that Ecstasy applies to generic types. And regarding the dogs and cats: No animals were harmed in the making of this language.
The real goal in the Ecstasy design was to somehow automatically "do the right thing", without the developer actually having to stop their progress and switch gears to "cut-and-paste from StackOverflow.com" mode, to look up the complex rules of type variance in the language documentation, or to try random code incantations until something actually compiles and looks-like-it-works without ever understanding the underlying technical concerns.
Prev: Understanding classes | Next: Basic building blocks |
---|