A long story about MLF
  1. A long story about MLF, part 1: Introduction, ML
  2. A long story about MLF, part 2: System F and principality
  3. Moderately practical uses of System F over ML
  4. A long story about MLF, part 3: introduction to MLF
  5. A long story about MLF, part 4: gMLF, graphical MLF types

A long story about MLF, part 2: System F and principality

(Gabriel Scherer (gasche) @ 2010-11-07 10:25:10)

System F

System F is a different type system featuring explicit polymorphism: the System F term language contains explicit indications of which terms are polymorphic and when we’re taking monomorphic instances of them.

e ::=                       system F expressions
    | x, y, z               variables
    | λ(x:σ) e              λ-abstraction
    | e₁ e₂                 application
    | let x = e₁ in e₂      local definition
    | Λ(α) e                introduction
    | e[σ]                  elimination

The following example declares the polymorphic identity function and uses it on the constant 5, in ML and System F:

  let id = λ(x) x in id 5                            (* ML *)

  let id : ∀(α)α→α = Λ(α) λ(x:α) x in id[int] 5      (* System F *)

System F terms, being fully explicit about polymorphism, are much heavier to write and read. They can, however, express richer polymorphism than ML:

λ(f : ∀(α) α→α) (f[int] 5, f[bool] true)

This function takes a polymorphic parameter: its type is (∀(α) α→α) → (int*bool). This was not possible in ML, were the types on both sides of an arrow had to be monotypes. System F has higher-ranked types:

σ ::=                    System F types                  
    | α, β, γ            type variables
    | σ → σ              function types
    | ∀(α) σ             type quantification

let-bindings are usually not included in presentations of System F. They’re useful here for homogeneity with the other term languages presented — ML and MLF. It is interesting to note that, similarly to ML, let cannot be macro-expanded as in untyped λ-calculus. It would be possible for an annotated let-binding:

let x:σ = e₁ in e₂     ⟶      (λ(x:σ) e₂) e₁

With the non-annotated let, however, the type σ is not syntaxically apparent. Note that this doesn’t introduce any impliciteness or non-trivial inference in System F type system: System F terms have unique types, and σ can thus be uniquely determined from e₁.

Inference and System F

Is it possible to keep the expressivity of System F, while not being so explicit about types and polymorphism?

If we drop type annotations completely from System F terms, including type introduction and elimiation, we exactly get back the ML terms as previously defined. But there are terms which have a System F type, while they are not typable in ML. Is there an algorithm to infer the System F type of a term? An easier question would be: given the System F type of a term, can we automatically recover type annotations, introductions and eliminations?

The answer to these questions is no. Those two questions — which are confusingly named type reconstruction and type inference, or typability and type checking — are undecidable : this has been proved by J.B. Wells in the following article.

In other words, it is not possible to get the expressive power of System F without using some type annotations. System F terms, using explicit polymorphism, use a lot of type annotations that often feel redundant. Is it possible to do better?

This question is an active research topic. The programming languages based on the ML type system have evolved and are looking for more expressivity. They have included extensions of the ML type system, allowing some restricted forms of higher-order polymorphism, available to the user that correctly inserts type annotations at dedicated places. The idea is to bridge the gap between ML and System F, by finding the right compromise between inference and polymorphism annotations.

Related work ?

A comprehensive overview of the topic should necessarily include a Related Works section, with an overview of the other branch of research in this area — on a personal note, the Related Works section of a paper is the second thing I read, just after the introduction.

However, I am not familiar enough with the research of these areas to feel confident about writing such a review. In case you’re interested, you should read the Related Works section of the Recasting MLF paper by Didier Le Botlan and Didier Rémy, or the more detailed section of Boris Yakbowski's PhD thesis. I was surprised by the important involvement of Martin Odersky, which I only knew as the main conceptor of the Scala language.

Principality in rich type systems

Rich type systems make type inference harder: more program are typable, which means more work to do for the inference algorithm. However, such a monotonic relation does not hold for principality: enriching the type system can make principality easier.

The simply typed λ-calculus — which is essentially our presentation of ML, restricted to monotypes only — has an easy type inference algorithm based on first-order unification, but does not benefit from principality. For example, the identity function (λ(x) x) admits any monotype (τ→τ), none of them being more generalא than all others.

By enriching the type system with the polymorphic types of ML, we gain principality: the new type scheme ∀(α) α→α is able to express the commonality in all the derivable monotypes. Type quantification is exactly what we needed to express that, for a given term, some parts of the inferred types do not matter.

In other words, principality of type inference requires expressiveness at the type level. During the type inference process on a term, one often encounters different possibilities. If you cannot express the alternatives in their full generality in the result type, you cannot have principality.

From System F to MLF

MLF is not a possibility in a continuum from ML to System F. It is an extension of System F, that regains principality by enriching the expressiveness of the type level.

Consider the following choose function:

choose := (λ(x) λ(y) if true then x else y)
choose :  ∀(α) α→α→α

If we pass it the identity function as argument, what is the type of the result?

id := λ(x) x
id :  ∀(α) α→α

In ML, which does not allow polymorphism under arrows, the result has type:

σ₁ :=  ∀(β) (β→β) → (β→β)

But in system as expressive as System F, there is a second possibility:

σ₂ := (∀(β) β→β) → (∀(β) β→β)

Those two types are admissible, and none of them is more general from the other. We cannot have principality with System F type language.

It is quite clear than σ₂ is not an instance of σ₁: by going from an inner quantification (σ₂) to an outer quantification (σ₁), we have lost the information that the polymorphism is local.

What information have we lost in σ₂? It is sound to consider (∀(β) β→β)→(int→int) as an instance of σ₂. It is not sound, however, for (int→int)→(int→int). Indeed, terms with type σ₂ are functions that may use their argument polymorphically, we cannot pass them a (int→int).

The information we have lost is the information that the argument and the result types are equal. When the type was written ∀(α)α→α→α, that equality was made explicit by the use of the same variable α for all components of the type. If we knew, somehow, that the instance and argument types of a term of type (∀(β) β→β) → (∀(β) β→β) are equal, then we could soundly instantiate it to (int→int)→(int→int).

MLF provides a new way to express equality information inside types. The most general type for choose id, expressing both local polymorphism and subtypes equality, is written:

∀(α ≥ ∀(β)β→β) α → α

א: In the explanation of principality, I claim that the STλC does not have principality, as it doesn’t have a principal type for e.g. λx.x. It is a somewhat controversial statement, that actually depends on more details about STλC. In particular, the atomic types may be considered either as a set of predefined, constant types (int, bool…), or as free type variables (X, Y…). In the second case, one could define an instance relation allowing to instantiate free variables, and say that the type X→X is a most general type for the expression λx.x.

I’m not sure there is a widely accepted consensus on that point. See for example the intense debate on LtU. What I meant, anyway, is that the most general type is not internalized in the language : adding an explicit polymorphism construct ∀α.α→α allows one to express this generality inside the language of types. This is the very point I’m trying to make : type inference needs the ability to express any such variability directly inside the language of types.