Deriving Distributive Laws for Graded Linear Types
Abstract
The recent notion of graded modal types provides a framework for extending type theories with fine-grained data-flow reasoning. The Granule language explores this idea in the context of linear types. In this practical setting, we observe that the presence of graded modal types can introduce an additional impediment when programming: when composing programs, it is often necessary to ‘distribute’ data types over graded modalities, and vice versa. In this paper, we show how to automatically derive these distributive laws as combinators for programming. We discuss the implementation and use of this automated deriving procedure in Granule, providing easy access to these distributive combinators. This work is also applicable to Linear Haskell (which retrofits Haskell with linear types via grading) and we apply our technique there to provide the same automatically derived combinators. Along the way, we discuss interesting considerations for pattern matching analysis via graded linear types. Lastly, we show how other useful structural combinators can also be automatically derived.
1 Introduction
Linear type systems capture and enforce the idea that some data travels a linear dataflow path through a program, being consumed exactly once. This is enforced by disallowing the structural rules of weakening and contraction on variables carrying such values. Non-linear dataflow is then typically captured by a modality characterising values/variables on which all structural rules are permitted [13]. This binary characterisation (linear versus non-linear) is made more fine-grained in Bounded Linear Logic (BLL) via a family of modalities where is a polynomial term in capturing the maximum number of times that a value can be used [14]. The proof rules of BLL track upper-bounds on non-linear use via these indices. Various works have generalised the indices of BLL’s modalities to arbitrary semirings [7, 12, 24] (often referred to as coeffect systems). These generalised systems provide a unified way to capture various program properties relating to dataflow via graded necessity modalities where is drawn from a semiring. The functional programming language Granule111https://granule-project.github.io/ provides a vehicle for exploring graded modal types (both graded necessity and graded possibility, which can model side effects [20]) alongside linear and indexed types (GADTs) [23]. There are various similar systems in the recent literature: Abel and Bernardy give an alternate graded modal calculus [2] and Linear Haskell retrofits linearity onto the Haskell type system [4] via a graded type system akin to the coeffect calculus of Petricek et al. [24]. The popularity of graded types is growing and their practical implications are being explored in research languages (e.g. Granule), and popular functional languages (e.g. Haskell [4] and Idris 2 [5]).
When programming with graded modal types, we have observed there is often a need to ‘distribute’ a graded modality over a type, and vice versa, in order to compose programs. That is, we may find ourselves in possession of a value (for some parametric data type ) which needs to be passed to a pre-existing function (of our own codebase or a library) which requires a value, or perhaps vice versa. A distributive law (in the categorical sense, e.g., [28]) provides a conversion from one to the other. In this paper, we present a procedure to automatically synthesise these distributive operators, applying a generic programming methodology [15] to compute these operations given the base type (e.g., in the above description). This serves to ease the use of graded modal types in practice, removing boilerplate code by automatically generating these ‘interfacing functions’ on-demand, for user-defined data types as well as built-in types.
Throughout, we refer to distributive laws of the form as push operations (as they ‘push’ the graded modality inside the type constructor ), and dually as pull operations (as they ‘pull’ the graded modality outside the type constructor ).
The primary contributions of this paper are then:
-
•
an overview of the application of distributive laws in the context of graded modal types;
-
•
an automatic procedure for calculating distributive laws from types and a formal analysis of their properties;
-
•
a realisation of this approach in both Granule (embedded into the compiler) and Linear Haskell (expressed within the language itself, leveraging Haskell’s advanced features);
-
•
and derivations of related combinators for structural use of values in a graded linear setting.
Along the way, we also analyse choices made around the typed-analysis of pattern matching in various recent works on graded modal types. Through the comparison between Granule and Linear Haskell, we also highlight ways in which Linear Haskell could be made more general and flexible in the future.
Section 2 defines a core calculus GrMiniP with linear and graded modal types which provides an idealised, simply-typed subset of Granule with which we develop the core contribution. Section 3 gives the procedures for deriving push and pull operators for the core calculus, and verifies that these are distributive laws of endofunctors over the graded comonadic modality. Section 4 describes the details of how these procedures are realised in the Granule language. Section 5 relates this work to Linear Haskell, and demonstrates how the push and pull combinators for user-defined data types may be automatically generated at compile-time using Template Haskell. Section 6 gives a comparison of the recent literature with regards the typed (graded) analysis of pattern matching, which is germane to the typing and derivation of our distributive laws. Section 7 covers how other structural combinators for Granule and Linear Haskell may be derived. Finally, Section 8 discusses more related and future work.
We start with an extended motivating example typifying the kind of software engineering impedance problem that distributive laws solve. We use Granule since it is the main vehicle for the developments here, and introduce some of the key concepts of graded modal types (in a linear context) along the way.
1.1 Motivating Example
Consider the situation of projecting the first element of a pair. In Granule, this first-projection is defined and typed as the following polymorphic function (whose syntax is reminiscent of Haskell or ML):
Linearity is the default, so this represents a linear function applied to linear values.222Granule uses syntax rather than for linear functions for the sake of familiarity with standard functional languages However, the second component of the pair has a graded modal type, written b [0], which means that we can use the value “inside” the graded modality times by first ‘unboxing’ this capability via the pattern match [y] which allows weakening to be applied in the body to discard y of type b. In calculus of Section 2, we denote ‘b [0]’ as the type (Granule’s graded modalities are written postfix with the ‘grade’ inside the box).
The type for fst is however somewhat restrictive: what if we are trying to use such a function with a value (call it myPair) whose type is not of the form (a, b [0]) but rather (a, b) [r] for some grading term r which permits weakening? Such a situation readily arises when we are composing functional code, say between libraries or between a library and user code. In this situation, fst myPair is ill-typed. Instead, we could define a different first projection function for use with myPair : (a,b) [r] as:
This implementation uses various language features of Granule to make it as general as possible. Firstly, the function is polymorphic in the grade r and in the semiring s of which r is an element. Next, a refinement constraint 0 r specifies that by the preordering associated with the semiring s, that 0 is approximated by r (essentially, that r permits weakening). The rest of the type and implementation looks more familiar for computing a first projection, but now the graded modality is over the entire pair.
From a software engineering perspective, it is cumbersome to create alternate versions of generic combinators every time we are in a slightly different situation with regards the position of a graded modality. Fortunately, this is an example to which a general distributive law can be deployed. In this case, we could define the following distributive law of graded modalities over products, call it pushPair:
This ‘pushes’ the graded modality r into the pair (via pattern matching on the modality and the pair inside it, and then reintroducing the modality on the right hand side via [x] and [y]), distributing the graded modality to each component. Given this combinator, we can now apply fst (pushPair myPair) to yield a value of type a [r], on which we can then apply the Granule standard library function extract, defined:
to get the original a value we desired:
The pushPair function could be provided by the standard library, and thus we have not had to write any specialised combinators ourselves: we have applied supplied combinators to solve the problem.
Now imagine we have introduced some custom data type List on which we have a map function:
Note that, via a graded modality, the type of map specifies that the parameter function, of type a b is non-linear, used between and times. Imagine now we have a value myPairList : (List (a, b)) [r] and we want to map first projection over it. But fst expects (a, b [0]) and even with pushPair we require (a, b) [r]. We need another distributive law, this time of the graded modality over the List data type. Since List was user-defined, we now have to roll our own pushList operation, and so we are back to having to make specialised combinators for our data types.
The crux of this paper is that such distributive laws can be automatically calculated given the definition of a type. With our Granule implementation of this approach (Section 4), we can then solve this combination problem via the following composition of combinators:
where the push operations are written with their base type via @ (a type application) and whose definitions and types are automatically generated during type checking. Thus the push operation is a data-type generic function [15]. This generic function is defined inductively over the structure of types, thus a programmer can introduce a new user-defined algebraic data type and have the implementation of the generic distributive law derived automatically. This reduces both the initial and future effort (e.g., if an ADT definition changes or new ADTs are introduced).
Dual to the above, there are situations where a programmer may wish to pull a graded modality out of a structure. This is possible with a dual distributive law, which could be written by hand as:
Note that the resulting grade is defined by the greatest-lower bound (meet) of n and m, if it exists as defined by a preorder for semiring s (that is, is not a total operation). This allows some flexibility in the use of the pull operation when grades differ in different components but have a greatest-lower bound which can be ‘pulled out’. Our approach also allows such operations to be generically derived.
2 Core Calculus, GrMiniP
We define here a simplified monomorphic subset of Granule, which we call the GrMiniP calculus.333GrMiniP is akin to the GrMini calculus given by Orchard et al. [23] but we include here the notions of data type constructors, eliminations, and pattern matching since these are important to the development here. This calculus extends the linear -calculus with a semiring graded necessity modality [23] where for a preordered semiring (that is, a semiring with a pre-order where and are monotonic with respect to ) there is a family of types . Granule allows multiple graded modalities (and thus multiple grading semirings) to be used simultaneously within a program, but we focus here on just one graded modality, parameterising the language. We also include notions of data constructor and their elimination via case expressions as a way to unify the handling of regular type constructors.
The syntax of GrMiniP terms and types is given by:
(terms) | ||||
(patterns) | ||||
(types) | ||||
(data constructors) |
Terms of GrMiniP consist of those of the linear -calculus, plus a promotion construct for introducing a graded modality, data constructor introduction and elimination (case), as well as recursive let bindings (). We will give more insights into the syntax via their typing rules. Types comprise linear function types, type variables (), multiplicative products, additive sums, unit types, graded modal types, recursive types, and recursion variables () (which must be guarded by a binding). Type variables are used in our derivation procedure but are treated here as constants by the core calculus, without any rules relating to their binding or unification.
2.1 Typing
Typing is via judgments of the form , assigning a type to term under the context . Typing contexts contain linear or graded assumptions given by the grammar:
where is a linear assumption and where is an assumption graded by drawn from the preordered semiring (called a graded assumption). This delineation of linear and graded assumptions avoids issues with substitution under a promotion, following the technique of Terui’s discharged assumptions [29]. Linear assumptions (including those of a graded modal type ) must therefore be used exactly once, whilst graded assumptions may be used non-linearly subject to the constraint of .
Figure 1 gives the typing rules. We briefly explain each rule in turn.
The var, abs and app rules follow the standard rules for the linear -calculus, modulo the context addition operation which is used in the rules any time the contexts of two subterms need to be combined. Context addition acts as union on disjoint linear and graded assumptions but is defined via semiring addition when a graded assumption appears in both contexts:
Context addition is undefined in the case of contexts whose linear assumptions are not-disjoint, enforcing the lack of contraction on linear assumptions.
The next rules relate to grading. Structural weakening is provided by (weak) for assumptions graded by 0 and ‘dereliction’ (der) converts a linear assumption to a graded assumption, graded by 1. Graded modalities are introduced by ‘promotion’ (pr), scaling graded assumptions in via semiring multiplication defined on contexts containing only graded assumptions as:
In the (pr) rule, denotes contexts with only graded assumptions. The approx rule captures grade approximation, allowing a grade in an assumption to be converted to another grade , on the condition that is approximated by , via the relation as provided by the semiring’s preorder.
The letrec rule provides recursive bindings in the standard way.
Data constructors with zero or more arguments are introduced via the con rule. Here, the constructors that concern us are units, products, and coproducts (sums), given by , a global set of data constructors with their types, defined:
Constructors are eliminated by pattern matching via the case rule. Patterns are typed by judgments of the form meaning that a pattern has type and produces a context of typed binders (used, e.g., in the typing of the case branches). The information to the left of the turnstile denotes optional grade information arising from being in an unboxing pattern and is syntactically defined as either:
(enclosing grade) |
where means the present pattern is not nested inside an unboxing pattern and that the present pattern is nested inside an unboxing pattern for a graded modality with grade .
The rules of pattern typing are given in Figure 2. The rule (PBox) provides graded modal elimination (an ‘unboxing’ pattern), propagating grade information into the typing of the sub-pattern. Thus can be used to eliminate a graded modal value. Variable patterns are typed via two rules depending on whether the variable occurs inside an unbox pattern ([Pvar]) or not (Pvar), with the [Pvar] rule producing a binding with the grade of the enclosing box’s grade . As with variable patterns, constructor patterns are split between rules for patterns which either occur inside an unboxing pattern or not. In the former case, the grade information is propagated to the subpattern(s), with the additional constraint that if there is more than one data constructor for the type (written ), then the grade must approximate 1 (written ) as pattern matching incurs a usage to inspect the constructor (discussed further in Section 6). The operation counts the number of data constructors for a type:
and is undefined (or effectively 0) since we do not allow unguarded recursion variables in types. A type must therefore involve a sum type for .
Since a wildcard pattern discards a value, this is only allowed inside an unboxing pattern where the enclosing grade permits weakening, captured via in rule [Pwild].
We now provide two examples of graded modalities. In addition to those shown here, Granule provides various other graded modalities, including grading by semirings of security-level lattices and “sensitivities” akin to DFuzz [10].
Example 2.1.
The natural numbers semiring provides exact usage where the preorder is equality . Graded modalities in this semiring count the number of times a value is used. As a simple example, we can define a function which copies its input value to produce a pair of values:
The capability to use a value of type twice is captured by the graded modality. In the body of the abstraction, variable must be used linearly; is used linearly in the case which eliminates the graded modality to yield a graded assumption which can be used twice to form the resulting pair.
Example 2.2.
BLL-style grading with upper bounds on usages can be obtained by replacing the equality relation in the -semiring with a less-than-equal ordering, giving a notion of approximation. This is useful for programs that use variables differently in control-flow branches, e.g., when eliminating a sum type. This notion is further refined by the semiring of natural number intervals, which captures lower and upper bounds on usages. Intervals are given as pairs written , where and represent the lower and upper bounds respectively, with , and . Ordering is given by . Using this, we can define the following function to eliminate a coproduct via a pair of functions each of which is used either 0 or 1 times:
2.2 Equational Theory
Figure 3 defines an equational theory for GrMiniP. The equational theory is typed, but for brevity we elide the typing for most cases since it follows exactly from the structure of the terms. The fully typed rules are provided in the appendix [18]. For those rules which are type restricted, we include the full typed-equality judgment here. The typability of these equations relies on previous work on the Granule language which proves that pattern matching and substitution are well typed [23].
The and rules follow the standard rules from the -calculus, where is a freshness predicate, denoting that variable does not appear inside term .
For recursive letrec bindings, the rule substitutes any occurrence of the bound variable in with , ensuring that recursive uses of inside can be substituted with through subsequent reduction. The LetRecDistrib rule allows distributivity of functions over letrec expressions, stating that if a function can be applied to the entire letrec expression, then this is equivalent to applying to just the body term .
Term elimination is via case, requiring rules for both - and -equality on case expressions, as well as rules for associativity and distributivity. In , a term is matched against a pattern in the context of the term through the use of the partial function which may substitute terms bound in into to yield if the match is successful. This partial function is defined inductively:
As a predicate to the rule, we require that be minimal, i.e. the first pattern in for which is defined. Rule states that if all branches of the case expression share a common term which differs between branches only in the occurrences of terms that match the pattern used, then we can substitute for the pattern inside .
Associativity of case expressions is provided by the CaseAssoc rule. This rule allows us to restructure nested case expressions such that the output terms of the inner case may be matched against the patterns of the outer case, to achieve the same resulting output terms . The [CaseAssoc] rule provides a graded alternative to the CaseAssoc rule, where the nested case expression is graded, provided that the patterns of the outer case expression are also graded. Notably, this rule only holds when the patterns of the inner case expression are linear (i.e., variable or constant) so that there are no nested box patterns, represented via the predicate. As with letrec, distributivity of functions over a case expression is given by CaseDistrib.
Lastly generalisation of an arbitrary boxed pattern to a variable is permitted through the CaseGen rule. Here, a boxed pattern and the output term of the case may be converted to a variable if the output term is equivalent to the pattern inside the box. The term being matched against must therefore have a grade approximatable by 1, as witnessed by the predicate in the typing derivation.
3 Automatically Deriving push and pull
Now that we have established the core theory, we describe the algorithmic calculation of distributive laws in GrMiniP. Note that whilst GrMiniP is simply typed (monomorphic), it includes type variables (ranged over by ) to enable the distributive laws to be derived on parametric types. In the implementation, these will really be polymorphic type variables, but the derivation procedure need only treat them as some additional syntactic type construct.
Notation
Let be an -ary type constructor (i.e. a constructor which takes type arguments), whose free type variables provide the parameter types. We write for the application of to type variables for all .
Push
We automatically calculate push for applied to type variables as the operation:
where we require due to the rule (e.g., if contains a sum).
For types closed with respect to recursion variables, let given by an intermediate interpretation where is a context of push combinators for the recursive type variables:
In the case of push on a value of type , we pattern match on the value, eliminating the graded modality via the unboxing pattern match and returning the unit value. For type variables, push is simply the identity of the value, while for recursion variables we lookup the ’s binding in and apply it to the value. For sum and product types, push works by pattern matching on the type’s constructor(s) and then inductively applying push to the boxed arguments, re-applying them to the constructor(s). Unlike pull below, the push operation can be derived for function types, with a contravariant use of pull. For recursive types, we inductively apply push to the value with a fresh recursion variable bound in , representing a recursive application of push.
There is no derivation of a distributive law for types which are themselves graded modalities (see further work discussion in Section 8).
The appendix [18] proves that is type sound, i.e., its derivations are well-typed.
Pull
We automatically calculate pull for applied to type variables as the operation:
Type constructor here is applied to arguments each of the form , i.e., each with a different grading of which the greatest-lower bound444The greatest-lower bound is partial operation which can be defined in terms of the semiring’s pre-order: if , and there exists no other where and and . is the resulting grade (see pullPair from Section 1.1).
For types closed with respect to recursion variables, let where:
Just like push, we cannot apply pull to graded modalities themselves. Unlike push, we cannot apply pull to function types. That is, we cannot derive a distributive law of the form since introducing the concluding would require the incoming function to itself be inside due to the promotion rule (pr), which does not match the type scheme for pull.
The rest of the derivation above is similar but dual to that of push.
The appendix [18] proves that is type sound, i.e., its derivations are well-typed.
Example 3.1.
To illustrate the above procedures, the derivation of is:
Remark 1.
One might ponder whether linear logic’s exponential [13] is modelled by the graded necessity modality over intervals, i.e., with . This is a reasonable assumption, but has a slightly different meaning to , exposed here: whilst is derivable in GrMiniP, linear logic does not permit . Models of provide only a monoidal functor structure which gives pull for , but not push [3]. This structure can be recovered in Granule through the introduction of a partial type-level operation which selectively disallows push for in semirings which model the modality of linear logic555The work in [16] arose as a result of the writing of this paper. [16].
The algorithmic definitions of ‘push’ and ‘pull’ can be leveraged in a programming context to automatically yield these combinators for practical purposes. We discuss how this is leveraged inside the Granule compiler in Section 4 and two techniques for leveraging it for Linear Haskell in Section 5. Before that, we study the algebraic behaviour of the derived distributive laws.
3.1 Properties
We consider here the properties of these derived operations. Prima facie, the above push and pull operations are simply distributive laws between two (parametric) type constructors and , the latter being the graded modality. However, both and have additional structure. If the mathematical terminology of ‘distributive laws’ is warranted, then such additional structure should be preserved by push and pull (e.g., as in how a distributive law between a monad and a comonad must preserve the behaviour of the monad and comonad operations after applying the distributive law [26]); we explain here the relevant additional structure and verify the distributive law properties.
Firstly, we note that these distributive laws are mutually inverse:
Proposition 3.1 (Pull is right inverse to push).
For all -arity types which do not contain function types, then for type variables and for all grades where if , then:
Proposition 3.2 (Pull is left inverse to push).
For all -arity types which do not contain function types, then for type variables and for all grades where if , then:
The appendix [18] gives the proofs, leveraging the equational theory of GrMiniP.
Applying a mathematical perspective, is also an endofunctor with its object mapping provided by the type constructor itself and its morphism mapping behaviour defined as follows:
Definition 3.1 ( functor).
Given a function (a closed function term) then is the morphism mapping of the endofunctor defined:
For types we can also automatically derive the morphism mapping of a covariant functor, which we write as whose definition is standard (e.g., applied in Haskell [22]) given in the appendix [18]. Distributive laws between endofunctors should be natural transformations, which is indeed the case for our derivations:
Proposition 3.3 (Naturality of push).
For all unary type constructors such that is defined, and given a closed function term , then: , i.e.:
Proposition 3.4 (Naturality of pull).
For all unary type constructors such that is defined, and given a closed function term , then: , i.e.:
The appendix [18] gives the proofs. Note that the naturality results here are for cases of unary type constructors that are covariant functors, written with a single parameter . This can easily generalise to -ary type constructors.
Definition 3.2 (Graded comonadic operations).
The GrMiniP calculus (and Granule) permits the derivation of graded comonadic operations [23] for the semiring graded necessity , defined:
The derived distributive laws preserve these graded comonadic operations i.e., the distributive laws are well-behaved with respect to the graded comonadic structure of , captured by the following properties:
Proposition 3.5 (Push preserves graded comonads).
For all such that is defined and does not contain (to avoid issues of contravariance in ) then:
Proposition 3.6 (Pull preserves graded comonads).
For all such that is defined then:
The appendix [18] gives the proofs.
4 Implementation in Granule
The Granule type checker implements the algorithmic derivation of push and pull distributive laws as covered in the previous section. Whilst the syntax of GrMiniP types had unit, sum, and product types as primitives, in Granule these are provided by a more general notion of type constructor which can be extended by user-defined, generalized algebraic data types (GADTs). The procedure outlined in Section 3 is therefore generalised slightly so that it can be applied to any data type: the case for is generalised to types with an arbitrary number of data constructors.
Our deriving mechanism is exposed to programmers via explicit (visible) type application (akin to that provided in GHC Haskell [9]) on reserved names push and pull. Written push @T or pull @T, this signals to the compiler that we wish to derive the corresponding distributive laws at the type T. For example, for the List : Type Type data type from the standard library, we can write the expression push @List which the type checker recognises as a function of type:
Note this function is not only polymorphic in the grade, but polymorphic in the semiring itself. Granule identifies different graded modalities by their semirings, and thus this operation is polymorphic in the graded modality. When the type checker encounters such a type application, it triggers the derivation procedure of Section 3, which also calculates the type. The result is then stored in the state of the frontend to be passed to the interpreter (or compiler) after type checking. The derived operations are memoized so that they need not be re-calculated if a particular distributive law is required more than once. Otherwise, the implementation largely follows Section 3 without surprises, apart from some additional machinery for specialising the types of data constructors coming from (generalized) algebraic data types.
Examples
Section 1 motivated the crux of this paper with a concrete example, which we can replay here in concrete Granule, using its type application technique for triggering the automatic derivation of the distributive laws. Previously, we defined pushPair by hand which can now be replaced with:
Note that in Granule (,) is an infix type constructor for products as well as terms. We could then replace the previous definition of fst’ from Section 1 with:
The point however in the example is that we need not even define this intermediate combinator, but can instead write the following wherever we need to compute the first projection of myPair : (a, b) [r]:
We already saw that we can then generalise this by applying
this first projection inside of the list
myPairList : (List (a, b))[r] directly, using push @List.
In a slightly more elaborate example, we can use the pull combinator for pairs to implement a function that duplicates a pair (given that both elements can be consumed twice):
Note pull here computes the greated-lower bound of intervals 0..2 and 2..4 which is 2..2, i.e., we can provide a pair of a and b values which can each be used exactly twice, which is what is required for copy.
As another example, interacting with Granule’s indexed types (GADTs), consider a simple programming task of taking the head of a sized-list (vector) and duplicating it into a pair. The head operation is typed: head : {a : Type, n : Nat} . (Vec (n + 1) a)[0..1] a which has a graded modal input with grade 0..1 meaning the input vector is used 0 or 1 times: the head element is used once (linearly) for the return but the tail is discarded.
This head element can then be copied if it has this capability via a graded modality, e.g., a value of type (Vec (n + 1) (a [2]))[0..1] permits:
Here we “unbox” the graded modal value of type a [2] to get a non-linear variable y which we can use precisely twice. However, what if we are in a programming context where we have a value Vec (n + 1) a with no graded modality on the type a? We can employ two idioms here: (i) take a value of type (Vec (n + 1) a) [0..2] and split its modality in two: (Vec (n + 1) a) [2] [0..1] (ii) then use push on the inner graded modality [2] to get (Vec (n + 1) (a [2])) [0..1].
Using push @Vec we can thus write the following to duplicate the head element of a vector:
which employs combinators from the standard library and the derived distributive law, of type:
5 Application to Linear Haskell
While Granule has been pushing the state-of-the-art in graded modal types, similar features have been added to more mainstream languages. Haskell has recently added support for linear types via an underlying graded system which enables linear types as a smooth extension to GHC’s current type system [4].666Released as part of GHC v9.0.1 in February 2021 https://www.haskell.org/ghc/download_ghc_9_0_1.html Functions may be linear with respect to their input (meaning, the function will consume its argument exactly once if the result is consumed exactly once), but they can also consume their argument r times for some multiplicity r via explicit ‘multiplicity polymorphism’. Unlike Granule, Linear Haskell limits this multiplicity to the set of either one or many (the paper speculates about extending this with zero)—full natural numbers or other semirings are not supported.
In Linear Haskell, the function type (a %r b) can be read as “a function from a to b which uses a according to r” where r is either (also written as ’One) or (written as ’Many). For example, the following defines and types the linear and non-linear functions swap and copy in Linear Haskell:
Assigning the type a %1 (a, a) to copy would result in a type error due to a mismatch in multiplicity.
The approach of Linear Haskell (as formally defined by Bernardy et al. [4]) resembles earlier coeffect systems [12, 24] and more recent work on graded systems which have appeared since [2, 8]. In these approaches there is no underlying linear type system (as there is in Granule) but grading is instead pervasive with function arrows carrying a grade describing the use of their parameter. Nevertheless, recent systems also provide a graded modality as this enables more fine-grained usage information to be ascribed to compound data. For example, without graded modalities it cannot be explained that the first projection on a pair uses the first component once and its second component not at all (instead a single grade would have to be assigned to the entire pair).
We can define a graded modality in Linear Haskell via the following Box data type that is parameterized over the multiplicity r and the value type a, defined as:
A Box type is necessary to make explicit the notion that a value may be consumed a certain number of times, where ordinarily Linear Haskell is concerned only with whether individual functions consume their arguments linearly or not. Thus, distributive laws of the form discussed in this paper then become useful in practice when working with Linear Haskell.
The pushPair and pullPair functions from Section 1 can then be implemented in Linear Haskell with the help of this Box type:
Interestingly, pushPair could also be implemented as a function of type (a, b) %r (Box r a, Box r b), and in general we can formulate push combinators typed (f a) %r f (Box r a), i.e., consuming the input r times, returning a box with multiplicity r, but we stick with the above formulation for consistency.
While more sophisticated methods are outlined in this paper for automatically synthesizing these functions in the context of Granule, in the context of Linear Haskell (which has a simpler notion of grading) distributive laws over unary and binary type constructors can be captured with type classes:
Separate classes here are defined for unary and binary cases, as working generically over both is tricky in Haskell. Implementing an instance of Pushable2 for pairs is then:
This implementation follows the procedure of Section 3. A pair type is handled by pattern matching on (Box (x, y)) and boxing both fields in the pair (Box x, Box y).
A Haskell programmer may define instances of these type classes for their own data types, but as discussed in Section 1, this is tedious from a software engineering perspective. Template Haskell is a meta-programming system for Haskell that allows compile-time generation of code, and one of the use cases for Template Haskell is generating boilerplate code [27]. The instances of Pushable and Pullable for algebraic data types are relatively straightforward, so we implemented procedures that will automatically generate these type class instances for arbitrary user-defined types (though types with so-called “phantom” parameters are currently not supported).
For example, if a programmer wanted to define a Pushable instance for the data type of a linear List a, they would write:
Here, derivePushable is a Template Haskell procedure777Available online: https://github.com/granule-project/deriving-distributed-linear-haskell that takes a name of a user-defined data type and emits a top-level declaration, following the strategy outlined in Section 3. For the List a data type above, we can walk through the type as derivePushable would. Because List a has two constructors, a case statement is necessary (our code generator will use a ‘case lambda’). This case will have branches for each of the Cons and Nil constructors—in the body of the former it must box the first field (of type a) and recursively apply push to the second field (of type List a) after boxing it, and in the body of the latter it must simply return Nil. The full code generated by derivePushable ’’List is given below.
Later, in Section 7, we will discuss other combinators and type classes that are useful in Linear Haskell.
The applicability of our proposed deriving method to Haskell shows that it is useful beyond the context of the Granule project. Despite Haskell not having a formal semantics, we believe the same equational reasoning can be applied to show that the properties in Section 3.1 also hold for these distributive laws in Linear Haskell. As more programming languages adopt type system features similar to Granule and Linear Haskell, we expect that deriving/synthesizing or otherwise generically implementing combinators derived from distributive laws will be increasingly useful.
6 Typed-analysis of Consumption in Pattern Matching
This paper’s study of distributive laws provides an opportunity to consider design decisions for the typed analysis of pattern matching since the operations of Section 3 are derived by pattern matching in concert with grading. We compare here the choices made surrounding the typing of pattern matching in four works (1) Granule and its core calculus [23] (2) the graded modal calculus of Abel and Bernardy [2] (3) the dependent graded system GraD of Choudhury et al. [8] and (4) Linear Haskell [4].
Granule
Pattern matching against a graded modality (with pattern ) in Granule is provided by the Pbox rule (Figure 2) which triggers typing pattern ‘under’ a grade at type . This was denoted via the optional grade information which then pushes grading down onto the variables bound within . Furthermore, it is only under such a pattern that wildcard patterns are allowed ([Pwild]), requiring , i.e., can approximate (where denotes weakening). None of the other systems considered here have such a facility for weakening via pattern matching.
For a type with more than one constructor (), pattern matching its constructors underneath an -graded box requires . For example, eliminating sums inside an -graded box requires as distinguishing or constitutes a consumption which reveals information (i.e., pattern matching on the ‘tag’ of the data constructors). By contrast, a type with only one constructor cannot convey any information by its constructor and so matching on it is not counted as a consumption: eliminating places no requirements on . The idea that unary data types do not incur consumption (since no information is conveyed by its constructor) is a refinement here to the original Granule paper as described by Orchard et al. [23], which for [Pcon] had only the premise rather than here (although the implementation already reflected this idea).
The calculus
Abel and Bernardy’s unified modal system is akin to Granule, but with pervasive grading (rather than base linearity) akin to the coeffect calculus [24] and Linear Haskell [4] (discussed in Section 5). Similarly to the situation in Granule, also places a grading requirement when pattern matching on a sum type, given by the following typing rule in their syntax [2, Fig 1, p.4]:
The key aspects here are that variables bound in the case are used with grade as denoted by the graded assumption in the context of typing and then that which is exactly our constraint that (their ordering just runs in the opposite direction to ours). For the elimination of pair and unit types in there is no such constraint, matching our idea that arity affects usage, captured in Granule by . Their typed-analysis of patterns is motivated by their parametricity theorems.
GraD
The dependent graded type system GraD of Choudhury et al. also considers the question of how to give the typing of pattern matching on sum types, with a rule in their system [8, p.8] which closely resembles the -elim rule for :
The direction of the preordering in GraD is the same as that in Granule but, modulo this ordering and some slight restructuring, the case rule captures the same idea as : “both branches of the base analysis must use the scrutinee at least once, as indicated by the constraint.” [8, p.8]. Choudhury et al., also provide a heap-based semantics which serves to connect the meaning of grades with a concrete operational model of usage, which then motivates the grading on sum elimination here. In the simply-typed version of GraD, matching on the components of a product requires that each component is consumed linearly.
Linear Haskell
The paper on Linear Haskell by Bernardy et al. [4] has a case expression for eliminating arbitrary data constructors, with grading similar to the rules seen above. Initially, this rule is for the setting of a semiring over (described in Section 5) and has no requirements on the grading to represent the notion of inspection, consumption, or usage due to matching on (multiple) constructors. This is reflected in the current implementation where we can define the following sum elimination:
However, later when considering the generalisation to other semirings they state that “typing rules are mostly unchanged with the caveat that must exclude ” [4, §7.2, p.25] where is the grade of the case guard. This appears a more coarse-grained restriction than the other three systems, excluding even the possibility of Granule’s weakening wildcard pattern which requires . Currently, such a pattern must be marked as ’Many in Linear Haskell (i.e., it cannot explain that first projection on a pair does not use the pair’s second component). Furthermore, the condition does not require that actually represents a consumption, unlike the approaches of the other three systems. The argument put forward by Abel and Bernardy for their restriction to mark a consumption () for the sake of parametricity is a compelling one, and the concrete model of Choudhury et al. gives further confidence that this restriction captures well an operational model. Thus, it seems there is a chance for fertilisation between the works mentioned here and Linear Haskell’s vital work, towards a future where grading is a key tool in the typed-functional programmer’s toolbox.
7 Deriving Other Useful Structural Combinators
So far we have motivated the use of distributive laws, and demonstrated that they are useful in practice when programming in languages with linear and graded modal types. The same methodology we have been discussing can also be used to derive other useful generic combinators for programming with linear and graded modal types. In this section, we consider two structural combinators, drop and copyShape, in Granule as well as related type classes for dropping, copying, and moving resources in Linear Haskell.
7.1 A Combinator for Weakening (“drop”)
The built-in type constants of Granule can be split into those which permit structural weakening such as Int, Char, String, and those which do not such as Handle (file handles) and Chan (concurrent channels). Those that permit weakening contain non-abstract values that can in theory be systematically inspected in order to consume them. Granule provides a built-in implementation of drop for types, which is then used by the following derivation procedure to derive weakening on compound types:
for closed types defined by an intermediate derivation :
Note we cannot use this procedure in a polymorphic context (over type variables ) since type polymorphism ranges over all types, including those which cannot be dropped like .
7.2 A Combinator for Copying “shape”
The “shape” of values for a parametric data types can be determined by a function , usually derived when is a functor by mapping with (dropping elements) [19]. This provides a way of capturing the size, shape, and form of a data structure. Often when programming with data structures which must be used linearly, we may wish to reason about properties of the data structure (such as the length or “shape” of the structure) but we may not be able to drop the contained values. Instead, we wish to extract the shape but without consuming the original data structure itself.
This can be accomplished with a function which copies the data structure exactly, returning this duplicate along with a data structure of the same shape, but with the terminal nodes replaced with values of the unit type (the ‘spine’). For example, consider a pair of integers: (1, 2). Then applying copyShape to this pair would yield (((), ()), (1, 2)). The original input pair is duplicated and returned on the right of the pair, while the left value contains a pair with the same structure as the input, but with values replaced with (). This is useful, as it allows us to use the left value of the resulting pair to reason about the structure of the input (e.g., its depth / size), while preserving the original input. This is particularly useful for deriving size and length combinators for collection-like data structures.
As with “drop”, we can derive such a function automatically:
defined by by an intermediate derivation :
The implementation recursively follows the structure of the type, replicating the constructors, reaching the crucial case where a polymorphically type is mapped to in the third equation.
Granule implements both these derived combinators in a similar way to push/pull providing copyShape and drop which can be derived for a type T via type application, e.g. drop @T : T () if it can be derived. Otherwise, the type checker produces an error, explaining why drop is not derivable at type T.
7.3 Other Combinators in Linear Haskell
As previously covered in Section 5, we demonstrated that the push and pull combinators derived from distributed laws are useful in Haskell with its linear types extension, and we demonstrated that they can be automatically derived using compile-time meta-programming with Template Haskell.
To the best of our knowledge, nothing comparable to the Pushable and Pullable type classes proposed here has been previously discussed in the literature on Linear Haskell. However, several other type classes have been proposed for inclusion in the standard library to deal with common use cases when programming with linear function types.888See the linear-base library: https://github.com/tweag/linear-base. One of these classes, Consumable, roughly corresponds to the drop combinator above, while the other two, Dupable and Movable, are for when a programmer wants to allow a data type to be duplicated or moved in linear code.
The consume function is a linear function from a value to unit, whereas dup is a linear function from a value to a pair of that same value. The move linear function maps a to Ur a, where Ur is the “unrestricted” modality. Thus, move can be used to implement both consume and dup2:
A ‘copy shape’ class may also be a useful addition to Linear Haskell in the future.
8 Discussion and Conclusion
Section 6 considered, in some detail, systems related to Granule and different typed analyses of pattern matching. Furthermore, in applying our approach to both Granule and Linear Haskell we have already provided some detailed comparison between the two. This section considers some wider related work alongside ideas for future work, then concludes the paper.
Generic Programming Methodology
The deriving mechanism for Granule is based on the methodology of generic functional programming [15], where functions may be defined generically for all possible data types in the language; generic functions are defined inductively on the structure of the types. This technique has notably been used before in Haskell, where there has been a strong interest in deriving type class instances automatically. Particularly relevant to this paper is the work on generic deriving [22], which allows Haskell programmers to automatically derive arbitrary class instances using standard datatype-generic programming techniques as described above. In this paper we opted to rely on compile-time metaprogramming using Template Haskell [27] instead, but it is possible that some of the combinators we describe could be implemented using generic deriving as well, which is future work.
Non-graded Distributive Laws
Distributive laws are standard components in abstract mathematics. Distributive laws between categorical structures used for modelling modalities (like monads and comonads) are well explored. For example, Brookes and Geva defined a categorical semantics using monads combined with comonads via a distributive law capturing both intensional and effectful aspects of a program [6]. Power and Watanabe study in detail different ways of combining comonads and monads via distributive laws [26]. Such distributive laws have been applied in the programming languages literature, e.g., for modelling streams of partial elements [30].
Graded Distributive Laws
Gaboardi et al. define families of graded distributive laws for graded monads and comonads [11]. They include the ability to interact the grades, e.g., with operations such as between a graded comonad and graded monad where and capture information about the distributive law in the grades. In comparison, our distributive laws here are more prosaic since they involve only a graded comonad (semiring graded necessity) distributed over a functor and vice versa. That said, the scheme of Gaboardi et al. suggests that there might be interesting graded distributive laws between and the indexed types, for example, which internally replicates a vector. However, it is less clear how useful such combinators would be in general or how systematic their construction would be. In contrast, the distributive laws explained here appear frequently and have a straightforward uniform calculation.
We noted in Section 3 that neither of our distributive laws can be derived over graded modalities themselves, i.e., we cannot derive . Such an operation would itself be a distributive law between two graded modalities, which may have further semantic and analysis consequences beyond the normal derivations here for regular types. Exploring this is future work, for which the previous work on graded distributive laws can provide a useful scheme for considering the possibilities here. Furthermore, Granule has both graded comonads and graded monads so there is scope for exploring possible graded distributive laws between these in the future following Gaboardi et al. [11].
In work that is somewhat adjacent to this paper, we define program synthesis procedures for graded linear types [17]. This program synthesis approach can synthesis pull distributive laws that are equivalent to the algorithmically derived operations of this paper. Interestingly the program synthesis approach cannot derive the push laws though as its core theory has a simplified notion of pattern typing that doesn’t capture the full power of Granule’s pattern matches as described in this paper.
Conclusions
The slogan of graded types is to imbue types with information reflecting and capturing the underlying program semantics and structure. This provides a mechanism for fine-grained intensional reasoning about programs at the type level, advancing the power of type-based verification. Graded types are a burgeoning technique that is gaining traction in mainstream functional programming and is being explored from multiple different angles in the literature. The work described here addresses the practical aspects of applying these techniques in real-world programming. Our hope is that this aids the development of the next generation of programming languages with rich type systems for high-assurance programming.
Acknowledgments
Thanks to Harley Eades III and Daniel Marshall for discussion and comments on a draft and to the reviewers and participants at TLLA-Linearity 2020. This work was supported by the EPSRC, grant EP/T013516/1 Verifying Resource-like Data Use in Programs via Types. The first author is also supported by an EPSRC DTA.
References
- [1]
- [2] Andreas Abel & Jean-Philippe Bernardy (2020): A Unified View of Modalities in Type Systems. Proc. ACM Program. Lang. 4(ICFP), 10.1145/3408972.
- [3] Nick Benton, Gavin Bierman, Valeria De Paiva & Martin Hyland (1992): Linear lambda-calculus and categorical models revisited. In: International Workshop on Computer Science Logic, Springer, pp. 61–84, 10.1007/3-540-56992-8_6.
- [4] Jean-Philippe Bernardy, Mathieu Boespflug, Ryan R. Newton, Simon Peyton Jones & Arnaud Spiwack (2017): Linear Haskell: Practical Linearity in a Higher-Order Polymorphic Language. Proc. ACM Program. Lang. 2(POPL), 10.1145/3158093.
- [5] Edwin Brady (2021): Idris 2: Quantitative Type Theory in Practice. In Anders Møller & Manu Sridharan, editors: 35th European Conference on Object-Oriented Programming (ECOOP 2021), Leibniz International Proceedings in Informatics (LIPIcs) 194, Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany, pp. 9:1–9:26, 10.4230/LIPIcs.ECOOP.2021.9. Available at https://drops.dagstuhl.de/opus/volltexte/2021/14052.
- [6] Stephen Brookes & Kathryn V Stone (1993): Monads and Comonads in Intensional Semantics. Technical Report, Pittsburgh, PA, USA. Available at https://dl.acm.org/doi/10.5555/865105.
- [7] Aloïs Brunel, Marco Gaboardi, Damiano Mazza & Steve Zdancewic (2014): A core quantitative coeffect calculus. In: European Symposium on Programming Languages and Systems, Springer, pp. 351–370, 10.1007/978-3-642-54833-8_19.
- [8] Pritam Choudhury, Harley Eades III, Richard A. Eisenberg & Stephanie Weirich (2021): A Graded Dependent Type System with a Usage-Aware Semantics. Proc. ACM Program. Lang. 5(POPL), 10.1145/3434331.
- [9] Richard A Eisenberg, Stephanie Weirich & Hamidhasan G Ahmed (2016): Visible type application. In: European Symposium on Programming, Springer, pp. 229–254, 10.1007/978-3-662-49498-1_10.
- [10] Marco Gaboardi, Andreas Haeberlen, Justin Hsu, Arjun Narayan & Benjamin C Pierce (2013): Linear dependent types for differential privacy. In: Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp. 357–370, 10.1145/2429069.2429113.
- [11] Marco Gaboardi, Shin-ya Katsumata, Dominic Orchard, Flavien Breuvart & Tarmo Uustalu (2016): Combining Effects and Coeffects via Grading. In: Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming, ICFP 2016, Association for Computing Machinery, New York, NY, USA, p. 476–489, 10.1145/2951913.2951939.
- [12] Dan R. Ghica & Alex I. Smith (2014): Bounded linear types in a resource semiring. In: Programming Languages and Systems, Springer, pp. 331–350, 10.1007/978-3-642-54833-8_18.
- [13] Jean-Yves Girard (1987): Linear logic. Theoretical computer science 50(1), pp. 1–101, 10.1016/0304-3975(87)90045-4.
- [14] Jean-Yves Girard, Andre Scedrov & Philip J Scott (1992): Bounded linear logic: a modular approach to polynomial-time computability. Theoretical computer science 97(1), pp. 1–66, 10.1016/0304-3975(92)90386-T.
- [15] Ralf Hinze (2000): A new approach to generic functional programming. In: Proceedings of the 27th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp. 119–132, 10.1145/325694.325709.
- [16] Jack Hughes, Daniel Marshall, James Wood & Dominic Orchard (2021): Linear Exponentials as Graded Modal Types. In: 5th International Workshop on Trends in Linear Logic and Applications (TLLA 2021), Rome (virtual), Italy. Available at https://hal-lirmm.ccsd.cnrs.fr/lirmm-03271465.
- [17] Jack Hughes & Dominic Orchard (2020): Resourceful Program Synthesis from Graded Linear Types. In: Logic-Based Program Synthesis and Transformation - 30th International Symposium, LOPSTR 2020, Bologna, Italy, September 7-9, 2020, Proceedings, pp. 151–170, 10.1007/978-3-030-68446-4_8.
- [18] Jack Hughes, Michael Vollmer & Dominic Orchard (2021): Deriving Distributive Laws for Graded Linear Types (Additional Material), 10.5281/zenodo.5575771. Available at https://doi.org/10.5281/zenodo.5575771.
- [19] C Barry Jay & J Robin B Cockett (1994): Shapely types and shape polymorphism. In: European Symposium on Programming, Springer, pp. 302–316, 10.1007/3-540-57880-3_20.
- [20] Shin-ya Katsumata (2014): Parametric effect monads and semantics of effect systems. In Suresh Jagannathan & Peter Sewell, editors: The 41st Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’14, San Diego, CA, USA, January 20-21, 2014, ACM, pp. 633–646, 10.1145/2535838.2535846.
- [21] Shin-ya Katsumata (2018): A Double Category Theoretic Analysis of Graded Linear Exponential Comonads. In Christel Baier & Ugo Dal Lago, editors: Foundations of Software Science and Computation Structures - 21st International Conference, FOSSACS 2018, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2018, Thessaloniki, Greece, April 14-20, 2018, Proceedings, Lecture Notes in Computer Science 10803, Springer, pp. 110–127, 10.1007/978-3-319-89366-2_6.
- [22] José Pedro Magalhães, Atze Dijkstra, Johan Jeuring & Andres Löh (2010): A Generic Deriving Mechanism for Haskell. SIGPLAN Not. 45(11), p. 37–48, 10.1145/2088456.1863529.
- [23] Dominic Orchard, Vilem-Benjamin Liepelt & Harley Eades III (2019): Quantitative program reasoning with graded modal types. PACMPL 3(ICFP), pp. 110:1–110:30, 10.1145/3341714.
- [24] Tomas Petricek, Dominic Orchard & Alan Mycroft (2014): Coeffects: a calculus of context-dependent computation. In: Proceedings of the 19th ACM SIGPLAN international conference on Functional programming, ACM, pp. 123–135, 10.1145/2692915.2628160.
- [25] Tomas Petricek, Dominic A. Orchard & Alan Mycroft (2013): Coeffects: Unified Static Analysis of Context-Dependence. In Fedor V. Fomin, Rusins Freivalds, Marta Z. Kwiatkowska & David Peleg, editors: Automata, Languages, and Programming - 40th International Colloquium, ICALP 2013, Riga, Latvia, July 8-12, 2013, Proceedings, Part II, Lecture Notes in Computer Science 7966, Springer, pp. 385–397, 10.1007/978-3-642-39212-2_35.
- [26] John Power & Hiroshi Watanabe (2002): Combining a monad and a comonad. Theoretical Computer Science 280(1-2), pp. 137–162, 10.1016/S0304-3975(01)00024-X.
- [27] Tim Sheard & Simon Peyton Jones (2002): Template Meta-Programming for Haskell. In: Proceedings of the 2002 ACM SIGPLAN Workshop on Haskell, Haskell ’02, Association for Computing Machinery, New York, NY, USA, p. 1–16, 10.1145/581690.581691.
- [28] Ross Street (1972): The formal theory of monads. Journal of Pure and Applied Algebra 2(2), pp. 149–168, 10.1016/0022-4049(72)90019-9.
- [29] K. Terui (2001): Light Affine Lambda Calculus and Polytime Strong Normalization. In: LICS ’01, IEEE Computer Society, pp. 209–220, 10.1109/LICS.2001.932498.
- [30] Tarmo Uustalu & Varmo Vene (November 2006): The Essence of Dataflow Programming. Lecture Notes in Computer Science 4164, pp. 135–167, 10.1007/11894100_5.