Saturday, January 19, 2013

Flexible Implicits (from Agda) for Scala

[This is my first blog post over my new favorite programming language, Scala. The target audience for this message are advanced Scala users and programming language researchers with knowledge of the language.]

Scala's implicit syntax has a few limitations. It's not just ugly - it reduces both elegance and expressiveness, especially in combination with path-dependent types. In this blog post, I will propose a general solution, taken from a  language feature of Agda, a fully dependently-typed language. I even mention (very briefly) how my proposal allows extending lightweight modular staging (LMS) (Rompf and Odersky, 2010) to describe computations on different locations.

Problem #1: Bakery of doom

Let us consider this Scala code (a simplification of code which you end up writing when using macros - the real Context type is scala.reflect.macros.Context):

trait Context {
  type Tree
}
def treeTransf1(c: Context)(tree: c.Tree): c.Tree
def treeTransf2(c: Context)(tree: c.Tree): c.Tree
def treeTransf(c: Context)(tree: c.Tree): c.Tree =
  treeTransf2(c)(treeTransf1(c)(tree))

In this code, what we really want to pass around is tree, but since it has a path-dependent type, namely c.Tree , we need to also pass around c: Context. When this question was raised among scala hackers, more than one proposed allowing this syntax:

def treeTransf1(tree: c.Tree)(implicit c: Context): c.Tree

Adriaan Moors then explained that one could think of the above as:

def treeTransf1[val c: Context](tree: c.Tree): c.Tree

because implicits are similar to type parameters, but did not propose actually using such a syntax. I'm proposing that syntax instead, or rather a variant:

def treeTransf1[[c: Context]](tree: c.Tree): c.Tree

Continuing the above example, I'd write:

def treeTransf2[[c: Context]](tree: c.Tree): c.Tree
def treeTransf[[c: Context]](tree: c.Tree): c.Tree =
  treeTransf2(treeTransf1(tree))

I will explain why it solves other problems as well. I am aware that the bakery of doom can be worked around through other solutions. However, generalizing implicits is more general and elegant.

What's the new syntax?

  • It introduces new-style implicit parameter list, signaled by double brackets [[paramName: ParamType]].
  • Such lists can be at any position among other parameter lists.
  • Analogously, I propose to allow lists of type parameters at any position. This way, type parameter bounds will be able to refer to parameters in previous parameter lists.
  • To specify explicitly an implicit argument for new-style implicit parameters, you'll need to bracket the argument with double brackets [[]] again.
  • If you are worrying about source compatibility, you needn't: this is an alternative syntax, which can coexist with the existing one.
  • In all the examples given, such implicit parameters need to be inferred based on later parameters, using unification like type parameters. Unlike in Agda, only implicit values can be inferred in the current proposal — an alternative would be to allow values which are not marked as implicits if unification finds them as a unique solution.

Why is it simpler?

  • It removes an arbitrary limitation, that is, having a single list of implicit parameters as the last one. Already today, I can workaround this arbitrary limitation by writing:
    def bippy(arg1: Foo)(implicit arg2: Bar): Bar2 => Bippy
    but such a function cannot be used conveniently; moreover, since it returns a function, it pays the overhead of partial function application. A call to the analogous:
    def bippy(arg1: Foo)[[arg2: Bar]](arg3: Bar2): Bippy
    would be erased to a standard method call.
  • In particular, this is simpler than the solution which was proposed:
    def treeTransf1(tree: c.Tree)(implicit c: Context): c.Tree
    which would be hard to support in Scalac, since parameters are used before being passed, and cannot be supported using the dependent function type constructor Pi, standard in dependently-typed lambda calculi.

Problem #2: for what parameter is this argument?

Sometimes, for syntactic reasons, you need to specify implicits arguments explicitly - that's ugly and unnecessary.

def buildMap[K, V](data: Seq[(K, V)])(implicit ordK: Ordering[K]): Map[K, V]

Let's try to call this function and extract immediately an entry from this map:

val m = buildMap(List(("Italy", "Euro"), ("US", "Dollar")))
m("Italy")
But suppose we never use m again.  We'd like to rewrite the above code as:

buildMap(List(("Italy", "Euro"), ("US", "Dollar")))("Italy")

but we can't - Scalac thinks "Italy" is an argument for the ordK parameter, and complains because it has the wrong type. The reason is that at the call site implicit and explicit applications cannot be distinguished. With the extension I propose, we could write instead:

def buildMap[K, V](data: Seq[(K, V)])[[ordK: Ordering[K]]]: Map[K, V]

and call this as

buildMap(List(("Italy", "Euro"), ("US", "Dollar")))("Italy")

We can still specify ordK explicitly, but we need to use a different syntax for these implicit arguments.

buildMap(List(("Italy", "Euro"), ("US", "Dollar")))[[implicitly[Ordering[String]]]]("Italy")

Here I just passed implicitly[Ordering[String]] for simplicity, but other values are possible.

Problem #3: tagging types to separate different worlds/regions/locations

Problem #1 is described as a consequence of using the cake pattern. However, the same problem can arise in different scenarios.
Suppose you want to divide values of some type in different incompatible worlds and enforce statically that they cannot be mixed. While the need for this may not arise as often, one example is in the reflection/macro API, where code being compiled (for instance in macros) should not be mixed with code being run (accessible through reflection) — see the documentation on reflection universes. For instance, in the above code the type Tree is parameterized on a value of type Context.  Trees from different Contexts should not be mixed together. The same applies to different Exprs.

Consider a method which, given two expressions, combines them together, producing an expression representing their sum:

def combine(c: Context)(tree1: c.Expr[Int], tree2: c.Expr[Int]): c.Expr[Int]

(I'll ignore the possibility of using splicing, because splicing is specific to the reflection library, while the problem at hand is more general).
The need to pass the context makes it less convenient to use. This would be more annoying if it were an infix operator. With our syntax, the problem disappears:

def combine[[c: Context]](tree1: c.Expr[Int], tree2: c.Expr[Int]): c.Expr[Int]

Now we can even make this a binary operator:

implicit class RichExpr[[c: Context]](tree1: c.Expr[Int]) {
  def +(tree2: c.Expr[Int]): c.Expr[Int]
}


Another example is handling of multiple access spaces. In various scenarios, you might want to distinguish between local memory and remote memory, which can be allocated in different locations. Converting between different locations requires copying the data and is thus expensive; hence such conversions should not happen implicitly, but should be inserted explicitly by the programmer, who should take care of having as few conversions as possible.

trait Location {
  type Rep[+T] //as in Lightweight modular staging.
}


trait Matrix[T: Numeric] {
}

implicit class RichMatrix[[l: Location]](m1: l.Rep[Matrix[Int]]) {
  def *(m2: l.Rep[Matrix[Int]]): l.Rep[Matrix[Int]] =
    ??? //omitted
}

Using such technique, one can embed different type systems where a region separation is enforced — for instance phase types as described by Cohen et al. (2012), or (a generalization of) client/server locations, as used by Ibrahim et al. (2009) and Oleg Kiselyov (2010). In the latter case, using locations is necessary to track multiple hosts.

Conclusion

I hope my examples suggest that this proposal, while simple and (in hindsight) pretty obvious, has several advantages over the current situation. What I find amazing is that while this is (or maybe only seems?) a matter of syntax, it affects the expressivity of the language, while in programming language theory syntax is usually considered of little importance.
Feedback is more than welcome.

Bibliography:

Tiark Rompf and Martin Odersky. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs, GPCE 2010.

Michael Cohen, Haitao Steve Zhu, Senem Ezgi Emgin, Yu David Liu, Energy Types, OOPSLA 2012.

Ali Ibrahim, Yang Jiao, Eli Tilevich, and William R. Cook. Remote batch invocation for compositional object services, ECOOP 2009.

Oleg Kiselyov, Semi-implicit batched remote code execution as staging. http://okmij.org/ftp/meta-future/, 2010.

No comments:

Post a Comment