Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
209 changes: 209 additions & 0 deletions _posts/2013-12-11-elixir-s-new-continuable-enumerators.markdown
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
---
layout: post
title: Elixir's new continuable enumerators
author: Peter Minten
category: "What's New in Elixir"
excerpt: In 0.12.0 Elixir's enumerators have gained the ability to suspend value
production and to terminate early.
---

As you may have heard in the upcoming 0.12.0 release Elixir's enumerators gained
some new features. In this blog post I'll explain what's new, what it enables
and how it works.

For those of you who use the development version of Elixir these changes are
already available. For the exact differences in code you can look at the
[relevant pull request](https://github.com/elixir-lang/elixir/pull/1922).

## A recap of enumerators, and some terminology

The basic idea of enumerators is that you traverse some data structure or
resource (lines from a file) by putting the thing that is traversed in control.
That is if you're reading from a file you have a loop that reads lines from a
file and for each line calls a function. Just calling a function isn't all that
useful for most tasks as there'd be no way to remember previous lines (ugly
hacks aside), so some accumulator value is passed to the function and a new
accumulator is returned by it.

For example here's how you can count the total length of strings in a list.

```elixir
Enumerable.reduce(l, 0, fn x, acc -> String.length(x) + acc end)
```

Often the actual call to `Enumerable.reduce/3` is hidden inside another
function. Say that we want to define a `sum` function. The usual way is to
write it like this:

```elixir
def sum(coll) do
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You name it coll here, calls it enumerable later. Maybe pick one of them? Maybe enum?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Fixed.

Enumerable.reduce(coll, 0, fn x, acc -> x + acc end)
end
```

This could get called as `Enum.map(1..10, &(&1 * &1)) |> sum()` to get the sum of
squares. Desugaring this means `sum(Enum.map(1..10, &(&1 * &1)))`.

The general pattern is this:

```elixir
def outer_function(coll, ...) do
...
Enumerable.reduce(coll, initial_consumer_acc, consumer)
...
end

something_that_returns_an_enumerable(...) |> outer_function(...)
```

You'll notice the slightly uncommon terminology of "outer function" and
"consumer" (normally called an "iteratee"). That's intentional, naming an
iteratee a consumer better reflects that it consumes values.

Along the same lines I call the reduce function for a specific enumerable a
producer, it produces values which are given to a consumer.

The outer function is the function to which the enumerable is passed.
Syntactically it looks like this is the consumer, but it's really a function
that combines the producer and the consumer. For simple consumers (say `fn x,
acc -> length(x) + acc end`) the consumer will often be written directly in the
source text of the outer function, but let's try to keep those concepts
distinguished.

## Two issues with classic Elixir enumerators

Enumerators are great, but they have their limitations. One issue is that it's
not possible to define a function that only returns at most 3 elements without
traversing all elements or using ugly tricks such as `throw` (with a
`try...catch` construct in the outer function). The `throw` trick is used in
`Enum` and `Stream` to implement functions such as `Enum.take/2` and
`Stream.take_while/2`. It works, but it's not what I'd call stylish.

A bigger problem, that doesn't have a workaround, is that there's no way to
interleave two enumerables. That is, it's not possible to define a function that
for two enumerables `A` and `B` returns a list `[A1, B1, A2, B2, A3, ...]`
(where `A1` is the first element of A) without first traversing both lists and
then interleaving the collected values. Interleaving is important because it's
the basis of a zip function. Without interleaving you cannot implement
`Stream.zip/2`.

The underlying problem, in both cases, is that the producer is fully in control.
The producer simply pushes out as many elements to the consumer as it wants and
then says "I'm done". There's no way aside from `throw`/`raise` for a consumer
to tell a producer "stop producing". There is definitely no way to tell a
producer "stop for now but be prepared to continue where you left off later".

## Power to the consumer!

At CodeMeshIO José Valim and Jessica Kerr sat down and discussed this problem.
They came up with a solution inspired by a [Monad.Reader
article](http://themonadreader.files.wordpress.com/2010/05/issue16.pdf) (third
article). It's an elegant extension of the old system, based on a simple idea.
Instead of returning only an accumulator at every step (for every produced
value) the consumer returns a combination of an accumulator and an instruction
to the producer. Three instructions are available:

* `:cont` - Keep producing.
* `:halt` - Stop producing.
* `:suspend` - Temporarily stop producing.

A consumer that always returns `:cont` makes the producer behave exactly the
same as in the old system. A consumer may return `:halt` to have the producer
terminate earlier than it normally would.

The real magic is in `:suspend` though. It tells a producer to return the
accumulator and a continuation function.

```elixir
{ :suspended, n_, cont } = Enumerable.reduce(1..5, { :cont, 0 }, fn x, n ->
if x == 3 do
{ :suspend, n }
else
{ :cont, n + x }
end
end)
```

After running this code `n_` will be `3` (1 + 2) and `cont` will be a
function. We'll get back to `cont` in a minute but first take a look at some of
the new elements here. The initial accumulator has an instruction as well, so
you could suspend or halt a producer immediately, if you really want to. The
value passed to the consumer (`n`) does not contain the instruction. The return
value of the producer also has a symbol in it. Like with the instructions of
consumers there are three possible values:

* `:done` - Completed normally.
* `:halted` - Consumer returned a `:halt` instruction.
* `:suspended` - Consumer return a `:suspend` instruction.

Together with the other values returned the possible return values from a
producer are `{ :done, acc } | { :halted, acc } | { :suspended, acc,
continuation }`.

Back to the continuation. A continuation is a function that given an accumulator
returns a new producer result. In other words it's a way to swap out the
accumulator but keep the same producer in the same state.

## Implementing `interleave`

Using the power of suspension it is now possible to create an interleave
function.

```elixir
def interleave(a, b) do
step = fn x, acc -> [x|acc] end
af = &Enumerable.reduce(a, &1, step)
bf = &Enumerable.reduce(b, &1, step)
do_interleave(af, bf, []) |> :lists.reverse()
end

defp do_interleave(a, b, acc) do
case a.({ :cont, acc }) do
{ :suspended, acc, a } ->
case b.({ :cont, acc }) do
{ :suspended, acc, b } ->
do_interleave(a, b, acc)
{ :done, acc } ->
# Get remainder of a's entries
{ :done, acc } = a.({ :cont, acc }, fn x, acc -> [x|acc] end)
acc
end
{ :done, acc } ->
{ :done, acc } = b.({ :cont, acc }, fn x, acc -> [x|acc] end)
acc
end
end
```

Lets go through this step by step. The main `interleave` function first
partially applies `Enumerable.reduce/3` to get function values that work just
like the continuations. This makes things easier for `do_interleave`.

The `do_interleave` function first calls `a` (`af` from `interleave`) with the
`step` function so that the available element of `a` gets added to the
accumulator and `a` immediately suspends afterwards. Then the same is done for
`b`. If either producer is done all the remaining elements of the other get
added to the accumulator list.

Note that `acc` is sometimes used to mean a tuple like `{ :cont, x }` and
sometimes the accumulator value proper. It's a bit confusing, yes.

This example shows that through clever combination of an outer function
(`do_interleave`) and an inner function `step` two producers can be interleaved.

## `Enum.reduce` and `Enumerable.reduce` are now slightly different

In the old system `Enum.reduce/3` simply called `Enumerable.reduce/3` with the
same arguments. In order to keep the `Enum.reduce/3` function simple to use this
is no longer the case in the new system. `Enum.reduce/3` now calls
`Enumerable.reduce/3`
with a very simple wrapper around the passed function: `fn x, acc -> { :cont,
f.(x, acc) } end`. Thus `Enum.reduce/3` works just as it has always done, it
calls the supplied function for every produced element without the possibility
of halting or suspending the producer.

## Conclusion

The new system of enumerators certainly makes things a bit more complicated but
also adds power. I suspect many interesting and "interesting" functions can be
built on top of it.