Pure vs. impure iterators in Go

# Pure vs. impure iterators in Go

## TL;DR ¶

– Go has now standardised iterators.
– Iterators are powerful.
– Being functions under the hood, iterators can be closures.
– The classification of iterators suggested by the documentation is ambiguous.
– Dividing iterators into two categories, “pure” and “impure”, seems to me preferrable.
– Whether iterators should be designed as “pure” whenever possible is unclear.

## The advent of iterators in Go ¶

The iterator pattern was popularised by the classic “Gang of Four” book as

> [providing] a way to access the elements of an aggregate object sequentially without exposing its underlying representation.

Until recently, the data structures over which you could iterate via a
`for`- `range` loop were limited to arrays (either directly or
through a pointer), slices, strings, maps, channels, and integers. However, in
the wake of Go 1.18’s support for parametric polymorphism
(a.k.a. “generics”), Go 1.23 standardised the way of defining custom
iterators, saw the addition of the iter package in the standard
library, and welcomed a couple of _iterator factories_ (i.e. functions or
methods that return an iterator) in the slices and maps
packages. And Go 1.24 marked the inception in the standard library
of even more iterator factories, such as `strings.SplitSeq`:

> `// SplitSeq returns an iterator over all substrings of s separated by sep. // The iterator yields the same strings that would be returned by Split(s, sep), // but without constructing the slice. // It returns a single-use iterator. func SplitSeq(s, sep string) iter.Seq[string]`

If you’re not familiar with the syntax and semantics of iterators in Go 1.23+, I recommend you peruse Ian Lance Taylor’s introductory post published on the Go blog.

Once you get past the first impression of bewilderment (“Why some many
`func` s?!”), it’s pretty smooth sailing. Moreover, callers of iterators
typically are isolated from whatever complexity is involved in their
implementation.

As a first example, consider the program below (playground), which
features a factory function named `fib0` that produces an iterator over the
sequence of Fibonacci numbers:

As you can expect, the program prints the Fibonacci numbers less than 100 in increasing order:

“`
0 1 1 2 3 5 8 13 21 34 55 89
“`

## The power of iterators ¶

The standardisation of iterators is a welcome addition to the Go language. Iterators indeed provide tangible benefits:

– They promote **flexibility** and **separation of concerns**: callers can remain oblivious as to how the sequence is produced and can instead focus on what to do with the data; an iterator that abstracts the paginated consumption of GitHub’s API is a good example.
– They encourage **encapsulation** insofar as they expose data as sequences that cannot be mutated like slices and maps can.
– They have the potential to boost **performance**: instead of materialising a data structure containing the entire body of data, they dole out elements one by one only as required by the caller, thereby promising, in many (though not all) situations, a lower latency and reduced heap allocations; they’re also more performant than iterator implementations relying on channels.
– They allow for **infinite sequences**(e.g. the sequence of prime numbers), an affordance that finite data structures like slices and maps never could provide.

Because iterators are so powerful, they’re likely to mushroom in libraries even beyond Go’s standard library. Therefore, to forestall any confusion in the discourse about iterators, the terminology surrounding them should be as precise as possible.

## Official guidance on iterator classification ¶

The phrase “single-use iterator” may have caught your eye in
`strings.SplitSeq`’s documentation.
It is explained in a section of the iter package:

> Most iterators provide the ability to walk an entire sequence: when called, the iterator does any setup necessary to start the sequence, then calls yield on successive elements of the sequence, and then cleans up before returning. Calling the iterator again walks the sequence again.
>
> Some iterators break that convention, providing the ability to walk a sequence only once. These “single-use iterators” typically report values from a data stream that cannot be rewound to start over. Calling the iterator again after stopping early may continue the stream, but calling it again after the sequence is finished will yield no values at all. Doc comments for functions or methods that return single-use iterators should document this fact:
>
> `// Lines returns an iterator over lines read from r. // It returns a single-use iterator. func (r *Reader) Lines() iter.Seq[string]`

This passage of the documentation seemingly divides iterators into two categories. I’ll attempt to elucidate them through a couple of examples.

### An example of a “pure” iterator ¶

The result of my `fib0` function clearly exemplifies the first category:

If we assign `fib0`’s result to a variable and then range over that iterator
multiple times, we indeed walk the sequence of Fibonacci
numbers _from the beginning_ each time:

“`
package main import ( “fmt” “iter” ) func fib0() iter.Seq[int] { return func(yield func(int) bool) { for a, b := 0, 1; yield(a); a, b = b, a+b { // deliberately empty } } } func main() { seq := fib0() for n := range seq { if n > 10 { break } fmt.Printf(“%d “, n) } fmt.Println() for n := range seq { if n > 100 { break } fmt.Printf(“%d “, n) } fmt.Println() for n := range seq { if n > 1000 { break } fmt.Printf(“%d “, n) } }
“`

Output:

“`
0 1 1 2 3 5 8 0 1 1 2 3 5 8 13 21 34 55 89 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
“`

In my interpretation of the documentation, the first category of iterators
corresponds to _externally pure functions_, i.e. iterators that may
internally rely on mutation but display no externally observable side effects.
Therefore, “pure” seems to me an apt qualifier for such iterators; “stateless”
comes a close second.

## Example of a “single-use” iterator ¶

What about the second category?

> Some iterators break that convention, providing the ability to walk a sequence only once. These “single-use iterators” typically report values from a data stream that cannot be rewound to start over. Calling the iterator again after stopping early may continue the stream, but calling it again after the sequence is finished will yield no values at all.

Consider the example below:

“`
func fib1() iter.Seq[int] { a, b := 0, 1 return func(yield func(int) bool) { for ; yield(a); a, b = b, a+b { // deliberately empty body } } }
“`

At a glance, iterator factory function `fib1` seems almost identical to `fib0`,
and you would be forgiven to dismiss their difference in implementation as
unimportant; in fact, you’d be in very good company if you
did. However, the difference between `fib1` and `fib0`, far from being merely
cosmetic, actually changes the semantics of their results. To understand why,
you need some familiarity with _closures_.

If necessary, refer to the appendix to this post for a refresher on closures.

Recall that, in Go, iterators are functions; as such, iterators can be
closures! In `fib0`, variables `a` and `b` are declared within the resulting
iterator. By contrast, in `fib1`, variables `a` and `b` are _free_
_variables_ of the resulting iterator, which also happens to mutate
those variables. Therefore, if we repeatedly range over `fib1`’s result
(playground) as we did over `fib0`’s result earlier, we observe a very different
behaviour:

“`
package main import ( “fmt” “iter” ) func fib1() iter.Seq[int] { a, b := 0, 1 return func(yield func(int) bool) { for ; yield(a); a, b = b, a+b { // deliberately empty } } } func main() { seq := fib1() for n := range seq { if n > 10 { break } fmt.Printf(“%d “, n) } fmt.Println() for n := range seq { if n > 100 { break } fmt.Printf(“%d “, n) } fmt.Println() for n := range seq { if n > 1000 { break } fmt.Printf(“%d “, n) } }
“`

Output:

“`
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
“`

In plain terms, the iterator “remembers” where in the sequence of Fibonacci numbers the caller last stopped and, when the caller resumes ranging over it, the iterator picks up right where it left off; because the sequence produced by this iterator cannot be rewound but can be resumed, the iterator could be perhaps be described as “single-use” but “resumable”.

## A problematic classification ¶

The classification of iterators suggested in the documentation is not ideal, in my opinion.

First, its lack of a term for iterators that I described earlier as “pure” is puzzling. Among all iterators, “pure” iterators arguably distinguish themselves as the easiest ones to reason about; as such, don’t they deserve their own qualifier? A parallel with dynamical systems comes to mind: linear dynamical systems are singled out from nonlinear ones precisely because they’re regarded as simpler and much easier to apprehend.

Second, I find the description of the second category confusingly imprecise and
ambiguous. In particular, whether the term “single-use” is intended to
encompass only some or _all_ “impure” iterators is unclear to me… and to
other Gophers too, if the results of an informal poll that I recently
ran on Gophers Slack are to be believed. If the “single-use”
qualifier is intended for only a subcategory of “impure” iterators, I’m not
sure why this subcategory should deserve a particular focus in the
documentation. And if the “single-use” qualifier is intended for all “impure”
iterators, it strikes me as a misnomer; after all, many forms of “impure”
iterators are possible, not all of which could reasonably be described as
“single-use”, as we shall see.

### A wondrous zoo of “impure” iterators ¶

In the program below, the iterator resulting from `fib2` (playground)
could reasonably be described as “usable twice” and “non-resumable”
(or, perhaps, _restarting_):

“`
package main import ( “fmt” “iter” ) func fib2() iter.Seq[int] { var n int return func(yield func(n int) bool) { if n > 1 { return } for a, b := 0, 1; yield(a); a, b = b, a+b { // deliberately empty body } n++ } } func main() { seq := fib2() for n := range seq { if n > 10 { break } fmt.Printf(“%d “, n) } fmt.Println() for n := range seq { if n > 100 { break } fmt.Printf(“%d “, n) } fmt.Println() for n := range seq { if n > 1000 { break } fmt.Printf(“%d “, n) } }
“`

Output:

“`
0 1 1 2 3 5 8 0 1 1 2 3 5 8 13 21 34 55 89
“`

And, in the program below, the iterator resulting from `fib3`
(playground) could reasonably be described as “usable twice” and
“resumable”:

“`
package main import ( “fmt” “iter” ) func fib3() iter.Seq[int] { var n int a, b := 0, 1 return func(yield func(n int) bool) { if n > 1 { return } for ; yield(a); a, b = b, a+b { // deliberately empty body } n++ } } func main() { seq := fib3() for n := range seq { if n > 10 { break } fmt.Printf(“%d “, n) } fmt.Println() for n := range seq { if n > 100 { break } fmt.Printf(“%d “, n) } fmt.Println() for n := range seq { if n > 1000 { break } fmt.Printf(“%d “, n) } }
“`

Output:

“`
0 1 1 2 3 5 8 13 21 34 55 89
“`

I’m sure you could think of more variations around this theme. As soon as function purity goes out the window, the possibilities are endless.

## Should iterators be “pure” whenever possible? ¶

Another question arises: if “pure” iterators are easier to reason about than “impure” ones are, shouldn’t iterators be designed as “pure” whenever possible?

### Performance considerations ¶

Perhaps they should, at least when we emphasise performance as a design
criterion. “Pure” iterators indeed tend to incur fewer heap allocations than
their “impure” counterparts do. Consider `strings.Lines` as a
case study; shown below is its source code in Go
1.24.3:

“`
// Lines returns an iterator over the newline-terminated lines in the string s. // The lines yielded by the iterator include their terminating newlines. // If s is empty, the iterator yields no lines at all. // If s does not end in a newline, the final yielded line will not end in a newline. // It returns a single-use iterator. func Lines(s string) iter.Seq[string] { return func(yield func(string) bool) { for len(s) > 0 { var line string if i := IndexByte(s, ‘n’); i >= 0 { line, s = s[:i+1], s[i+1:] } else { line, s = s, “” } if !yield(line) { return } } return } }
“`

The resulting iterator is “impure” because it mutates `s`, which happens to be
its only free variable. As a consequence of escape analysis, `s` to escape to
heap:

“`
$ go build -gcflags=’-m=2′ ./strings -snip- strings/iter.go:18:12: s escapes to heap: -snip-
“`

But `strings.Lines` could instead have been designed to produce a “pure”
iterator:

“`
func Lines(s string) iter.Seq[string] { return func(yield func(string) bool) { + s := s // local copy! for len(s) > 0 { var line string if i := strings.IndexByte(s, ‘n’); i >= 0 { line, s = s[:i+1], s[i+1:] } else { line, s = s, “” } if !yield(line) { return } } return } }
“`

In this alternative version of `strings.Lines`, its `s` parameter remains a
free variable of the resulting iterator, but the iterator does not mutate that
variable; instead, the iterator exclusively operates on a local copy of the
source string. With this simple change, escape analysis no longer concludes that variable `s`
needs escape to heap, and one fewer memory allocation is required.
See this related comment by Alan Donovan on GitHub.

### Consistency with related iterators ¶

However, performance is only one possible design criterion: as Ian Lance
Taylor shrewdly pointed out to me, consistency in behaviour
with closely linked iterator factories is another. For example, package
`bytes` provides a function named `Lines` that’s
analogous to `strings.Lines`; shown below is its source code in Go
1.24.3:

“`
// Lines returns an iterator over the newline-terminated lines in the byte slice s. // The lines yielded by the iterator include their terminating newlines. // If s is empty, the iterator yields no lines at all. // If s does not end in a newline, the final yielded line will not end in a newline. // It returns a single-use iterator. func Lines(s []byte) iter.Seq[[]byte] { return func(yield func([]byte) bool) { for len(s) > 0 { var line []byte if i := IndexByte(s, ‘n’); i >= 0 { line, s = s[:i+1], s[i+1:] } else { line, s = s, nil } if !yield(line[:len(line):len(line)]) { return } } return } }
“`

As much as I’d like to redesign `bytes.Lines` so as to make
its product “pure”, I can’t think of an efficient way to do so. Contrary to strings, slices are mutable; moreover,
multiples slices can reference the same underlying array. Therefore, even if
the resulting iterator were modified to operate on a local copy of
the source slice, it still wouldn’t be “pure”, as demonstrated by the following
program (playground):

“`
package main import ( “bytes” “fmt” “iter” “os” ) func main() { src := []byte(“foonbarnbazn”) seq := Lines(src) for s := range seq { os.Stdout.Write(s) } for s := range seq { // pass that mutates the source slice if len(s) == 0 { continue } s[0] = ‘a’ } fmt.Println() for s := range seq { os.Stdout.Write(s) } } func Lines(s []byte) iter.Seq[[]byte] { return func(yield func([]byte) bool) { s := s // local copy for len(s) > 0 { var line []byte if i := bytes.IndexByte(s, ‘n’); i >= 0 { line, s = s[:i+1], s[i+1:] } else { line, s = s, nil } if !yield(line[:len(line):len(line)]) { return } } return } }
“`

Output:

“`
foo bar baz aoo aar aaz
“`

For `bytes.Lines` to produce a pure iterator, a “deep” copy of the source slice
(i.e. a slice that would reference a clone of the source slice’s backing array)
would be required (playground); and going down that road
seems to me like it would defeat the purpose of iterators.

If `bytes.Lines` can’t easily be designed to produce a “pure” iterator,
`strings.Lines` should arguably still return an “impure” iterator as well. In
general, should iterator “purity” be pursued at all costs, or should
consistency with related iterators be a factor? The jury is still out.

## Conclusion ¶

As Austin Clements recently wrote:

> Iterators are still young in Go […]

Conventions around iterators have not yet been firmly established, let alone percolated through the Go community; there is still room for experimentation and time to iron the terminology kinks out. The sooner, the better, though, if you ask me.

## Acknowledgments ¶

Thanks to Valentin Deleplace for spurring me to write this post and for reviewing an early draft of it.

## Appendix: refresher on closures ¶

A closure is a function that _captures_ variables declared in an
outer lexical scope. Such variables are called _free variables_ of the function
in question. In the following example, anonymous function `f` reads a variable
named `truth` variable that is declared in the `main` function:

“`
package main import “fmt” func main() { truth := true f := func() bool { return truth } fmt.Println(f()) }
“`

Output:

“`
true
“`

So far, nothing to write home about. However, the power of closures lies in
their ability to _update_ their free variables! Such closures
essentially are _stateful_ functions: functions that maintain a state
(composed of their free variables) from one invocation to the next.

In the following example (playground), factory function
`falseAfterN` returns a closure that captures a local variable named `truth`,
updates that variable during each invocation, and varies its boolean results on
the basis of the free variable’s value:

“`
package main import “fmt” func falseAfterN(n uint) func() bool { var count uint return func() bool { if count

Leave a Reply Cancel reply