Challenge: make this Go function inlinable and free of bounds checks

# Challenge: make this Go function inlinable and free of bounds checks

In this post, I challenge you to refactor a small function in such a way as to make it inlinable and free of bounds checks, for better performance.

**Disclaimer**: this post assumes version 1.24.2 of the (official) Go
compiler; you may get different results with other versions of the Go
compiler or with other implementations of the Go language.

## Function inlining & bounds-check elimination ¶

Some familiarity with function inlining and bounds-check elimination is a prerequisite for attempting my challenge. The following three sections serve as a crash course on those topics. Feel free to skip straight to the challenge itself.

### Function inlining 101 ¶

Inlining is a compiler strategy that can roughly be described as “substituting the body of a function for calls to that function”. Because it eliminates some function-call overhead, inlining often results in faster program execution. However, function inlining also increases the size of the resulting binaries, and indiscriminate inlining could cause compilers to produce prohibitively large binaries. Therefore, compilers typically restrict eligibility for inlining to functions that they deem “simple enough”.

For each function, the Go compiler allocates an inlining budget and computes an
inlining cost. If a function’s inlining cost does not exceed its budget, the
compiler deems the function _eligible for inlining_ (a.k.a. _inlinable_) and
inlines it; otherwise, it doesn’t. At the time of writing (Go
1.24.2), the default inlining budget is
80, but the compiler may, thanks to profile-guided
optimisation (PGO), decide to increase the inlining budget for functions
called on the hot path; for this post, let’s keep things simple and ignore PGO,
though. Some language constructs (such as recursion, “defer”
statements, and calls to the `recover` function) preclude
function inlining outright. The Go compiler’s inlining strategy is not specified
by the language itself and may change from one release to the next.

The most straightforward way to determine whether a function is eligible for inlining is to run a command of the following form and inspect its output:

“`
$ go build -gcflags ‘-m=2’ 2>&1 | grep
“`

Even if a function is inlinable, you may occasionally want to prevent it from
being inlined; you can instruct the compiler not to inline a function by adding
a `//go:noinline` directive above the function’s declaration. But you cannot
force the compiler to inline arbitrary functions, for reasons explained above;
a more thorough rationale for this limitation can be found in issue
#21536.

By refactoring an initially non-inlinable function (and/or by relaxing its semantics), you can sometimes lower a function’s inlining cost to the point of making it eligible for inlining; however, unless you’re very familiar with the Go compiler’s inliner (and keep abreast of changes brought to it by maintainers of the Go project), you may rightly perceive making a function eligible for inlining tends more as an art than as a science.

### Bounds-check elimination 101 ¶

I touched upon bounds-check elimination (BCE) in a post published earlier this year on this blog; allow me to quote it:

> […] Go is said to be memory-safe; in particular, per the language specification, implementations must trigger a run-time panic if a slice-indexing operation is ever out of bounds. Such bounds checks are relatively cheap, but they are not free. When the compiler can prove that some slice access cannot be out of bounds, it may omit, for better performance, the corresponding bounds check from the resulting executable.

What I wrote about slices in that post is also true of `string` s, which are
little more than slices of `byte` s under the hood.

Bounds-check elimination, like inlining strategies, remains an implementation detail; it is not specified by the language itself and may change from one release of the Go compiler to the next.

To identify where the Go compiler introduces bounds checks in a package, run a command of the following form:

“`
$ go build -gcflags ‘-d=ssa/check_bce/debug=1’
“`

The presence of bounds checks is not systematically detrimental to performance,
and eliminating all bounds checks from a program often proves impossible
anyway. However, eliminating some bounds checks, perhaps by _hoisting_ them
out of loops, typically is conducive to faster program execution. You can find
instances of hoisting bounds checks out of loops in Go’s standard
library.

Despite frequent improvements to the Go compiler’s BCE logic, you sometimes need to refactor your code in order to convince the compiler to eliminate or displace some bounds checks. This too is more an art than a science, and typically requires some trial and error.

### A delicate balancing act ¶

Function inlining and BCE are closely intertwined. Inlinability sometimes comes at the cost of a couple of additional bounds checks; conversely, eliminating some bounds checks may rob a function from its eligibility for inlining. In some lucky cases, though, you can have your cake and eat it too!

Now that you know enough about inlining and BCE, I can finally present my challenge to you.

## Can you make this function inlinable and free of bounds checks? ¶

Consider the following source file (which I’ve uploaded to a Gist, for your convenience):

“`
package headers // TrimOWS trims up to n bytes of optional whitespace (OWS) // from the start of and/or the end of s. // If no more than n bytes of OWS are found at the start of s // and no more than n bytes of OWS are found at the end of s, // it returns the trimmed result and true. // Otherwise, it returns some unspecified string and false. func TrimOWS(s string, n int) (string, bool) { if s == “” { return s, true } s, ok := trimLeftOWS(s, n) if !ok { return “”, false } s, ok = trimRightOWS(s, n) if !ok { return “”, false } return s, true } func trimLeftOWS(s string, n int) (string, bool) { var i int for len(s) > 0 { if !isOWS(s[0]) { break } if i >= n { return “”, false } s = s[1:] i++ } return s, true } func trimRightOWS(s string, n int) (string, bool) { var i int for len(s) > 0 { if !isOWS(s[len(s)-1]) { break } if i >= n { return “”, false } s = s[:len(s)-1] i++ } return s, true } // See https://httpwg.org/specs/rfc9110.html#whitespace. func isOWS(b byte) bool { return b == ‘t’ || b == ‘ ‘ }
“`

The implementation of `TrimOWS` above incurs no bounds checks, as evidenced by
the empty output of the following command:

“`
$ go build -gcflags ‘-d=ssa/check_bce/debug=1’ $
“`

However, it is not inlinable:

“`
$ go build -gcflags ‘-m=2’ 2>&1 | grep TrimOWS ./ows.go:9:6: cannot inline TrimOWS: function too complex: cost 130 exceeds budget 80
“`

Here is my challenge to you: can you refactor function `TrimOWS` so that it be
inlinable (without relying on profile-guided optimisation) and free of bounds
checks?

To allow you to catch bugs as you refactor the code, I’ve included a suite of unit tests; and to allow you to later measure performance changes between implementations, I’ve also included some benchmarks:

“`
package headers import “testing” const maxOWS = 1 // number of OWS bytes tolerated on either side var trimOWStests = []struct { desc string s string want string ok bool }{ { desc: “empty”, s: “”, want: “”, ok: true, }, { desc: “no OWS”, s: “foo”, want: “foo”, ok: true, }, { desc: “internal OWS”, s: “foo ttbar”, want: “foo ttbar”, ok: true, }, { desc: “leading and trailing OWS”, s: “tfoo “, want: “foo”, ok: true, }, { desc: “non-OWS whitespace”, s: “nfoot”, want: “nfoo”, ok: true, }, { desc: “too much leading OWS”, s: ” tfoot”, ok: false, }, { desc: “too much trailing OWS”, s: ” foot “, ok: false, }, { desc: “too much leading and trailing OWS”, s: ” tfoot “, ok: false, }, } func TestTrimOWS(t *testing.T) { for _, tc := range trimOWStests { f := func(t *testing.T) { allocs := testing.AllocsPerRun(10, func() { str, truth = TrimOWS(tc.s, maxOWS) }, ) if allocs > 0 { const tmpl = “TrimOWS(%q, %d) allocates %.2f objects” t.Errorf(tmpl, tc.s, maxOWS, allocs) } got, ok := TrimOWS(tc.s, maxOWS) if !tc.ok && ok { // In cases where TrimOWS must fail, // its string result is unspecified. const tmpl = “TrimOWS(%q, %d): _, %t; want _, %t” t.Fatalf(tmpl, tc.s, maxOWS, ok, tc.ok) } if tc.ok && (!ok || got != tc.want) { const tmpl = “TrimOWS(%q, %d): %q, %t; want %q, %t” t.Errorf(tmpl, tc.s, maxOWS, got, ok, tc.want, tc.ok) } } t.Run(tc.desc, f) } } func BenchmarkTrimOWS(b *testing.B) { for _, tc := range trimOWStests { f := func(b *testing.B) { for range b.N { str, truth = TrimOWS(tc.s, maxOWS) } } b.Run(tc.desc, f) } } var ( // sinks for avoiding dead-code elimination in benchmarks str string truth bool )
“`

Ready for some micro-optimisation? Clone the Gist and `cd` into your clone:

“`
$ git clone https://gist.github.com/jub0bs/0151185bad8ea4d70c4fd1d6ac78ab40 $ cd 0151185bad8ea4d70c4fd1d6ac78ab40
“`

And off you go!

### Hints ¶

Here is series of hints that you might find useful in case you get stuck.

**Hint 0**

Try eliminating unnecessary branching. In particular, the initial empty-string
check in `TrimOWS` is redundant; it’s an attempt at
micro-optimisation, but its performance benefits are moot. Moreover, `TrimOWS`
doesn’t need to check the boolean results of `trimRightOWS`; instead, it could
simply bubble up `trimRightOWS`’s results to the caller.

“`
func TrimOWS(s string, n int) (string, bool) { – if s == “” { – return s, true – } s, ok := trimLeftOWS(s, n) if !ok { return “”, false } – s, ok = trimRightOWS(s, n) – if !ok { – return “”, false – } – return s, true + return trimRightOWS(s, n) }
“`

These changes don’t introduce any bounds check and somewhat reduce `TrimOWS`’s
inlining cost:

“`
$ go build -gcflags ‘-d=ssa/check_bce/debug=1’ $ $ go build -gcflags ‘-m=2’ 2>&1 | grep TrimOWS ./ows.go:9:6: cannot inline TrimOWS: function too complex: cost 121 exceeds budget 80
“`

**Hint 1**

Although helper functions `trimLeftOWS` and `trimRightOWS` are themselves
inlinable, manually inlining them in `TrimOWS` helps inlinability without
hurting readability:

“`
func TrimOWS(s string, n int) (string, bool) { – s, ok := trimLeftOWS(s, n) – if !ok { – return “”, false – } – return trimRightOWS(s, n) -} – -func trimLeftOWS(s string, n int) (string, bool) { var i int for len(s) > 0 { if !isOWS(s[0]) { break } if i >= n { return “”, false } s = s[1:] i++ } – return s, true -} – -func trimRightOWS(s string, n int) (string, bool) { – var i int + i = 0 for len(s) > 0 { if !isOWS(s[len(s)-1]) { break } if i >= n { return “”, false } s = s[:len(s)-1] i++ } return s, true }
“`

Not only do these changes not introduce any bounds checks, but they
dramatically reduce `TrimOWS`’s inlining cost:

“`
$ go build -gcflags ‘-d=ssa/check_bce/debug=1’ $ $ go build -gcflags ‘-m=2’ 2>&1 | grep TrimOWS ./ows.go:9:6: cannot inline TrimOWS: function too complex: cost 88 exceeds budget 80
“`

Manually inlining my little `isOWS` function as well is tempting and does
somewhat reduce `TrimOWS`’s inlining cost, but doing so isn’t ideal because it
violates the DRY principle:

> Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

**Hint 2**

Try walking along the string rather than repeatedly munching at it:

“`
func TrimOWS(s string, n int) (string, bool) { – var i int – for len(s) > 0 { – if !isOWS(s[0]) { + for i := 0; i = n { return “”, false } – s = s[1:] – i++ } – i = 0 – for len(s) > 0 { – if !isOWS(s[len(s)-1]) { + for i := len(s) – 1; i >= 0; i– { + if !isOWS(s[i]) { + s = s[:i+1] break } – if i >= n { + if i &1 | grep TrimOWS ./ows.go:9:6: cannot inline TrimOWS: function too complex: cost 94 exceeds budget 80
“`

Fret not! Performance optimisation seldom is a straight line, and refactorings initially perceived as setbacks may in fact unlock new avenues for improvement.

**Hint 3**

Use range-over-int loops wherever possible, as they tend to be cheaper (and easier to reason about!) than classic three-clause loops:

“`
func TrimOWS(s string, n int) (string, bool) { – for i := 0; i &1 | grep TrimOWS ./ows.go:9:6: cannot inline TrimOWS: function too complex: cost 87 exceeds budget 80
“`

We’re back on tracks!

**Hint 4**

Using naked returns can help reduce a function’s inlining cost:

“`
-func TrimOWS(s string, n int) (string, bool) { +func TrimOWS(s string, n int) (_ string, _ bool) { for i := range len(s) { if !isOWS(s[i]) { s = s[i:] break } if i >= n { – return “”, false + return } } for i := len(s) – 1; i >= 0; i– { if !isOWS(s[i]) { s = s[:i+1] break } if i &1 | grep TrimOWS ./ows.go:9:6: cannot inline TrimOWS: function too complex: cost 83 exceeds budget 80
“`

Down to 83, now. We’re breathing rare air, here!

**Hint 5**

You can return early from the second loop; by doing so, you can also clarify
that, if execution reaches the bottom of `TrimOWS`, the string returned by the
function is empty:

“`
func TrimOWS(s string, n int) (_ string, _ bool) { for i := range len(s) { if !isOWS(s[i]) { s = s[i:] break } if i >= n { return } } for i := len(s) – 1; i >= 0; i– { if !isOWS(s[i]) { – s = s[:i+1] – break + return s[:i+1], true } if i &1 | grep TrimOWS ./ows.go:9:6: cannot inline TrimOWS: function too complex: cost 82 exceeds budget 80
“`

Down to 82. Tantalisingly close to the inlining budget!

**Hint 6**

At this stage, you may be (as I was) running out of ideas for further reducing
`TrimOWS`’s inlining cost. But try offsetting the loop variable of the second
loop by one…

“`
func TrimOWS(s string, n int) (_ string, _ bool) { for i := range len(s) { if !isOWS(s[i]) { s = s[i:] break } if i >= n { return } } – for i := len(s) – 1; i >= 0; i– { – if !isOWS(s[i]) { – return s[:i+1], true + for i := len(s); i > 0; i– { + if !isOWS(s[i-1]) { + return s[:i], true } – if i &1 | grep TrimOWS ./ows.go:9:6: can inline TrimOWS with cost 78 as: func(string, int) (string, bool) { for loop; for loop; return “”, true }
“`

Mission accomplished. 🎉

## My solution ¶

Below is my complete final implementation (click to reveal), both inlinable and free of bounds checks:

**Final implementation**

However, I think that such directives are unlikely to ever be supported by the Go compiler. Recall that neither the inlining logic nor the bounds-check-elimination logic are not enshrined in the language specification; they’re mere implementation details of the compiler and, as such, they may (and do, in my experience) evolve. The introduction of such compiler directives as discussed above would open the undesirable possibility that existing programs break as a result of changes to the compiler, a possibility that would violate Go 1.0’s compatibility promise.

Third-party tools, such as Jordan Lewis’s `gcassert`
command, purport (among other goals) to fill the gap left by the
absence of such compiler directives. However, I must admit that I’m reluctant
to pepper my code with third-party annotations; doing so would go against the
spirit of minimalism that I’ve grown accustomed to since I took my first step
in the world of Go.

And perhaps a good suite of benchmarks is all I need. After all, some inlinability regressions and bounds-check-elimination regressions may only have a negligible effect on performance and not be worth attending to.

## Takeaways ¶

– Eliminating most bounds checks from a function while also making it eligible for inlining can unlock an execution speedup.
– Through trial and error, a series of simple refactorings can lower a function’s inlining cost and reduce the number of expensive bounds checks deemed necessary by the Go compiler.
– Micro-optimisation sometimes is a futile endeavour but often proves intellectually rewarding.

If you’d like to learn more about inlining and bounds-check elimination, I
recommend you peruse Tapir Liu’s _Go Optimizations 101_
e-book.

I hope that you’ve enjoyed this challenge. Till next time.

Leave a Reply Cancel reply