Functional Programming in Java, Second Edition: p.139 Intro to "Optimizing Recursions" - very confusing

We read:

Recursion is a powerful and charming way to solve problems. It’s highly expressive—using recursion we can provide a solution to a problem by applying the same solution to its subproblems, an approach known as divide and conquer. Various applications employ recursion, such as for finding the shortest distances on a map, computing minimum cost or maximum profit, or reducing waste.

Well, as Gödel’s Theory of recursive functions tells us, using recursion is just another way of looking at computation (besides Turing Machines, Lambda Calculus, finite automata with access to two stacks or what have you), in particular in settings where data is immutable (i.e. functional languages)

Proposing:

Recursion is just another way to avoid having to write (a lot, imbricated and complex) iterative for or while loops. While many problems are easy to express using loops - which is why inherently recursive languages like Prolog have been given looping constructs [see for example this paper by the designer of the PICAT language: https://www.sci.brooklyn.cuny.edu/~zhou/papers/loops.pdf ], many problems are best expressed using recursion. These include in particular visiting and building recursive data structures like lists, trees or graphs. More generally, recursive algorithms should be used if a large problem shall be divided into smaller but similar subproblems (“divide-and-conquer” for independent and “dynamic programming” for interdependent subproblems). The theory and practice of “search and optimization” uses recursion extensively in both specification and implementation.

The last sentence of this chapter also says:

The approach we used here made recursions feasible for large input. Next we’ll see how to make them practical from a performance point of view.

But we didn’t even have any input! Similary suggesting:

The approach we used here made tail-recursive procedures performing “deep recursion” feasible. Next we’ll see how to make them practical from a performance point of view.

And then we continue …

Most languages in use today support recursion [well, one would hope so, it’s 60 years after LISP, emulating a stack in the database is not fun…]. However, a recursive procedure, if written naively, grows the stack whenever it calls itself - a new stack frame is created whenever one would perform a passage through a loop in an iterative solution. Going through a 10’000 entry list would thus mean 10’000 frames on the stack. Stack overflow becomes a risk for large inputs. Recursion-friendly languages are able to optimize excessive stack growth away in certain circumstances via “tail-call optimization” (TCO) during the compilation phase, essentially transforming recursion into iteration.

At this point, I refer to Wikipedia, which says:

[Tail call - Wikipedia]

For example, in the JVM, tail-recursive calls can be eliminated (as this reuses the existing call stack), but general tail calls cannot be (as this changes the call stack). [ scala - What limitations does the JVM impose on tail-call optimization - Software Engineering Stack Exchange ] As a result, functional languages such as [Scala] that target the JVM can efficiently implement direct tail recursion, but not mutual tail recursion.

But there seems to be a problem because “tail-call optimization” as explained in the book is not the one I know.

Going by for example this page:

[Java Tail Call Optimization | Delft Stack]

we find the following:

The tail call optimization is a process where we can avoid allocating one new stack frame for a function because the calling function will return a value it receives from the called function.

This applies to not only recursion but to any call chain where no operations are performed in the calling procedure except the return:

One of the most common uses is the tail-recursion (a recursive function where a function calls itself at the end/tail), where the recursive method is written to take advantage of tail call optimization and can use constant stack space.

So we have:

  • procedure with a “tail call”: a procedure where the last instruction is a return someOtherProcedure(some, values);
  • procedure with a “recursive tail call”: as above but the procedure calls itself;
  • tail call optimization: compiler-performed optimization that makes sure no additional stack frame is created for “tail call”, which may or may not be recursive (not entirely feasible on the JVM apparently).

The book says:

TCO lets us convert regular recursive calls into tail calls to make recursions practical for large inputs.

On the next page, page 140, we read:

Let’s start with a piece of code for computing a factorial using a simple unoptimized recursion.

But what is shown is not only an unoptimized recursion, it is not even a tail recursion (as correctly noted, there is the multiplication operation waiting to be performed when we “return from the recursive descent”), so we can’t hope for compiler optimization at all. We need to change the order of operations to make it a tail recursion so that the compiler can optimize it away into an iteration using a single stack frame.

On page 141:

Ideally we would like to rely on the compiler to provide such optimization, but since it doesn’t, we can use lambda expressions to do this manually, as we’ll see next.

Well, I don’t see how it could. Even Prolog doesn’t manage. We need to break this ice ourselves.

Which we then proceed to do with a stream-based implementation.

Note that the “stream” based approach is said to be “lazy”, and it is, but is that really the key characteristic? What it essentially does is:

  • emulate the stack, using a stream
  • emulate tail-call optimization (i.e. a non-growing stack) by handing the last used used emulated stack frame (an instance of TailCall which could alternatively be called StackFrame) to the garbage collector ASAP, as soon as the next stack frame has been generated with apply(), inside Stream.iterate(). Very slick. :ok_hand:

An experiment to compare the naive version with the accumulator-based version with the one based on a stream

Instead of “factorial”, which blows up function-wise (while we just want to blow up stack-wise), let’s just use sum-over-integers instead of product-over-integers. This relieves us also from having to later modify the simple example with BigIntegers to accept humonguous numbers, a problems that is entirely incidental to showing how the “very slick stack simulator” (VSSS) is programmed out.

import org.junit.jupiter.api.Test;

import java.util.function.Function;
import java.util.stream.Stream;

// ----------------------------------------------------------------------------------------------
// The "very slick stack simulator" (VSSS) for optimizable (not necessarily recursive) tail calls
// ----------------------------------------------------------------------------------------------

@FunctionalInterface
interface TailCall<T> {
    TailCall<T> apply();

    default boolean isComplete() {
        return false;
    }

    default T result() {
        throw new Error("not implemented");
    }

    default T invoke() {
        return Stream.iterate(this, TailCall::apply)
                .filter(TailCall::isComplete)
                .findFirst()
                .get() // The linter tells us: "Optional.get() without "isPresent()" check"
                .result();
    }
}

class TailCalls {

    // call() simply exists so that usage has a symmetric look
    public static <T> TailCall<T> call(final TailCall<T> nextCall) {
        return nextCall;
    }

    public static <T> TailCall<T> done(final T value) {
        return new TailCall<T>() {
            @Override
            public boolean isComplete() {
                return true;
            }

            @Override
            public T result() {
                return value;
            }

            @Override
            public TailCall<T> apply() {
                throw new Error("not implemented");
            }
        };
    }
}

// -----------------------------
// An application of VSSS
// -----------------------------

class SumRecursivelyWithVSSS {

    public static long sum(final int number) {
        return sum_inner(1, number).invoke();
    }

    // Note that this method is *never* called recursively!

    public static TailCall<Long> sum_inner(final long accumulator, final int number) {
        if (number == 1) {
            // Return the "TailCalls" instance that ends the stream
            return TailCalls.done(accumulator);
        } else {
            // "call()" does nothing, and we could just return the closure directly, but it looks nice
            return TailCalls.call(
                    // When called by Stream.iterate(), this closure is supposed to generate&return the
                    // "next TailCall instance" of the stream
                    () -> sum_inner(accumulator + number, number - 1)
            );
        }
    }
}

// -----------------------------
// Summing using a recursive call that is a proper tail call (and should be optimizable)
// -----------------------------

class SumRecursivelyUsingTailCalls {

    public static long sum(final int number) {
        return sum_inner(1, number);
    }

    // Tail-recursive: uses an accumulator while going "down", on the way back "up"
    // there are only "returns". This can be optimized so that only 1 stack frame is used
    // but the Java compiler (or the JVM?) does not do that fully.
    // Stack overflows after some time.

    private static long sum_inner(final long accumulator, final int number) {
        if (number == 1) {
            return accumulator;
        } else {
            // A (in this ase recursive) tail call: no further computation needs to be
            // done on return -- just return. The compiler can find out that it really
            // doesn't need to keep the current stack frame when calling sum_inner()
            // But JVM seems to additional constraints on stack tracing so the call cannot
            // be optimized away fully.
            return sum_inner(accumulator + number, number - 1);
        }
    }
}

// -----------------------------
// Summing using a recursive call that is not a tail call (avoid if possible, though
// it is not always possible)
// -----------------------------

class SumRecursivelyNaively {

    public static long sum(final int number) {
        return (number == 1) ? 1 : (number + sum(number - 1));
    }
}

// -----------------------------
// JUnit Test Cases: Executable stuff that you can click on!
// -----------------------------

public class Chapter8 {

    final static int limit = 70000;

    static boolean callSum(final boolean skip, final String name, int n, Function<Integer, Long> sum) {
        boolean skipNextTime = skip;
        if (!skip) {
            try {
                long res = sum.apply(n);
                // Properly, this test should be in the caller
                if (n == limit - 1) {
                    System.out.println("Reached the end: " + name + "(" + n + ") = " + res);
                }
            } catch (StackOverflowError err) {
                System.out.println("Stack overflow for " + name + "(" + n + ")");
                skipNextTime = true;
            }
        }
        return skipNextTime;
    }

    @Test
    void runThem() {
        boolean skipNaiveVersion = false;
        boolean skipTailCallVersion = false;
        boolean skipVsssVersion = false;
        for (int n = 1; n < limit && !(skipTailCallVersion && skipNaiveVersion && skipVsssVersion); n += 1) {
            skipNaiveVersion = callSum(skipNaiveVersion, "SumRecursivelyNaively.sum", n, SumRecursivelyNaively::sum);
            skipTailCallVersion = callSum(skipTailCallVersion, "SumRecursivelyUsingTailCalls.sum", n, SumRecursivelyUsingTailCalls::sum);
            skipVsssVersion = callSum(skipVsssVersion, "SumRecursivelyWithVSSS.sum", n, SumRecursivelyWithVSSS::sum);
        }
    }

}

Let’s run the above with a stacksize of 200 KiB (JVM option -Xss200K).

Stack overflow for SumRecursivelyNaively.sum(38919)
Stack overflow for SumRecursivelyUsingTailCalls.sum(58377)
Reached the end: SumRecursivelyWithVSSS.sum(69999) = 2449965000

The SumRecursivelyNaively.sum() is not tail-recursive and the stack overflows rather rapidly.

The accumulator-based version SumRecursivelyUsingTailCalls.sum(), which corresponds to what is in the the book, is properly “tail recursive” according to the usual definition of a “tail call”. According to various discussion on the Internet, the Java compiler should optimize this, but it does … not really … it runs out of stack a bit later than the naive version but it does run out. Could be the JVM that wants to keep at least a stack trace.

Again, according to:

[Java Tail Call Optimization | Delft Stack]

The first reason [for a lack of TCO in the JVM] is that Tail Call Optimization is pricey. Second, according to a rule in Java, we get the stack trace whenever we face an error.

And the stack traces are well-defined concepts that hardly can track one stack frame for every logical call. This will not be possible if tail call optimization exists in Java.

Okay.

The stream-based version SumRecursivelyWithVSSS.sum() however, survives to deliver the result!

The topic of tail recursion was more of an experimentation to show some of the capabilities. Eventually the JVM, I hope, will implement TCO and that’s what we should be relying on. Will revisit this in the future base on how things evolve. Thank you.

1 Like