fanf: (Default)
[personal profile] fanf

https://dotat.at/@/2025-05-13-if-is.html

About half a year ago I encountered a paper bombastically titled "the ultimate conditional syntax". It has the attractive goal of unifying pattern match with boolean if tests, and its solution is in some ways very nice. But it seems over-complicated to me, especially for something that's a basic work-horse of programming.

I couldn't immediately see how to cut it down to manageable proportions, but recently I had an idea. I'll outline it under the "penultimate conditionals" heading below, after reviewing the UCS and explaining my motivation.

what the UCS?

The ultimate conditional syntax does several things which are somewhat intertwined and support each other.

  • An "expression is pattern" operator allows you to do pattern matching inside boolean expressions.

    Like "match" but unlike most other expressions, "is" binds variables whose scope is the rest of the boolean expression that might be evaluated when the "is" is true, and the consequent "then" clause.

  • You can "split" tests to avoid repeating parts that are the same in successive branches. For example,

        if num < 0 then -1 else
        if num > 0 then +1 else 0
    

    can be written

        if num < 0 then -1
               > 0 then +1
                   else  0
    

    The example shows a split before an operator, where the left hand operand is the same and the rest of the expression varies. You can split after the operator when the operator is the same, which is common for "is" pattern match clauses.

  • Indentation-based syntax (an offside rule) reduces the amount of punctuation that splits would otherwise need. An explicit version of the example above is

        if { x { { < { 0 then −1 } };
                 { > { 0 then +1 } }; else 0 } }
    

    (This example is written in the paper on one line. I've split it for narrow screens, which exposes what I think is a mistake in the nesting.)

  • You can also intersperse let bindings between splits. I doubt the value of this feature, since "is" can also bind values, but interspersed let does have its uses. The paper has an example using let to avoid rightward drift:

        if  let tp1_n = normalize(tp1)
            tp1_n is Bot then Bot
            let tp2_n = normalize(tp2)
            tp2_n is Bot then Bot
            let m = merge(tp1_n, tp2_n)
            m is Some(tp) then tp
            m is None then glb(tp1_n, tp2_n)
    

    It's probably better to use early return to avoid rightward drift.

    The desugaring uses let bindings when lowering the UCS to simpler constructions.

whence UCS

Pattern matching in the tradition of functional programming languages supports nested patterns that are compiled in a way that eliminates redundant tests. For example, this example checks that e1 is Some(_) once, not twice as written.

    if e1 is Some(Left(lv)) then e2
             Some(Right(rv)) then e3
             None then e4

Being cheeky, I'd say UCS introduces more causes of redundant checks, then goes to great effort to to eliminate redundant checks again.

Splits reduce redundant code at the source level; the bulk of the paper is about eliminating redundant checks in the lowering from source to core language.

I think the primary cause of this extra complexity is treating the is operator as a two-way test rather than a multi-way match. Splits are introduced as a more general (more complicated) way to build multi-way conditions out of two-way tests.

There's a secondary cause: the tradition of expression-oriented functional languages doesn't like early returns. A nice pattern in imperative code is to write a function as a series of preliminary calculations and guards with early returns that set things up for the main work of the function. Rust's ? operator and let-else statement support this pattern directly. UCS addresses the same pattern by wedging calculate-check sequences into if statements, as in the normalize example above.

out of scope

I suspect UCS's indentation-based syntax will make programmers more likely to make mistakes, and make compilers have more trouble producing nice error messages. (YAML has put me off syntax that doesn't have enough redundancy to support good error recovery.)

So I wondered if there's a way to have something like an "is pattern" operator in a Rust-like language, without an offside rule, and without the excess of punctuation in the UCS desugaring.

But I couldn't work out how to make the scope of variable bindings in patterns cover all the code that might need to use them. The scope needs to extend into the consequent then clause, but also into any follow-up tests -- and those tests can branch so the scope might need to reach into multiple then clauses.

The problem was the way I was still thinking of the then and else clauses as part of the outer if. That implied the expression has to be closed off before the then, which troublesomely closes off the scope of any is-bound variables. The solution -- part of it, at least -- is actually in the paper, where then and else are nested inside the conditional expression.

penultimate conditionals

There are two ingredients:

  • The then and else clauses become operators that cause early return from a conditional expression.

    They can be lowered to a vaguely Rust syntax with the following desugaring rules. The 'if label denotes the closest-enclosing if; you can't use then or else inside the expr of a then or else unless there's another intervening if.
    then expr && break 'if expr
    else expr || break 'if expr
    else expr || _ && break 'if expr

    There are two desugarings for else depending on whether it appears in an expression or a pattern.

    If you prefer a less wordy syntax, you might spell then as => (like match in Rust) and else as || =>. (For symmetry we might allow && => for then as well.)

  • An is operator for multi-way pattern-matching that binds variables whose scope covers the consequent part of the expression.

    The basic form is like the UCS,

    scrutinee is pattern

    which matches the scrutinee against the pattern returning a boolean result. For example,

        foo is None
    

    Guarded patterns are like,

    scrutinee is pattern && consequent

    where the scope of the variables bound by the pattern covers the consequent. The consequent might be a simple boolean guard, for example,

        foo is Some(n) && n < 0
    

    or inside an if expression it might end with a then clause,

        if foo is Some(n) && n < 0 => -1
        // ...
    

    Simple multi-way patterns are like,

    scrutinee is { pattern || pattern || ... }

    If there is a consequent then the patterns must all bind the same set of variables (if any) with the same types.

    More typically, a multi-way match will have consequent clauses, like

    scrutinee is {
    pattern && consequent ||
    pattern && consequent ||
    => otherwise }

    When a consequent is false, we go on to try other alternatives of the match, like we would when the first operand of boolean || is false.

    To help with layout, you can include a redundant || before the first alternative.

    For example,

        if foo is {
            || Some(n) && n < 0 => -1
            || Some(n) && n > 0 => +1
            || Some(n) => 0
            || None => 0
        }
    

    Alternatively,

        if foo is { Some(n) && ( n < 0 => -1 ||
                                 n > 0 => +1 ||
                                 => 0 )
                 || None => 0 }
    

    (They should compile the same way.)

The evaluation model is like familiar shortcutting && and || and the syntax is supposed to reinforce that intuition. The UCS paper spends a lot of time discussing backtracking and how to eliminate it, but penultimate conditionals evaluate straightforwardly from left to right.

The paper briefly mentions as patterns, like

    Some(Pair(x, y) as p)

which in Rust would be written

    Some(p @ Pair(x, y))

The is operator doesn't need a separate syntax for this feature:

    Some(p is Pair(x, y))

For large examples, the penultimate conditional syntax is about as noisy as Rust's match, but it scales down nicely to smaller matches. However, there are differences in how consequences and alternatives are punctuated which need a bit more discussion.

dangling syntax

The precedence and associativity of the is operator is tricky: it has two kinds of dangling-else problem.

The first kind occurs with a surrounding boolean expression. For example, when b = false, what is the value of this?

    b is true || false

It could bracket to the left, yielding false:

    (b is true) || false

Or to the right, yielding true:

    b is { true || false }

This could be disambiguated by using different spellings for boolean or and pattern alternatives. But that doesn't help for the second kind which occurs with an inner match.

    foo is Some(_) && bar is Some(_) || None

Does that check foo is Some(_) with an always-true look at bar

    ( foo is Some(_) ) && bar is { Some(_) || None }

Or does it check bar is Some(_) and waste time with foo?

    foo is { Some(_) && ( bar is Some(_) ) || None }

I have chosen to resolve the ambiguity by requiring curly braces {} around groups of alternative patterns. This allows me to use the same spelling || for all kinds of alternation. (Compare Rust, which uses || for boolean expressions, | in a pattern, and , between the arms of a match.)

Curlies around multi-way matches can be nested, so the example in the previous section can also be written,

    if foo is {
        || Some(n) && n < 0 => -1
        || Some(n) && n > 0 => +1
        || { Some(0) || None } => 0
    }

The is operator binds tigher than && on its left, but looser than && on its right (so that a chain of && is gathered into a consequent) and tigher than || on its right so that outer || alternatives don't need extra brackets.

examples

I'm going to finish these notes by going through the ultimate conditional syntax paper to translate most of its examples into the penultimate syntax, to give it some exercise.

Here we use is to name a value n, as a replacement for the |> abs pipe operator, and we use range patterns instead of split relational operators:

    if foo(args) is {
        || 0 => "null"
        || n && abs(n) is {
            || 101.. => "large"
            || ..10 => "small"
            || => "medium"
        )
    }

In both the previous example and the next one, we have some extra brackets where UCS relies purely on an offside rule.

    if x is {
        || Right(None) => defaultValue
        || Right(Some(cached)) => f(cached)
        || Left(input) && compute(input) is {
            || None => defaultValue
            || Some(result) => f(result)
        }
    }

This one is almost identical to UCS apart from the spellings of and, then, else.

    if name.startsWith("_")
    && name.tailOption is Some(namePostfix)
    && namePostfix.toIntOption is Some(index)
    && 0 <= index && index < arity
    && => Right([index, name])
    || => Left("invalid identifier: " + name)

Here are some nested multi-way matches with overlapping patterns and bound values:

    if e is {
        // ...
        || Lit(value)
            && Map.find_opt(value) is Some(result)
            => Some(result)
        // ...
        || { Lit(value) ||
             Add(Lit(0), value) ||
             Add(value, Lit(0)) }
            => {
                print_int(value);
                Some(value)
            }
        // ...
    }

The next few examples show UCS splits without the is operator. In my syntax I need to press a few more buttons but I think that's OK.

    if x == 0 => "zero"
    || x == 1 => "unit"
    || => "?"

    if x == 0 => "null"
    || x > 0 => "positive"
    || => "negative"

    if predicate(0, 1) => "A"
    || predicate(2, 3) => "B"
    || => "C"

The first two can be written with is instead, but it's not briefer:

    if x is {
        || 0 => "zero"
        || 1 => "unit"
        || => "?"
    }

    if x is {
        || 0 => "null"
        || 1.. => "positive"
        || => "negative"
    }

There's little need for a split-anything feature when we have multi-way matches.

    if foo(u, v, w) is {
        || Some(x) && x is {
            || Left(_) => "left-defined"
            || Right(_) => "right-defined"
        }
        || None => "undefined"
    }

A more complete function:

    fn zip_with(f, xs, ys) {
        if [xs, ys] is {
            || [x :: xs, y :: ys]
                && zip_with(f, xs, ys) is Some(tail)
                => Some(f(x, y) :: tail)
            || [Nil, Nil] => Some(Nil)
            || => None
        }
    }

Another fragment of the expression evaluator:

    if e is {
        // ...
        || Var(name) && Map.find_opt(env, name) is {
            || Some(Right(value)) => Some(value)
            || Some(Left(thunk)) => Some(thunk())
        }
        || App(lhs, rhs) => // ...
        // ...
    }

This expression is used in the paper to show how a UCS split is desugared:

    if Pair(x, y) is {
        || Pair(Some(xv), Some(yv)) => xv + yv
        || Pair(Some(xv), None) => xv
        || Pair(None, Some(yv)) => yv
        || Pair(None, None) => 0
    }

The desugaring in the paper introduces a lot of redundant tests. I would desugar straightforwardly, then rely on later optimizations to eliminate other redundancies such as the construction and immediate destruction of the pair:

    if Pair(x, y) is Pair(xx, yy) && xx is {
        || Some(xv) && yy is {
            || Some(yv) => xv + yv
            || None => xv
        }
        || None && yy is {
            || Some(yv) => yv
            || None => 0
        }
    }

Skipping ahead to the "non-trivial example" in the paper's fig. 11:

    if e is {
        || Var(x) && context.get(x) is {
            || Some(IntVal(v)) => Left(v)
            || Some(BoolVal(v)) => Right(v)
        }
        || Lit(IntVal(v)) => Left(v)
        || Lit(BoolVal(v)) => Right(v)
        // ...
    }

The next example in the paper compares C# relational patterns. Rust's range patterns do a similar job, with the caveat that Rust's ranges don't have a syntax for exclusive lower bounds.

    fn classify(value) {
        if value is {
            || .. -4.0 => "too low"
            || 10.0 .. => "too high"
            || NaN => "unknown"
            || => "acceptable"
        }
    }

I tend to think relational patterns are the better syntax than ranges. With relational patterns I can rewrite an earlier example like,

    if foo is {
        || Some(< 0) => -1
        || Some(> 0) => +1
        || { Some(0) || None } => 0
    }

I think with the UCS I would have to name the Some(_) value to be able to compare it, which suggests that relational patterns can be better than UCS split relational operators.

Prefix-unary relational operators are also a nice way to write single-ended ranges in expressions. We could simply write both ends to get a complete range, like >= lo < hi or like

    if value is > -4.0 < 10.0 => "acceptable"
    || => "far out"

Near the start I quoted a normalize example that illustrates left-aligned UCS expression. The penultimate version drifts right like the Scala version:

    if normalize(tp1) is {
        || Bot => Bot
        || tp1_n && normalize(tp2) is {
            || Bot => Bot
            || tp2_n && merge(tp1_n, tp2_n) is {
                || Some(tp) => tp
                || None => glb(tp1_n, tp2_n)
            }
        }
    }

But a more Rusty style shows the benefits of early returns (especially the terse ? operator) and monadic combinators.

    let tp1 = normalize(tp1)?;
    let tp2 = normalize(tp2)?;
    merge(tp1, tp2)
        .unwrap_or_else(|| glb(tp1, tp2))

antepenultimate breath

When I started writing these notes, my penultimate conditional syntax was little more than a sketch of an idea. Having gone through the previous section's exercise, I think it has turned out better than I thought it might.

The extra nesting from multi-way match braces doesn't seem to be unbearably heavyweight. However, none of the examples have bulky then or else blocks which are where the extra nesting is more likely to be annoying. But then, as I said before it's comparable to a Rust match:

    match scrutinee {
        pattern => {
            consequent
        }
    }

    if scrutinee is {
        || pattern => {
            consequent
        }
    }

The || lines down the left margin are noisy, but hard to get rid of in the context of a curly-brace language. I can't reduce them to | like OCaml because what would I use for bitwise OR? I don't want presence or absence of flow control to depend on types or context. I kind of like Prolog / Erlang , for && and ; for ||, but that's well outside what's legible to mainstream programmers. So, dunno.

Anyway, I think I've successfully found a syntax that does most of what UCS does, but much in a much simpler fashion.

July 2025

S M T W T F S
  1 2345
6789101112
13141516171819
20212223242526
2728293031  

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated 2025-07-04 10:11
Powered by Dreamwidth Studios