The syntax thread!

Oh there are so many languages that I like so much syntax of, but I will start with FORTH, it has the ‘purest’ and most simple syntax, it’s just a stream of WORDS. Anything can be a word except whitespaces, whitespace delineates WORDS, and a WORD is any sequence of characters excluding whitespace. ^.^

A WORD you can think of as a function in FORTH, although whether that’s a built-in or so can be debatable, and you can rebind WORD’s, and WORD’s are only run one at a time, so a single WORD will be run to completion before it even looks at the next. You can write a fully conformant FORTH interpreter in just a few hundred lines of code in a verbose language (like a dozen or so in Python). FORTH runs on satellite’s even to this day to more things than people would expect, it is fast and short and easy to update.

The basic concept of FORTH is there is a STACK, which you can push to and pop from, though there are WORDS that allow you do to things like assign variables and such as well in most implementations.

For example DUP will duplicate what is on top of the stack, so if the number 1 was on it, now there will be two 1's. To put a 1 on the stack you can just use the WORD 1, which defaults to a built-in function that just parses that integer and puts it onto the stack, all such integers are ‘defaulted’ in such a way, though in some implementations you can override that if you so wish. + is a built-in function (again, can be overridden by the user, but ew in this case?), it just pops the top two items off the stack and puts on to the stack the added value of the two things it popped off.

So if you run:

1 2 +

Then you will end up with 3 on the stack as it first pushes a 1 when it runs 1, then pushes a 2 when it runs 2, then it pops the 2 and then the 1 and pushes a 3 when it runs the + WORD.

To define a function you use the : WORD, which pops WORD’s from the input stream until it hits a ; WORD and stores them into the named WORD. So a function like : DOUBLE 2 * ; would define a DOUBLE word that pushes a 2 then runs *, which will pop the top two stack values, multiply them together, and push on the result, so doing 3 DOUBLE would then have 6 on the stack. You can optionally have a comment after the name in the function definition inside of (...) so by convention the stack input along with -- then the stack output is generally put there, so for DOUBLE WORD you’d document it like : DOUBLE (a -- b) 2 * ; to state it pops one input and pushes one output, and of course you can write text to describe documentation as well. For note, even things like ; are just functions too, though for ; it’s : ; POSTPONE EXIT REVEAL POSTPONE [ ; IMMEDIATE, where the WORD POSTPONE means to take a WORD that you would normally call immediately and instead postpone it, like take the pointer to a function in most languages. EXIT means to exit compilation mode, REVEAL will reveal the new WORD so it can be used, etc… FORTH itself needs very very few built-ins defined (though there are usually a lot more for speed reasons), and even the act of defining a function is actually taking the function pointers of WORDS and inlining them, usually directly to machine code (often with some simple optimizations to inline things), hence why FORTH is usually quite FAST. ^.^

For note, FORTH is usually case insensitive, but full-caps is most often used by convention except in strings (" WORD).

Let’s see a simple popular tutorial set of code:

: STAR	           [CHAR] * EMIT ;
: STARS	           0 DO STAR LOOP CR ;
: SQUARE	       DUP 0 DO DUP STARS LOOP DROP ;
: TRIANGLE	       1 DO I STARS LOOP ;
: TOWER ( n -- )   DUP TRIANGLE SQUARE ;

This allows you to call, say, 4 TOWER and it will print out:

*
**
***
****
****
****
****

For each function:

  • : STAR [CHAR] * EMIT ; This defines a WORD named STAR, the [CHAR] word takes the next WORD and treats it as a character array (basically a string), so * in this case, and then EMIT's it to the screen. So calling STAR will just print a *. For note, [CHAR] is ansi forth setups rather than the FORTH standard, to be standard compliant just change [CHAR] * to 42. ^.^
  • : STARS 0 DO STAR LOOP CR ; This defines a WORD named STARS, then pushes a 0 onto the stack, then calls the WORD DO, which will pop the top two numbers off the stack and consume WORD’s up until LOOP is encountered, then run them repeatedly the number of times as the number popped off stack (0 to N, where N is what was popped), then lastly it prints a carriage return (\n in other words). So calling 3 STARS will print ***\n.
  • : SQUARE DUP 0 DO DUP STARS LOOP DROP ; This defines a WORD named SQUARE, which first DUPlicates what’s on top of the stack, then pushes a 0, then does a loop of calling STARS that number of times after duplicating the input again, then drops the number that’s on the stack, so calling 2 SQUARE would print **\n**\n.
  • : TRIANGLE 1 DO I STARS LOOP ; This loops from 1 to the passed in number and calls STARS with each iteration of I (the internal variable set by DO for its loop by default), so calling 4 TRIANGLE would print *\n**\n***\n, so a triangle of one size less than the number pass in
  • : TOWER ( n -- ) DUP TRIANGLE SQUARE ; This just duplicates the input, then calls TRIANGLE (which pops one of those off) then calls SQUARE (popping the original off), and that’s it.

But FORTH is pure function, most programs in FORTH will read a lot like english sentences, you can define any kind of DSEL as you want, etc… FORTH and LISP have very similar ideas, both can create DSEL’s with impunity, but where I’d say FORTH is a purity of function, LISP is a purity of form, they do things in very different ways.

FORTH being so short and easy to implement even in raw machine code makes it really common to bootstrap micro-projects that need to be reprogrammable, plus it’s very fun to program in as it is so different than essentially any other language. ^.^

3 Likes