Beginning Factor - Shufflers & Combinators

In the previous installment of Beginning Factor, we discussed some of the attributes of stack-based languages and the syntax for defining words in Factor. This time around, I’d like to introduce stack shufflers, quotations & combinators, and touch on some more basic data types and their properties.

Up until now, we’ve essentially been using Factor as an overqualified RPN calculator. I just wanted to make sure that you don’t underestimate Factor because of these particular examples; Factor is an extremely capable and modern language that can be used for everything from web applications, to game development, to complex text parsing, and so on. I’m purposefully using over-simplified examples as a means to demonstrate specific points about the language. Stick with me, and I assure you the examples will gradually get more expressive.

Stack Shufflers

Because of Factor’s stack-based nature, you sometimes need the ability to rearrange or copy items in the stack to ensure that they are in the correct position for future use. The way to go about this is a group of words called “stack shufflers”.

Much like the name implies, stack shufflers are merely words that change the order (or number) of the items on the top of the stack. It is said that shuffle words “control the flow of data between words that perform actions.” In fact, you already know one stack shuffler… drop

There are three basic varieties of stack shufflers. Here they are along with the most commonly used word for each type:

  1. Removing Shufflers

    drop ( x -- )
    removes the top item from the stack and discards it
  2. Duplicating Shufflers

    dup ( x -- x x )
    duplicates the top item on the stack
  3. Permuting Shufflers

    swap ( x y -- y x )
    exchanges the positions of the first and second items on the top of the stack

Swapping Two Items on the Top of a Stack

…and here are a couple of word definitions that use these shufflers appropriately:

: sq ( x -- y )
    dup * ;

: neg ( x -- -x )
    0 swap - ;

(scratchpad) 5 sq .
25
(scratchpad) 1 neg .
-1

Caveats

There are many more shuffle words that support intricate rearrangements and the duplication/removal of multiple items from differing locations within the stack. The problem is, code that is full of stack shufflers can easily become confusing. In general, you should try to minimize the use of stack shufflers to keep things understandable.

Of course there are exceptions to that rule, times when shufflers make the most sense, but the preferred alternative to complicated stack shufflers is the use of a “combinator” that fits your use case. That said, combinators will take a bit of explaining…

Quotations

Before we can get to the idea of combinators, we first have to discuss quotations:

(scratchpad) 5 [ "hello" print ] times
hello
hello
hello
hello
hello

In this example, [ "hello" print ] is a quotation.

In layman’s terms, a quotation is a way to encapsulate a snippet of code so it doesn’t get called right away, but can be passed around on the stack and called later. The computer science term for this is an “anonymous function”. It’s referred to as “anonymous” because it is fundamentally a word that has no name.

If we didn’t have quotations, that same example would get ugly real fast:

(scratchpad) "hello" dup dup dup dup print print print print print
hello
hello
hello
hello
hello

Imagine if you wanted to print “hello” 1000 times!

To create a quotation, you just surround your code snippet with square brackets, and it will be pushed onto the stack: [ ... ]. In order to do anything useful with that quotation once it’s on the stack, what you need is a combinator.

What is a Combinator?

A combinator is just a fancy name for a word that takes a quotation as one of its inputs. In the example above, times is a combinator which takes an integer (n) and a quotation from the stack, and it calls that quotation n times.

Factor uses quotations & combinators extensively for conditionals, sequence traversal, namespaces, closures, and more…but that’s jumping the gun a bit. Before we dive into all of that, I’d like to get back to the idea of minimizing the use of stack shufflers by replacing them with appropriate combinators instead.

Combinators That Express Intent

While our code examples thus far have been easy to follow, when you start tackling more realistic problems, stack shufflers will begin to obfuscate your code. If we can represent our intentions consistently with a combinator instead, then code becomes cleaner and you can consequently focus more on your problem domain and less on mentally organizing your stack.

To illustrate this point, I’d like to introduce a couple of simple combinators:

dip ( x quot -- x )
calls a quotation while temporarily hiding the top item on the stack
keep ( x quot -- x )
calls a quotation with an item on the stack, restoring that item after the quotation returns

While both of these combinators have the same stack effect declaration, their usage is a bit different:

(scratchpad) 1 2 4 [ + ] dip

--- Data stack:
3
4
(scratchpad) clear
(scratchpad) 1 2 4 [ + ] keep

--- Data stack:
1
6
4

These two combinators alone can greatly reduce the number of stack shufflers your code will need. If you’re curious about how these combinators work, they both secretly take advantage of an auxiliary stack (called the “retain stack”) to temporarily store items while the supplied quotation is being executed. There are a few other retain stack combinators that are worth exploring as well.

Cleave, Spread, and Apply

The cleave, spread, and apply combinators are your best weapons when trying to reduce the use of stack shufflers while simultaneously expressing intent. The key point being that they should express intent…if you find yourself writing code where these combinators don’t fit logically, then try another option.

  1. Cleave Combinators

    These are used when you want to apply multiple quotations to the same set of items on the top of the stack.

    Let’s say that you want to find the average of a bunch of numbers in an array. The steps are straightforward, you take the sum of all the numbers and divide that sum by how many numbers you have (the length of the array):

    (scratchpad) { 1 2 3 } dup sum swap length / .
    2

    We can eliminate the need for those stack shufflers and better express our intent by using a cleave combinator to achieve the same thing:

    bi ( x p q -- )
    applies quotation p to x, then applies quotation q to x

    Cleave Combinator Demonstration

    (scratchpad) { 1 2 3 } [ sum ] [ length ] bi / .
    2

    The different cleave combinators change either the number of quotations applied to your items (bi vs. tri), or the number of items used as input for your quotations (bi vs. 2bi).

  2. Spread Combinators

    These are used when you want to apply a different quotation to different items on the top of the stack. The spread combinators are closely related to dip, but provide a bit more flexibility while also expressing intent.

    Let’s say that you have two coordinate positions in the form of { x y }, and you’d like to extract the x-coordinate from the first position, and the y-coordinate from the second position to form a new position with those values:

    (scratchpad) { 1 2 } { 3 4 } swap first swap second 2array .
    { 1 4 }

    We can eliminate the need for those stack shufflers and better express our intent by using a spread combinator to achieve the same thing:

    bi* ( x y p q -- )
    applies quotation p to x, then applies quotation q to y

    Spread Combinator Demonstration

    (scratchpad) { 1 2 } { 3 4 } [ first ] [ second ] bi* 2array .
    { 1 4 }

    When you want to do the same thing with more than two quotations/items, then using spread combinators eliminates the need for nested dips or shufflers and the added clarity becomes much more evident.

    The different spread combinators change the number of quotations applied to the corresponding number of items on the stack (bi* vs. tri*).

  3. Apply Combinators

    These are used when you want to apply a single quotation to multiple items on the top of the stack.

    Let’s say that you have two strings, each containing a name, and you want to see if those names are the same. In order to ignore case when doing the comparison, you decide to convert both strings to upper case before checking for equality:

    (scratchpad) "john" "John" swap >upper swap >upper = .
    t

    We can eliminate the need for those stack shufflers and better express our intent by using an apply combinator to achieve the same thing:

    bi@ ( x y quot -- )
    applies the quotation to x, then to y

    Apply Combinator Demonstration

    (scratchpad) "john" "John" [ >upper ] bi@ = .
    t

    The different apply combinators change the number of items on the stack your quotation is applied to (bi@ vs. tri@).

The cleave, spread, and apply combinators are all closely related; if you’re having trouble keeping them apart, try to memorize the naming convention:

  • If there is no suffix, it is a “cleave”
  • If the suffix is *, it is a “spread”
  • If the suffix is @, it is an “apply”

Once you learn these combinators, you should be able to express almost any pattern of complicated stack shufflers. Note that there are also generic forms for all of these combinators that can take additional inputs from the stack. If you find that you resort to using the generic forms more often then not, that’s usually a good indication that you should rethink your approach or put your data into a more appropriate structure.

Data Type Details

Before I turn you loose, I wanted to offer a few extra details about some of Factor’s basic data types…

Sequences

In the cleave and spread examples above, I was sneaky and used sequences without explaining them formally. A sequence is a finite, ordered, collection of elements. Any data type that implements the sequence mixin class (meaning a data type that knows its length and will let you set/get an element at a specific index) gains the ability to use the powerful built-in sequence operators. Read through that documentation to get an idea on how to manipulate sequences and their elements.

Factor has many sequence types that you may already be familiar with, such as arrays (fixed-size mutable sequences) and vectors (resizable mutable sequences), but there are also other data types that you might not expect to be sequences, such as strings. In Factor, a string is merely an array of Unicode 5.0 code points.

Using the sequence operators and combinators together, you can create all sorts of powerful abstractions that I’ll talk more about next time. Here are a couple of examples to whet your appetite:

(scratchpad) { 1 2 3 4 5 } [ even? ] filter .
{ 2 4 }
(scratchpad) { 1 2 3 } [ . ] each
1
2
3
(scratchpad) "Hello" [ alpha? ] all? .
t
(scratchpad) "Hello!!!" [ alpha? ] all? .
f

Keep in mind that in Factor, sequences are zero-based.

Numbers

The last topic for the day is numbers. So far, we have only used integers, but Factor also supports rational numbers (fractions), floats (decimal approximations of a number), and complex numbers (imaginary numbers).

(scratchpad) 100 330 / .
10/33
(scratchpad) 5/4 1/2 + .
1+3/4
(scratchpad) 5/4 0.5 + .
1.75
(scratchpad) 2 5 rect> .
C{ 2 5 }

Integers have one additional property that makes them special…non-negative integers are sequences. A positive integer (n) is a sequence of length n, whose elements are its non-negative predecessors (i.e. 0 to n-1 inclusive). This property can be very helpful when performing counted loops or other control flow statements.

Because of this property, the following two lines are equivalent:

{ 0 1 2 3 4 5 6 7 8 9 } [ . ] each
10 [ . ] each

Next Time

Next time we’ll stick with a theme of “flow” and discuss control flow for your words and also the typical work flow when developing code with Factor. Hope you enjoyed this installment!

18 Comments

  1. Jon
    Posted December 4th, 2008 at 3:15am | Permalink

    Again, nice and useful stuff! I was surprised by the fact that non-negative integers are (regarded as) sequences, but it sort of explained why “5 >array” gives me { 0 1 2 3 4 }, which is very handy.

  2. Slava Pestov
    Posted December 4th, 2008 at 4:23am | Permalink

    A minor nitpick I just noticed: a float is not a “decimal approximations of a number”. For example, the decimal number 0.4 has no floating point representation, because in base-2 0.4 is a non-terminating decimal (just like 1/3 is non-terminating in base-10). A float is a number of the form m*2^e where m and e are (binary) integers.

  3. Posted December 4th, 2008 at 8:11am | Permalink

    Great tutorial.
    Do you think you could do a tutorial on using different text editors with factor. The text editor I’d like to use is scintilla on Windows XP.

  4. Jon
    Posted December 4th, 2008 at 9:37am | Permalink

    I wanted to know a bit more on the workings of the retain stack, so I followed your link to the “retain stack combinators”. I then looked up the definition for “dip”, and as that seemed to be only “swap slip”, I looked up “slip”. I was surprised to see that it involved “dip”! What kind of circular magic is this? And where’s the retain stack hiding?

  5. kenmyers
    Posted December 4th, 2008 at 9:42am | Permalink

    I have SO just RSSed this site. Good work.

  6. Andy
    Posted December 4th, 2008 at 10:33am | Permalink

    Nice tutorial, thanks.

    Looking forward to more instalments.

  7. Jeff
    Posted December 4th, 2008 at 11:01am | Permalink

    I’ve just played with Factor a bit, but this post was really, really helpful to understand how to work with the stack in a more manageable way. In the bit of factor coding I’ve done I found that having to maintain a mental image of the stack while coding can get pretty annoying. In a Lisp I think the arguments and let statements make it easier to see right away what you are working with, although it is for sure more verbose. Anyway, great stuff. Thanks again, and I’ll be sure to keep up with the rest of the series.

  8. Posted December 4th, 2008 at 11:29am | Permalink

    Great stuff! Thanks for the tutorials, I’ve played with Factor a bit but haven’t had any luck finding basic tutorials before this.

  9. Doug Coleman (erg)
    Posted December 4th, 2008 at 12:57pm | Permalink

    Stephen,
    We have Scite integration already. USE: editors.scite
    If your scite binary is in \Program Files\wscite\Scite.exe then it will work out of the box. Otherwise, make a file in your home directory (use the factor code home . to see this) and put: USING: namespaces editors.scite ; “my\path\to\scite” \ scite-path set-global

    This is pretty lame, but I can’t find a download for Scintilla on Windows. If Scintilla is actually a text editor that is different from Scite, then make an editors\scintilla that looks like the editors\scite one and submit it to the Factor project.

    Have fun!

  10. Slava Pestov
    Posted December 4th, 2008 at 1:22pm | Permalink

    Jon: the compiler handles both dip and slip specially. Direct calls to either one with literal quotations never result in the definition executing: the definition is only there to handle strange corner cases such as ‘[ 1 2 ] \ dip execute’ or ‘[ 1 ] { 2 } >quotation append dip’.

  11. Posted December 5th, 2008 at 1:40am | Permalink

    “…I found that having to maintain a mental image of the stack while coding can get pretty annoying.”
    – Jeff

    That most definitely gets easier over time with the more Factor code you write. Factor does support lexical bindings via let statements (like Lisp’s) and also locals for named arguments…see local variables and lexical closures for details. I’ll be talking about those in a future installment :-)

  12. Jon
    Posted December 5th, 2008 at 4:13am | Permalink

    Aaron: You said that non-negative integers are sequences, but to me it seems that negative integers may be sequences as well:
    -5 sequence?
    –> t
    -5 length
    –> -5
    Is this a bug?

  13. Jon
    Posted December 5th, 2008 at 4:33am | Permalink

    Aaron: I think your “Beginning Factor” articles should be listed under “Learning Factor” here: http://concatenative.org/wiki/view/Factor
    Or here: http://concatenative.org/wiki/view/Factor/Getting%20started

  14. Posted December 5th, 2008 at 10:18am | Permalink

    “…it seems that negative integers may be sequences as well. Is this a bug?”
    – Jon

    Nope, it’s not a bug…the data type of integers have implemented the minimum requirements for the sequence mixin class, so any integer (negative or positive) will report back that it is a sequence with a length of itself.

    The problem comes from that returned length…-5 is out of bounds for zero-based sequences. It’s not possible to have a negative-length sequence. So if you actually try to use its underlying sequence properties, a negative integer will error out:

    (scratchpad) -5 >array .
    Invalid array size: -5
    Maximum: 576460752303423487
    (scratchpad) 5 >array .
    { 0 1 2 3 4 }
    

    So, you’re right that they are technically sequences, but they are sequences that you can’t do anything with.

  15. Posted December 5th, 2008 at 1:49pm | Permalink

    Thanks Doug. :)

  16. Jon
    Posted December 6th, 2008 at 4:33pm | Permalink

    Can integers be entered in hexadecimal or octal form (as literals)?

  17. Posted December 7th, 2008 at 3:25pm | Permalink

    “Can integers be entered in hexadecimal or octal form (as literals)?”
    – Jon

    Yep…you can enter and print numbers as bin, hex, and oct…they will always be stored on the stack in decimal form, but they can be taken out however you like:

    (scratchpad) BIN: 101010
    
    --- Data Stack:
    42
    (scratchpad) .b
    101010
    (scratchpad) HEX: 2a
    
    --- Data Stack:
    42
    (scratchpad) .h
    2a
    (scratchpad) OCT: 52
    
    --- Data Stack:
    42
    (scratchpad) .o
    52
    

    You can also create them from a string with bin>, hex>, and oct>. See “Convert between numbers and strings” by doing "number-strings" help in the listener.

  18. cadar
    Posted December 8th, 2008 at 6:36pm | Permalink

    Thanks. This was educational.

Post a Comment

Your email address is never published nor shared. Required fields are marked *

*
*