Pharo Smalltalk as a DSL Without a DSL

7 min readNov 4, 2017

If anyone has written a DSL in Eclipse, for example, simply the base projects Eclipse creates are somewhat daunting to deal with at first. If creating a DSL ‘facility’ before the actual DSL seems a bit overweight, you might be interested in an environment without that overhead. The following gives a simple (and, honestly, rather easy to write up) example of an environment that has no need of any such facility, since it is already written in the manner a DSL tends to be written.

Imagine you have an application you’ve been asked to add scripting capability to. The application stores documents in a JSON document format which includes formatting information. Those documents must be made available in a formatted way via a scripting interface to be made available to a remote document handling process. In thinking about it, you assume there’s some sort of Document class called ‘Document’ that maps to the JSON documents stored in, say, Unqlite, to use a simple NoSQL example.

You know that people writing scripts are going to want various specific output formats, HTML, XHTML, PDF etc. The common way of extending a class like Document in Java and most other object languages would be via inheritance and method overrides, which would give you something like this:

You look at this and the problem, initially, is that you’re creating specific objects by overriding a generic object. You could try casting based on some input parameter, but the proliferation of output formats starts to make the object model messy. You could alternatively implement polymorphic methods in the base class, but that would require an additional method per format, and the more formats you need to support, the messier the base class becomes.

But your task isn’t to clean up whatever object hierarchy exists, in fact you haven’t even looked at the actual class hierarchy yet. You have to make dealing with whatever those are as simple as possible, and you decide a simple noun -> verb -> modifier style API would be ideal.

So, you decide to start from the latter, and then figure out the implementation from the way you want the API to behave.

The obvious noun is the reference to the document. If we already have an implicit search that can find a document by the name assigned to the reference, and we have a good naming convention, we can just put this as the document name.

‘Letter to Andrew Glynn — 10–10–2017”

Ok, so far so good, it’s a clean way of referring to the document, those writing the scripts know the convention, and in any case, have some way of looking up a document from a searchable list.

Next, you need a verb, you’re not necessarily either printing it or displaying it, just outputting it formatted, likely just to a file or to some other stream. ‘Render’ is a generic verb that implies formatting but not much else, so you decide to use it. But ‘render’ by itself doesn’t specify the format wanted, so you need a modifier. You can use some sort of symbolic name as a modifier to indicate the format. Since you want to make it clear in the API that something is needed beyond render, you put a colon after the verb, but the verb is not sufficiently transitive to make it immediately understandable once something follows it, so you modify the verb to ‘renderOn:’.

Great, now you have a good way to create the API, you can just put in a symbolic name after the colon that tells the actual code what the scripter’s desired format is, such as

“Letter to Janelle Klein — 10–10–2017” renderOn: #pdf.

To make it generic, in API style, you replace the literal with a variable, such as:

thisDocument renderOn: #pdf.

That looks easy to write, and its meaning is obvious for anyone revising the script.

A minor difficulty is that since a symbolic name is completely arbitrary, the scripting tool you plan to build has no way of checking what names are valid, unless you stick them in an enum. Worse, each way you think of implementing it, it’s as messy or worse than uour initial design thoughts.

This is where a DSL framework becomes useful. You can write the object code as cleanly as possible, and then use the DSL framework as an abstraction to simplify usage for the scripting environment.

Although you were assuming the existing code was Java, you now find out that in fact it is written in Smalltalk. Since Smalltalk is an object language, although you’re not specifically familiar with it, it can’t be that different, so you decide to go and see how you could implement a DSL in Smalltalk for the existing code. You look up Smalltalk DSL libraries and don’t find anything, which is a bit frustrating, but not surprising given how niche it is, you do find good parsers and emitters though, so all the building blocks for a DSL are there, it’s just a bit more work to write one. Armed with this information, you dive into the environment to see how it does things.

However, in Smalltalk, the equivalent command using the objects in the language and base libraries, is this:

thisDocument renderOn: PDF.

The only difference between your posited DSL format and the actual Smalltalk code format is that, since it lacks the hashtag, ‘PDF’ is obviously not a symbolic name. You figure Smalltalk must be a 4th gen language, or at least act like it, rather than a 3rd gen like Java, and the real code is somewhere ‘under the hood’.

Since Smalltalk only has messages and objects, and renderOn: is the message, PDF must be an object.

You use Spotter and search for PDF, and you find it’s in the Artefact PDF library, and is in fact a class.

This seems a bit odd, given that most object languages represent classes as templates or something similar, so you decide to look at the Document class hierarchy. Even more odd, in Spotter, you find Document, and a few related classes specific to certain types, such as DTDDocumentValidator, but no Document hierarchy, Document has no subclasses at all.

So, you go to look at the Document class, given your Java background, you immediately go to the methods list, looking for renderOn:

But there’s no method by that name.

At this point you’re starting to get annoyed, and you have the same reaction most people who are not used to Smalltalk have initially, “where the f** is the code?”

Then you remember messages are not the same as methods, so renderOn: must be a message. Since few languages have messages, you figure you need to understand what they do and why Smalltalk bothers with them, after all, why not just call the method?

But the Document class has no renderOn: message either. At this point your annoyance gets magnified, “where the F***** is the GD code???”

You remember that Spotter can find messages and methods, as well as classes and objects, so you calm back down a bit and put renderOn: into Spotter.

It comes up with a few classes that do have that message. Since most of them are graphics classes, and they’re all in the graphics library, you pick the one that seems most likely to be relevant, in a class called AthensTextLine (Athens is the graphics subsystem), and find the following implementation:

renderOn: aTextRenderer

commands ifNil: [ ^ self ].

commands do: [ :cmd |

cmd renderOn: aTextRenderer].

At this point your annoyance is at boiling level, you’ve found the code, but it barely does anything, and what it does isn’t immediately obvious in specifics, though the intent is obvious enough.

So, where is the f*** code?

What you’ve run into is the fact that Smalltalk is very decomposed, and as a result, doesn’t require the kind of convoluted object hierarchies nor the long, complex methods common in other object languages.

The DocumentElement class, which each element of a Document is an instance of, is a subclass of AthensTextLine. To handle a message like renderOn: it has a method specific to a formatted text line in a Document. Since all class objects are always live, PDF is a reference to the class object for the PDF class.

The implication is that the PDF class object, represented in the message by aTextRenderer, knows how to render a given DocumentElement the way that the PDF format requires it, and that it does so via commands which operate on the elements in the concrete instance of the object, thisDocument, but use the full graphical formatting system in order to accomplish it in a consistent manner.

Due to writing the code this way, there’s no Document type hierarchy needed, nor any polymorphic methods (Smalltalk has polymorphic classes, but the implications of that are very different). By using reflection, which is built into the Object class, and therefore in every actual object, a single Document class can know how to present itself in a formatted way, using the full graphical formatting library, in the format desired. If, rather than PDF, we want an XML output of the JSON or HTML output, we can simply do this:

thisDocument renderOn: XML

and

thisDocument renderOn: HTML

Since all of Smalltalk works in a similar way, and messages are arbitrary (if a class doesn’t understand a message, it immediately throws a ‘Message Not Understood’ exception and won’t let you run the code), there’s no need for a separate facility to create simple, tight abstractions to make development easier.

Thus, Smalltalk code tends to implicitly work and look like a DSL without needing any further facility to do so, which is part of the reason the assumption is that it’s ‘higher level’ than Java, while since the VM, interpreter and JIT compiler are written in Smalltalk, it’s also a lower level language, strictly speaking, than Java.

Pharo Smalltalk as a DSL Without a DSL

Written by Andrew Glynn

Responses (1)