the blog
Back to -Blog

The Sapphire Programming Language: defining the grammar

Parsing and Lexing
by Dermot Hogan
Monday 24 March 2008.

We’ve now come up with the first formal definition of Sapphire in ANTLR 3. There’ll be several (possibly many) more iterations before we are ready to integrate it into the DLR – but that’s no problem since both ANTLR 3.1 and the the DLR itself are still under development (though both are now in beta).

While we’ve referenced Ruby a lot here, I want to make it clear that Sapphire isn’t a fork of Ruby – it’s a completely new language, starting from a clean code base and aimed at the DLR and .NET (and Mono). We don’t have any compatibility baggage or any historical code base to maintain. Sapphire will be a fast, clean, efficient and safe dynamic language. It will be accessible to both those who know Ruby and those from a more traditional C/Java/Delphi background

We’re using two new tools to build Sapphire: ANTLR 3 and Microsoft’s DLR. In fact, the emergence of these tools over the last few months has moved Sapphire from “wouldn’t it be nice if ...” to “It’s now doable. So let’s do it!”. We’ve been kicking the idea of Sapphire around for a couple of years now but without either of these tools, it just would not be possible in a reasonable timeframe to generate a new dynamic language like Sapphire.

Using ANTLR 3 and the DLR, there is a clear set of steps which will result in Sapphire. These are:

- Define a base ANTLR3 lexer

- Build a C# ‘sub-lexer’ that handles the nasty bits of a Ruby-like syntax. This is actually one of the most difficult bits in the whole business. Without a decent sub-lexer, you’re not going to get very far. We’ve pretty thouroughly debugged the current RiS sub-lexer over the last year or so of selling RiS, so we’re pretty sure it’s solid.

- Define an ANTLR 3 parser

- Define an ANTLR 3 ‘tree grammar’ that connects the parser to the DLR (this is pure magic in my view! ANTLR 3 fits like a glove onto the DLR here)

- Build the DLR ‘generators’ that actually create the IL code that will be executed by the DLR when you run a Sapphire program.

The key points here are the last two - the tree grammar and the generators. These are ‘declarative’ in nature. That is, you write down what you want to happen – walk the AST and emit some code – and the ANTLR tree grammar and the magic of the DLR handle the rest. Now, I’m simplifying quite a lot here – inheritance, encapsulation and scoping are still pretty knotty problems. And I haven’t even touched on closures. But the point is that with ANTLR 3 handling the ‘front end’ and the DLR handling the ‘back end’, the remaining work is manageable. Not easy, I’ll grant that – but eminently doable.

So what have we got so far? Well, the ANTLR 3 lexer and sub-lexer are more-or-less done (I took them from our main Ruby In Steel product and modified them). The Sapphire ANTLR 3 parser exists and we can produce ASTs from it to test our syntax like this:

This comes from a var block in which the names and expected types of variables may be optionally asserted:

var @a, @b :string;  @a
   @c :int;
   @b :int

Here’s a flavour of what the ANTLR parser grammar looks like (we’ll publish it in full a bit later on).

 : dotoperation arrayref block? -> ^(CALL[$start, "call_7"] dotoperation arrayref block?)
 | dotoperation call_args block?  -> ^(CALL[$start, "call_8"] dotoperation call_args block?)
 | dotoperation block? -> ^(CALL[$start, "call_9"] dotoperation block?)

The point is that this is ‘clean’ – and very much cleaner than Matz’s original yacc syntax. Translating that into ANTLR took me a long time and some very late nights indeed at one point in the original RiS development.

Currently, ANTLR 3 tree grammars are in the next version of ANTLR (3.1 – not yet released). And the DLR is still in beta and I don’t want to commit to anything definitively until the DLR is fully released. So still some way to go, but the foundations are there.

What we’ve removed from Ruby

The first stage of the design has been to determine what we don’t want in Sapphire. Starting from the original Ruby ANTLR 3 specification we’re using in Ruby In Steel, so far we’ve removed:

- BEGIN ... END sections (these occur at the start and end of Ruby programs).

- =begin and =end block comment delimiters. We’ve replaced them with standard C/Java block comments /* ... */

- Modified if, while etc. statements. These are actually easy to implement, but don’t add much to the usefulness of a language. They can also be quite unclear in many circumstances.

- No unless or until. It’s clearer to use a negated if or while.

- No commands. To invoke a method, you must use (...). Again, this is for clarity. A ‘method’ with no brackets is a field accessor.

- Curly brackets are only used for hashes; square brackets are only used for arrays. Currently, the only way to delimit a block is to use do ... end. Again, for clarity (mainly) and safety.

- Operator precedence. In Ruby there are about a dozen different levels of operator precedence. This is far too many. I can only remember the ‘usual’ add/multiply precedence. All the rest I use brackets for. So we’ve kept the add/multiply precedence so that 1 + 2 * 3 gives 7 as you would expect and not 9, but all the rest (|, &, <<, etc have the same (lower) precedence.

What we’ve added

So far we’ve added get and set keywords. These work like attr_reader and attr_writer in Ruby and are syntactically similar to Ruby def. Much clearer in my view.

We’ve also added a var declaration section (which is optional). When used, it allows you to assign types to variables. Whether these are enforced by the DLR runtime is a compiler option.

Blocks are first class objects. You can assign a block to a variable and invoke it like a method. Similar to Smalltalk blocks, in fact.

I’ll elaborate on the get and set optional typing system syntax next week.

Bookmark and Share   Keywords:  Sapphire
  • The Sapphire Programming Language: defining the grammar
    31 March 2008, by ion

    Please, please, please do not make a distinction between a method and an accessor!

    IMHO, it’s one of Ruby’s great features that you can choose for yourself what should be an accessor and what should be a method with your API staying constant. Compare for instance:

    class Distance; FEET_PER_M = 0.3048; attr_accessor :meters; def initialize; @meters = 0.0; end; def feet; @meters*FEET_PER_M; end; def feet= val; @meters = val/FEET_PER_M; val; end end

    class Distance2; FEET_PER_M = 0.3048; attr_accessor :feet; def initialize; @feet = 0.0; end; def meters; @feet/FEET_PER_M; end; def meters= val; @feet = val*FEET_PER_M; val; end end

    >> d =; d.feet = 10; d.meters

    => 32.8083989501312

    >> d2 =; d2.feet = 10; d2.meters

    => 32.8083989501312

    Otherwise Sapphire sounds awesome.

    • The Sapphire Programming Language: defining the grammar
      1 April 2008, by Huw Collingbourne

      The distinction we make will be equivalent to that made by Ruby. You will be able to have simple accessors with a used to get and/or set the @a instance variable. Or you will be able to hide @a and have no visible accessors. Or you will be able to define a method called def dosomethingtoa which will access @a and do some extra processing en route. It’s not the range of possible behaviour that will differ, it’s the way in which it is done. To clarify:

      - No instance variables will be visible outside an object: Ob.@a is not allowed
      - When an instance variable is declared in the public area of a class, it automatically creates matching accessors (equivalent to Ruby’s atttr_accessor :a) without requiring explicit accessor declarations
      - When an instance variable is declared in the private area of a class, no accessors are defined
      - An instance variable’s accessors may be ’null’ in which case they will not get or set a variable and they will return this information to the caller
      - Getter and setter methods may optionally be explicitly declared using the get and set keywords instead of the def keyword
      - Setter methods will not require = to be appended
      - You may create a method of any name to access (get or set) an instance variable

      I hope that explains a bit better. I think much of the confusion here has come about because we have not yet explained the interface mechanism to classes in Sapphire or how the privacy of a class’s internals is protected. I should emphasis that our two main aims here are: 1) to implement complete encapsulation with full modularity and data hiding, 2) to make the syntax as lightweight as possible, avoid using methods as ’pseudo-keywords’ and remove the requirement to use ’special characters’ (here = ) when naming methods.

      In short, the range of baviour possible for methods and accessors in Sapphire is equivalent to that in Ruby but the conventions used are different. Moreover, contrary to what some people seem to believe our encapsulation will, in fact, be more rigorous than in Ruby. I’ll try to explain this soon. I think once people have understood how fundamental encapsulation is to Sapphire (it is one of the core principles) the rest will ’fall into place’ ;-)

      best wishes


  • The Sapphire Programming Language: defining the grammar
    29 March 2008, by dolzenko

    What we’ve removed from Ruby
    * Modified if, while etc. statements.
    * No unless or until. It’s clearer to use a negated if or while.
    * No commands. To invoke a method, you must use (...).
    * Curly brackets are only used for hashes; square brackets are only used for arrays.

    nice to see as those rules long ago become part of my coding style in Ruby

© SapphireSteel Software 2014