Ruby LL(2) parser

I’ve finally got a LL(2) parser built for Ruby. It’s taken me about a month to do this from  knowing zilch about parsing and less about Ruby.

I had a couple of false starts. I started off by using Flex to build a component for Visual Studio colourising, but decided against using Bison and the existing YACC Ruby module since it tied me into using C/C++ for the syntax analysis; that wasn’t a direction I wanted to go.

I then tried using a free C# YACC toolset. That had a number of problems. It didn’t produce the same grammar as Bison and was slow to compile. I also learnt the hard way about the grief you can get when you have to try and debug a YACC style grammar: state tables might be fast, but finding out what’s wrong is a nightmare.

Then I came across Antlr. Magic! Absolute magic!! It produces easy to debug code – you can see what’s going on – and it has intelligible diagnostic messages. I love Terrence Parr’s sense of humour too – “Why program by hand in five days what you can spend five years of your life automating?”. A man after my own heart.

Ruby isn’t easy to fit into an LL(2) scheme, but it can be done - with one exception that I’ve come across so far. I’ve managed to parse nearly all the Ruby test modules so things are looking good. The next step is to bolt it into the Visual Studio/Steel IDE.

Parsing is quite fast for the most part. The main problems come where I’ve had to use predicates to resolve ambiguous Ruby syntax. Syntax-wise, Ruby simply sucks.

One Response to “Ruby LL(2) parser”

  1. yawl Says:

    Good work. You may be interested that I have a ruby parser available using ANTLR as well, which be able to parse all the ruby’s stable snapshot and Ruby on Rails:
    http://seclib.blogspot.com/2006/04/compile-time-type-inference-for-ruby.html

Leave a Reply

You must be logged in to post a comment.