the blog
Back to -Blog

The devil is in the detail

Syntax coloring/collapsing - the road to IntelliSense
by Dermot Hogan
Thursday 17 August 2006.

We’ve just finished up the release of 0.7.5. As the number indicates, this isn’t a release with a lot of new functionality in it – but that doesn’t mean we haven’t put a lot of work into it. Far from it …

I’ve really concentrated on two things – getting the installation sorted out with a proper bootstrap program that installs the ProjectAggregator and the Ruby In Steel program as one operation (the reason why this wasn’t done before was that the Visual Studio Setup tool currently only allows one msi file – as far as I can see). The setup is (hopefully) properly localised as well.

The second area is syntax colouring - I’m pretty sure that Steel currently is the best Ruby colorer and outliner around. Believe it or not, the Ruby coloring does not come for free in Visual Studio. There’s a lot of hard work involved.

But that probably sounds a bit odd – why on earth bother about getting some obscure Ruby construct to color properly when I could be working on much more important stuff?

Two reasons:
- First, if the coloring isn’t right it just looks unprofessional. Personally, I find that sort of detail quite important: I like working with good tools.
- And secondly, the coloring is the basis of a sequence that leads to IntelliSense.

It goes like this. To build a good parser, you need an accurate lexer. The lexer is the thing that emits tokens like end and def while the parser makes sense of entire phrases like def x; end).

If you have a precise lexer you can color properly. Unfortunately, this isn’t a trivial undertaking in Ruby. It isn’t just a question of coloring keywords, etc. The lexer has to know quite a bit about what constitutes Ruby. Getting that right has taken some time and effort. The real trick, though, is to keep the lexer separate from the parser – incidentally, something which Ruby’s own lexer/parser (written in C) doesn’t manage to do. However, with a bit of hard work, I’ve got a good lexer that is a separate functional object from the parser: the parser does not poke around in the lexer.

The next step is to build a good AST (abstract syntax tree ). And to do this you need an accurate parser. My parser now correctly analyses all of the 2,500 or so Ruby files that come with the main distributions. I’m not saying it’s perfect, but I think it’s pretty close.

And once you got a an accurate AST (which describes the program in terms of entities like expressions and complete methods), you can start asking interesting questions like what methods can a particular object at a certain location have? To do this you need an ‘inference engine’ that determines the type of a Ruby object - without running Ruby. Again, it’s not at all easy – but I’m getting there. In fact, I’m writing this because I’m having a few minutes off – my first real IntelliSense version of Steel just displayed all the methods of Object when I typed ‘.’ after an identifier, so I’m feeling pretty pleased with myself.

The point is that to get to the inference engine, you need a lexer, parser and AST. All as accurate as you can get them - and as a by product, you get good coloring! And you also get good collapsing and outlining. For example, you have to be able to distinguish between a ‘{‘ for a hash and the same character used for a block, in order to collapse the block, but not the hash.

PS. I’ve just noticed that define? doesn’t color correctly. O well… ;-)

Bookmark and Share   Keywords:  development
  • The devil is in the detail
    17 August 2006, by lucas

    Are you using some elegant pattern to parse these things or is it really just low level parsing? I wrote many a tool to convert from language to language, and technology to technology (java to asp, php to python, xml to etc...) it seems to me building object graphs and parsing and tagging code for coloring is very similar, I was just curious on your parsing design?

    I would recommend a Visitor pattern for this, it has simplified code i have written over many years, and seems like this may help make your parsing easier, but hey, what do i know :)

    Thanks for the hard work, I have tried installing ruby on steel on my windows 64bit machine, (2003 server) and had some problems with the package loading correctly, but I think it is visual studio’s problem, not your product.

    thanks again

    Lucas

    • The devil is in the detail
      19 August 2006, by Dermot

      Parsing Ruby takes a bit more than pattern matching unfortunately.

      I read somewhere that the Perl grammar was defined "somewhere between smoke, mirrors and the Perl lexer". I think that’s even more the case with Ruby. The main problem is that the grammar just isn’t written down. Even if you look at the Ruby parse.y file (not recommended), something as basic as the operator precedence just isn’t defined.

      I chose a slightly offbeat way of tackling this problem. It’s this: there are 2500 or so Ruby files on my disk which I know compile: they are my experimental ’data’ points, produced in a wide range of styles by a good number of people (of varying ability). So my hunch is that these experimental data points are a good ’cover’ for Ruby. The next step I adopted was to produce a ’theory’ to account for these data points. This is my Ruby grammar.

      I then tested my ’theory’ against the ’data’ and found which bits of the ’theory’ didn’t work. I then adjusted the ’theory’ until it matched all the ’data’, that is my parser/lexer successfully parsed all 2500 files. Hard work, but well worth it.

      The parser I’ve got as a result is a very clean (and I think elegant, but I’m biased) LL(2) recursive descent one implemented in Antlr. I’m not saying its perfect, but I don’t think there are any major flaws. I’ll fixup my ’theory’ as data comes in that it doesn’t match. But I haven’t found any flaws for a bit now, and I’ve got the point where I trust the parser a good bit more than my heuristic inference engine that I use to deduce what Ruby is up to.

      Most of the hueristics aren’t pattern matching so much as Sherlock Holmes.

      Btw, Ruby In Steel should work on a server, so maybe there’s something wrong with my installation code. If you want to persue this via the support contact page, I’ll be happy to help.

  • The devil is in the detail
    17 August 2006, by Justin

    A couple of comments. First, I downloaded Ruby in Steel a month or two ago and am very happy with the color coding. Once you have gotten used to syntax coloring, its hard to give it up. My comments:

    1) Are you saying you’ve fixed the "heredoc" bug, where the ending ’tag’ cannot be tabbed over? That would be awesome

    2) Regarding parsing/lexing for Intellisense - I don’t know that you can really do what you want without actually running Ruby. Methods can appear at almost anytime. The simplest example is attr_accessor. Of course, you can special case to "fake" the methods created in that way - but it doesn’t really get you that far. I’ve thought about this some, and I think the only way to really know what methods are available is to actually run Ruby. Of course, the immediate problem is how to keep Ruby from damaging your system (hitting the database, deleting files, etc etc). Until recently there didn’t seem to be a solution, but _why has recently released his Sandbox project which actually allows changes to be isolated. It’s smoething you may want to look into.

    Good luck with the project, it looks great so far. Please continue to provide support for pure Ruby develpoment too - not just Rails-oriented Ruby.

    • The devil is in the detail
      17 August 2006, by Dermot

      1) heredocs. Yes, I think I have (I’d be interested to know if I haven’t anyway!). There is one limitation: nested heredocs. These are a real pain in the butt to handle and while I can do it, I’ve got better things to do with my life. If anyone really, really needs these I’ll fix it.

      The big problem with heredocs is that you can have something like this:

      x(s<<EOH, y, z)
      blah
      EOH

      only the blah is in heredoc. Sorting that in a LL(2) parser is painful.

      I’ve even managed to sort out crap like this

      x<<<<EOH
      blah
      EOH

      (yes, I found a real example in the Ruby distribution).

      The one problem area is that Visual Studio doesn’t color properly if you change the end of the heredoc. So if you change the final EOH into EOT VS doesn’t recognise the fact. There’s a complicated internal VS technical reason for this (not connected to Ruby coloring). But I’m not planning to fix that right now.

      2) It is tricky to do Intellisense without running Ruby, but you can actually do a lot. You can’t do everything and its certainly going to be possible to fool the Intellisense system by writing ’clever’ code. But most Ruby code is straightworward (and if it isn’t, Intellisense is the least of your problems, believe me). For example with

      a = Array.new

      I know that a is now an Array. That’s reasonably easy, but I can do similar and also much more complicated analysis all the way through. The one real problem is defs. A priori, I have no idea what a def returns. I’m trying out a few ideas on that right now.

      The thing is, though, once you have a good symbol table you can do an awful lot: cross-referencing, checking for dangerous constructs, class browsers. Things I haven’t even thought of yet.

      3) The main thrust will be Ruby. I really think that if you have a good Ruby environment, Rails follows naturally.

  • The devil is in the detail
    17 August 2006, by Drazen Dotlic

    Hi,

    could you please post a screenshot that demonstrates all the constructs that version 0.7.5 should support and color correctly.

    On my machine, after deinstalling 0.7.0xx and installing 0.7.5 there is no noticable difference in syntax coloring. It looks quite... simple. I think the only thing I’ve noticed is that now the braces are also colored, but that’s all.

    I’d expect at least that class names and/or module names are colored differently. Plus, you can essentially color instance and static members "for free" - they have "@" and "@@" at the beginning of their name, so no need for full AST. The hardest thing should be coloring methods properly, but if you have rough intellisense working, you’ve solved this already (bravo!).

    • The devil is in the detail
      17 August 2006, by Dermot

      I didn’t keep track of all the constructs that I had problems with. I just spent night after night hammering away at parsing all the Ruby code I could find. I’ll see if I can construct a ’nasty’ piece of code that does illustrate some of the wierd things that Ruby programmers write.

      But for example, code in a dynamic string is very tricky to handle. You have to know that you are in a dynamic string so that a } in a dynamic string doesn’t screw up the outlining and parsing (all the braces have to match). You also have to be aware of all the substitution rules that go in such strings, otherwise you go wrong. From what I remember, SciTe doesn’t do this properly.

      It’s quite easy to add extra coloring for things like module names, instance variables and so on. I’ll add some extra color options in the next release. Uptill now, I’ve really just struck with the standard (boring) VS colors.

      The real point is though, that I didn’t spend all that time getting Ruby to color properly JUST because I was aesthetically offended. I did it so that I could get the parsing exactly right. As in any respectable software project, there is a BIG difference between being 95% there and 100% done.

© SapphireSteel Software 2014