Skip to main content
Redhat Developers  Logo
  • Products

    Platforms

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat AI
      Red Hat AI
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • View All Red Hat Products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat Developer Hub
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat OpenShift Local
    • Red Hat Developer Sandbox

      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Secure Development & Architectures

      • Security
      • Secure coding
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • Product Documentation
    • API Catalog
    • Legacy Documentation
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

A Beginners Guide to DSL Writing in Ruby

March 29, 2017
Heiko Rupp
Related products:
Developer Tools

Share:

    In this article, I want to describe how to write a DSL / parser in Ruby with a treetop parser.

    Writing the Grammar and Parser in Ruby first has the advantage of interactivity. Ruby is interpreted and has a very quick startup time.

    To get going I’ll start writing a JSON parser. This may sound strange as there are already JSON parsers out there, but I don’t try to replace them, but rather show the concepts along that line rather simple grammar.

    At the end, I will also give you a peek at a grammar for a more complex DSL for alert trigger creation.

    To get an overview of the grammar, have a look at JSON.org.

    The Ruby parser

    For Ruby, I have chosen a treetop parser, which is a Parsing Expression Grammar (PEG).

    Grammars in it look like this:

    Basic setup of a grammar file.
    grammar Grammarname (1)
      rule FirstRule (2)
       .. projections..
      end
      rule NextRule
       .. projections..
      end
    end
    1 The name of the grammar determines the name of the parser. See below
    2 The first rule is the starting point of the grammar.

    For our Json parser the top of the file json-simple.treetop would look like this:

    # A treetop grammar for JSON
    grammar JSONGrammar
      rule json            (1)
        array / object     (2)
      end
    1 Topmost rule, parsing starts here
    2 Content of the rule used for matching

    A Json object would thus be either an array or an object. As both are not terminal symbols in a simple character or string (like 'this'), more rules are needed to resolve this. We will look at them below.

    Before we continue with the grammar, let’s have a look at how to compile the grammar and use it.

    Compiling the grammar

    Treetop brings a command line tool called tt, which will compile the grammar into Ruby code.

    Compiling the grammar using tt.
    $ tt json-simple.treetop (1)
    $ ls json.rb      (2)
    json.rb
    1 The input grammar file
    2 The resulting Ruby parser

    This worked well, but calling tt during a development cycle is a bit too heavy. Luckily, there is a way to compile grammar files on the fly when you first use them.

    Here you see how a parser driver can load the definition and instantiate the parser.

    json_parser.rb
    require 'treetop'
    class JsonParser
      Treetop.load(File.join(__dir__, 'json-simple.treetop'))
      @@parser = JSONGrammarParser.new
      def self.parse(data, env = {})
        # Pass the data over to the parser instance
        tree = @@parser.parse(data)
        # If the AST is nil then there was an error during parsing
        # we need to report a simple error message to help the user

    The key part is those two lines:

      Treetop.load(File.join(base_path, 'json-simple.treetop'))  (1)
      @@parser = JSONGrammarParser.new    (2)
    1 We are loading the .treetop grammar file
    2 The name of the parser is determined from the grammar

    The name of our parser is determined from the grammar instruction by appending Parser to the grammar name given there.

    Let’s try our grammar.

    Testing the Grammar with irb
    $ irb
    2.3.0 :001 > require_relative 'json_parser' (1)
     => true
    2.3.0 :002 > JsonParser.parse '{}' (2)
    NameError: undefined local variable or method `_nt_array' for #<JSONGrammarParser:0x007f895a9b31c0>
            from (eval):21:in `_nt_json'
    1 Load the parser
    2 Try to parse the empty object

    You see that the parsing fails with an exception about an unknown method _nt_array. If you look at the grammar above you can see that we have said that a json object is either an array or an object but did not define them anywhere.

    While the compiler is not telling us where in our source file it thinks the error is, the message above (which could be less cryptic) tells us that _nt_array is missing from _nt_json or in other words the rule json uses a non-terminal ( _nt ) named array, which is not defined.

    TIP As the grammar is evaluated left to right, it also means that the error reporting is left to right.

     

    In the above case, the missing object declaration is not reported, as the missing array already made the compiler fail.

    Continuing with the grammar

    So let’s continue with the grammar.

    The definition of an array.
      rule array
        '[' space a_value? (space ',' space a_value)* space ']'
      end

    In this example, you find two new things terminals enclosed in single quotes like the opening and
    closing brackets, '[' and ']'. The parser expects such a symbol or keyword literally at this place.

    So for the above, an array needs to start with a [ character and end with ] to be recognized.

    The other new concept you see here are the quantifiers, that you probably already know from
    regular expressions:

    • ? The item may show up zero or one time
    • + The item must show up at least one time (not shown above)
    • * The item may be given any number of times (including zero)

    And last but not least items can be grouped together with parens ( and ).

    So the array has to start with [ has a space rule, zero or one value and then zero or many times a block, which consists of a space, a literal colon, space and a value.

    Now let’s look at the a_value.

    The definition of a_value.
      rule a_value
        object / array / string / number / true / false / null
      end

    Here you see the slash / which means or. A value can thus be an object, an array, a string, a number or one of true, false and null. As treetop is a PEG parser generator, it will evaluate the list from left to right.

    Sometimes you want to have a rule match different options like the above, but where only a part of the matching expression is a choice. In this case, you can put a sub-expression in parens:

    Grouping of sub-expressions.
      rule eval_boolean
        boolean ( 'and' / 'or' ) boolean
      end

    The last excerpt from the grammar is the space.

    The definition of a space.
      rule space
        [\s\n\t\r]*
      end

    Here you see that the space consists of zero or more space, newline or tab characters.

    TIP The full grammar up to this point is available in the file json-simple.treetop.

    Generating values from the input

    Up to this point, we have a grammar that can parse JSON input, but for the parser to be useful we actually want to transform the input into an object that contains nested lists and objects.

    You may have seen above that the parser returns the abstract syntax tree (AST) it has generated:

        tree = @@parser.parse(data)

    We could now walk the tree and determine the values from the walk. But there is a better way to do this in the treetop.

    Individual grammar nodes can provide methods that can be called during the parsing run. There are several options to do this - I will show two.

    Inline in the grammar

    In this case, the function is directly inside the .treetop file. This is nice for a very simple function but otherwise is probably more confusing than helpful. Also, even if it is Ruby code, your IDE is most likely not helping you.

      rule val_in_parens
        '(' string ')'  (1)
        {  (2)
          def value (3)
            elements[1].text_value (4)
          end
        }
      end
    1 The usual matching
    2 The inline function definition is delimited by { and }
    3 The name of the function can be anything with and without parameters.
    4 Evaluation of the current node, see below
      Even if the name and signature of the function is arbitrary, it needs to be known to a
    potential calling node.

    Via Mixin

    Here we externalize the functions into a Module and only provide a pointer to it inside the
    grammar.

      rule val_in_parens
        '(' string ')' <ValueNode> (1)
      end
    1 The match definition gets the name of the Module passed in angle brackets

    We then have a Ruby file that contains the definition:

    parser_extensions.rb
    module ValueNode
      def value
        elements[1].text_value
      end
    end

    You see that this is the same as inside the grammar, but you get all the help from the IDE. It is now needed though to require that additional file from our parser driver.json_parser.rb

    json_parser.rb
    # Our JsonParser
    require_relative 'parser_extensions'
    require 'treetop'
    class JsonParser

    Evaluation from the parser driver

    Now that we have the functions defined, we can start evaluating the input. This is simply done by calling a function on the generated AST:

    json_parser.rb
      def self.parse(data, env = {})
        # Pass the data over to the parser instance
        tree = @@parser.parse(data)
    [...]
        # Compute the hash and return it
        tree.value
      end

    As we are calling the function value we also need to make sure that the topmost grammar node has it defined. Otherwise, an error like this is raised:

      In our case this is not needed, as our json rule only diverts to array or object and thus does not add any value (pun intended).
    2.3.0 :003 > JsonParser.parse '{}'
    NoMethodError: undefined method 'value' for #<Treetop::Runtime::SyntaxNode:0x007fc204068970>

    Now the big question is how can we access the elements in a rule to evaluate them?

    Internally the parsed result is out into a Ruby-array called elements. Data starts at index 0.

    This is a bit tricky sometimes and I have added the element positions below the matches.

      rule kv_pair
         string space? ':' space? a_value
    #     0      1      2     3    4
      end

    To obtain the key (a string) you can thus use elements[0] and elements[4] for the value. Our value function for kv_pair could thus look like this:

      def value (parent)
        parent[elements[0].value.to_sym] = elements[4].value
      end

    This works quite nicely and if you compile the grammar with tt as seen above, you can see that the compiled parser uses the same elements to provide some aliases:

    Generated code in the parser.
      module KvPair0
        def string
          elements[0]
        end
        def a_value
          elements[4]
        end
      end

    It is thus possible to use the aliases in the above for it to look like this:

      def value (parent)
        parent[string.value.to_sym] = a_value.value
      end

    The parser gets a lot less brittle this way. If your rule now prepends a matching character literal,
    the aliases stay the same while the indexes into the elements would have changed.

    Now suppose you need another string in your input. Treetop would then name them string1 and string2. A more elegant solution here is to "tag" those input rules with the desired alias as in,

      rule kv_pair
         key:string space ':' space a_value
      end

    So that you can access the key string as key:

    parser_extensions.rb
    module KVNode
      def value(parent)
        parent[key.value.to_sym] = a_value.value
      end
    end

    Still, the value part of the key-value-pair uses the un-aliased a_node. If you ever decide to
    rename this, you also need to rename the ruby code.

      Treetop v1.6.8 seems to only provide the aliases for non-terminal symbols that are required. For cases where you have optional symbols, you still need to fall back to tags.

    Handling of bad input

    Now sometimes parsing an input fails because the input does not match the defined grammar. The parser does not return an AST in this case, which we can test for and return some debugging
    information to the user:

    json_parser.rb
        # If the AST is nil then there was an error during parsing
        # we need to report a simple error message to help the user
        if(tree.nil?)
          puts "Input : >>|#{data}|<<"
          puts @@parser.terminal_failures.join("\n")
          raise JsonParserException, "Parse error: #{@@parser.failure_reason}"
        end
        # Compute the hash and return it
        tree.value

    Let’s see how this looks in irb.

    Parser error as seen in irb
    >> require_relative 'json_parser'
    => true
    >> JsonParser.parse '{ x'
    Input : >>|{ x|<<
    String matching [\s\n\t] expected.
    String matching '"' expected.
    JsonParserException: Parse error: Expected one of [\s\n\t\r], '"' at line 1, column 3 (byte 3) after {
    	from /Users/hrupp/src/dsl_writing/ruby/json_parser.rb:21:in `parse'
    	from (irb):2

    While working in your own code, you can change the error handling to your liking.

    Traps

    In this section, I want to point out a few traps that one can easily fall into.

    Too much space

    Suppose the following grammar

      rule object
        '{' space kv_pair? space '}'
      end
      rule kv_pair
         space string space ':' space value space
      end

    Parsing or filling of the elements array may fail in unexpected ways. The parser is unable to
    determine to which element the various space characters should be matched. This becomes a bit more obvious when we 'insert' the kv_pair ` declaration into the `object one:

      rule object
        '{' space space string space ':' space value space space '}' <ObjectNode>
    #    0   1     2    3       4     5   6     7     8     9     10
      end

    At positions 1,2 and 8,9, each time the parser is asked to (greedily) consume white space. One could argue that positions 1 and 8 do this and 2 and 9 could be no-ops, but apparently, this is not the case.

    A solution can be to remove the surrounding space in kv_pair:

      rule kv_pair
         string space ':' space value
      end

    Adjacent things that should not be

    Take this grammar snippet:

      rule array
        '[' a_value?']'
      end

    Treetop will fail to compile the grammar, as there is no whitespace between a_value? and ']', so the expression makes no sense to it.

    Forgotten grouping

    In the next snippet, we have two aBool that can be tied together via 'and' or 'or'.

      rule eval_boolean
        aBool 'and' / 'or' aBool
      end

    Treetop will, in this case, try to match aBool 'and' or 'or' aBool, which is most of the time not what we intended. To fix this we need to put the alternatives into a group.

      rule eval_boolean
        aBool ( 'and' / 'or' ) aBool
      end

    Naming things

    Treetop will internally provide aliases for matches as we have seen before. Take this example:

      rule object
        '{' string ':' value '}'

    It will create an alias 'value'. Now when you define a function value on it, you will get
    a silent clash.

      rule object
        '{' string ':' value '}'
        {
           def value

    Best here is to name your functions in a way that you would never name the rules or grammar elements.

    Looking at a DSL for alert trigger creation

    The Hawkular project also has an alerting component that can be used to inform the user when conditions are met. Those could be when certain values are larger than a threshold, when a string is matching or when the availability of a resource goes down. The definition of the alert trigger with its conditions etc. is done via REST-api.

    With HawkFX, a pet project of mine, I have created an explorer for Hawkular. And in this regard, I have also started to add a DSL for trigger definition. I have blogged about it as well.

    I will not show the full code here, but some excerpts to give you some ideas how a more complex grammar may look like. The full code is available in the HawkFX repo on GitHub.

    Some examples

    Trigger for availability, with high severity that is initially disabled
    define trigger "MyTrigger"            (1)
      disabled                            (2)
      severity HIGH                       (3)
      ( availability "mymetric" is DOWN ) (4)
    1 We define a new trigger with the name MyTrigger
    2 It is disabled by at creation time
    3 When it fires an alert, then the alert severity is High
    4 It fires when the availability of mymetric is Down.
     
    Trigger that fires when two conditions are true
    define trigger "MyTrigger"
      AND(                                   (1)
        ( threshold counter "mycount" < 5 )  (2)
        ( string "mymetric" CO "ERROR" )     (3)
      )
    1 The trigger only fires if all conditions are true
    2 The condition is true if the counter-metric 'mycount' is less than 5
    3 This condition is true if the string-metric 'mymetric' contains the string 'ERROR'
    Trigger that disables itself after firing
    define trigger "MyTrigger"
     ( threshold "myvalue" > 3 )
     auto-disable (1)
    1 When this keyword is present, the Trigger will auto-disable after firing.

    The grammar

    Now that we have seen some examples, we can look at the grammar rules.

    Start rule to define a trigger
       rule define
         'define trigger ' space? name:string space? (1)
         disabled:disabled? space?
         sev:severity? space?
         id:id? space? conditions space? act:action? (2)
         ae:auto_enable? space? ad:auto_disable? space?
         <DefineNode>                                (3)
       end
    1 'define trigger' needs to be written as is
    2 We need conditions. Pretty much everything else is optional
    3 Evaluation happens in the DefineNode module

    The rule for disabled is pretty simple and just checks if the keyword 'disabled' is present.

    Rule for disabled
       rule disabled
         'disabled'
       end

    Inside the Ruby code it is queried on the value method for the DefineNode module:

    module DefineNode
      def value(env = {})
        t = Hawkular::Alerts::Trigger.new({})  (1)
        t.enabled = true if disabled.empty?    (2)
        t.name = name.value env
    [..]
    1 Trigger object that we want to prepare
    2 We check if disabled is set.

    Remember from above that disabled is available as a variable for us to use because we set it in the define rule via disabled:disabled?.

    Rules for conditions
      rule conditions
        conds:(and_cond / or_cond / condition)  <ConditionsNode>
      end

    Conditions can be either a single definition, or conditions joined by 'AND' or 'OR'. Next is the rule
    for the 'AND' case:

    Rules for conjunctions
      rule and_cond
        'AND(' space? first:condition more:( space condition )* space? ')' <AndConditionNode>
      end

    Here you see a trick that is often used: we parse the first condition in first:condition and then define the remaining conditions as a list of any number of conditions in the more part.

    In code this looks as follows:

    The AndConditionNode evaluation in Ruby
    module  AndConditionNode
      def value(env = {})
        first.value env                                      (1)
        more.elements.each {|elem| elem.condition.value env} (2)
        env[:trigger].firing_match = :ALL
      end
    end
    1 Evaluate the first condition
    2 Evaluate the additional and optional conditions.

    I am at the end of the example. Again, the full code is available in the HawkFX repository on GitHub.

    References

    Treetop home page http://treetop.rubyforge.org/index.html

    Parsing time ranges in Treetop https://github.com/jcinnamond/treetop-time-range

    Json.org http://json.org/index.html


    Join Red Hat Developers, a developer program for you to learn, share, and code faster – and get access to Red Hat software for your development.  The developer program and software are both free!


    Download Red Hat Enterprise Linux Developer Suite, which includes Red Hat Enterprise Linux 7 server, a collection of development tools, and much more.

    Last updated: February 6, 2024

    Recent Posts

    • How to enable Ansible Lightspeed intelligent assistant

    • Why some agentic AI developers are moving code from Python to Rust

    • Confidential VMs: The core of confidential containers

    • Benchmarking with GuideLLM in air-gapped OpenShift clusters

    • Run Qwen3-Next on vLLM with Red Hat AI: A step-by-step guide

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue