9.5.解析器逻辑

\(9.5.\)Parser Logic

1.Parsing logic

  In order to carry out parsing, we are going to build a CompilationEngine class which consist of a set of methods, one method for every non-terminal rule.

  • The code of each compilexxx method follows the right-hand side of the rule xxx
  • Each compilexxx method is responsible for advancing and handling its own part of the input.

  To illustrate this, let's focus on one of these methods compileWhileStatement:

  1. We follow the right-hand side of the rule, and parse the input it dictates.

    • In whileStatement, it says we should expect to see the token while. If we find it in the input, we record the fact. Then the left parenthesis, and so on and so forth,
    • At some point we will get a non-terminal rule like expression, then we go to next step:
  2. If the right-hand side specifies a non-terminal rule xxx, we call the corresponding method compilexxx

  3. We do this recursively untile we exhaust the input.

  And to initialize this process, we:

  1. Advance the tokenizer.
  2. Call compileWhileStatement.

2.Parser design

  We also take the compileWhileStatement as an example:

  1. If we look at the grammar, we first see that you should see a while , so we call a method eat('while'), which means you should expect now to eat the while token.
  2. We do the same thing eat('(').
  3. We meet a non-terminal rule expression, so we call compileExpression();
  4. And so on and so forth...
1
2
3
4
5
6
7
8
9
10
class CompilationEngine { 

compileWhileStatement() {
eat('while’);
eat('(');
compileExpression();
eat(')');
...
}
}

  When it comes to eat method:

  • It's a private method.
  • It says you should expect to see the given string, if not, you should throw an error.
  • Otherwise, simply advance the input.
1
2
3
4
5
6
eat(string) {
if (currentToken <> string)
error...
else
advance...
}

3.Some observation about grammars and parsing

  • LL grammar: It's a grammar that can be parsed by a recursive descent algorithm parser without backtracking.

    • When you make a decision that what you have is a while statement, you don't have to kind of retract your progress and find out a mistake.
  • LL(k) parser: It's an LL parser that needs to look ahead at most k tokens before you know which rule you are dealing with.

    • The grammar we saw so far is LL1, because once we have a particular token in hand like let, we immediately know which rule we have to invoke.
    • There are also things that don't belong to LL1, like English. For example, when we meet a token word "lift", we still have no idea what we should do with it. A verb? A noun? We have to look more tokens downstream to know which rule to use. In this process, we may need to backtrack and reconsider some of the rules we have to use...