User:OrenBochman/ParserNG/antlr

The preprocessor is tricky business to code.

It has recursive rules a complicated (for parsing) format.
double curly expressions can have different semantics
triple curlies too
single curlies also on occasion.

next follow some advanced Antlr specific features illustrated by things they could help to fix.

Recursive Lexer / Parser Rules

When creating a recursive expresions definition, the lowest nesting level is the lowest precedence of the operator matched in that rule.

Asociativity

expr: mult ('+' mult)* ; // left-associative via (...)*
mult: pow ('*' pow)* ;
pow : atom ('^' pow)? ; // right-associative via tail recursion
atom: ID| INT ;

In The PreProcessor

thus:


grammar:PreProcessor;

members
{
String myName;
boolean isTemplate(String name){...}
String getTemplate(String name, List<String> args){...}
boolean isTemplateArg(String name){...}
String getTemplateArg(String name){...}
boolean isMagicWord(String name){...}
String getMagicWord(String name, List<String> args){...}
boolean isParserFunction(String name){...}
String getParserFunction(String name, List<String> args){...}
}
wikiExpr: curlyExpr;
curlyExpr: parserFunction| magicWord | templateVar | template ;
angleExpr: ext | nowiki | noInclude | includeOnly | comment | pre ;
template: {isTemplate(input.LT(2).text)}?=>'{{' NAME TEMPLATE_ARGS | curlyExpr | /* empty */ '}}'; // right-associative via tail recursion
templateVar: {isTemplate(input.LT(2).text)}?=>'{{{' NAME | /* empty */ '}}}'; // right-associative via tail recursion
parserFunction : {isParserFunction(input.LT(2).text)}?=>'{{' NAME PF_ARRGS '}}'; //curly atom
magicWord : {isMagicWord(input.LT(2).text)}?=>'{{' NAME MW_ARGS '}}'; //curly atom
//lexer rules
PF_ARRGS: ... ;
MW_ARGS: ... ;
TEMPLATE_ARGS: ... ;

Semantic and Syntactic Predicates

two advanced ideas are

syntactic predicates

parts of a template look like:

parts: 
 title=LITERAL (PIPE (args=LITERAL)*)*;

semantic predicates

Using Semantics To Simplify Syntax Parsing

does {{LITERAL| ... }} reffer to a parser function, magic word or a template ?

since the parser should be privy to a symbol table of parser_functions, magic_words, templates, variables, etc they can be used to simplify (read: speed up parsing using a simple semantic check (isMagicWord) instead of a deep syntax scan.) in Antlr speak lookahead is said to be analogous to checking all options of the maze, using predicates is called sending in a trained monkiess to stand in the required junction points.

IncludeOnly and NoInclude semantics

IncludeOnly means include the following text only within the scope of the tansculded page (expanded page).
NoInclude means include the following text only within the scope of the template page (expanded page).

//todo prototype and paste here :-)

Look Ahead

This is an example of using look ahead in the parser. There is also lookahead in the lexer. But it does not return string only ints (of chars).

Implementing some parser functions

if switch etc are the where parsers shine. the parser rules below use sysntactic predicates:

stat: keyIF expr stat
| keyCALL ID ';'
| ';'
;

/** An ID whose text is "#if" */
keyIF : {input.LT(1).getText().equals("#if")}? ID ;
/** An ID whose text is "#switch" */
keyCALL : {input.LT(1).getText().equals("#switch")}? ID ;

Implementing some Magic Words

Magic words can be trickier to take care of since. one idea is to keep them in a data structure like a hash or a trie.

@members{

  public boolean isMagic(String token){

     return (magic.isKey(token))); // a hash of magic words token (and thier functor objects)
  }


}




keyCALL : {isMagic(input.LT(1).getText())}? ID ;