User:OrenBochman/ParserNG/WikiTable
Appearance
An Antlr Spec for the WikiTable Markup
ANTLR spec
[edit]grammar wikiTable;
@header {
package p;
}
//@header {import org.antlr.test;} // not auto-copied to lexer
@lexer::header{
package p;
//import org.antlr.test;
//
}
@lexer::members {
//state check are deeply nested in a table are we?
int inTable=0;
List tokens = new ArrayList();
public void emit(Token token) {
state.token = token;
tokens.add(token);
}
public Token nextToken() {
super.nextToken();
if ( tokens.size()==0 ) {
return Token.EOF_TOKEN;
}
return (Token)tokens.remove(0);
}
}
@members{
//int inTable=0;
//public void foo(){};
//int rows=0;
}
//Parser Rules
wikiTable : TBL_START xml_attributes? caption? head? rows TBL_END;
caption : CAPTION_START HS xml_attributes? captionText=TEXT+;
fragment
head : (hCell hCellInLine*)+;
rows : (firstRow|row) row*;
firstRow : cells;
row : rowStart xml_attributes? cells;
rowStart : ROW_START;
cells :((cell|hCell) (cellInline|hCellInLine)*)+;
cell : CELL_START xml_attributes? text=TEXT*;
cellInline : CELL_INLINE_STRT xml_attributes? text=TEXT*;
hCell : HEAD_START xml_attributes? text=TEXT*;
hCellInLine : HEAD_INLINE_STRT xml_attributes? text=TEXT*;
//this is the recursive definition alowing table nesting
//cells :( {input.LT(0)==CELL_START||input.LT(0)==HEAD_START}?=>(HEAD_START | CELL_START) XHTML_ATTRIBUTES? (TEXT|wikiTable)+ (CELL_INLINE_STRT XHTML_ATTRIBUTES? (TEXT|wikiTable)+)* )+ ;
//this needs to be in the parser for LT(2) to mean the second parser token
xml_attributes: {input.LT(2).getText().equals("=")}? xml_attribute+ PIPE? ;
xml_attribute: name=TEXT EQ DQUOTE value=TEXT* DQUOTE ;
//Lexer Rules
TBL_START : {getCharPositionInLine()==0}?=> '{|'{inTable++; } ;
TBL_END : {getCharPositionInLine()==0&&inTable>0}?=> '|}'{inTable--;} ;
HEAD_START : {getCharPositionInLine()==0&&inTable>0}?=> '!';
HEAD_INLINE_STRT: {inTable>0}?=> '!!';
CELL_START : {getCharPositionInLine()==0&&inTable>0}?=> '|'; //this should only be recignised within a table
PIPE : {getCharPositionInLine()>0||inTable==0}?=> '|'; //outside table or not at tart of line
CELL_INLINE_STRT: {inTable>0}?=> '||'; //this should only be recignised within a table
ROW_START : {getCharPositionInLine()==0&&inTable>0}?=> '|-' ;
CAPTION_START : {getCharPositionInLine()==0&&inTable>0}?=> '|+' ;
TEXT : ('a'..'z'|'A'..'Z'|'0'..'9'|'.'|'-'|';'|':'|',')+; //simplified
DQUOTE : '"';
//WS : (HS | VS) ; //{ $channel = HIDDEN; } ;
HS : ( ' ' | '\t' )+ { $channel = HIDDEN; } ;
VS : ( '\r' | '\n' )+ { $channel = HIDDEN; } ;
EQ : '=';
Status
[edit]- This is a lexer + a parser.
- Tested against the examples in table.
- A tree grammar or a string template could be used to transform into XHTM etc.
- Does not support full unicode to simplify development - but the string could be changed with minimal impact.
Problems
[edit]The speck has a recognizer nondeterminism [1]
- Antlr is unabile to decide which path to take when meeting a HEAD_START symbol since it could belong to
- In the optional header.
- There is no optional header but the body starts with a header. (this is a mistake)
- This is a warning and option #2 is discarded . How could this nondeterminsm be removed ?
- adding a variable with a table wide scope
boolean hasHead=TRUE;
- use it in a predicate on the optional header
{hasHead}?;
- add an action after the optional header to flip it
{hasHead=FALSE;}
- adding a variable with a table wide scope
- Antlr complains that the first non-header cell might belong
- In the (optional) first row, i.e. the one without a |- indicator.
- In the optional other rows after.
Table in Table Test
[edit]You type | You get | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
<!-- outer --> {| border="1" | Orange || Apple || align="right" | 12,333.00 |- | Bread || Pie || align="right" | 500.00 |- | Butter || Ice cream || align="right" | 1.00 <!-- inner --> {| border="1" | Orange || Apple || align="right" | 12,333.00 |- | Bread || Pie || align="right" | 500.00 |- | Butter || Ice cream || align="right" | 1.00 |} |} |
|
Refrences
[edit]- â The Definitive ANTLR Reference: Building Domain-Specific Languages; Terence Parr; 2007; ISBN 0-9787392-5-6 p.127