User:OrenBochman/ParserNG/Transliterator Antlr
Appearance
Tranlitatrator Filter Antlr
[edit]"To make ANTLR generate lexers that behave like the UNIX utility sed (copy standard in to standard out except as specified by the replace patterns), use a filter rule that does the input to output copying:" - antlr docs [1]
class cfgSed extends Lexer;
options {
k=2;
filter=IGNORE;
charVocabulary = '\3'..'\177';
//if dictionary is needed
map<String,String> dictionary = loadDictionary();
}
//example of unicode to unicode conversion;
ALPHA1 : '\u000X'-'\u000Y';
KENJI : src:ALPHA1;
{ System.out.print(dictionary.get(src)); } // filter output
;
protected
IGNORE
: ( "\r\n" | '\r' | '\n' )
{newline(); System.out.println("");}
| c:. {System.out.print(c);}
;
based on [2]
the idea is to use a dictionary (map) or a conversion function to replace the detected char set.
Usage
[edit]this filter can be:
- Integrated into the lexer (one scan would be fastest).
- Run as a seperate step (modular, slow, easily configurable).
Issues
[edit]- Best if translitration is an offset or a call to an outside module.
- Dictionry and look ahead provide maximum tanliteration power.