As part of my investigation work on PHP, I needed a preprocessor capable of extracting all the PHP from a PHP file, discarding all HTML it encounters.
Initially I thought it would be a hard job, but latter found that AntLR3 makes this job really easy!
I wrote something like this:
options { filter=true; }
PHP : ‘‘ { System.out.println(getText()); };This happens to work really good :) Now I can continue my PHP parser…
*: Of course my code doesn’t work when you have strings like '?>'
. Here’s a new version that should work :)
SINGLE_QUOTED_STRING : ’\’’ (’\\’ | ’\\’’ | ~(’\’‘))* ’\’’ ;
DOUBLE_QUOTED_STRING : ’”’ (’\\’ | ’\”’ | ~(’”’)) * ’”’ ;