There’s no place like ::1

Extracting PHP From HTML Files With AntLR3

| Comments

As part of my investigation work on PHP, I needed a preprocessor capable of extracting all the PHP from a PHP file, discarding all HTML it encounters.

Initially I thought it would be a hard job, but latter found that AntLR3 makes this job really easy!

I wrote something like this:

lexer grammar FuzzyPHP;

options { filter=true; }

PHP : ‘‘ { System.out.println(getText()); };

This happens to work really good :) Now I can continue my PHP parser…

*: Of course my code doesn’t work when you have strings like '?>'. Here’s a new version that should work :)

PHP : ‘SINGLE_QUOTED_STRING | DOUBLE_QUOTED_STRING)) ’?>’ { out.println(getText()); };

SINGLE_QUOTED_STRING : ’\’’ (’\\’ | ’\\’’ | ~(’\’‘))* ’\’’ ;

DOUBLE_QUOTED_STRING : ’”’ (’\\’ | ’\”’ | ~(’”’)) * ’”’ ;
