Tuesday, November 4, 2008

Regex in markup

Well, in an older post (the italian one only) i made intensive use of regex to retrive HTML tags text content. Well, now i want to give you a simple trick to do this in a more elegant way.
Be careful: this post doesn't want to explain regex at all!
We all know that a markup tag with a content look like <tag_name>content</tag_name>.
Well to retrive the content we only need to match everything between tag_name, we match it as:
"everything tat is not the character 'less than'".
So the regex look like: /<tag_name>([^<]+)<\/tag_name>/

here a php interactive session:

Interactive shell

php > $string = "<tag_name_one>http://tag_name_one.com</tag_name_one><tag_name_two>123 ## @ jhkasdfh</tag_name_two>";
php > $tag_name = "tag_name_one";
php > $result = array();
php > preg_match("/<$tag_name>([^<]+)<\/$tag_name>/", $string, $result);
php > print_r($result);
Array
(
[0] => <tag_name_one>http://tag_name_one.com</tag_name_one>
[1] => http://tag_name_one.com
)
php > $tag_name = "tag_name_two";
php > preg_match("/<$tag_name>([^<]+)<\/$tag_name>/", $string, $result);
php > print_r($result);
Array
(
[0] => <tag_name_two>123 ## @ jhkasdfh</tag_name_two>
[1] => 123 ## @ jhkasdfh
)