Tuesday, June 24, 2008

regex replace part of a match in PHP

So, I'm working in PHP on the new company site, and we're pulling all sorts of feeds from our blogs and showing them on the site. I'm using Magpie RSS to do this, and for the most part works really well, except for funny characters. You know, the ones you get in Word that look just a little different. They're pretty awesome. Well, Magpie RSS handles them awesome too. It replaces all of these funny characters with one type of character - the question mark. Now I usually enjoy that little guy, its all squiggly and informs you to intonate correctly at the end of a sentence, but here the big Q is a pain in the ascot. And I can never remember how to replace only part of a match in regular expressions, so I am writing the PHP syntax down.

For example, I have the improperly formatted text "Charlie?s Angels"

I can use preg_replace and group the different sets matching characters with parens:

/(\?)([a-zA-Z])/

So this matches the ?s and groups the '?' and the 's' seperately which can be used as back references in my replace string with the characters $1 and $2 respectively. $0 refers to the whole match and back references work with numbers 1 through 9.

The full expression would be

$str = "Charlie?s Angels";
$s = preg_replace("/(\?)([a-zA-Z])/", "'$2", $str);

Now if only I can keep this in my noggin for a while...

Thursday, June 5, 2008