Skip to content

Regular Expressions

rollynoel edited this page Jun 13, 2013 · 2 revisions

Boo has built-in support for regular expression literals.

You surround the regular expression with / /, or @/ / for more complex expressions that contain whitespace. Boo even has an =~ operator like Perl.

Here are some samples of using regular expressions in boo:

//pattern matching using perl's match operator (=) samplestring = "Here is foo" if samplestring = /foo/: print "it's a match"

//sample regex object re = /foo(bar)/ print re.GetType() //-> System.Text.RegularExpressions.Regex

//split a line on spaces: words = @/ /.Split(samplestring) //you can also use \s for any type of whitespace print join(words, ",")

//similar example with unpacking s = "First Last" first, last = @/ /.Split(s)

//another way to do matching without =~ m = /abc/.Match("123abc456") if m.Success: print "Found match at position:", m.Index print "Matched text:", m.ToString()

//more complex example with named groups s = """ Joe Jackson 131 W. 5th Street New York, NY 10023 """

r = /(?<=\n)\s*(?[^\n]+)\s*,\s*(?\w+)\s+(?\d{5}(-\d{4})?).*$/.Match(s)

print r.Groups["city"] print r.Groups["state"] print r.Groups["zip"]

//just for reference, the match operator (=) works with regular strings, too: if samplestring = "foo": print "it matches"

//the "not match" operator (!) has not been implemented yet, but you //can simply prefix the expression with "not": if not samplestring = /badfoo/: print "no match"

Specifying regex options

One limitation of using built-in support for regular expressions is that you can't use non-standard regex options like regex compiled.

Actually there is a way to use the ignore case option. Add a (?i) to the beginnning of your regex pattern like so:

pattern = /(?i)google/ print "google" =~ pattern print "Google" =~ pattern

But in other cases you may want to just use the .NET Regex class explicitly:

import System.Text.RegularExpressions

samplestring = "Here is foo" re = Regex("FOO", RegexOptions.IgnoreCase | RegexOptions.Compiled)

//one way: if samplestring =~ re: print "we matched"

//another using Match if re.IsMatch(samplestring): print "we matched"

Regex Replace method

This would be an equivalent to using perl's switch/replace statement: s/foo/bar/g.

import System.Text.RegularExpressions

text = "four score and seven years ago" print text

//replace each word with "X" t2 = /\w+/.Replace(text, "X") print t2 //-> X X X X X X

//replace only the first occurence of a word with X, // starting after the 15th character: t3 = /\w+/.Replace(text, "X", 1, 15) print t3 //-> four score and X years ago

//use a closure (or a regular method) to capitalize each word: t4 = /\w+/.Replace(text) do (m as Match): s = m.ToString() if System.Char.IsLower(s[0]): //built-in char type will be added soon return System.Char.ToUpper(s[0]) + s[1:] return s print t4 //-> Four Score And Seven Years Ago

//Back References are supported too! using the dollar sign phonenumber = "5551234567" phonenumber = /(\d{3})(\d{3})(\d{4})/.Replace(phonenumber, "($1) $2-$3") print phonenumber //-> (555) 123-4567

regex primitive type

Also note, Boo has a built-in primitive type called "regex" (lowercase) that means the same thing as the .NET Regex class. So you can do for example:

import System.Text.RegularExpressions

re = /foo(bar)/

if re isa regex: print "re is a regular expression"

//or declare the type explicitly: re2 as regex = /foo(bar)/ print re2 isa regex

//using the regex primitive constructor //(You'll need an import statement for RegexOptions) re3 = regex("FOO", RegexOptions.IgnoreCase | RegexOptions.Compiled) print re3 isa regex

See also:

Using Regular Expressions in .NET
.NET Framework Regular Expressions
The Regular Expression Library has hundreds of user-contributed regex samples.
txt2regex - command line to assist with constructing regular expressions
Clone this wiki locally