We need some self serve RegEx #143

jcdavison · 2014-08-21T19:58:18Z

Create some kind of supplemental kata/code/project that demonstrates how to use regex.

Things that I think could be useful to emphasize,

How to use Rubular
Why parsing web content with regex is a bad pattern.
Introduce some of the more useful regex-ish methods .match() .sub()

The text was updated successfully, but these errors were encountered:

jfarmer · 2014-08-24T00:35:43Z

In the last Fundamentals workshop, I did two sessions on regular expressions. These were "split sessions," so they covered the same material. I uploaded my raw, unfiltered notes to Slack (https://files.slack.com/files-pri/T02DYKN91-F02GXLKDN/download/regex_sessions.tar.gz).

When I was running the sessions, I started by talking about a text editor or word processor's typical search functionality and then offering up "queries" that we could express easily as people but which these applications couldn't handle. For example:

Every word that begins with a number
Uncapitalized words at the beginning of a sentence.
Phone numbers

You could tell a human to search a Word document for these things and they'd understand what to do, notwithstanding how tedious the task would be. I then said that regular expressions are a language that allow us to express/articulate more sophisticated patterns.

It achieves these by allowing us to do two things that Word's search functionality doesn't:

The ability to express character classes, i.e., instead of searching for a fixed character, we can search for one of the N characters contained in a character class.
The ability to express the idea of repetition, i.e., look for an X repeated 0 or more times, 1 or more times, exactly 5 times, 0 or 1 times, and so on.

Before introducing the short-hand notation for common character classes like \d, \s, and so on, I made sure they understood regular expressions like

/[Aa]pples/
/[0123456789][0123456789][0123456789][0123456789][0123456789]/
/[0123456789]{5}/
/[23456789][0123456789]{2}-[23456789][0123456789]-[0123456789]{4}/

I then taught them the "range" short hand

/[0-9]{5}/
/[2-9][0-9]{2}-[2-9][0-9]{2}-[0-9]{4}/

After that, I went into more typical regexes. You can see in the notes where I spelled out the "vocabulary."

    . = any character whatsoever
    ? = 0 or 1 times
    * = 0 or more times
    + = 1 or more times
  {N} = exactly N times
{X,Y} = between X and Y times

Character classes, e.g., [aeiou], [01234]
There "convenience" shorthand ways of writing many of these

\d = [0123456789], i.e., any digit
\D = [^0123456789], i.e., any non-digit
\s = (any whitespace character, incl. space, tab, newline)
\S = any non-whitespace chracter
[A-Za-z0-9]

We can also express and combine ranges where it makes sense:

[0-9] = [0123456789]
[a-z] = [abcdefghijklmnopqrstuvwxyz]
[A-Z] = [ABCDEFGHIJKLMNOPQRSTUVWXYZ]

and combine them, e.g.,

[a-z,] = [abcdefghijklmnopqrstuvwxyz,]
[A-Za-z] = [ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz]
[a-z0-9] = [abcdefghijklmnopqrstuvwxyz0123456789]

jcdavison assigned zspencer Aug 21, 2014

zspencer added this to the 0.2.0 milestone Aug 23, 2014

zspencer added the New Curriculum label Aug 23, 2014

jcdavison modified the milestones: The Distant Future, 0.2.0 Sep 12, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

We need some self serve RegEx #143

We need some self serve RegEx #143

jcdavison commented Aug 21, 2014

jfarmer commented Aug 24, 2014

We need some self serve RegEx #143

We need some self serve RegEx #143

Comments

jcdavison commented Aug 21, 2014

jfarmer commented Aug 24, 2014