Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We need some self serve RegEx #143

Open
jcdavison opened this issue Aug 21, 2014 · 1 comment
Open

We need some self serve RegEx #143

jcdavison opened this issue Aug 21, 2014 · 1 comment
Assignees

Comments

@jcdavison
Copy link
Contributor

Create some kind of supplemental kata/code/project that demonstrates how to use regex.

Things that I think could be useful to emphasize,

  • How to use Rubular
  • Why parsing web content with regex is a bad pattern.
  • Introduce some of the more useful regex-ish methods .match() .sub()
@zspencer zspencer added this to the 0.2.0 milestone Aug 23, 2014
@jfarmer
Copy link
Contributor

jfarmer commented Aug 24, 2014

In the last Fundamentals workshop, I did two sessions on regular expressions. These were "split sessions," so they covered the same material. I uploaded my raw, unfiltered notes to Slack (https://files.slack.com/files-pri/T02DYKN91-F02GXLKDN/download/regex_sessions.tar.gz).

When I was running the sessions, I started by talking about a text editor or word processor's typical search functionality and then offering up "queries" that we could express easily as people but which these applications couldn't handle. For example:

  1. Every word that begins with a number
  2. Uncapitalized words at the beginning of a sentence.
  3. Phone numbers

You could tell a human to search a Word document for these things and they'd understand what to do, notwithstanding how tedious the task would be. I then said that regular expressions are a language that allow us to express/articulate more sophisticated patterns.

It achieves these by allowing us to do two things that Word's search functionality doesn't:

  1. The ability to express character classes, i.e., instead of searching for a fixed character, we can search for one of the N characters contained in a character class.
  2. The ability to express the idea of repetition, i.e., look for an X repeated 0 or more times, 1 or more times, exactly 5 times, 0 or 1 times, and so on.

Before introducing the short-hand notation for common character classes like \d, \s, and so on, I made sure they understood regular expressions like

/[Aa]pples/
/[0123456789][0123456789][0123456789][0123456789][0123456789]/
/[0123456789]{5}/
/[23456789][0123456789]{2}-[23456789][0123456789]-[0123456789]{4}/

I then taught them the "range" short hand

/[0-9]{5}/
/[2-9][0-9]{2}-[2-9][0-9]{2}-[0-9]{4}/

After that, I went into more typical regexes. You can see in the notes where I spelled out the "vocabulary."

    . = any character whatsoever
    ? = 0 or 1 times
    * = 0 or more times
    + = 1 or more times
  {N} = exactly N times
{X,Y} = between X and Y times

Character classes, e.g., [aeiou], [01234]
There "convenience" shorthand ways of writing many of these

\d = [0123456789], i.e., any digit
\D = [^0123456789], i.e., any non-digit
\s = (any whitespace character, incl. space, tab, newline)
\S = any non-whitespace chracter
[A-Za-z0-9]

We can also express and combine ranges where it makes sense:

[0-9] = [0123456789]
[a-z] = [abcdefghijklmnopqrstuvwxyz]
[A-Z] = [ABCDEFGHIJKLMNOPQRSTUVWXYZ]

and combine them, e.g.,

[a-z,] = [abcdefghijklmnopqrstuvwxyz,]
[A-Za-z] = [ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz]
[a-z0-9] = [abcdefghijklmnopqrstuvwxyz0123456789]

@jcdavison jcdavison modified the milestones: The Distant Future, 0.2.0 Sep 12, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants