Skip to content

Latest commit

 

History

History
59 lines (50 loc) · 1.78 KB

README.md

File metadata and controls

59 lines (50 loc) · 1.78 KB

Regular Expression in R and Python

What is regular expression?

Regular expression is not a library nor is it a programming language. Instead, regular expression is a sequence of characters that specifies a search pattern in any given text (string).

A text can consist of pretty much anything from letters to numbers, space characters to special characters. As long as the string follows some sort of pattern, regex is robust enough to be able to capture this pattern and return a specific part of the string.

Basic regex characters

Characters

  • Escape character: \
  • Any character: .
  • Digit: \d
  • Not a digit: \D
  • Word character: \w
  • Not a word character: \W
  • Whitespace: \s
  • Not whitespace: \S
  • Word boundary: \b
  • Not a word boundary: \B
  • Beginning of a string: ^
  • End of a string: $

Groupings

  • Matches characters in brackets: [ ]
  • Matches characters not in brackets: [^ ]
  • Either or: |
  • Capturing group: ( )

Quantifiers

  • 0 or more: *
  • 1 or more: +
  • 0 or 1: ?
  • An exact number of characters: { }
  • Range of number of characters: {Minimum, Maximum}

Regex examples

  • Phone numbers
  • Dates
  • Names
  • URLs
  • Email address
  • Address

Medium article

Link to full write-up on Towards Data Science here.

Additional resources

Follow me