Basic Regular Expression

A beautiful sight

The Regular expression (regex, or just re) is a means of representation, used for string matching and searching.

The above definition seems a bit vague and confusing, but don’t worry! Let’s just try out an example:

Suppose you are writing a novel. You named your main character Jack and have written 300 pages describing his life and adventures. You give the draft to your friend so that she can read and give you some feedback. After a week, she comes to you and says: “Oh, the plot is so cool, I love the way you make up many monsters for him to oppose! But, hey, sometimes my mood got down because you mistyped your character’s name, sometimes you call him Jeck or so, what happened?” Oh my god! Really? – you think. That night, you open up your Microsoft Word, read through the first several pages but cannot find where you called your character Jeck. Tired of reading and finding, you suddenly remember that Word has a function to search. You press Ctrl + F, type “Jeck”, press Enter and happily find out all the word “Jeck” in your text.

That is it, the story ends here. The word “Jeck” that you typed into Word’s search bar is a regular expression, although in its simplest form.

Instead of using Word, let us simulate the process on Python. We will use a library call re, which is abbreviate for Regular Expression.

Click Run (the green triangle button) or press Ctrl + Enter to run the code.

You see that you have 2 times mistakenly typed Jack as Jeck.

You correct the two and think it is ok now. But not, as you know you mistyped his name because you were writing at night when you often feel very sleepy. So you think that maybe you not just typed Jeck, but also Juck or Jick or something like that, maybe. Hence, you want to search for any string that starts with a ‘J’, followed by any character, and then ‘ck’. How do you do that?

One way is to search for ‘Jbck’, then search for ‘Jcck’, then for ‘Jdck’, etc, then for ‘J0ck’, then for ‘J1ck’, etc. After a finite number of iterations, you will find all the mistyping of Jack’s name. But it costs you quite some time and effort.

Another way is to use Regular Expression. In regex, one arbitrary character is denoted by ‘.’ (dot sign). So you just need to find for “J.ck” in the regular expression mode.

The below code demonstrates the idea:

Click Run (the green triangle button) or press Ctrl + Enter to run the code.

Now, I bet you have had a feel of what regex is. Normally, without regex, we can easily search for an exact string in a text. With regex, we soften the constraint. We don’t find an exact string anymore, now we find any string that matches a pattern.

Some common regex’s special expressions are:

.Any (one) character
c*zero or more character c
c+one or more character c
c?zero or one character c
^beginning of line
$end of line
a-zcharacters from a to z
[ ]OR operator of all choices in the brackets
[^ ](caret in brackets) NOT any choices in the brackets
\‘\’ is escape sign, it says that the expression right-after it has a plain meaning, not regex-meaning

Here are some more examples:

You can actively modify the above code to try out all the given expressions!

Test your understanding
0%

Basic regular expression - Quiz

1 / 5

re.findall('^text[_?m-z]*', 'text_message')

What is the output of the above?

2 / 5

What regex matches the string "Hello Regex"?

3 / 5

Regarding Regex, what is the difference between "u+" and "u*"?

4 / 5

re.findall('n+', 'binning')

What is the output of the above?

5 / 5

Regarding Regex, what does "[e-h]" mean?

Your score is

0%

Please rate this quiz

Reference:

Leave a Reply