The Coder's Apprentice


When you are facing a problem with text mining, data processing, finding patterns in large collections of data, or scraping data from web pages, and you explore the Internet to find a solution to your problem, you will find that often the very first answer given to questions in this respect is “Why don’t you use regular expressions?” or even “Just use regex,” without further explanations. Rather smug answers, as many people have never heard of regular expressions, and if they have, might find them scary and incomprehensible. In fact, at first glance they come over as so arcane and confusing that most people rather shy away than delve into them. Which is a pity, as regular expressions are a powerful tool that should not be missing in the toolbox of anyone who deals with unstructured data on a regular basis.

In this chapter I will explain how to write and use basic regular expressions with Python. You will find them indeed a powerful way to quickly express and discover complex and diverse patterns in data, providing access to functionalities that would be very hard to implement in vanilla Python. While this chapter does not contain a complete overview of regular expressions, after studying it you will be able to understand and use regular expressions for most, if not all, pattern-matching problems that you encounter in practice, and be confident in telling the uninitiated: “You should use regular expressions to solve your problems.” Now you can feel smug too!