Regular Expressions: The Complete Tutorial

Transcription

Regular ExpressionsThe Complete TutorialJan Goyvaerts

Regular Expressions: The Complete TutorialJan GoyvaertsCopyright 2006, 2007 Jan Goyvaerts. All rights reserved.Last updated July 2007.No part of this book shall be reproduced, stored in a retrieval system, or transmitted by any means, electronic,mechanical, photocopying, recording, or otherwise, without written permission from the author.This book is published exclusively at y effort has been made to make this book as complete and as accurate as possible, but no warranty or fitness isimplied. The information is provided on an “as is” basis. The author and the publisher shall have neither liability norresponsibility to any person or entity with respect to any loss or damages arising from the information contained in thisbook.

iTable of ContentsTutorial. 11. Regular Expression Tutorial . 32. Literal Characters. 53. First Look at How a Regex Engine Works Internally . 74. Character Classes or Character Sets. 95. The Dot Matches (Almost) Any Character . 136. Start of String and End of String Anchors. 157. Word Boundaries. 188. Alternation with The Vertical Bar or Pipe Symbol . 219. Optional Items . 2310. Repetition with Star and Plus . 2411. Use Round Brackets for Grouping. 2712. Named Capturing Groups . 3113. Unicode Regular Expressions. 3314. Regex Matching Modes . 4215. Possessive Quantifiers . 4416. Atomic Grouping . 4717. Lookahead and Lookbehind Zero-Width Assertions. 4918. Testing The Same Part of a String for More Than One Requirement . 5219. Continuing at The End of The Previous Match. 5420. If-Then-Else Conditionals in Regular Expressions . 5621. XML Schema Character Classes . 5922. POSIX Bracket Expressions . 6123. Adding Comments to Regular Expressions . 6524. Free-Spacing Regular Expressions. 66Examples. 671. Sample Regular Expressions. 692. Matching Floating Point Numbers with a Regular Expression . 723. How to Find or Validate an Email Address. 734. Matching a Valid Date . 765. Matching Whole Lines of Text. 776. Deleting Duplicate Lines From a File . 788. Find Two Words Near Each Other. 799. Runaway Regular Expressions: Catastrophic Backtracking. 8010. Repeating a Capturing Group vs. Capturing a Repeated Group . 85Tools & Languages. 871. Specialized Tools and Utilities for Working with Regular Expressions . 892. Using Regular Expressions with Delphi for .NET and Win32. 91

ii3. EditPad Pro: Convenient Text Editor with Full Regular Expression Support . 924. What Is grep?. 955. Using Regular Expressions in Java . 976. Java Demo Application using Regular Expressions.1007. Using Regular Expressions with JavaScript and ECMAScript.1078. JavaScript RegExp Example: Regular Expression Tester .1099. MySQL Regular Expressions with The REGEXP Operator.11010. Using Regular Expressions with The Microsoft .NET Framework .11111. C# Demo Application.11412. Oracle Database 10g Regular Expressions.12113. The PCRE Open Source Regex Library .12314. Perl’s Rich Support for Regular Expressions.12415. PHP Provides Three Sets of Regular Expression Functions .12616. POSIX Basic Regular Expressions .12917. PostgreSQL Has Three Regular Expression Flavors .13118. PowerGREP: Taking grep Beyond The Command Line .13319. Python’s re Module .13520. How to Use Regular Expressions in REALbasic.13921. RegexBuddy: Your Perfect Companion for Working with Regular Expressions.14222. Using Regular Expressions with Ruby.14523. Tcl Has Three Regular Expression Flavors .14724. VBScript’s Regular Expression Support.15125. VBScript RegExp Example: Regular Expression Tester .15426. How to Use Regular Expressions in Visual Basic.15627. XML Schema Regular Expressions .157Reference.1591. Basic Syntax Reference .1612. Advanced Syntax Reference.1663. Unicode Syntax Reference .1704. Syntax Reference for Specific Regex Flavors.1715. Regular Expression Flavor Comparison.1736. Replacement Text Reference .182

iiiIntroductionA regular expression (regex or regexp for short) is a special text string for describing a search pattern. Youcan think of regular expressions as wildcards on steroids. You are probably familiar with wildcard notationssuch as *.txt to find all text files in a file manager. The regex equivalent is «.*\.txt» .But you can do much more with regular expressions. In a text editor like EditPad Pro or a specialized textprocessing tool like PowerGREP, you could use the regular expression «\b[A-Z0-9. % -] @[A-Z0-9.] \.[A-Z]{2,4}\b» to search for an email address. Any email address, to be exact. A very similar regularexpression (replace the first \b with and the last one with ) can be used by a programmer to check if theuser entered a properly formatted email address. In just one line of code, whether that code is written in Perl,PHP, Java, a .NET language or a multitude of other languages.Complete Regular Expression TutorialDo not worry if the above example or the quick start make little sense to you. Any non-trivial regex looksdaunting to anybody not familiar with them. But with just a bit of experience, you will soon be able to craftyour own regular expressions like you have never done anything else. The tutorial in this book explainseverything bit by bit.This tutorial is quite unique because it not only explains the regex syntax, but also describes in detail how theregex engine actually goes about its work. You will learn quite a lot, even if you have already been usingregular expressions for some time. This will help you to understand quickly why a particular regex does notdo what you initially expected, saving you lots of guesswork and head scratching when writing more complexregexes.Applications & Languages That Support RegexesThere are many software applications and programming languages that support regular expressions. If you area programmer, you can save yourself lots of time and effort. You can often accomplish with a single regularexpression in one or a few lines of code what would otherwise take dozens or hundreds.Not Only for ProgrammersIf you are not a programmer, you use regular expressions in many situations just as well. They will makefinding information a lot easier. You can use them in powerful search and replace operations to quickly makechanges across large numbers of files. A simple example is «gr[ae]y» which will find both sp

3 1. Regular Expression Tutorial In this tutorial, I will teach you all you need to know to be able to craft powerful time-saving regular expressions.