Tutorial: Using Regular Expressions

Transcription

Tutorial: Using regular expressionsSection 1. Introduction to the tutorialWho is this tutorial for?This tutorial is aimed at programmers who workwith tools that use regular expressions, and whowould like to become more comfortable with theintricacies of regular expressions. Evenprogrammers who have used regular expressionsin the past, but have forgotten some of the details,can benefit from this tutorial as a refresher.After completing this tutorial, you will not yet be anexpert in using regular expressions to bestadvantage. But this tutorial combined with lots ofpractice with varying cases is about all you need tobe an expert. The concepts of regular expressionsare extremely simple and powerful -- it is theirapplication that takes some work.Just what is a regular expression, anyway?Take the tutorial to get the long answer. The short answer is that a regular expressionis a compact way of describing complex patterns in texts. You can use them to searchfor patterns and, once found, to modify the patterns in complex ways. You can also usethem to launch programmatic actions that depend on patterns.A tongue-in-cheek comment by programmers is worth thinking about: "Sometimes youhave a programming problem and it seems like the best solution is to use regularexpressions; now you have two problems." Regular expressions are amazinglypowerful and deeply expressive. That is the very reason writing them is just aserror-prone as writing any other complex programming code. It is always better tosolve a genuinely simple problem in a simple way; when you go beyond simple, thinkabout regular expressions.Tutorial: Using regular expressionsPage 1

Presented by developerWorks, your source for great tutorialsibm.com/developerWhat tools use regular expressions?Many tools incorporate regular expressions as part of their functionality. UNIX-orientedcommand line tools like grep, sed, and awk are mostly wrapper for regular-expressionprocessing. Many text editors allow search and/or replacement based on regularexpressions. Many programming languages, especially scripting languages such asPerl, Python, and TCL, build regular expressions into the heart of the language. Evenmost command-line shells, such as Bash or the Windows-console, allow restrictedregular expressions as part of their command syntax.There are a few variations in regular-expression syntax between different tools thatuse them. Some tools add enhanced capabilities that are not available everywhere. Ingeneral, for the simplest cases, this tutorial will use examples based around grep orsed. For a few more exotic capabilities, Perl or Python examples will be chosen. Forthe most part, examples will work anywhere; but check the documentation on your owntool for syntax variations and capabilities.Note on presentationFor purposes of presenting examples in thistutorial, regular expressions described will besurrounded by forward slashes. This style ofdelimiting regular expressions is used by sed, awk,Perl, and other tools. For instance, an examplemight mention:/[A-Z] (abc xyz)*/Read ahead to understand this example, for nowjust understand that the actual regular expressionis everything between the slashes.Many examples will be accompanied by anillustration that shows a regular expression, andtext that is highlighted for every match on thatexpression.Tutorial: Using regular expressionsPage 2

Presented by developerWorks, your source for great tutorialsibm.com/developerTutorial navigationNavigating through the tutorial is easy:···Select Next and Previous to move forward and backward through the tutorial.When you're finished with a section, select the Main menu for the next section.Within a section, use the Section menu.If you'd like to tell us what you think, or if you have a question for the author aboutthe content of the tutorial, use the Feedback button.ContactDavid Mertz is a writer, a programmer, and a teacher who always endeavors toimprove his communication to readers (and tutorial takers). He welcomes anycomments; please direct them to mertz@gnosis.cx.Tutorial: Using regular expressionsPage 3

Presented by developerWorks, your source for great tutorialsibm.com/developerSection 2. Basic pattern matching in text/a/Mary had a little lamb.And everywhere that Marywent, the lamb was sureto go./Mary/Mary had a little lamb.And everywhere that Marywent, the lamb was sureto go.Character literalsThe very simplest pattern matched by a regularexpression is a literal character or a sequence ofliteral characters. Anything in the target text thatconsists of exactly those characters in exactly theorder listed will match. A lowercase character isnot identical to its uppercase version, and viceversa. A space in a regular expression, by the way,matches a literal space in the target (this is unlikemost programming languages or command-linetools, where spaces separate keywords)./.*/Special characters must be escaped.*/\.\*/Special characters must be escaped.*Tutorial: Using regular expressions"Escaped" charactersliteralsA number of characters have specialmeanings to regular expressions. Asymbol with a special meaning can bematched, but to do so you must prefix itwith the backslash character (thisincludes the backslash character itself:to match one backslash in the target,your regular expression should include"\\").Page 4

Presented by developerWorks, your source for great tutorials/ Mary/Mary had a little lamb.And everywhere that Marywent, the lamb was sureto go./Mary /Mary had a little lamb.And everywhere that Marywent, the lamb was sureto go.ibm.com/developerPositional special charactersTwo special characters are used in almost allregular expression tools to mark the beginning andend of a line: caret ( ) and dollar-sign ( ). Tomatch a caret or dollar-sign as a literal character,you must escape it (that is, precede it with abackslash "\").An interesting thing about the caret and dollar-signis that they match zero-width patterns. That is,the length of the string matched by a caret ordollar-sign by itself is zero (but the rest of theregular expression can still depend on thezero-width match). Many regular expression toolsprovide another zero-width pattern forword-boundary (\b). Words might be divided bywhitespace like spaces, tabs, newlines, or othercharacters like nulls; the word-boundary patternmatches the actual point where a word starts orends, not the particular whitespace characters./.a/Mary had a little lamb.And everywhere that Marywent, the lamb was sureto go.The "wildcard" characterIn regular expressions, a period can stand for anycharacter. Normally, the newline character is notincluded, but most tools have optional switches toforce inclusion of the newline character also. Usinga period in a pattern is a way of requiring that"something" occurs here, without having to decidewhat.Users who are familiar with DOS command-linewildcards will know the question-mark as filling therole of "some character" in command masks. But inregular expressions, the question-mark has adifferent meaning, and the period is used as awildcard.Tutorial: Using regular expressionsPage 5

Presented by developerWorks, your source for great tutorials/(Mary)( )(had)/Mary had a little lamb.And everywhere that Marywent, the lamb was sureto go.ibm.com/developerGrouping regular expressionsA regular expression can have literal characters init, and also zero-width positional patterns. Eachliteral character or positional pattern is an atom ina regular expression. You may also group severalatoms together into a small regular expression thatis part of a larger regular expression. One might beinclined to call such a grouping a "molecule," butnormally it is also called an atom.In older UNIX-oriented tools like grep,subexpressions must be grouped with escapedparentheses, as in /\(Mary\)/. In Perl and mostmore recent tools (including egrep), grouping isdone with bare parentheses, but matching a literalparenthesis requires escaping it in the pattern (theexample follows the Perl style)./[a-z]a/Mary had a little lamb.And everywhere that Marywent, the lamb was sureto go.Character classesRather than name only a single character, you caninclude a pattern in a regular expression thatmatches any of a set of characters.A set of characters can be given as a simple listinside square brackets; for example, /[aeiou]/will match any single lowercase vowel. For letter ornumber ranges you may also use only the first andlast letter of a range, with a dash in the middle; forexample, /[A-Ma-m]/ will match any lowercaseor uppercase in the first half of the alphabet.Many regular expression tools also provideescape-style shortcuts to the most commonly usedcharacter class, such as \w for a whitespacecharacter and \d for a digit. You could alwaysdefine these character classes with squarebrackets, but the shortcuts can make regularexpressions more compact and readable.Tutorial: Using regular expressionsPage 6

Presented by developerWorks, your source for great tutorials/[ a-z]a/Mary had a little lamb.And everywhere that Marywent, the lamb was sureto go.ibm.com/developerComplement operatorThe caret symbol can actually have two differentmeanings in regular expressions. Most of the time,it means to match the zero-length pattern for linebeginnings. But if it is used at the beginning of acharacter class, it reverses the meaning of thecharacter class. Everything not included in thelisted character set is matched./cat dog bird/The pet store sold cats, dogs, ands./ xxx yyy / xxx xxx # yyy yyy # xxx # yyy /( )(xxx) (yyy)( )/ xxx xxx # yyy yyy # xxx # yyy / (xxx yyy) / xxx xxx # yyy yyy # xxx # yyy Alternation of patternsUsing character classes is a way ofindicating that either one thing or anotherthing can occur in a particular spot. Butwhat if you want to specify that either oftwo whole subexpressions occurs in aposition in the regular expression? Forthat, you use the alternation operator,the vertical bar (" "). This is the symbolthat is also used to indicate a pipe inUNIX/DOS shells, and is sometimescalled the pipe character.The pipe character in a regularexpression indicates an alternationbetween everything in the groupenclosing it. Even if there are severalgroups to the left and right of a pipecharacter, the alternation greedily asksfor everything on both sides. To selectthe scope of the alternation, you mustdefine a group that encompasses thepatterns that may match. The exampleillustrates this.Tutorial: Using regular expressionsPage 7

Presented by developerWorks, your source for great tutorials/@( )*@/Match with zero in the middle: @@Subexpression occurs, but.: @ ABC@Many occurrences: @ @Repeat entire pattern: @ @ibm.com/developerThe basic abstractquantifierOne of the most powerful and common thingsyou can do with regular expressions is specifyhow many times an atom occurs in acomplete regular expression. Sometimes youwant to specify something about theoccurrence of a single character, but veryoften you are interested in specifying theoccurrence of a character class or a groupedsubexpression.There is only one quantifier included with"basic" regular expression syntax, the asterisk("*"); this has the meaning "some or none" or"zero or more." If you want to specify that anynumber of an atom may occur as part of apattern, follow the atom by an asterisk.Without quantifiers, grouping expressionsdoesn't really serve much purpose, but oncewe can add a quantifier to a subexpressionwe can say something about the occurrenceof the subexpression as a whole. Take a lookat the example.Tutorial: Using regular expressionsPage 8

Presented by developerWorks, your source for great tutorialsibm.com/developerSection 3. Intermediate pattern matching in text/A B*C?DAAADABBBBCDBBBCDABCCDAAABBBCMore abstract quantifiersIn a way, the lack of any quantifier symbol after anatom quantifies the atom anyway: it says the atomoccurs exactly once. Extended regularexpressions (which most tools support) add a fewother useful numbers to "once exactly" and "zeroor more times." The plus-sign (" ") means "one ormore times" and the question-mark ("?") means"zero or one times." These quantifiers are by farthe most common enumerations you wind upnaming.If you think about it, you can see that the extendedregular expressions do not actually let you "say"anything the basic ones do not. They just let yousay it in a shorter and more readable way. Forexample, "(ABC) " is equivalent to(ABC)(ABC)*"; and "X(ABC)?Y" is equivalent toXABCY XY". If the atoms being quantified arethemselves complicated grouped subexpressions,the question-mark and plus-sign can make thingsa lot shorter.Tutorial: Using regular expressionsPage 9

Presented by developerWorks, your source for great tutorials/a{5} b{,6} c{4,8}/aaaaa bbbbb cccccaaa bbb cccaaaaa bbbbbbbbbbbbbb ccccc/a b{3,} c?/aaaaa bbbbb cccccaaa bbb cccaaaaa bbbbbbbbbbbbbb ccccc/a{5} b{6,} c{4,8}/aaaaa bbbbb cccccaaa bbb cccaaaaa bbbbbbbbbbbbbb cccccTutorial: Using regular expressionsibm.com/developerNumeric quantifiersUsing extended regular expressions, you canspecify arbitrary pattern occurrence counts using amore verbose syntax than the question-mark,plus-sign, and asterisk quantifiers. Thecurly-braces ("{" and "}") can surround a precisecount of how many occurrences you are lookingfor.The most general form of the curly-bracequantification uses two range arguments (the firstmust be no larger than the second, and both mustbe non-negative integers). The occurrence countis specified this way to fall between the minimumand maximum indicated (inclusive). As shorthand,either argument may be left empty: if so, theminimum/maximum is specified as zero/infinity,respectively. If only one argument is used (with nocomma in there), exactly that many occurrencesare matched.Page 10

Presented by developerWorks, your source for great tutorials/(abc xyz) \1/jkljkljkljklabcxyzabcxyzxyzabcabcxyz/(abc xyz) (abc eveloperBackreferencesOne powerful option in creating search patterns isspecifying that a subexpression that was matchedearlier in a regular expression is matched againlater in the expression. We do this usingbackreferences. Backreferences are named bythe numbers 1 through 9, preceded by thebackslash/escape character when used in thismanner. These backreferences refer to eachsuccessive group in the match pattern, as in/(one)(two)(three)/\1\2\3/. Eachnumbered backreference refers to the group that,in this example, has the word corresponding to thenumber.It is important to note something the exampleillustrates. What gets matched by a backreferenceis the same literal string matched the first time,even if the pattern that matched the string couldhave matched other strings. Simply repeating thesame grouped subexpression later in the regularexpression does not match the same targets asusing a backreference (but you have to decidewhat you actually want to match in either case).Backreferences refer back to whatever occurred inthe previous grouped expressions, in the orderthose grouped expressions occurred. Because ofthe naming convention (\1-\9), many tools limit youto nine backreferences. Some tools allow actualnaming of backreferences and/or saving them toprogram variables. Section 4 touches on thesetopics.Tutorial: Using regular expressionsPage 11

Presented by developerWorks, your source for great tutorials/th.*s/-- Match the words that start-- with 'th' and end with 's'.thisthusthistlethis line matches too muchibm.com/developerDon't match more than youwant toQuantifiers in regular expressions are greedy.That is, they match as much as they possibly can.Probably the easiest mistake to make incomposing regular expressions is to match toomuch. When you use a quantifier, you want it tomatch everything (of the right sort) up to the pointwhere you want to finish your match. But whenusing the "*", " ", or numeric quantifiers, it is easyto forget that the last bit you are looking for mightoccur later in a line than the one you are interestedin.Tutorial: Using regular expressionsPage 12

Presented by developerWorks, your source for great tutorials/th.*s/-- Match the words that start-- with 'th' and end with 's'./th[ s]*./-- Match the words that start-- with 'th' and end with 's'.thisthusthistlethis line matches too muchibm.com/developerTricks for restraining matchesIf you find that your regular expressions arematching too much, a useful procedure is toreformulate the problem in your mind. Rather thanthinking "what am I trying to match later in theexpression?" ask yourself "what do I need to avoidmatching in the next part?". Often this leads tomore parsimonious pattern matches. Often the wayto avoid a pattern is to use the complementoperator and a character class. Look at theexample, and think about how it works.The trick here is that there are two different waysof formulating almost the same sequence. Youcan either think you want to keep matching untilyou get to XYZ, or you can think you want to keepmatching unless you get to XYZ. These are subtlydifferent.For people who have thought about basicprobability, the same pattern occurs. The chanceof rolling a 6 on a die in one roll is 1/6. What is thechance of rolling a 6 in six rolls? A naivecalculation puts the odds at1/6 1/6 1/6 1/6 1/6 1/6, or 100%. This is wrong,of course (after all, the chance after twelve rollsisn't 200%). The correct calculation is "how do Iavoid rolling a 6 for six rolls?" -- in other words,5/6*5/6*5/6*5/6*5/6*5/6, or about 33%. The chanceof getting a 6 is the same chance as not avoidingit (or about 66%). In fact, if you imaginetranscribing a series of dice rolls, you could apply aregular expression to the written record, andsimilar thinking applies.Tutorial: Using regular expressionsPage 13

Presented by developerWorks, your source for great tutorialsibm.com/developerComments on modification toolsNot all tools that use regular expressions allow you to modify target strings. Somesimply locate the matched pattern; the mostly widely used regular expression tool isprobably grep, which is a tool for searching only. Text editors, for example, may or maynot allow replacement in their regular expression search facility. As always, consult thedocumentation on your individual tool.Of the tools that allow you to modify target text, there are a few differences to keep inmind. The way you actually specify replacements will vary between tools: a text editormight have a dialog box; command-line tools will use delimiters between match andreplacement, programming languages will typically call functions with arguments formatch and replacement patterns.Another important difference to keep in mind is what is getting modified. UNIX-orientedcommand-line tools typically utilize pipes and STDOUT for changes to buffers, ratherthan modify files in-place. Using a sed command, for example, will write themodifications to the console, but will not change the original target file. Text editors orprogramming languages are more likely to actually modify a file in-place.A note on modification examplesFor purposes of this tutorial, examples will continue to use the sed style slashdelimiters. Specifically, the examples will indicate the substitution command and theglobal modifier, as with "s/this/that/g". This expression means: "Replace thestring 'this' with the string 'that' everywhere in the target text.Examples will consist of the modification command, an input line, and an output line.The output line will have any changes emphasized. Also, each input/output line will bepreceded by a less-than or greater-than symbol to help distinguish them (the order willbe as described also), which is suggestive of redirection symbols in UNIX shells.Tutorial: Using regular expressionsPage 14

Presented by developerWorks, your source for great tutorialsibm.com/developerA literal-string modification exampleLet's take a look at a few modification examples that build on what we have alreadycovered.s/cat/dog/g wild dogs, bobcats, lions, and other wild cats wild dogs, bobdogs, lions, and other wild dogsThis one simply substitutes some literal text for some other literal text. Thesearch-and-replace capability of many tools can do this much, even without usingregular expressions.A pattern-match modification examples/cat dog/snake/g wild dogs, bobcats, lions, and other wild cats wild snakes, bobsnakes, lions, and other wild snakess/[a-z] i[a-z]*/nice/g wild dogs, bobcats, lions, and other wild cats nice dogs, bobcats, nice, and other nice catsMost of the time, if you are using regular expressions to modify a target text, you willwant to match more general patterns than just literal strings. Whatever is matched iswhat gets replaced (even if it is several different strings in the target).Tutorial: Using regular expressionsPage 15

Presented by developerWorks, your source for great tutorialss/([A-Z])([0-9]{2,4}) /\2:\1 /g A37 B4 C107 D54112 E1103 XXX 37:A B4 107:C D54112 1103:E XXXibm.com/developerModification usingbackreferencesIt is nice to be able to insert a fixed stringeverywhere a pattern occurs in a target text.But frankly, doing that is not very contextsensitive. A lot of times, we do not want just toinsert fixed strings, but rather to insertsomething that bears much more relation tothe matched patterns. Fortunately,backreferences come to our rescue here. Youcan use backreferences in thepattern-matches themselves, but it is evenmore useful to be able to use them inreplacement patterns. By using replacementbackreferences, you can pick and choosefrom the matched patterns to use just theparts you are interested in.To aid readability, subexpressions will begrouped with bare parentheses (as with Perl),rather than with escaped parentheses (as withsed).Another warning on mismatchingThis tutorial has already warned about the danger of matching too much with yourregular expression patterns. But the danger is so much more serious when you domodifications, that it is worth repeating. If you replace a pattern that matches a largerstring than you thought of when you composed the pattern, you have potentiallydeleted some important data from your target.It is always a good idea to try out your regular expressions on diverse target data thatis representative of your production usage. Make sure you are matching what you thinkyou are matching. A stray quantifier or wildcard can make a surprisingly wide variety oftexts match what you thought was a specific pattern. And sometimes you just have tostare at your pattern for a while, or find another set of eyes, to figure out what is reallygoing on even after you see what matches. Familiarity might breed contempt, but italso instills competence.Tutorial: Using regular expressionsPage 16

Presented by developerWorks, your source for great tutorialsibm.com/developerSection 4. Advanced regular expression extensionsAbout advanced featuresSome very useful enhancements are included in some regular expression tools. Theseenhancements often make the composition and maintenance of regular expressionconsiderably easier. But check with your own tool to see what is supported.The programming language Perl is probably the most sophisticated tool forregular-expression processing, which explains much of its popularity. The examplesillustrated will use Perl-ish code to explain concepts. Other programming languages,especially other scripting languages such as Python, have a similar range ofenhancements. But for purposes of illustration, Perl's syntax most closely mirrors theregular expression tools it builds on, such as ed, ex, grep, sed, and awk./th.*s/-- Match the words that start-- with 'th' and end with 's'.this line matches just rightthis # thus # thistle/th.*?s/-- Match the words that start-- with 'th' and end with 's'.this # thus # thistlethis line matches just right/th.*?s /-- Match the words that start-- with 'th' and end with 's'.-- (FINALLY!)Sthis # thus # thistlethis line matches just rightNon-greedy quantifiersEarlier in the tutorial, the problems of matching toomuch were discussed, and some workaroundswere suggested. Some regular expression toolsmake this easier by providing optional non-greedyquantifiers. These quantifier grab as little aspossible while still matching whatever comes nextin the pattern (instead of as much as possible).Non-greedy quantifiers have the same syntax asregular greedy ones, except with the quantifierfollowed by a question-mark. For example, anon-greedy pattern might look like:/A[A-Z]*?B/". In English, this means "match anA, followed by only as many capital letters as areneeded to find a B."One little thing to look out for is the fact that thepattern "/[A-Z]*?./" will always match zerocapital letters. If you use non-greedy quantifiers,watch out for matching too little, which is asymmetric danger.Tutorial: Using regular expressionsPage 17

Presented by developerWorks, your source for great tutorials/M.*[ise] /MAINE # Massachusetts # Colorado #mississippi # Missouri # Minnesota #/M.*[ise] /iMAINE # Massachusetts # Colorado #mississippi # Missouri # Minnesota #/M.*[ise] /gisMAINE # Massachusetts # Colorado #mississippi # Missouri # Minnesota #ibm.com/developerPattern-match modifiersWe already saw one pattern-matchmodifier in the modification examples:the global modifier. In fact, in manyregular expression tools, we should havebeen using the "g" modifier for all ourpattern matches. Without the "g", manytools will match only the first occurrenceof a pattern on a line in the target. Sothis is a useful modifier (but not one younecessarily want to use always). Let uslook at some others.As a little mnemonic, it is nice toremember the word "gismo" (it evenseems somehow appropriate). The mostfrequent modifiers are:·····g - Match globallyi - Case-insensitive matchs - Treat string as single linem - Treat string as multiple lineso - Only compile pattern onceThe o option is an implementationoptimization, and not really a regularexpression issue (but it helps themnemonic). The single-line optionallows the wildcard to match a newlinecharacter (it won't otherwise). Theultiple-line option causes " " and " " tomatch the begin and end of each line inthe target, not just the begin/end of thetarget as a whole (with sed or grep this isthe default). The insensitive optionignores differences between case ofletters.Tutorial: Using regular expressionsPage 18

Presented by developerWorks, your source for great tutorialss/([A-Z])(?:-[a-z]{3}-)([0-9]*)/\1\2/g A-xyz-37 # B:abcd:142 # C-wxy-66 A37 # B:abcd:42 # C66ibm.com/developerChanging backreferencebehaviorBackreferencing in replacement patternsis very powerful; but it is also easy to usemore than nine groups in a complexregular expression. Quite apart fromusing up the available backreferencenames, it is often more legible to refer tothe parts of a replacement pattern insequential order. To handle this issue,some regular expression tools allow"grouping without backreferencing."A group that should not also be treatedas a backreference has a question-markcolon at the beginning of the group, as in"(?:pattern)." In fact, you can usethis syntax even when yourbackreferences are in the search patternitself.Naming backreferencesimport retxt "A-xyz-37 # B:abcd:142 # C-wxy-66 # D-qrs-93"new re.sub("(?P pre [A-Z])(-[a-z]{3}-)(?P id [0-9]*)","\g pre \g id ", txt)print newA37 # B:abcd:42 # C66 # D93The language Python offers a particularly handy syntax for really complex patternbackreferences. Rather than just play with the numbering of matched groups, you cangive them a name.The syntax of using regular expressions in Python is a standard programminglanguage function/method style of call, rather than Perl- or sed-style slash delimiters.Check your own tool to see if it supports this facility.Tutorial: Using regular expressionsPage 19

Presented by developerWorks, your source for great tutorialss/([A-Z]-)(? [a-z]{3})([a-z0-9]* )/\2\1/g A-xyz37 # B-ab6142 # C-Wxy66 # D-qrs93 xyz37A- # B-ab6142 # C-Wxy66 # qrs93Ds/([A-Z]-)(! [a-z]{3})([a-z0-9]* )/\2\1/g A-xyz37 # B-ab6142 # C-Wxy66 # D-qrs93 A-xyz37 # ab6142B- # Wxy66C- # D-qrs93ibm.com/developerLookahead assertionsAnother trick of advanced regular expressiontools is "lookahead assertions." These aresimilar to regular grouped subexpression,except they do not actually grab what theymatch. There are two advantages to usinglookahead assertions. On the one hand, alookahead assertion can function in a similarway to a group that is not backreferenced;that is, you can match something withoutcounting it in backreferences. Moresignificantly, however, a lookahead assertioncan specify that the next chunk of a patternhas a certain form, but let a differentsubexpression actually grab it (usually forpurposes of backreferencing that othersubexpression).There are two kinds of lookahead assertions:positive and negative. As you would expect, apositive assertion specifies that somethingdoes come next, and a negative onespecifies that something does not come next.Emphasizing their connection withnon-backreferenced groups, the syntax forlookahead assertions is similar:(? pattern) for positive assertions, and(?!pattern) for negative assertions.Tutorial: Using regular expressionsPage 20

Presented by developerWorks, your source for great tutorialsibm.com/developerMaking regular expressions more readable/#[ "] ##http ftp gopher #:\/\/ #[ \n\r] #(? [\s\.,]) #/identify URLs within a text filedo not match URLs in IMG tags like: img src "http://this.com/pic.png" make sure we find a resource type.followed by colon-slash-slashnot space, newline, or tab in URLassert next: whitespace/period/commaThe URL for my site is: http://mysite.com/mydoc.html. Youmight also enjoy ftp://yoursite.com/index.html for a goodplace to download files.In the later examples we have started to see just how complicated regular expressionscan ge

Perl, Python, and TCL, build regular expressions into the heart of the language. Even most command-line shells, such as Bash or the Windows-console, allow restricted . the content of the tutorial, use the Feedback button. Contact David Mertz is a writer