The REFind
and REFindNoCase
functions return the location in the search string of the first match of the regular expression. Even though the search string in the next example contains two matches of the regular expression, the function only returns the index of the first:
<cfset IndexOfOccurrence=REFind(" BIG ", "Some BIG BIG string")>
<!--- The value of IndexOfOccurrence is 5 --->
To find all instances of the regular expression, you must call the REFind
and REFindNoCase
functions multiple times.
Both the REFind
and REFindNoCase
functions take an optional third parameter that specifies the starting index in the search string for the search. By default, the starting location is index 1, the beginning of the string.
To find the second instance of the regular expression in this example, you call REFind
with a starting index of 8:
<cfset IndexOfOccurrence=REFind(" BIG ", "Some BIG BIG string", 8)>
<!--- The value of IndexOfOccurrence is 9 --->
In this case, the function returns an index of 9, the starting index of the second string " BIG ".
To find the second occurrence of the string, you must know that the first string occurred at index 5 and that the string's length was 5. However, REFind
only returns starting index of the string, not its length. So, you either must know the length of the matched string to call REFind
the second time, or you must use subexpressions in the regular expression.
The REFind
and REFindNoCase
functions let you get information about matched subexpressions. If you set these functions' fourth parameter, ReturnSubExpression
, to True, the functions return a CFML structure with two arrays, pos
and len
, containing the positions and lengths of text strings that match the subexpressions of a regular expression, as the following example shows:
<cfset sLenPos=REFind(" BIG ", "Some BIG BIG string", 1, "True")>
<cfoutput> <cfdump var="#sLenPos#"> </cfoutput><br>
The following figure shows the output of the cfdump
tag:
Element one of the pos
array contains the starting index in the search string of the string that matched the regular expression. Element one of the len
array contains length of the matched string. For this example, the index of the first " BIG " string is 5 and its length is also 5. If there are no occurrences of the regular expression, the pos
and len
arrays each contain one element with a value of 0.
You can use the returned information with other string functions, such as mid
. The following example returns that part of the search string matching the regular expression:
<cfset myString="Some BIG BIG string">
<cfset sLenPos=REFind(" BIG ", myString, 1, "True")> <cfoutput> #mid(myString, sLenPos.pos[1], sLenPos.len[1])# </cfoutput>
Each additional element in the pos
array contains the position of the first match of each subexpression in the search string. Each additional element in len
contains the length of the subexpression's match.
In the previous example, the regular expression " BIG " contained no subexpressions. Therefore, each array in the structure returned by REFind
contains a single element.
After executing the previous example, you can call REFind
a second time to find the second occurrence of the regular expression. This time, you use the information returned by the first call to make the second:
<cfset newstart = sLenPos.pos[1] + sLenPos.len[1] - 1>
<!--- subtract 1 because you need to start at the first space ---> <cfset sLenPos2=REFind(" BIG ", "Some BIG BIG string", newstart, "True")> <cfoutput> <cfdump var="#sLenPos2#"> </cfoutput><br>
The following figure shows the output of the cfdump
tag:
If you include subexpressions in your regular expression, each element of pos
and len
after element one contains the position and length of the first occurrence of each subexpression in the search string.
In the following example, the expression [A-Za-z]+ is a subexpression of a regular expression. The first match for the expression ([A-Za-z]+)[ ]+, is "is is".
<cfset
sLenPos=REFind("([A-Za-z]+)[ ]+\1", "There is is a cat in in the kitchen", 1, "True")>
<cfoutput> <cfdump var="#sLenPos#"> </cfoutput><br>
The following figure shows the output of the cfdump
tag:
The entries sLenPos.pos[1] and sLenPos.len[1] contain information about the match of the entire regular expression. The array elements sLenPos.pos[2] and sLenPos.len[2] contain information about the first subexpression ("is"). Because REFind
returns information on the first regular expression match only, the sLenPos structure does not contain information about the second match to the regular expression, "in in".
The regular expression in the following example uses two subexpressions. Therefore, each array in the output structure contains the position and length of the first match of the entire regular expression, the first match of the first subexpression, and the first match of the second subexpression.
<cfset sString = "apples and pears, apples and pears, apples and pears">
<cfset regex = "(apples) and (pears)"> <cfset sLenPos = REFind(regex, sString, 1, "True")> <cfoutput> <cfdump var="#sLenPos#"> </cfoutput><br><br>
The following figure shows the output of the cfdump
tag:
For a full discussion of subexpression usage, see the sections on REFind and REFindNoCase in the ColdFusion Functions chapter in CFML Reference.
The regular expression quantifiers ?, *, +, {min,} and {min,max} specify a minimum and/or maximum number of instances of a given expression to match. By default, ColdFusion locates the greatest number characters in the search string that match the regular expression. This behavior is called maximal matching.
For example, you use the regular expression "<b>(.*)</b>" to search the string "<b>one</b> <b>two</b>". The regular expression "<b>(.*)</b>", matches both of the following:
By default, ColdFusion always tries to match the regular expression to the largest string in the search string. The following code shows the results of this example:
<cfset sLenPos=REFind("<b>(.*)</b>", "<b>one</b> <b>two</b>", 1, "True")>
<cfoutput> <cfdump var="#sLenPos#"> </cfoutput><br>
The following figure shows the output of the cfdump
tag:
Thus, the starting position of the string is 1 and its length is 21, which corresponds to the largest of the two possible matches.
However, sometimes you might want to override this default behavior to find the shortest string that matches the regular expression. ColdFusion includes minimal-matching quantifiers that let you specify to match on the smallest string. The following table describes these expressions:
If you modify the previous example to use the minimal-matching syntax, the code is as follows:
<cfset sLenPos=REFind("<b>(.*?)</b>", "<b>one</b> <b>two</b>", 1, "True")>
<cfoutput> <cfdump var="#sLenPos#"> </cfoutput><br>
The following figure shows the output of the cfdump
tag:
Thus, the length of the string found by the regular expression is 10, corresponding to the string "<b>one</b>".