One of the strengths of Verity is its ability to perform full-text searches on documents of many formats. However, there are often times when you want to restrict a search to certain portions of a document, to improve search relevance. If a Verity collection contains some documents about baseball and other documents about caves, then a search for the word bat might retrieve several irrelevant results.
If the documents are structured documents, you can take advantage of the ability to search zones and fields. The following are some examples of structured documents:
Note: Although your word processor might open with what appears to be a blank page, the document has many regions such as title, subject, and author. Refer to your application's documentation or online help system for how to view a document's properties.
You can perform zone searches on markup language documents. The Verity zone filter includes built-in support for HTML and several file formats; for a list of supported file formats, see "Building a Search Interface". Verity searches XML files by treating the XML tags as zones. When you use the zone filter, the Verity engine builds zone information into the collection's full-word index. This index, enhanced with zone information, permits quick and efficient searches over zones. The zone filter can automatically define a zone, or you can define it yourself in the style.zon file. You can use zone searching to limit your search to a particular zone. This can produce more accurate, but not necessarily faster, search results than searching an entire file.
Note: The contents of a zone cannot be returned in the results list of an application.
The following examples perform zone searching on XML files. In a list of rock bands, you could have XML files with tags for the instruments and for comments. In the following XML file, the word Pete appears in a comment field:
<band.xml>
<Lead_Guitar>Dan</Lead_Guitar> <Rhythm_Guitar>Jake</Rhythm_Guitar> <Bass_Guitar>Mike</Bass_Guitar> <Drums>Chris</Drums> <COMMENT_A>Dan plays guitar, better than Pete.</COMMENT_A> <COMMENT_B>Jake plays rhythm guitar.</COMMENT_B> </band.xml>
The following CFML code shows a search for the word Pete:
<cfsearch name = "band_search"
collection="my_collection" type = "simple" criteria="Pete">
The above search for Pete returns this XML file because this search target is in the COMMENT_A field. In contrast, Pete is the lead guitarist in the following XML file:
<band.xml>
<Lead_Guitar>Pete</Lead_Guitar> <Rhythm_Guitar>Roger</Rhythm_Guitar> <Bass_Guitar>John</Bass_Guitar> <Drums>Kenny</Drums> <COMMENT_A>Who knows who's better than this band?</COMMENT_A> <COMMENT_B>Ticket prices correlated with decibels.</COMMENT_B> </band.xml>
To retrieve only the files in which Pete is the lead guitarist, perform a zone search using the IN operator according to the following syntax:
(query) <IN> (zone1, zone2, ...)
Note: As with other operators, IN might be uppercase or lowercase. Unlike AND, OR, or NOT, you must enclose IN within brackets.
Thus, the following explicit search retrieves files in which Pete is the lead guitarist:
(Pete) <in> Lead_Guitar
This is expressed in CFML as follows:
<cfsearch name = "band_search"
collection="my_collection" type = "explicit" criteria="(Pete) <in> Lead_Guitar">
To retrieve files in which Pete plays either lead or rhythm guitar, use the following explicit search:
(Pete) <in> (Lead_Guitar,Rhythm_Guitar)
This is expressed in CFML as follows:
<cfsearch name = "band_search"
collection="bbb" type = "explicit" criteria="(Pete) <in> (Lead_Guitar,Rhythm_Guitar)">
Fields are extracted from the document and stored in the collection for retrieval and searching, and can be returned on a results list. Zones, on the other hand, are merely the definitions of "regions" of a document for searching purposes, and are not physically extracted from the document in the same way that fields are extracted.
You must define a region of text as a zone before it can be a field. Therefore, it can be only a zone, or it can be both a field and a zone. Whether you define a region of text as a zone only or as both a field and a zone depends on your particular requirements.
A field must be defined in the style file, style.ufl, before you create the collection. To map zones to fields (to display field data), you must define and add these extra fields to style.ufl.
You can specify the values for the cfindex
attributes TITLE, KEY, URL, and CUSTOM as document fields for use with relational operators in the criteria
attribute. (The SCORE and SUMMARY attributes are automatically returned by a cfsearch
; these attributes are different for each record of a collection as the search criteria changes.) Text comparison operators can reference the following document fields:
cf_title
cf_key
cf_url
cf_custom1
cf_custom2
To explore how to use document fields to refine a search, consider the following database table, named Calls. This table has four fields and three records, as the following table shows:
A Verity search for the word certain returns three records. However, you can use the document fields to restrict your search; for example, a search to retrieve HomeSite problems with the word certain in the problem description.
These are the requirements to run this procedure:
The following table shows the relationship between the database column and cfindex
attribute:
You begin by selecting all data in a query:
<cfquery name = "Calls" datasource = "MyDSN">
Select * from Calls
</cfquery>
The following code shows the cfindex
tag for indexing the collection (the type
attribute is set to custom for tablular data):
<cfindex
query = "Calls"
collection = "training"
action = "UPDATE"
type = "CUSTOM"
title = "Short_Description" key = "Call_ID" body = "Problem_Description" custom1 = "Product">
To perform the refined search for HomeSite problems with the word certain in the problem description, the cfsearch
tag uses the CONTAINS operator in its criteria
attribute:
<cfsearch
collection = "training" name = "search_calls" criteria = "certain and CF_CUSTOM1 <CONTAINS> HomeSite">
The following code displays the results of the refined search:
<table border="1" cellspacing="5">
<tr> <th align="LEFT">KEY</th> <th align="LEFT">TITLE</th> <th align="LEFT">CUSTOM1</th> </tr> <cfoutput query = "search_calls"> <tr> <td>#KEY#</td> <td>#TITLE#</td> <td>#CUSTOM1#</td> </tr> </cfoutput> </table>
In a browser, the follwing retrieved results appear: