|
|
Boolean and Proximity
Operators
Wildcards
Free-Text Queries
Vector Space Queries
Property Value Queries
Query Examples
List of Property Names
You can search for any word or phrase on a Web site by just typing the
word or phrase into a query form and clicking the button to execute the
query (for example, the Execute Query button on the sample query form).
Searches produce a list of files that contain the word or phrase no matter
where they appear in the text.
This list gives the rules for formulating queries:
-
Multiple consecutive words are treated as
a phrase; they must appear in the same order within a matching document.
-
Queries are case-insensitive, so you can type
your query in uppercase or lowercase.
-
You can search for any word except for those
in the exception list (for English, this includes a, an,
and, as, and other common words), which are ignored
during a search.
-
Words in the exception list are treated as
placeholders in phrase and proximity queries. For example, if you
searched for "Word for Windows", the results could give you "Word
for Windows" and "Word and Windows", because for is a noise
word and appears in the exception list.
-
Punctuation marks such as the period (.),
colon (:), semicolon (;), and comma (,) are ignored during a search.
-
To use specially treated characters such as
&, |, ^, #, @, $, (, ), in a query, enclose your query in quotation
marks (").
-
To search for a word or phrase containing
quotation marks, enclose the entire phrase in quotation marks and
then double the quotation marks around the word or words you want
to surround with quotes. For example, "World-Wide Web or ""Web"""
searches for World-Wide Web or "Web".
-
You can use Boolean operators
(AND, OR, and NOT) and the proximity
operator (NEAR) to specify additional search information.
-
The wildcard character
(*) can match words with a given prefix. The query esc* matches the
terms "ESC," "escape," and so on.
-
Free-text queries
can be specified without regard to query syntax.
-
Vector space queries
can be specified.
-
ActivexT (OLE) and file attribute property
value queries can be issued.
Boolean and Proximity Operators
Boolean and proximity operators can create a more precise query.
| To Search For |
Example |
Results |
| Both terms in the same page |
access and basic -Or-
access & basic |
Pages with both the words "access" and "basic" |
| Either term in a page |
cgi or isapi -Or-
cgi | isapi |
Pages with the words "cgi" or "isapi" |
| The first term without the second term |
access and not basic -Or-
access & ! basic |
Pages with the word "access" but not "basic" |
| Pages not matching a property value |
not @size = 100 -Or-
! @size = 100 |
Pages that are not 100 bytes |
| Both terms in the same page, close together |
excel near project -Or-
excel ~ project |
Pages with the word "excel" near the word
"project" |
Hints:
-
You can add parentheses to nest expressions
within a query. The expressions in parentheses are evaluated before
the rest of the query.
-
Use double quotes (") to indicate that a
Boolean or NEAR operator keyword should be ignored in your
query. For example, "Abbot and Costello" will match pages with the
phrase, not pages that match the Boolean expression. In addition to
being an operator, the word and is a noise word in English.
-
The NEAR operator is similar to the
AND operator in that NEAR returns a match if both words
being searched for are in the same page. However, the NEAR
operator differs from AND because the rank assigned by NEAR
depends on the proximity of words. That is, the rank of a page with
the searched-for words closer together is greater than or equal to
the rank of a page where the words are farther apart. If the searched-for
words are more than 50 words apart, they are not considered near enough,
and the page is assigned a rank of zero.
-
The NOT operator can be used only
after an AND operator in content queries; it can be used only
to exclude pages that match a previous content restriction. For property
value queries, the NOT operator can be used apart from the
AND operator.
-
The AND operator has a higher precedence
than OR. For example, the first three queries are equal, but
the fourth is not:
a AND b OR c
c OR a AND b
c OR (a AND b)
(c OR a) AND b
Note The symbols (&, |, !, ~) and the English
keywords AND, OR, NOT, and NEAR work the same
way in all languages supported by Index Server. Localized keywords are
also available when the browser locale is set to one of the following
six languages:
| Language |
Keywords |
| German |
UND, ODER, NICHT, NAH |
| French |
ET, OU, SANS, PRES |
| Spanish |
E, O, NO, CERCA |
| Dutch |
EN, OF, NIET, NABIJ |
| Swedish |
OCH, ELLER, INTE, NÄRA |
| Italian |
E, O, NO, VICINO |
Wildcards
Wildcard operators help you find pages containing
words similar to a given word.
| To
Search For |
Example |
Results |
| Words with the same
prefix |
comput* |
Pages with words
that have the prefix "comput," such as "computer," "computing," and
so on |
| Words based on the
same stem word |
fly** |
Pages with words
based on the same stem as "fly," such as "flying," "flown," "flew,"
and so on |
Free-Text Queries
The query engine finds pages that best
match the words and phrases in a free-text query. This is done by automatically
finding pages that match the meaning, not the exact wording, of the query.
Boolean, proximity, and wildcard operators are ignored within a free-text
query. Free-text queries are prefixed with $contents.
| To
Search For |
Example |
Results |
| Files that
match free-text |
$contents how
do I print in Microsoft Excel? |
Pages that
mention printing and Microsoft Excel. |
Vector Space Queries
The query engine supports vector space queries. Vector queries return
pages that match a list of words and phrases. The rank of each page indicates
how well the page matched the query.
| To Search
For |
Example |
Results |
| Pages that contain specific words |
light, bulb |
Files with words that best match the words
being searched for |
| Pages that contain weighted prefixes, words,
and phrases |
invent*, light[50], bulb[10], "light bulb"[400] |
Files that contain words prefixed by "invent,"
the words "light," "bulb," and the phrase "light bulb" (the terms
are weighted) |
- Components in vector queries are separated
by commas.
- Components in vector queries can be weighted
by using the [weight] syntax.
- Pages returned by vector queries do not necessarily
match every term in the query.
- Vector queries work best when the results
are sorted by rank.
Property Value Queries
Property value queries can be used to find files that have property values
that match a given criteria. The properties over which you can query include
basic file information like file name and file size, and ActiveX properties
including the document summary (abstract) that is stored in files created
by ActiveX-aware applications.
There are two types of property queries:
-
Relational
property queries consist of an "at" character (@), a property
name, a relational operator,
and a property value. For example, to
find all of the files larger than one million bytes, issue the query
@size > 1000000.
-
Regular expression property queries
consist of a number sign (#), a property
name, and a regular expression
for the property value. For example,
to find to find all of the video (.avi) files, issue the query #filename
*.avi. Regular expressions will never match the special properties
contents (#contents) and all (#all). There may also be additional
format-specific properties that cannot be matched (for example, #HtmlHRef
for HTML pages).
Property Names
Property names are preceded by either the
"at" (@) or number sign (#) character. Use @ for relational queries, and
# for regular expression queries.
If no property name is specified, @contents
is assumed.
Properties available for all files include:
| Property
Name |
Description |
| All |
Matches any property |
| Contents |
Words and phrases
in the file and textual properties |
| Filename |
Name of the file |
| Size |
File size |
| Write |
Last time the
file was modified |
ActiveX property values can also be used
in queries. Web sites with files created by most ActiveX-aware applications
can be queried for these properties:
| Property
Name |
Description |
| DocTitle |
Title of the
document |
| DocSubject |
Subject of the
document |
| DocAuthor |
The document's
author |
| DocKeywords |
Keywords for
the document |
| DocComments |
Comments about
the document |
For a complete list of property names, see the List
of Property Names later on this page.
Relational Operators
Relational operators are used in relational property queries.
| To Search
For |
Example |
Results |
| Property values in relation to a fixed value |
@size < 100
@size <= 100
@size = 100
@size != 100
@size >= 100
@size > 100 |
Files whose size matches the query |
| Property values with all of a set of bits
on |
@attrib ^a 0x820 |
Compressed files with the archive bit on |
| Property values with some of a set of bits
on |
@attrib ^s 0x20 |
Files with the archive bit on |
Property Values
| To Search
For |
Example |
Results |
| A specific value |
@DocAuthor = Bill Barnes |
Files authored by "Bill Barnes" |
| Values beginning with a prefix |
#DocAuthor George* |
Files whose author property begins with "George" |
| Files with any of a set of extensions |
#filename *.|(exe|,dll|,sys|) |
Files with .exe, .dll, or .sys extensions |
| Files modified after a certain date |
@write > 96/2/14 10:00:00 |
Files modified after February 14, 1996 at
10:00 GMT |
| Files modified after a relative date |
@write > -1d2h |
Files modified in the last 26 hours |
| Vectors matching a vector |
@vectorprop = { 10, 15, 20 } |
ActiveX documents with a vectorprop value
of { 10, 15, 20 } |
| Vectors where each value matches a criteria |
@vectorprop >^a 15 |
ActiveX documents with a vectorprop value
in which all values in the vector are greater than 15 |
| Vectors where at least one value matches
a criteria |
@vectorprop =^s 15 |
ActiveX documents with a vectorprop value
in which at least one value is 15 |
-
Be sure to use the pound (#) character before
the property name when using a regular expression in a property value,
and an "at" (@) character otherwise. The equal (=) relational operator
is assumed for regular-expression queries.
-
File name (#filename) is the only property
that supports regular expressions with wildcards to the left
of text. This is the only case where wildcards to the left are efficient..
-
Date and time values are of the form yyyy/mm/dd
hh:mm:ss. The first two characters of the year and the entire
time can be omitted. Dates and times are in Greenwich Mean Time (GMT).
-
Dates and times relative to the current time
can be expressed with a minus (-) character followed by zero or by
more integer unit and time unit pairs. Time units are expressed as:
(y) for years, (m) for months, (w) for weeks, (d) for days, (h) for
hours, (n) for minutes, and (s) for seconds.
-
Currency values are of the form x.y,
where x is the whole value amount and y is the fractional
amount. There is no assumption about units.
-
Boolean values are (t) or (true) for TRUE
and (f) or (false) for FALSE.
-
Vectors (VT_VECTOR) are expressed as an opening
brace ({), followed by a comma-separated list of values, then a closing
brace (}).
-
Single-value expressions that are compared
against vectors are expressed as a relational
operator, then a (^a) for all of or a (^s) for some
of.
-
Numeric values can be in decimal or hexadecimal
(preceded by 0x).
-
The contents property does not support
relational operators. If a relational operator is specified, no results
will be found. For example, @contents Microsoft will find documents
containing Microsoft, but @contents=Microsoft will find none.
Regular Expressions
Regular expressions in property queries are defined as follows:
-
Any character except asterisk (*), period
(.), question mark (?), and vertical bar (|) defaults to matching
just itself.
-
Regular expressions can be enclosed in matching
quotes ("), and must be enclosed in quotes if they contain a space
( ) or closing parenthesis ()).
-
The characters *, ., and ? behave as they
behave in Windows; they match any number of characters, match (.)
or end of string, and match any one character, respectively.
- The character | is an escape character. After
|, the following characters have special meaning:
- ( opens a group. Must be followed by a
matching ).
- ) closes a group. Must be preceded by a
matching (.
- [ opens a character class. Must be followed
by a matching (un-escaped) ].
- { opens a counted match. Must be followed
by a matching }.
- } closes a counted match. Must be preceded
by a matching {.
- , separates OR clauses.
- * matches zero or more occurrences of the
preceding expression.
- ? matches zero or one occurrences of the
preceding expression.
- + matches one or more occurrences of the
preceding expression.
- Anything else, including |, matches itself.
- Between square brackets ([]) the following
characters have special meaning:
- ^ matches everything but following classes.
Must be the first character.
- ] matches ]. May only be preceded by ^,
otherwise it closes the class.
- - range operator. Preceded and followed
by normal characters.
- Anything else matches itself (or begins
or ends a range at itself).
- Between curly braces ({}) the following syntax
applies:
- |{m|} matches exactly m occurrences
of the preceding expression. (0 < m < 256).
- |{m,|} matches at least m occurrences
of the preceding expression. (1 < m < 256).
- |{m,n|} matches between m and n
occurrences of the preceding expression, inclusive. (0 < m <
256, 0 < n < 256).
-
To match *, ., and ?, enclose them in brackets
(for example, |[*]sample will match "*sample").
Query Examples
| Example |
Results |
| @size > 1000000 |
Pages larger than one million bytes |
| @write > 95/12/23 |
Pages modified after the date |
| Apple tree |
Pages with the phrase "apple tree" |
| "apple tree" |
Same as above |
| @contents apple tree |
Same as above |
| Microsoft and @size > 1000000 |
Pages with the word "Microsoft" that are
larger than one million bytes |
| "microsoft and @size > 1000000" |
Pages with the phrase specified (not the
same as above) |
| #filename *.avi |
Video files (the # prefix is used because
the query contains a regular expression) |
| @attrib ^s 32 |
Pages with the archive attribute bit on |
| @docauthor = John Smith |
Pages with the given author |
| $contents why is the sky blue? |
Pages that match the query |
| @size < 100 & #filename *.gif |
Graphics Interchange Format (GIF) files less
than 100 bytes in size |
List of Property Names
These properties are always available for queries. Additional properties
may also be available depending on the configuration of the Web server.
| Friendly
Name |
Datatype |
Property |
| Access |
DBTYPE_DATE |
Last time file was accessed. |
| All |
(not applicable) |
Searches every property for a string. Can
be queried but not retrieved. |
| AllocSize |
DBTYPE_I8 |
Size of disk allocation for file. |
| Attrib |
DBTYPE_UI4 |
File attributes. Documented in Win32 SDK. |
| ClassId |
DBTYPE_GUID |
Class ID of object, for example, WordPerfect,
Word, and so on. |
| Change |
DBTYPE_DATE |
Last time file was changed (includes changes
to attributes). |
| Characterization |
DBTYPE_WSTR | DBTYPE_BYREF |
Characterization, or abstract, of document.
Computed by Index Server. |
| Contents |
(not applicable) |
Main contents of file. Can be queried
but not retrieved. |
| Create |
DBTYPE_DATE |
Time file was created. |
| DocAppName |
DBTYPE_STR | DBTYPE_BYREF |
Name of application that created the file. |
| DocAuthor |
DBTYPE_STR | DBTYPE_BYREF |
Author of document. |
| DocCategory |
DBTYPE_STR |
Type of document such as a memo, schedule, or whitepaper. |
| DocCharCount |
DBTYPE_I4 |
Number of characters in document. |
| DocComments |
DBTYPE_STR | DBTYPE_BYREF |
Comments about document. |
| DocCompany |
DBTYPE_STR |
Name of the company for which the document was written. |
| DocCreatedTm |
DBTYPE_DATE |
Time document was created. |
| DocEditTime |
DBTYPE_DATE |
Total time spent editing document. |
| DocKeywords |
DBTYPE_STR | DBTYPE_BYREF |
Document keywords. |
| DocLastAuthor |
DBTYPE_STR | DBTYPE_BYREF |
Most recent user who edited document. |
| DocLastPrinted |
DBTYPE_DATE |
Time document was last printed. |
| DocLastSavedTm |
DBTYPE_DATE |
Time document was last saved. |
| DocManager |
DBTYPE_STR |
Name of the manager of the document's author. |
| DocPageCount |
DBTYPE_I4 |
Number of pages in document. |
| DocRevNumber |
DBTYPE_STR | DBTYPE_BYREF |
Current version number of document. |
| DocSubject |
DBTYPE_STR | DBTYPE_BYREF |
Subject of document. |
| DocTemplate |
DBTYPE_STR | DBTYPE_BYREF |
Name of template for document. |
| DocTitle |
DBTYPE_STR | DBTYPE_BYREF |
Title of document. |
| DocWordCount |
DBTYPE_I4 |
Number of words in document. |
| FileIndex |
DBTYPE_I8 |
Unique ID of file. |
| FileName |
DBTYPE_WSTR | DBTYPE_BYREF |
Name of file. |
| HitCount |
DBTYPE_I4 |
Number of hits (words matching query) in
file. |
| HtmlHRef |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of HTML HREF. Can be queried but
not retrieved. |
| HtmlHeading1 |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of HTML document in style H1. Can
be queried but not retrieved. |
| HtmlHeading2 |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of HTML document in style H2. Can
be queried but not retrieved. |
| HtmlHeading3 |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of HTML document in style H3. Can
be queried but not retrieved. |
| HtmlHeading4 |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of HTML document in style H4. Can
be queried but not retrieved. |
| HtmlHeading5 |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of HTML document in style H5. Can
be queried but not retrieved. |
| HtmlHeading6 |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of HTML document in style H6. Can
be queried but not retrieved. |
| Path |
DBTYPE_WSTR | DBTYPE_BYREF |
Full physical path to file, including file
name. |
| Rank |
DBTYPE_I4 |
Rank of row. Ranges from 0 to 1000. Larger
numbers indicate better matches. |
| RankVector |
DBTYPE_I4 | DBTYPE_VECTOR |
Ranks of individual components of a vector
query. |
| SecurityChange |
DBTYPE_DATE |
Last time security was changed on file. |
| ShortFileName |
DBTYPE_WSTR | DBTYPE_BYREF |
Short (8.3) file name. |
| Size |
DBTYPE_I8 |
Size of file, in bytes. |
| USN |
DBTYPE_I8 |
Update Sequence Number. NTFS drives only. |
| VPath |
DBTYPE_WSTR | DBTYPE_BYREF |
Full virtual path to file, including file
name. If more than one possible path, then the best match for the
specific query is chosen. |
| WorkId |
DBTYPE_I4 |
Internal ID for file. Used within Index Server. |
| Write |
DBTYPE_DATE |
Last time file was written. |
|