|mnoGoSearch 3.2.43 reference manual: Full-featured search engine software|
|Prev||Chapter 8. Searching documents||Next|
mnoGoSearch users have an ability to customize search results (output of search.cgi or search.php). You may do it by providing a template search.htm file, which should be located in the /etc/ directory of mnoGoSearch's installation.
The template file is an usual HTML file, which is divided into sections. Keep in mind that you can just open the template file in your favorite browser and get the idea of how the search results will look like.
Note: Each template line should not exceed 1024 bytes.
Each section begins with <!--sectionname--> and ends with <!--/sectionname--> delimiters, which should reside on a separate line.
Each section consists of HTML formatted text with special meta symbols. Every meta symbol is replaced by it's corresponding string. You can think of meta symbols as of variables, which will have their appropriate values while displaying search results.
Format of variables is the following:
$(x) - plain value $&(x) - HTML-escaped value and search words highlighted. $%(x) - value escaped to be used in URLs $^(x) - search words highlighted.
The following section names are defined:
This section is included first on every page. You should begin this section with <HTML><HEAD> and so on. Also, this is a definitive place to provide a search form. There are several special meta symbols you may use in this section:
$(self) - argument for FORM ACTION tag $(q) - a search query $(cat) - current category value $(tag) - current tag value $(rN) - random number (here N is a number)
If you want to include some random banners on your pages, please use $(rN). You should also place string like "RN xxxx" in 'variables' section (see below), which will give you a range 0..xxxx for $(rN). You can use up as many random numbers as you want.
Example: $(r0), $(r1), $(r45) etc.
Simple top section should be like this:
<!--top--> <HTML> <HEAD> <TITLE>mnoGoSearch: $(q)</TITLE> </HEAD> <BODY> <FORM METHOD=GET ACTION="$(self)"> <INPUT TYPE="hidden" NAME="ul" VALUE=""> <INPUT TYPE="hidden" NAME="ps" VALUE="20"> Search for: <INPUT TYPE="text" NAME="q" SIZE=30 VALUE="$&(q)"> <INPUT TYPE="submit" VALUE="Search!"><BR> </FORM> <!--/top-->
Some variables are defined in FORM.
lang limits results by language. The value is a two-letter language code.
<SELECT NAME="lang"> <OPTION VALUE="en" SELECTED="$(lang)">English ..... </SELECT>
ul is the filter for URL. It allows you to limit results to particular site or section etc. For example, you can put the following in the form
<SELECT NAME="ul"> <OPTION VALUE="" SELECTED="$(ul)">Entire site <OPTION VALUE="/manual/" SELECTED="$(ul)">Manual <OPTION VALUE="/products/" SELECTED="$(ul)">Products <OPTION VALUE="/support/" SELECTED="$(ul)">Support </SELECT>
to limit your search to particular section.
The expression SELECTED="$(ul)" in the above example (and all the examples below) allows the selected option to be reproduced on the next pages. If the search front-end finds that expression it prints the string SELECTED only in the case the OPTION VALUE given is equal to that variable.
ps is the default page size (e.g. how many documents are displayed per page).
q is the query itself.
pn is ps*np. This variable is not used by mnoGoSearch, but may be useful for example in <!INCLUDE CONTENT="..."> directive if you want to include results produced by another search engine.
Following variables are concerning advanced search capabilities:
m can be used to choose the default search type if your query consists of more than one word. If m=any, the search will try to find at least one word. If case m=all, the search is more restrictive - all words should be in the document. If m=bool, the query string is considered as a boolean expression.
dt is a time limiting type. There are three types supported.
If 'dt' is 'back', that means you want to limit the results to recent pages, and you should specify this "recentness" in variable 'dp' in the form xxxA[yyyB[zzzC]]. Spaces are allowed between xxx and A and yyy and so on). xxx, yyy, zzz are numbers (can be negative!). A, B, C can be one of the following (the letters are the same as in strptime/strftime functions):
s - second M - minute h - hour d - day m - month y - year
4h30m - 4 hours and 30 minutes 1Y6M-15d - 1 year and six month minus 15 days 1h-60m+1s - 1 hour minus 60 minutes plus 1 second
If 'dt' is 'er' (which is short for newer/older), that means the search will be limited to pages newer or older than the date given. Variable dx is newer/older flag (1 means "newer" or "after", -1 means "older" or "before"). The date is separated into fields as follows:
'dm' - month (0 - January, 1 - February, .., 11 - December) 'dy' - year (four digits, for example 1999 or 2000) 'dd' - day (1...31)
If 'dt' is 'range', it means the search is done within a given range of dates. Variables 'db' and 'de' are used here and stand for beginning and end date. Each date is a string in the form dd/mm/yyyy, where dd is the day, mm is the month and yyyy is a four-digits year.
You can also use 'dstmp' variable in combination with 'dx' to specify date limit in seconds since 00:00:00 UTC, January 1, 1970. If 'dx' is -1 then search returns documents older than the given 'dstmp' value, otherwise, newer than the given 'dstmp' value.
This is the example of FORM part where you can choose between different time limiting options.
<!-- 'search with time limits' options --> <TR><TD> <TABLE CELLPADDING=2 CELLSPACING=0 BORDER=0> <CAPTION> Limit results to pages published within a specified period of time.<BR> <FONT SIZE=-1><I>(Please select only one option) </I></FONT> </CAPTION> <TR> <TD VALIGN=center><INPUT TYPE=radio NAME="dt" VALUE="back" CHECKED></TD> <TD><SELECT NAME="dp"> <OPTION VALUE="0" SELECTED="$(dp)">anytime <OPTION VALUE="10M" SELECTED="$(dp)">within the last ten minutes <OPTION VALUE="1h" SELECTED="$(dp)">within the last hour <OPTION VALUE="7d" SELECTED="$(dp)">within the last week <OPTION VALUE="14d" SELECTED="$(dp)">within the last 2 weeks <OPTION VALUE="1m" SELECTED="$(dp)">within the last month <OPTION VALUE="3m" SELECTED="$(dp)">within the last 3 months <OPTION VALUE="6m" SELECTED="$(dp)">within the last 6 months <OPTION VALUE="1y" SELECTED="$(dp)">within the last year <OPTION VALUE="2y" SELECTED="$(dp)">within the last 2 years </SELECT> </TD> </TR> <TR> <TD VALIGN=center><INPUT type=radio NAME="dt" VALUE="er"> </TD> <TD><SELECT NAME="dx"> <OPTION VALUE="1" SELECTED="$(dx)">After <OPTION VALUE="-1" SELECTED="$(dx)">Before </SELECT>
<SELECT NAME="dm"> <OPTION VALUE="0" SELECTED="$(dm)">January <OPTION VALUE="1" SELECTED="$(dm)">February <OPTION VALUE="2" SELECTED="$(dm)">March <OPTION VALUE="3" SELECTED="$(dm)">April <OPTION VALUE="4" SELECTED="$(dm)">May <OPTION VALUE="5" SELECTED="$(dm)">June <OPTION VALUE="6" SELECTED="$(dm)">July <OPTION VALUE="7" SELECTED="$(dm)">August <OPTION VALUE="8" SELECTED="$(dm)">September <OPTION VALUE="9" SELECTED="$(dm)">October <OPTION VALUE="10" SELECTED="$(dm)">November <OPTION VALUE="11" SELECTED="$(dm)">December </SELECT> <INPUT TYPE=text NAME="dd" VALUE="$(dd)" SIZE=2 maxlength=2> , <SELECT NAME="dy" > <OPTION VALUE="1990" SELECTED="$(dy)">1990 <OPTION VALUE="1991" SELECTED="$(dy)">1991 <OPTION VALUE="1992" SELECTED="$(dy)">1992 <OPTION VALUE="1993" SELECTED="$(dy)">1993 <OPTION VALUE="1994" SELECTED="$(dy)">1994 <OPTION VALUE="1995" SELECTED="$(dy)">1995 <OPTION VALUE="1996" SELECTED="$(dy)">1996 <OPTION VALUE="1997" SELECTED="$(dy)">1997 <OPTION VALUE="1998" SELECTED="$(dy)">1998 <OPTION VALUE="1999" SELECTED="$(dy)">1999 <OPTION VALUE="2000" SELECTED="$(dy)">2000 <OPTION VALUE="2001" SELECTED="$(dy)">2001 </SELECT> </TD> </TR> </TR> <TD VALIGN=center><INPUT TYPE=radio NAME="dt" VALUE="range"> </TD> <TD> Between <INPUT TYPE=text NAME="db" VALUE="$(db)" SIZE=11 MAXLENGTH=11> and <INPUT TYPE=text NAME="de" VALUE="$(de)" SIZE=11 MAXLENGTH=11> </TD> </TR> </TABLE> </TD></TR> <!-- end of stl options -->
This section is always included last in every page, so you should provide all closing tags which have their counterparts in the top section. It is not obligatory to place this section at the end of the template file, but doing so will help you to view your template as an ordinary html file in a browser to get the idea how it looks like.
Below is an example of bottom section:
<!--bottom--> <P> <HR> <DIV ALIGN=right> <A HREF="http://search.mnogo.ru/"> <IMG SRC="mnogosearch.gif" BORDER=0 ALT="[Powered by mnoGoSearch search engine software]"> </A> </BODY> </HTML> <!--/bottom-->
This section is included just before the search results. It's a good idea to provide some common search results. You can do so by using the next meta symbols:
$(W) - search results with information about the count of exact search word form found and the number of all search word forms found, delimited with "/" sign for every search word, e.g. if the search result is test: 25/73, it means that the number of word form "test" found is 25, and the number of all its forms ("test", "tests", "testing", etc.) found is 73.
Below is an example of 'restop' section:
<!--restop--> <TABLE BORDER=0 WIDTH=100%> <TR> <TD>Search<BR>results:</TD> <TD><small>$(WE)</small></TD> <TD><small>$(W)</small></TD> </TR> </TABLE> <HR> <CENTER> Displaying documents $(first)-$(last) of total <B>$(total)</B> found. </CENTER> <!--/restop-->
This section is used for displaying various information about every found document. The following meta symbols are used:
$(URL) Document URL
$(Title) Document Title
$(Score) Document Rating (as calculated by mnoGoSearch)
$(Body) Document text, the document excerpt if CachedCopy feature is used, or the first couple of lines otherwise, to give an idea of what the document is about.
$(Content-Type) Document Content-type (for example, text/html)
$(Last-Modified) Document Last-Modified date
$(Last-Modified-Timestamp) Document Last-Modified date as a number of seconds since 00:00:00 UTC, January 1, 1970.
$(Content-Length) Document Size (in bytes)
$(Content-Length-K) Document Size (in kilobytes)
$(Order) Document Number (in order of appearance)
$(meta.description) Document Description (from META DESCRIPTION tag)
$(meta.keywords) Document Keywords (from META KEYWORDS tag)
$(DY) Document category with links, i.e. /home/computers/software/www/
$(CL) Clone List (see the Section called CLONE section for details)
$(BrowserCharset) Charset used to display search results
$(PerSite) Total number of document from this site, if grouping by site is enabled, =0 otherwise.
Note: It is possible to specify the maximum number of characters returned by any of the above variables. E.g.
$(URL)may return a long URL that may break page table structure. To specify maximum number of characters in the displayed URLs, use $(URL:xx), where xx - maximum number of characters:
will return a URL, and if it is longer than 40 character, only 40 characters will be displayed including the ending points:
Here is an example of res section:
<!--res--> <DL><DT> <b>$(Order).</b><a href="$(URL)" TARGET="_blank"> <b>$(Title)</b></a> [<b>$(Score)</b>]<DD> $(Body)...<BR> <b>URL: </b> <A HREF="$(URL)" TARGET="_blank">$(URL)</A>($(Content-Type))<BR> $(Last-Modified), $(Content-Length) bytes<BR> <b>Description: </b>$(meta.description)<br> <b>Keywords: </b>$(meta.keywords)<br> </DL> <UL> $(CL) </UL> <!--/res-->
The contents of this section is included in result just instead of $(CL) meta symbol for every document clone found. This is used to provide all URLs with the same contents (like mirrors etc.). You can use the same meta symbols here as in the 'res' section. Of course, some information about the clone, like $(Body) or $(Title) will be the same so it is of little use to place it here.
Below is an example of 'clone' section.
<!--clone--> <li><A HREF="$(URL)" TARGET="_blank">$(URL)</A> $(Last-Modified) <!--/clone-->
This is included just after the last 'res' section. You usually give a navigation bar here to allow user go to next/previous results page.
This is an example of 'resbot' section:
<!--resbot--> <HR> <CENTER> Result pages: $(NL)$(NB)$(NR) </CENTER> <!--/resbot-->
A navigator is a complex thing and therefore it is constructed from the following template sections:
These are used to print the link to the previous page. If that page exists, <!--navleft--> is used, and on the first page there is no previous page, so <!--navleft_nop--> is used.
<!--navleft--> <TD><A HREF="$(NH)"><IMG...></A><BR> <A HREF="$(NH)">Prev</A></TD> <!--/navleft--> <!--navleft_nop--> <TD><IMG...><BR> <FONT COLOR=gray>Prev</FONT></TD> <!--/navleft_nop-->
This is used for printing the current page in the page list.
<!--navbar0--> <TD><IMG...><BR>$(NN)</TD> <!--navbar0-->
These are used to print the link to the next page. If that page exists, <!--navright--> is used, and on the last page <!--navright_nop--> is used instead.
<!--navright--> <TD> <A HREF="$(NH)"><IMG...></A> <BR> <A HREF="$(NH)">Next</A></TD> <!--/navright--> <!--navright_nop--> <TD> <IMG...> <BR> <FONT COLOR=gray>Next</FONT></TD> <!--/navright_nop-->
This is used to print the links to the other pages in the page list.
<!--navbar1--> <TD> <A HREF="$(HR)"> <IMG...></A><BR> <A HREF="$(NH)">$(NN)</A> </TD> <!--/navbar1-->
As its name implies, this section is displayed in case no documents are found. You usually give a little message saying so, and maybe some hints how to make search less restrictive.
Below is an example of notfound section:
<!--notfound--> <CENTER> Sorry, but search hasn't returned results.<P> <I>Try to compose less restrictive search query or check spelling.</I> </CENTER> <HR> <!--/notfound-->
This section is displayed in case the user gives an empty query. Below is an example of noquery section:
<!--noquery--> <CENTER> You haven't typed any word(s) to search for. </CENTER> <HR> <!--/noquery-->
Example of error section:
<!--error--> <CENTER> <FONT COLOR="#FF0000">An error occurred!</FONT> <P> <B>$(E)</B> </CENTER> <!--/error-->
There is also a special variables section, in which you can set up some values for search.
Variables section usually looks like this:
<!--variables DBAddr mysql://foo:bar@localhost/search/ DBMode single VarDir /usr/local/mnogosearch/var/ LocalCharset iso-8859-1 BrowserCharset iso-8859-1 TrackQuery no Cache no DetectClones yes HlBeg <font color="blue"><b><i> HlEnd </i></b> R1 100 R2 256 Synonym synonym/english.syn -->
BrowserCharset specifies which charset will be used to display the results. It may differ from LocalCharset. All template variables which correspond to data from search result (such as document title, description, text) will be converted from LocalCharset to BrowserCharset. The contents of the template itself are not converted, it must be in BrowserCharset.
Use "Cache yes/no" to enable/disable search results cache.
Use "Clone yes/no" to enable/disable clones detection.
There is an Alias command in search.htm, that is similar to the one in indexer.conf, but it affects only search results while having no effect on indexing. See the Section called Aliases in Chapter 3 for details.
The Synonym command is used to load a specified synonyms list. Synonyms file name is either absolute or relative to the /etc directory of mnoGoSearch's installation.
You can also use a special variables section in combination with operators supported by mnoGoSearch template language. You can set some variable values which will be used during search, for example search limits.
mnoGoSearch template language provides a set of operators making it possible to create templates with complicated logic.
mnoGoSearch supports several conditional operators in search templates: IF, IFCS, IFNOT, IFLIKE, ELSEIF (ELIF), ELSELIKE (ELIKE), IFLE, IFLT, IFGE, IFGT. Comparison is performed case insensitively for IF, IFNOT, IFLIKE, ELSEIF, ELSELIKE, and case sensitively for IFCS. Operators IFLE, IFLT, IFGE, IFGT perform numeric comparison: less-or-equal, less, greater-or-equal and greater comparison correspondingly.
<!IF NAME="Content-Type" Content="application/pdf"> <img src="pdf.png"> <!ELIF NAME="Content-Type" Content="text/plain"> <img src="text.png"> <!ENDIF> <!IFLIKE NAME="URL" CONTENT="http*"> This is an HTTP address <!ELIKE NAME="URL" CONTENT="ftp*"> This is an FTP address <!ELSE> This is an unknown address type> <!ENDIF>
It is possible to use nested conditional operators. This gives much power for search template construction. Please find some examples in the default template etc/search.htm-dist.
This operator is designed to set a variable value from a constant, from another variable, or even from a more complex expression.
<!SET NAME="a" Content="Some string"> <!SET NAME="b" Content="Another string"> <!SET NAME="c" Content="$(a)"> <!SET NAME="d" Content="a is '$(a)', b is '$(b)', c is '$(c)'">
Arithmetic operators INC and DEC respectively increment and decrement a variable value, as an integer.
<!SET NAME="a" Content="10">a is $(a) <!INC NAME="a">After increment, a is $(a) <!DEC NAME="a">After decrement, a is $(a)
Arithmetic operators ADD, SUB, MUL perform integer addition, substraction and multiplication of two variables specified in the NAME and CONTENT attributes and write the result back into the variable specified in the NAME attribute.
<!SET NAME="b" CONTENT="20"> <!SET NAME="a" CONTENT="10"> <!MUL NAME="a" CONTENT="$(b)">a*b=$(a) <!SET NAME="a" CONTENT="10"> <!ADD NAME="a" CONTENT="$(b)">a+b=$(a) <!SET NAME="a" CONTENT="10"> <!SUB NAME="a" CONTENT="$(b)">a-b=$(a)
Two loop operators, WHILE and WHILENOT, are available.
<!SET NAME="a" Content="10"> <!WHILENOT NAME="a" Content="0"> a is $(a) <!DEC NAME="a"> <!ENDWHILE>
URLDECODE - decodes URL-encoded string. Decodes any %## encoding in the given string. The decoded string is written into the variable specified in the NAME attribute.
<!URLDECODE NAME="decoded" Content="$(url)">URL is $(decoded)
HTMLENCODE - Converts special characters to HTML entities.
The convertion rules are:
'&' becomes '&'.
'<' becomes '<'.
'>' becomes '>'.
'"' becomes '"'.
<!HTMLENCODE NAME="encoded" Content="$(url)">URL is $(encoded)
EREG - Replaces a regular expression.
<!EREG NAME="a" CONTENT="string" MATCH="pattern" RESULT="replacement">
EREG scans the string given in CONTENT attribute for matches to the string given in MATCH and stores the matched text into variable given in NAME using RESULT value as a replacement pattern. Replacement string may contain substrings in the form $N, where N is a digit 0..9, which is replaced by the text matching the N'th parenthesized substring. $0 produces the entire contents of matching string. CONTENT string may have references to other variables.
<!SET NAME="str" CONTENT="http://www.host.com/path/file.ext"> <!EREG NAME="a" CONTENT="$(str)" MATCH="^([a-z]*)://([^/]*)/(.*)" RESULT="scheme=$1; host=$2 path+file=$3">$(a)
scheme=http host=www.host.com path=path/file.ext
You may use <!INCLUDE Content="http://hostname/path"> to include external URLs into search results.
WARNING: You can use <!INCLUDE> ONLY in the following template sections:
This is an example of include usage:
WARNING: Since the template file contains such info as password, it is highly recommended to give the file proper permissions to protect it from reading by anyone but you and search program. Otherwise your passwords may leak.