8.3. How to write search result templates

DataparkSearch users have an ability to customize search results (output of search.cgi ). You may do it by providing template file search.htm, which should be located in /etc/ directory of DataparkSearch installation.

Template file is usual HTML file, which is divided into sections. Keep in mind that you can just open template file in your favorite browser and get the idea of how the search results will look like.

Note: Each templates line should not exceed 1024 bytes.

Each section begins with <!--sectionname--> and ends with <!--/sectionname--> delimiters, which should reside on a separate line.

Each section consists of HTML formatted text with special meta symbols. Every meta symbol is replaced by it's corresponding string. You can think of meta symbols as of variables, which will have their appropriate values while displaying search results.

Format of variables is the following:

$(x) - plain value
$&(x) - HTML-escaped value and search words highlighted.
$*(x) - HTML-escaped value.
$%(x) - value escaped to be used in URLs
$^(x) - search words highlighted.
$(x:128) - value truncated to the first 128 bytes, if longer.
$(x:UTF-8) - value written in UTF-8 charset. You may specify any charset supported.
$(x:128:right) - value truncated to the last 128 bytes, if longer.
$(x:cite:160) - make value citation on search keywords, no longer than 160 characters (approx.).
$(url.host:idnd) - convert hostname from punycode into the BrowserCharset encoding.
$(x:json) - JSON encoding for characters.

8.3.1. Template sections

The following section names are defined: TOP>, BOTTOM>, RESTOP>, RES>, BETWEENRES>, CLONE>, RESBOT>, navleft, navleft_nop>, navbar0>, navright, navright_nop>, navbar1>, notfound>, noquery>, error>, variables>.

8.3.1.1. TOP section

This section is included first on every page. You should begin this section with <HTML><HEAD> and so on. Also, this is a definitive place to provide a search form. There are two special meta symbols you may use in this section:

$(self)  - argument for FORM ACTION tag
$(q)     - a search query
$(cat)    - current category value
$(tag)      - current tag value
$(rN) - random number (here N is a number)

If you want to include some random banners on your pages, please use $(rN). You should also place string like "RN xxxx" in 'variables' section (see below), which will give you a range 0..xxxx for $(rN). You can use as many up random numbers as you want.

Example: $(r0), $(r1), $(r45) etc.

Simple top section should be like this:

<!--top-->
<HTML>
<HEAD>
 <TITLE>Search Query: $(q)</TITLE>
</HEAD>
<BODY>

<FORM METHOD=GET ACTION="$(self)">
 <INPUT TYPE="hidden" NAME="ul" VALUE="">
 <INPUT TYPE="hidden" NAME="ps" VALUE="20">
 Search for: <INPUT TYPE="text" NAME="q" SIZE=30 
 VALUE="$&(q)">
 <INPUT TYPE="submit" VALUE="Search!"><BR>
</FORM>
<!--/top-->

There are some variables defined in FORM.

lang limit results by language. Value is a two-letter language code.

<SELECT NAME="lang">
<OPTION VALUE="en" SELECTED="$(lang)">English
.....
</SELECT>
    

ul is the filter for URL. It allows you to limit results to particular site or section etc. For example, you can put the following in the form

Search through:

<SELECT NAME="ul">
<OPTION VALUE=""            SELECTED="$(ul)">Entire site
<OPTION VALUE="/manual/"    SELECTED="$(ul)">Manual
<OPTION VALUE="/products/"  SELECTED="$(ul)">Products
<OPTION VALUE="/support/"   SELECTED="$(ul)">Support
</SELECT>

to limit your search to particular section.

The expression SELECTED="$(ul)" in example above (and all the examples below) allows the selected option to be reproduced on next pages. If search front-end finds that expression it prints the string SELECTED only in the case OPTION VALUE given is equal to that variable.

ps is default page size (e.g. how many documents to display per page).

q is the query itself.

pn is ps*np. This variable is not used by DataparkSearch, but may be useful for example in <!INCLUDE CONTENT="..."> directive if you want to include result produced by another search engine.

Following variables are concerning advanced search capabilities:

  • m can be used to choose default search type if your query consists of more than one word. In case m=any, the search will try to find at least one word, in case m=all, the search is more restrictive - all words should be in the document. If m=bool query string is considered as a boolean expression.

  • dt is time limiting type. There are three types supported.

    If 'dt' is 'back', that means you want to limit result to recent pages, and you should specify this "recentness" in variable 'dp' in the form xxxA[yyyB[zzzC]]. Spaces are allowed between xxx and A and yyy and so on). xxx, yyy, zzz are numbers (can be negative!) A, B, C can be one of the following (the letters are the same as in strptime/strftime functions):

     s - second
     M - minute
     h - hour
     d - day
     m - month
     y - year

    Examples:

      4h30m 	  - 2 hours and 30 minutes
      1Y6M-15d  - 1 year and six month minus 15 days
      1h-60m+1s - 1 hour minus 60 minutes plus 1 second

    If 'dt' is 'er' (which is short for newer/older), that means the search will be limited to pages newer or older than date given. Variable dx is newer/older flag (1 means "newer" or "after", -1 means "older" or "before"). Date is separated into fields as follows:

      'dm' - month (0 - January, 1 - February, .., 11 - December)
      'dy' - year (four digits, for example 1999 or 2000)
      'dd' - day (1...31)

    If 'dt' is 'range', that means search within given range of dates. Variables 'db' and 'de' are used here and stands for beginning and end date. Each date is string in the form dd/mm/yyyy, there dd is day, mm is month and yyyy is four-digits year.

    This is the example of FORM part where you can choose between different time limiting options.

    <!-- 'search with time limits' options -->
    <TR><TD>
    <TABLE CELLPADDING=2 CELLSPACING=0 BORDER=0>
    <CAPTION> 
    Limit results to pages published within
    a specified period of time.<BR>
    <FONT SIZE=-1><I>(Please select only one option)
    </I></FONT>
    </CAPTION>
    <TR>
    <TD VALIGN=center><INPUT TYPE=radio NAME="dt" 
    VALUE="back" CHECKED></TD>
    <TD><SELECT NAME="dp">
    <OPTION VALUE="0" SELECTED="$(dp)">anytime
    <OPTION VALUE="10M" SELECTED="$(dp)">in the last ten minutes
    <OPTION VALUE="1h" SELECTED="$(dp)">in the last hour
    <OPTION VALUE="7d" SELECTED="$(dp)">in the last week
    <OPTION VALUE="14d" SELECTED="$(dp)">in the last 2 weeks
    <OPTION VALUE="1m" SELECTED="$(dp)">in the last month
    <OPTION VALUE="3m" SELECTED="$(dp)">in the last 3 months
    <OPTION VALUE="6m" SELECTED="$(dp)">in the last 6 months
    <OPTION VALUE="1y" SELECTED="$(dp)">in the last year
    <OPTION VALUE="2y" SELECTED="$(dp)">in the last 2 years
    </SELECT>
    </TD>
    </TR>
    <TR>
    <TD VALIGN=center><INPUT type=radio NAME="dt" VALUE="er">
    </TD>
    <TD><SELECT NAME="dx">
    <OPTION VALUE="1" SELECTED="$(dx)">After
    <OPTION VALUE="-1" SELECTED="$(dx)">Before
    </SELECT>

    or on

    <SELECT NAME="dm">
    <OPTION VALUE="0" SELECTED="$(dm)">January
    <OPTION VALUE="1" SELECTED="$(dm)">February
    <OPTION VALUE="2" SELECTED="$(dm)">March
    <OPTION VALUE="3" SELECTED="$(dm)">April
    <OPTION VALUE="4" SELECTED="$(dm)">May
    <OPTION VALUE="5" SELECTED="$(dm)">June
    <OPTION VALUE="6" SELECTED="$(dm)">July
    <OPTION VALUE="7" SELECTED="$(dm)">August
    <OPTION VALUE="8" SELECTED="$(dm)">September
    <OPTION VALUE="9" SELECTED="$(dm)">October
    <OPTION VALUE="10" SELECTED="$(dm)">November
    <OPTION VALUE="11" SELECTED="$(dm)">December
    </SELECT>
    <INPUT TYPE=text NAME="dd" VALUE="$(dd)" SIZE=2 maxlength=2>
    ,
    <SELECT NAME="dy" >
    <OPTION VALUE="1990" SELECTED="$(dy)">1990
    <OPTION VALUE="1991" SELECTED="$(dy)">1991
    <OPTION VALUE="1992" SELECTED="$(dy)">1992
    <OPTION VALUE="1993" SELECTED="$(dy)">1993
    <OPTION VALUE="1994" SELECTED="$(dy)">1994
    <OPTION VALUE="1995" SELECTED="$(dy)">1995
    <OPTION VALUE="1996" SELECTED="$(dy)">1996
    <OPTION VALUE="1997" SELECTED="$(dy)">1997
    <OPTION VALUE="1998" SELECTED="$(dy)">1998
    <OPTION VALUE="1999" SELECTED="$(dy)">1999
    <OPTION VALUE="2000" SELECTED="$(dy)">2000
    <OPTION VALUE="2001" SELECTED="$(dy)">2001
    </SELECT>
    </TD>
    </TR>
    </TR>
    <TD VALIGN=center><INPUT TYPE=radio NAME="dt" VALUE="range">
    </TD>
    <TD>
    Between
    <INPUT TYPE=text NAME="db" VALUE="$(db)" SIZE=11 MAXLENGTH=11>
    and
    <INPUT TYPE=text NAME="de" VALUE="$(de)" SIZE=11 MAXLENGTH=11>
    </TD>
    </TR>
    </TABLE>
    </TD></TR>
    <!-- end of stl options -->

8.3.1.2. BOTTOM section

This section is always included last in every page. So you should provide all closing tags which have their counterparts in top section. Although it is not obligatory to place this section at the end of template file, but doing so will help you to view your template as an ordinary html file in a browser to get the idea how it's look like.

Below is an example of bottom section:

<!--bottom-->
<P>
<HR>
<DIV ALIGN=right>
<A HREF="http://www.maxime.net.ru/">
<IMG SRC="dpsearch.gif" BORDER=0 
ALT="[Powered by DataparkSearch search engine software]">
</A>
</BODY>
</HTML>
<!--/bottom-->

8.3.1.3. RESTOP section

This section is included just before the search results. It's a good idea to provide some common search results. You can do so by using the next meta symbols:

  • $(first) - number of First document displayed on this page

  • $(last) - number of Last document displayed on this page

  • $(total) - total number of documents found

  • $(grand_total) - total number of documents found before grouping by site

  • $(WE) - search results with full statistics of every word form search

  • $(W) - search results with information about the number of the word form found and the number of all word forms found delimited with "/" sign for every search word, e.g. if the search result is test: 25/73, it means that the number of word form "test" found is 25, and the number of all its forms ("test", "tests", "testing", etc.) found is 73.

  • $(WS) - search results in short form with the number of all word forms found.

  • $(SearchTime) - search query execution time.

  • $(ndocs) - number of documents in database.

Below is an example of 'restop' section:

<!--restop-->
<TABLE BORDER=0 WIDTH=100%>
<TR>
<TD>Search<BR>results:</TD>
<TD><small>$(WE)</small></TD>
<TD><small>$(W)</small></TD>
</TR>
</TABLE>
<HR>
<CENTER>
Displaying documents $(first)-$(last) of total <B>$(total)</B> found.
</CENTER>
<!--/restop-->

8.3.1.4. RES section

This section is used for displaying various information about every found document. The following meta symbols are used:

  • $(URL) Document URL

  • $(Title) Document Title

  • $(Score) Document Rating (as calculated by DataparkSearch

  • $(Body) Document text, the document excerpt, if stored is used, or the first couple of lines, otherwise, to give an idea of what the document is about).

  • $(Content-Type) Document Content-type (for example, text/html)

  • $(Last-Modified) Document Last-Modified date

  • $(Content-Length) Document Size in bytes

  • $(FancySize) Document Size in bytes, kilobytes or megabytes, what best match.

  • $(Order) Overall Document Number (in order of appearance), i.e. from 1 to $(total).

  • $(Pos) Document Number on the page (in order of appearance), i.e. from 1 to $(ps).

  • $(meta.description) Document Description (from META DESCRIPTION tag)

  • $(meta.keywords) Document Keywords (from META KEYWORDS tag)

  • $(DY) Document category with links, i.e. /home/computers/software/www/

  • $(CL) Clone List (see Section 8.3.1.6> for details)

  • $(BrowserCharset) Charset used to display search results

  • $(PerSite) Total number of document from this site, if grouping by site is enabled, =0 otherwise.

Note: It is possible to specify maximum number of characters returned by any of the above variables. E.g. $(URL) may return a long URL that may break page table structure. To specify maximum number of characters in the displayed URL's, use $(URL:xx), where xx - maximum number of characters:

$(URL:40)

will return a URL, and if it is longer than 40 character, only 40 characters will be displayed including the ending points:

http://very.long.url/path/veery/long/...

Here is an example of res section:

<!--res-->
<DL><DT>
<b>$(Order).</b><a href="$(URL)" TARGET="_blank">
<b>$(Title)</b></a> [<b>$(Score)</b>]<DD>
$(Body)...<BR>
<b>URL: </b>
<A HREF="$(URL)" TARGET="_blank">$(URL)</A>($(Content-Type))<BR>
$(Last-Modified), $(Content-Length) bytes<BR>
<b>Description: </b>$(meta.description)<br>
<b>Keywords: </b>$(meta.keywords)<br>
</DL>
<UL>
$(CL)
</UL>
<!--/res-->

8.3.1.5. BETWEENRESsection

The content of this section is inserted between search results shown with RES section. You can use it if the format of your search result page requires a separator between records, as in JSON, eg. (see doc/samples/json.htm).

8.3.1.6. CLONE section

The contents of this section is included in result just instead of $(CL) meta symbol for every document clone found. This is used to provide all URLs with the same contents (like mirrors etc.). You can use the same $(D*) meta symbols here as in 'res' section. Of course, some information about clone, like $(DS), $(DR), $(DX) will be the same so it is of little use to place it here.

Below is an example of 'clone' section.

<!--clone-->
<li><A HREF="$(DU)" TARGET="_blank">$(DU)</A> ($(DC)) $(DM)
<!--/clone-->

8.3.1.7. RESBOT section

This is included just after last 'res' section. You usually give a navigation bar here to allow user go to next/previous results page.

This is an example of 'resbot' section:

<!--resbot-->
<HR>
<CENTER>
Result pages: $(NL)$(NB)$(NR)
</CENTER>
<!--/resbot-->

Navigator is a complex thing and therefore is constructed from the following template sections:

8.3.1.8. navleft, navleft_nop section

These are used for printing the link to the previous page. If that page exists, <!--navleft--> is used, and on the first page there is no previous page, so <!--navleft_nop--> is used.

<!--navleft-->
<TD><A HREF="$(NH)"><IMG...></A><BR>
<A HREF="$(NH)">Prev</A></TD>
<!--/navleft-->

<!--navleft_nop-->
<TD><IMG...><BR>
<FONT COLOR=gray>Prev</FONT></TD>
<!--/navleft_nop-->

8.3.1.9. navbar0 section

This is used for printing the current page in the page list.

<!--navbar0-->
<TD><IMG...><BR>$(NP)</TD>
<!--navbar0-->

8.3.1.10. navright, navright_nop section

These are used for printing the link to the next page. If that page exists, <!--navright--> is used, and on the last page <!--navright_nop--> is used instead.

<!--navright-->
<TD>
<A HREF="$(NH)"><IMG...></A>
<BR>
<A HREF="$(NH)">Next</A></TD>
<!--/navright-->

<!--navright_nop-->
<TD>
<IMG...>
<BR>
<FONT COLOR=gray>Next</FONT></TD>
<!--/navright_nop-->

8.3.1.11. navbar1 section

This is used for printing the links to the other pages in the page list.

<!--navbar1-->
<TD>
<A HREF="$(HR)">
<IMG...></A><BR>
<A HREF="$(NH)">$(NP)</A>
</TD>
<!--/navbar1-->

8.3.1.12. notfound section

As its name implies, this section is displayed in case when no documents are found. You usually give a little message saying that and maybe some hints how to make search less restrictive.

Below is an example of notfound section:

<!--notfound-->
<CENTER>
Sorry, but search hasn't returned results.<P>
<I>Try to compose less restrictive search query or check spelling.</I>
</CENTER>
<HR>
<!--/notfound-->

8.3.1.13. noquery section

This section is displayed in case when user gives an empty query. Below is an example of noquery section:

<!--noquery-->
<CENTER>
You haven't typed any word(s) to search for.
</CENTER>
<HR>
<!--/noquery-->

8.3.1.14. error section

This section is displayed in case some internal error occurred while searching. For example, database server is not running or so. You may provide the following meta symbol: $(E) - error text.

Example of error section:

<!--error-->
<CENTER>
<FONT COLOR="#FF0000">An error occured!</FONT>
<P>
<B>$(E)</B>
</CENTER>
<!--/error-->

8.3.2. Variables section

There is also a special variables section, in which you can set up some values for search.

Special variables section usually looks like this:

<!--variables
DBAddr		  mysql://foo:bar@localhost/search/?dbmode=single
VarDir            /usr/local/dpsearch/var/
LocalCharset	  iso-8859-1
BrowserCharset    iso-8859-1
TrackQuery	  no
Cache		  no
DetectClones	  yes
HlBeg		  <font color="blue"><b><i>
HlEnd		  </i></b>
R1		  100
R2		  256
Synonym		  synonym/english.syn
ResultContentType text/xml
Locale            fr_FR.ISO_8859-1
TZ                Australia/Sydney
-->

Note: Database option DBAddr like in indexer.conf, host part in DBAddr argument takes affect for natively supported databases only and does not matter for ODBC databases. In case of ODBC use database name part of DBAddr to specify ODBC DSN.

VarDir command specifies a custom path to directory that indexer stores data to when use with cache mode. By default /var directory of DataparkSearch installation is used.

LocalCharset specifies a charset of database. It must be the same with indexer.conf LocalCharset.

BrowserCharset specifies which charset will be used to display results. It may differ from LocalCharset. All template variables which correspond data from search result (such as document title, description, text) will be converted from LocalCharset to BrowserCharset. Contents of template itself is not converted, it must be in BrowserCharset.

Use "Cache yes/no" to enable/disable search results cache.

Use "DetectClones yes/no" to enable/disable clones detection. This is disable by default for search.

Use "GroupBySite yes/no/full" to enable/disable grouping results by url.site_id. When yes option is used, the pages from the same site coming in a row are grouped. If full option is used, all pages from the same site are grouped.

Note: If searchd is used you should place GroupBySite in your searchd.conf file, or pass it as CGI parameter.

If cache storage mode is used, you need also create SITE limit (see Section 5.2.8>).

Use PagesInGroup command to specify the number of additional results from the same site when google-like groupping is enabled.

You may use MaxSiteLevel command to specify maximal domain name level using for site_id calculation. Default value: 2. One exception: three or less letter domains at level 2 count as domain names at level 1. For example: domain.ext - level 2, www.domain.ext - level 3, domain.com.ext - level 2. A negative value for MaxSiteLevel mean grouping performs on per directory basis, i.e. for level -1 www.site.ext/dir1/ and www.site.ext/dir2 group as different sites.

HlBeg and HlEnd commands are used to configure search results highlighting. Found words will be surrounded in those tags.

There is an Alias command in search.htm, that is similar to the one in indexer.conf, but it affects only search results while having no effect on indexing. See Section 3.7> for details.

R1 and R2 specify ranges for random variables $(R1) and $(R2).

Synonym command is used to load specified synonyms list. Synonyms file name is either absolute or relative to /etc directory of DataparkSearch installation.

DateFormat command is used to change Last-Modified date format output. Use strftime function meta-variables for your own format string.

Note: If searchd is used, you may specify DateFormat in your searchd.conf file, but there you should enclose this string in quotas ("), or pass it as CGI parameter.

"Log2stderr yes/no" command is used to enable error logging to stderr.

ResultsLimit command is uses to limit maximum number of results shown. If searchd is used, this command may be specified in searchd.conf.

ResultContentType command is uses to specify Content-Type header for results page. Default value: text/html.

Locale command is used to specify LC_ALL locale settings for search results output. Default value: unspecified (uses the value specified before in system settings).

TZ command is used to specify time zone for timestamps shown on search results pages. Default value: system default.

With MakePrexixes yes command you can instruct to extend a search query automatically by producing all prefixes of query words. This is suitable, for example, for making search suggestions.(See also Section 3.10.56>)

8.3.3. Includes in templates

You may use <!INCLUDE Content="http://hostname/path"> to include external URLs into search results.

WARNING: You can use <!INCLUDE> ONLY in the following template sections:

<!--top-->
<!--bottom-->
<!--restop-->
<!--resbot-->
<!--notfound-->
<!--error-->

This is an example of includes usage:

<!--top-->
....
<!INCLUDE CONTENT="http://hostname/banner?query=$&(q)">
...
<!--/top-->

8.3.4. Conditional template operators

DataparkSearch supports conditional operators in search templates: <!IF, <!ELSE, <!ENDIF, <!ELIF, <!ELSEIF, <!SET, <!COPY, <!IFLIKE, <!IFREGEX, <!ELIKE, <!EREGEX, <!ELSELIKE, <!ELSEREGEX.

<!IF   NAME="Content-Type" Content="application/pdf">
<img src="pdf.png">
<!ELIF NAME="Content-Type" Content="text/plain">
<img src="text.png">
<!ENDIF>

It's possible to use nested conditional operators. This gives more power for search template construction. See samples in etc/search.htm-dist file.

8.3.5. Security issues

WARNING: Since the template file contains such info as password, it is highly recommended to give the file proper permissions to protect it from reading by anyone but you and search program. Otherwise your passwords may leak.