9.3. Database schema

Full database schema used by DataparkSearch is defined in appropriate sql-scipts for database creation located under create subdirectory.

Table 9-1. server table schema

rec_idUnique record identificator.
enabledA flag to enable/disable record for indexer.
urlURL or pattern.
tagTag value.
categoryCategories table rec_id.
command

=S - this record is a server.

=F - this record is a filter.

ordreSorting key, it define records order for server table loading.
parentIf not null, this record is added automaticaly by indexer and url field contain a server name accepted on record pointed by this filed value.
weightThis record weight for PopRank calculation.
pop_weightOne link weight from pages of this server. Calculated automatically. Manually change will have no effect.

Other server's parameters store in srvinfo table. Possible values for several parameters is given in table below.

Table 9-2. Several server's parameters values in srvinfo table

sname valuePossible sval values.
AliasAlias used for url.
PeriodReindexing period in seconds.
DeleteOlderHow much time to hold URLs before deleting them from the database.
RemoteCharsetDefault charset value.
DefaultLangDefault language value.
Request.AuthorizationFor basic authorization.
Request.ProxyProxy server to access documents from this resource.
Request.Proxy-AuthorizationProxy server authorization.
MaxHopsMaximum depth of way in "mouse" clicks from start url.
IndexA flag to enable/disable documents indexing.
Follow

=0, "page"

=1, "path"

=2, "site"

=3, "world"

RobotsA flag to enable/disable robots.txt file using.
DetectClonesA flag to enable/disable "clones" detection.
MaxNetErrorsMaximum network errors for this server.
NetDelayTimeIndexing delay time if a network error is occurred.
ReadTimeoutNetwork timeout value.
match_type

=0, DPS_MATCH_FULL - full coincidence.

=1, DPS_MATCH_BEGIN - pattern is a URL prefix.

=2, DPS_MATCH_SUBSTR - pattern is a URL substring.

=3, DPS_MATCH_END - pattern is a URL suffix.

=4, DPS_MATCH_REGEX - pattern is a regular expression.

=5, DPS_MATCH_WILD - pattern is a wildcards pattern (* and ? wildcards may be used).

=6, DPS_MATCH_SUBNET - < not yet supported >.

case_sense

=1, - case insensitive comparison.

=0, - case sensitive comparison.

nomatch

=1, - URLs not match this record is accepted.

=0, - URL match this record is accepted.

Method

Specify a document action for this command.

=Allow, - all corresponding documents will be indexed and scanned for new links.

=Disallow, - all corresponding documents will be ignored and deleted from database.

=HrefOnly, - all corresponding documents will be only scanned for new links (not indexed).

=CheckOnly, - all corresponding documents will be requested by HTTP HEAD request, not HTTP GET, i.e. inly brief info about documents (size, last modified, content type) will be fetched.

=Skip, - all corresponding documents will be skipped while indexing.

=CheckMP3, - all corresponding documents will be checked for MP3 tags along if its Content-Type is equal to audio/mpeg.

=CheckMP3Only, - is equal to CheckMP3, but if MP3 tag is not present, processing on Content-Type will not be taken.

=TagIf, - all documents will be maked by tag specified.

=CategoryIf, - all documents will be maked by category specified.

=IndexIf, - all documents will be indexed, if the value of section specified match the pattern given.

=NoIndexIf, - all documents will be ignored and deleted from database, if the value of section specified match the pattern given.

Section

Section name used in pattern matching for IndexIf and NotIndexIf methods.