mnoGoSearch 3.2.43 reference manual

Full-featured search engine software


Table of Contents
1. Introduction
mnoGoSearch Features
Where to get mnoGoSearch.
Disclaimer
Authors
Contributors (in no particular order)
Frequently Asked Questions
2. Installation
SQL database requirements
Supported operating systems
Tools required for installation
Installing mnoGoSearch
Possible installation problems
Creating binary distribution
Installation registration
3. Indexing
Indexing in general
Configuration
Running indexer
SQL back-end notes
How to create SQL table structure
How to drop SQL table structure
Subsection control
How to clear database
Database Statistics
Link validation
Parallel indexing
Supported HTTP response codes
Content-Encoding support
indexer configuration
Specifying WEB space to be indexed
Aliases
ServerTable
FlushServerTable
External parsers
Extended indexing features
News extensions
Indexing SQL database tables (htdb: virtual URL scheme)
Indexing binaries output (exec: and cgi: virtual URL schemes)
Mirroring
Using syslog
Disabling Apache logging
Storing cached copies
Configuring cached copies
Using cached copies at search time
4. mnoGoSearch HTML parser
Tag parser
Special characters
META tags
Links
Comments
5. Storing mnoGoSearch data
SQL storage types
Various modes of words storage
Storage mode - single
Storage mode - multi
Storage mode - blob
Substring search notes
Cache mode storage
mnoGoSearch performance issues
MySQL performance
Post-indexing optimization
Oracle notes
Introduction
Compilation, Installation and Configuration
IBM DB2 notes
6. Subsections
Categories
Tags
Tags in SQL version
7. Languages support
Character sets
Supported character sets
Several languages in one database
UTF-8 mode
non-UTF-8 mode
Recoding
Recoding at search time
Character sets aliases
Document charset detection
Automatic charset guesser
Default charset
Default Language
Making multi-language search pages
How does it work?
Possible troubles
Segmenters for Chinese, Thai and Japanese languages
Japanese language phrase segmenter
Chinese language phrase segmenter
Thai language phrase segmenter
Multilingual servers support
8. Searching documents
Using search front-ends
Performing search
Search parameters
Changing different document parts weights at search time
Using front-end with an shtml page
Using several templates
Advanced boolean search
Restrict searched words to a section
Phrase search
How search handles expired documents
How to write search result templates
Template sections
Template operators
Includes in templates
Security issues
Designing search.html
How is the results page created
Your HTML
Forms considerations
Relative links in search.htm
Adding Search form to other pages
Relevancy
Ordering documents
Boolean search
Crosswords
Search queries tracking
Search results cache
Fuzzy search
Ispell
Synonyms
Loading synonyms and word forms from SQL database
Dumping ispell data
Transliteration
Searching numbers
9. Miscellaneous
Reporting bugs
Currently known bugs
Core dump reports
Using libmnogosearch library
udm-config script
mnoGoSearch API
MySQL fulltext parser plugin
Database schema
I. Reference
I. mnoGoSearch commands reference
AddType -- associates file names or extensions with mime types
Affix -- includes ispell affix file
Alias -- associates master and mirror sites
AliasProg -- calls external URL parser
Allow -- allows to index defined URLs
AlnumFactor -- this command is obsolete
AlwaysFoundWord -- defines word that is always treated as found
AuthBasic -- defines basic HTTP authorization user name and password
BaseFiles -- this command is obsolete
BrowserCharset -- defines browser charset
Cache -- enables or disables cache search results
Category -- defines documents category
CheckMP3 -- checks for MP3 meta information
CheckMP3Only -- check for MP3 meta information
CheckOnly -- checks for file existence only
CrossWords -- specifies whether to use crosswords
CustomLog -- logging to stdout using the given format
CVSIgnore -- enables or disables indexing internal CVS files
DateFactor -- giving less score to old documents
DateFormat -- defines date format
DBAddr -- sets database address
DefaultContentType -- defines default Content-Type
DefaultLang -- defines default language
DetectClones -- enables or disables clone detection
Disallow -- disallows indexing defined URLs
DocMemCacheSize -- this command is obsolete
DocSizeWeight -- change document size impact on the document score
DocTimeOut -- defines maximal time for document downloading
DoStore -- this command is obsolete
ExcerptSize -- defines maximal length of excerpt
ExcerptStopword -- whether to hightlight stopwords.
ExcerptPadding -- defines excerpt padding length
FlushServerTable -- flushes server.active to inactive
FollowSymLinks -- dereference or not symlinks
ForceIISCharset1251 -- assume windows-1251 charset
GroupBySite -- this command is obsolete
GuesserUseMeta -- enables or disables using meta tags
HlBeg -- configures search results highlighting
HlEnd -- configures search results highlighting
HoldBadHrefs -- defines timeout for holding bad URLs
HrefOnly -- scan HTML pages only for URLs
HTDBAddr -- HTDBAddr
HTDBDoc -- HTDBDoc
HTDBLimit -- HTDBLimit
HTDBList -- HTDBList
HTTPHeader -- adds desired headers in indexer HTTP request
ImportEnv -- imports an environment variable
Include -- includes additional configuration file
Index -- prevents indexer from storing words into database
IndexIf -- allows indexing documents whose section matches the given pattern
IndexTime -- Enables or disables Last-Modified HTTP header processing.
IspellCorrectFactor -- this command is obsolete
IspellInCorrectFactor -- this command is obsolete
IspellUsePrefixes -- allows to use ispell prefixes while searching
LangMapFile -- loads language map for charset and language guesser
LangMapUpdate -- no description available yet
Limit -- describes a fast limit
LoadChineseList -- loads Chinese word frequency list
LoadThaiList -- loads Thai word frequency list
LocalCharset -- defines local charset
Locale -- sets a desired locale
LogsOnly -- this command is obsolete
MaxDocSize -- defines maximal document size
MaxDocPerSite -- defines maximal document number to pick up from each site
MaxHops -- defines maximal way in "mouse clicks"
MaxNetErrors -- defines maximal network errors
MaxWordLength -- defines maximal word length
Mime -- defines external parser for given mime-type
MinCoordFactor -- giving more score to documents having found words closer to the beginning
MinWordLength -- defines minimal word length
MirrorHeadersRoot -- defines root directory of mirrored document's headers
MirrorPeriod -- defines period for mirrored files
MirrorRoot -- defines root directory to enable sites mirroring
NetErrorDelayTime -- defines document processing delay
NewsExtensions -- enables news extensions
NoIndexIf -- disallows indexing documents whose section matches the given pattern.
NumberFactor -- this command is obsolete
NumSections -- specifying the number of sections configured in indexer.conf
NumWordFactor -- giving more score to documents having more found words
OptimizeInterval -- this command is obsolete
OptimizeRatio -- this command is obsolete
ParserTimeOut -- defines amount of time for parser execution
Period -- defines reindex period
PopRankFeedBack -- calculates sites weights
PopRankShowCntRatio -- PopRankShowCntRatio
PopRankShowCntWeight -- PopRankShowCntWeight
PopRankSkipSameSite -- skips links from same site
PopRankUseShowCnt -- PopRankUseShowCnt
PopRankUseTracking -- PopRankUseTracking
Proxy -- defines HTTP proxy address
ProxyAuthBasic -- defines HTTP proxy user name and password
R0 - R9 -- sets random number
ReadTimeOut -- defines stalled connections timeout
Realm -- describes web-space to index using regex/wild patterns
RemoteCharset -- defines default character set for next Server command(s)
RemoteFileNameCharset -- defines default character set of file and directory names
ReplaceVar -- creates or modifies a variable
ResultsLimit -- ResultsLimit
ReverseAlias -- ReverseAlias
Robots -- allows using robots.txt
Section -- defines document's section
Server -- describes web-space you want to index
ServerTable -- loads servers from database
ServerWeight -- defines server's weight
Spell -- loads ispell file
SQLWordForms -- load synonyms or word forms from the database
StartHops -- 'Hops' value for start URLs.
StopwordFile -- loads stopwords file
StoredFiles -- this command is obsolete
StrictModeThreshold -- threshold to switch to a less strict search mode
Subnet -- Subnet
Suggest -- Display misspelled search word suggestions
Synonym -- loads synonyms file
SyslogFacility -- sets syslog facility
Tag -- generic grouping tag
URL -- inserts URL into database
URLDataThreshold -- improves search performance for queries returning small number of results
URLDAddr -- this command is obsolete
URLDataFiles -- this command is obsolete
URLSelectCacheSize -- sets URLs cache size for indexer
UseCookie -- activates using per-session cookies during indexing
UseCRC32URLId -- enables generation CRC32 URL IDs
UseNumericOperators -- activates interpretic numeric operators in a search query
UseRemoteContentType -- specifies if the indexer should get content type from server
UserScore -- specifies a SQL query to calculate user defined score for desired documents.
UserScoreFactor -- set the effect of "UserScore" command
VarDir -- defines mnogosearch var directory
VaryLang -- defines languages for multilingual indexing
wf -- sets the default weights of different document parts
WordCacheSize -- defines maximal in-memory words cache size
WordDistanceWeight -- change word distance impact on the document score
WrdFiles -- this command is obsolete
A. mnoGoSearch change history
Changes in 3.2
Changes in 3.2.43 (17 October 2007)
Changes in 3.2.42 (12 April 2007)
Changes in 3.2.41 (03 February 2007)
Changes in 3.2.40 (10 November 2006)
Changes in 3.2.39 (5 June 2006)
Changes in 3.2.38 (15 March 2006)
Changes in 3.2.37 (17 February 2006)
Changes in 3.2.36 (27 January 2006)
Changes in 3.2.35 (23 November 2005)
Changes in 3.2.34 (22 September 2005)
Changes in 3.2.33 (13 June 2005)
Changes in 3.2.32 (30 March 2005)
Changes in 3.2.31 (17 February 2005)
Changes in 3.2.30 (21 January 2005)
Changes in 3.2.29 (24 December 2004)
Changes in 3.2.28 (17 December 2004)
Changes in 3.2.27 (10 December 2004)
Changes in 3.2.26 (03 December 2004)
Changes in 3.2.25 (22 November 2004)
Changes in 3.2.24 (04 November 2004)
Changes in 3.2.23 (20 October 2004)
Changes in 3.2.22 (14 October 2004)
Changes in 3.2.21 (01 September 2004)
Changes in 3.2.20 (23 August 2004)
Changes in 3.2.19 (07 July 2004)
Changes in 3.2.18 (07 June 2004)
Changes in 3.2.17 (05 May 2004)
Changes in 3.2.16 (12 April 2004)
Changes in 3.2.15 (26 September 2003)
Changes in 3.2.14 (29 July 2003)
Changes in 3.2.13 (10 July 2003)
Changes in 3.2.12 (25 June 2003)
Changes in 3.2.11 (20 June 2003)
Changes in 3.2.10 (11 April 2003)
Changes in 3.2.9 (07 April 2003)
Changes in 3.2.8 (30 January 2003)
Changes in 3.2.7 (11 October 2002)
Changes in 3.2.6 (19 June 2002)
Changes in 3.2.5 (27 May 2002)
Changes in 3.2.4 (15 May 2002)
Changes in 3.2.3 (24 November 2001)
Changes in 3.2.2 (24 October 2001)
Changes in 3.2.1 (27 September 2001)
Changes in 3.2.0 (24 September 2001)
Changes in 3.2.0.b2 (08 August 2001)
Changes in 3.2.0.b1 (03 July 2001)
Index
List of Tables
3-1. Verbose levels
7-1. Language groups
7-2. Charsets aliases
8-1. Available search parameters
9-1. server table schema
9-2. Several server parameters values in srvinfo table