Appendix A. mnoGoSearch change history

Changes in 3.2

Changes in 3.2.43 (17 October 2007)

  • Fixed that indexer crashed in some cases when running with many threads.

  • Bug#1716 "Can't limit indexer to documents matching language" was fixed.

  • Bug#1713 "Square brackets in DOCTYPE makes XML parser fail" was fixed.

  • Bug#1740 "'UseRemoteContentType yes' doesn't work." was fixed (a bug since 3.2.42).

  • Fixed an XSS (cross-site scripting) security problem in the default template search.htm-dist. Passing special values of the "t" query string variable to search.cgi resulted in bad code injection near the OPTION tags of the <SELECT NAME="t"> option list in extended search form.

    This problem happened only with <SELECT NAME="t"> which is inside a HTML comment in the default template. Other SELECT lists were not affected, if you didn't put them into a HTML comment.

    To prevent this problem, search.cgi was modified to understand variable references with "HTML-encoded" output format:

    
<OPTION VALUE="val" SELECTED="$&(var)">
    
    Previously only non-encoded variable references worked in OPTION tags:
    
<OPTION VALUE="val" SELECTED="$(var)">
    
    The default template search.htm-dist was modified to use HTML-encoded output format in variable references in all OPTION tags.

    After upgrade to this release, modify the existing templates by replacing all <OPTION VALUE="val" SELECTED="$(var)"> to <OPTION VALUE="val" SELECTED="$&(var)">.

Changes in 3.2.42 (12 April 2007)

  • DBMode=blob is now supported with DB2.

  • Minor indexing performance improvement was done for the case when no LangMapFile commands specified.

  • Fixed that "UseCookie yes" made indexer crash when fetching data from HTDB sources.

  • Bug#1024 "Clear database limitations do not work: error ORA-01795" was fixed.

  • Bug#1016 "Indexer is selecting wrong Content-Type" was fixed.

  • Bug#1024 "Clear database limitations do not work: error ORA-01795" was fixed.

  • Bug#1044 "-Ewordstat: incorrect unicode sequence" was fixed.

  • Bug#1110 "'invalid UTF-8 byte sequence detected' when INSERT INTO dictXX" was fixed. This error happened when indexing into PostgreSQL with DBMode=multi. The "intag" column type was changed from TEXT to BYTEA in the tables "dict00".."dictFF".

  • Bug#1182 "Indexer crashes with -a -y 'content/type'" was fixed.

  • Bug#1356 "MySQL fulltext parser plugin does not compile" was fixed.

  • Bug#1398 "DateFactor does not work with DBMode=blob" was fixed.

  • Bug#1427 "ORA-01785: maxinum number of expressions in a list is 1000" was fixed.

  • Bug#1436 "Cannot run -Ewordstat, ORA-01400: cannot insert NULL" was fixed.

  • Bug#1615 "The identifier "PATH_MAX" is undefined" wad fixed.

  • Bug#1641 "Documentation problem" was fixed.

  • Bug#1693 "User defined sections don't work for text/plain files" was fixed.

Changes in 3.2.41 (03 February 2007)

  • DBMode=blob is now supported with Firebird/Interbase.

  • The "UserScore" and "UserScoreFactor" commands where added. These commans allow to mix score calculated by mnoGoSearch with user defined score. The "us" search.cgi parameter was added to choose which UserScore to use for the current search session.

  • Hindi language map was added. Thanks to Yannick LE NY for contribution,

  • "indexer -Eblob" performance improvements were made: it's now up to two times faster depending on the database. Converting now uses less memory allocations and memory moves, and also utilizes DISABLE/ENABLE KEYS technique with MySQL when writting to the table "bdict".

  • The "DEFAULT" clause was removed from all BLOB/TEXT fields in MySQL create scripts, to avoid errors when running with mysqld in "strict" mode.

  • The "ServerTable" command is now ignored when loading search.htm. It's usefull when ServerTable/DBAddr commands are written in a separate file which is included from both indexer.conf and search.htm. Thanks to Michael Hanselmann for the patch.

  • The search cache now respects the "u" variable. Thanks to Michael Hanselmann for the patch.

  • Fixed that search.cgi now detects and reports missing template ENDIF and ENDWHILE operators. Previously it fell into an endless loop.

Changes in 3.2.40 (10 November 2006)

  • Fixed that cookies with path "/foo" didn't match neither "/foobar" nor "/foo/bar.html". Only cookies with trailing slashes in path (e.g. "/foo/") worked as expected.

  • Fixed that section references like 'title:word1 body:word2' didn't work in DBMode=single and DBMode=multi

  • Bug#1123 "--without-extra-charsets does not work" was fixed.

  • Bug#1142 "indexer gives PG errors when indexing" was fixed.

  • Fixed that substring search with DBMode=blob returned extra empty documents in some cases.

  • Fixed that "fl" was ignored with "ul" or date limit specified at the same time.

  • Fixed that indexer crashed with "UseCookies yes" in some cases.

  • A fix in OCI driver was made: OCILobGetLength() returns in characters, not in bytes. Conversion from number of characters into number of bytes was added when allocating space for fetch buffer.

  • bug#1300 "Low ranking with GroupBySite=yes" was fixed.

  • A fix in MySQL driver was made to make indexer work with mysql-3.23 again: fixed not to use SQL syntax appeared in mysql-4.0 or later (bug since 3.2.38).

  • bug#1085 "configure cannot find mysql include/lib files" was fixed.

  • Fixed that search.cgi crashed because of stack overflow when compiled with 'CFLAGS=-O'.

Changes in 3.2.39 (5 June 2006)

  • "Locale" search.htm command was added. Month and day names in $(Last-Modified) are now printed according to the desired locale. Example: "Locale fr_FR.UTF8".

  • SQLite3 support was added. Use --with-sqlite3 to configure to build with SQLite3 support.

  • "fl" now can be used in DBAddr.

  • Excluding fast limits were added. Add "-" before a limit name to make it excluding. For example, fl=-name will exclude all documents covered by limit "name".

  • Now it's possible to specify "Limit" commands in search.htm, i.e. without having to cache them previously using "indexer -Eblob" or "indexer -Erewritelimits". Useful for limits whose SQL queries work very quickly (usually returning a small amount of documents).

  • Search query syntax now undestands section name references. For example, "title:web body:server" will find documents having "web" in title and "server" in body. Copy "Section" commands from indexer.conf into search.htm to make search recognize section names.

  • Automatic phrase search was implemented for complex words having dots, dashes, underscores, commas and slashes (-_.,/) as delimiters between word parts. For example, `max_allowed_packet' now automatically searches for phrase `"max allowed packet"', not just for three separate words.

  • Search now uses operator LIKE instead of operator "=" when loading fast limits from the database. The "fl" pattern can match more than one limit. Documents covered by either of matching limits are returned.

  • Better excerpt generation for phrase search. Separate words are not included into excerpts anymore.

  • More accurate soundex calculation for better misspelled word suggestions. Re-run "indexer -Ewordstat" after upgrade if you use suggestions.

  • Minor query tracking performance improvements were done for MySQL, Oracle, Mimer and Interbase.

  • "CREATE TABLE url" and "CREATE TABLE urlinfo" MySQL statements were fixed. Now these tables can be bigger than 4Gb.

  • "indexer -Eblob" now writes a timestamp marker into the "bdict" table. Execute "SELECT CAST(intag AS CHAR) FROM bdict WHERE word='#ts'" to know when "indexer -Eblob" was run previous time.

  • Some thread safety improvements were made.

  • PostgreSQL driver now uses two times less memory when "indexer -Eblob" is running. Also, a memory leak in PostgreSQL driver was fixed.

  • Fixed that path and file names were not URL-unescaped when talking to a FTP server.

  • Bug#769 "Missing Alias var in clone template" was fixed. Thanks to Jens for the original patch.

  • Fixed that misspelled word suggestions didn't work for DBMode=blob

  • Bug#1105 "Column 'rec_id' in field list is ambiguous" was fixed.

  • Bug#1099 "Illegal using sequences in Oracle" was fixed.

  • Fixed that the variables declared using ReplaceVar were not converted from LocalCharset to BrowserCharset with an empty search query typed.

Changes in 3.2.38 (15 March 2006)

  • Fetching of a list of indexer targets is now much faster with big MySQL databases. The SQL query for target look up was rewritten to creating a temporary table and its further use in a join with the "url" table. It allowed to get rid of a filesort with a huge amount of data. Also, a key using free columns (next_index_time,seed,rec_id) was added for the "url" table, to allow index read instead of full table scan.

  • MinCoordFactor improvements were made to be more distinguishable for small word positions, i.e. in the very beginning of the document.

  • Fixed that User.Date with YYYY-MM-YY and DD.MM.YYYY formats gave wrong month value.

  • Crash with long words in synonym look up code was fixed.

  • Compilation problem on RedHat ES 4 was fixed.

  • Fixed that after processing a "mailto:" or a disallowed link, indexer wouldn't store crosswords from any further links on the same document:

                
    <html><body>
    <a href="a.html">These crosswords were stored fine.</a>
    <a href="mailto:test@test.com">A test mailto link, breaking stopwords.</>
    <a href="b.html">These crosswords were not stored.</a>
    <a href="c.html">Neither were these.</a>
    </body></html>
    

Changes in 3.2.37 (17 February 2006)

  • ChangeLog has been moved into documentation.

  • Fixed that DBMode=blob table structure files were not created for databases other than MySQL.

  • Fixed that crosswords didn't take into account the "fl" parameter.

  • Fixed that empty "fl" value didn't return any results, instead of returning all matching results without filtering.

  • Fixed that search unnecessarily loaded the "fl" limits with an empty search query, e.g. when switching between "simple" and "extended" search form. Thanks to Goga for providing the fix.

  • Fixed that search cache did not take into account the "fl" parameter, so one could get the same result with different limits if Cache is on.

  • Fast limits can be built on any tables (not only url and urlinfo) with MySQL.

  • Fixed that "de" patameter didn't work inclusively, e.g. de=01/01/2006 included only those documents modified before "01/01/2006 00:00:00", instead of "01/01/2006 23:59:59".

  • Running "indexer -Eblob" and "indexer" simultaneously does not lock search anymore with MySQL.

  • --enable-chasen and --enable-mecab were changed to --with-chasen and --with-mecab, making it possibile to specify a directory name.

  • Fixed several memleaks and uninitialized variables reported by valgrind.

  • Fixed that libmnogosearch.so wasn't installed. Bug since 3.2.36.

Changes in 3.2.36 (27 January 2006)

  • indexer now supports DBMode=blob, which is now the fastest DBMode for both indexing and searching.

  • libmnogosearch.so now can be installed as MySQL fulltext parser. See "MySQL fulltext parser plugin" manual section for details.

  • It's now possible to use variables in an external parser command line. This example passes URL and TAG values in the parser command line:

    
        Mime "text/pdf" "text/plain" "/path/to/parser -u ${URL} -t ${TAG}"
            

    See the list of all available variables in "indexer -v6" output, in the lines beginning with "Response." prefix.

  • An optional fourth parameter for Mime command was added, to post extra information to an external parser, together with document content. For example:

    
        Mime mytype "text/plain" "cat" "${URL} # ${HTTP.Content}"
            
  • "SQLWordForms sql" search.htm command was added. It intorduces a new fuzzy search method allowing to load synonyms or word forms from the SQL database. It can be used as a faster replacement for Synonym and Ispell fuzzy search methods.

  • "indexer -Edumpspell" command was added to dump spell data in a format suitable for loading into SQL database for further use with "SQLWordForms".

  • A new "when" optional parameter was added into "Section" indexer.conf command. It supports three values: "afterheaders", "afterguesser" and "afterparser", and allows to create user defined sections at different moments of document processing, which for example makes it possible to replace HTTP headers sent by a remote server.

  • "Limit" indexer.conf command and "fl" search parameter were added, introducing fast limits support, which improves searching through a part of the database, especially for DBMode=blob.

  • A new command "ReplaceVar name value" was added.

  • Synonym files now understand "Mode: reverse" and "Mode: oneway" commands to change word expansion behaviour between "all words exapand to all words on the same line" and "only the leftmost word expands to other words on the same line".

  • "NumWordFactor num" search.htm command was added, where num is between 0 and 255. It specifies how much the number of found words in a document affects its final score. 255 means maxinum effect, 0 means ignore the count of found words.

  • "MinCoordFactor num" search.htm command was added. Use this command to give more score for those documents having the first found word closer to the beginning of the document. Use with a number between 0 and 255. The default value is 0, which means no effect.

  • "URLDataThreshold num" search.htm command was added. It allows to improve search performance with DBMode=blob for the queries returning a small number of results (not more than several hundreds). If search returns less than "num" documents, full URL information is not loaded from the "bdict" table and the "url" table is used instead. The default value is 0, which means always read URL data from the "bdict" table. Find the number which is good for your installation experimentally.

  • "UseNumericOperators yes/no" search.htm command was added. When set to "yes", the "<" and ">" signs are treated as numeric comparison operators, e.g. "<100" finds all documents which have numbers less than 100 in their body or title or other sections according to the "wf" settings. Default value is "no", i.e. numeric operators are ignored.

  • New character set name aliases were added: "armscii8", "koi8r", "koi8u" and "ujis", for MySQL names compatibility.

  • Fixed that XML character set declaration was not processed, e.g.: <?xml version="1.0" encoding="utf-8"?>

  • Fixed that query tracking didn't work with Oracle, DB2, Firebird, Mimer, Sybase (Bug#742).

  • Fixed that "crossdict" table wasn't created for Oracle, DB2, Mimer and Interbase/Firebird (Bug#748).

  • Fixed that $(PerSite) value was calculated incorrectly with several DBAddr search.htm commands.

  • Fixed that template operators inside a HTML comment were interpreted instead of being printed just as a comment part (Bug#708, part2).

  • Fixed that <!EREG> didn't work with "<" and ">" characters inside REPLACE attribute (Bug#1010)

  • Fixed that <META NAME="ROBOTS" CONTENT="NOINDEX"> didn't prevent indexing of the url.file, url.path, url.site, url.proto sections (Bug#679).

  • indexer now chooses character set value in this order: "Content-Type" HTTP header, "Content-Type" META tag, RemoteCharset value from indexer.conf. Previously RemoteCharset was incorrectly selected in the first instance (bug#575).

  • Fixed that "Sun, 6 Nov 1994 08:49:37 GMT" date format was not recognized when indexing a NEWS server (Bug#694).

  • Syntax error in PostgreSQL trigger was fixed (Bug#784).

  • Build error on IRIX using native CC compiler was fixed (bug#778).

  • Bug#760 "Empty title and body in search.cgi - Can't get BLOB from oracle" was fixed.

  • Fixed that "mconv" incorrectly exited with "An output error" message in some cases.

  • Fixed that search.cgi could crash when running with DBMode=blob in some cases. Thanks to Goga for proposing the fix.

  • Fixed that the "regexp" keyword didn't work as an alias for "regex" in some indexer.conf commands.

Changes in 3.2.35 (23 November 2005)

  • A new "wtime" column was added into "qtrack" table to store time spent for search, in milliseconds. Everyone who uses "trackquery" feature needs add this column (e.g. using ALTER TABLE) or recreate "qtrack" table.

  • IndexIf/NoIndexIf now understand variables, e.g. the following command means not to index documents having content type "text/plain" from the site 'site':

    
        NoIndexIf "${URL}#${Content-Type}" "http://site/*#text/plain"
            
  • indexer and search.cgi now load my.cnf file by default. Use "DBAddr mysql://user:passwd@host/dbname/?MyCnfGroup=group" to read options from the named group. If MyCnfGroup=no is specified, then the option file is not loaded (Bug#771).

  • "DateFactor number" search.htm command was added. Use with a number in the range 0..255 to change effect of Last-Modified of a document on its score. The default value is 0, which means don't take Last-Modified into account. If DateFactor is set to a non-zero value, then a more fresh document gets better score than an older document with the same content.

  • Indexer now treats the documents having "xml" and "rss" substrings in Content-Type header as XML documents. E.g. "application/xml", "application/rss" are now understood as XML as well. Previously only the exact "text/xml" string worked.

  • DBMode=blob now works with PostgreSQL.

  • "Deflate" DBAddr parameter was added into indexer.conf, e.g. "DBAddr mysql://root@localhost/test/?DBMode=multi&Deflate=yes". With "Deflate=yes" specified, indexer compresses data when converting with "indexer -Eblob", which makes a smaller database size and faster search.

  • It is possible to rewrite only URL data for DBMode=blob: "indexer -Erewriteurl". It's useful for very quick rewrite of URL data after adding "Deflate=yes", without touching word information.

  • CustomLog indexer.conf command was added to log to stdout using a user defined format, e.g.: "CustomLog '[${PID}] ${CurrentTime} ${Status} ${URL} ${Content-Type}'".

  • Several minor search performance improvements were made.

  • Several bugs in "AlwaysFoundWord" were fixed.

  • Fixed that loading URL data in "DBMode=blob" didn't work on big endian platforms (e.g. MacOS X). As a result search loaded data from "url" table, which was slow.

  • Fixed that "Section url.file" and "Section url.path" didn't work well when indexing FTP sites having national letters in directory and file names (Bug#658). Directory and file names (after %XX URL-unescaping) considered to have the same character set with the one specified in RemoteCharset (or iso-8859-1 by default). A new indexer.conf command "RemoteFileNameCharset" was added for the case when URL character set is different from RemoteCharset.

  • Fixed that MySQL-4.1 running in utf8 failed to create "qinfo" table with "Specified key was too long" error (Bug#1041).

  • Fixed that the "<!DOCTYPE...>" tag was removed from the template (Bug#781, Bug#1026).

  • Fixed that "<!CDATA[]]>" tags were not correctly processed by XML parser.

Changes in 3.2.34 (22 September 2005)

  • Per session Cookie support was added, use new "UseCookie yes/no" indexer.conf command to switch on/off.

  • "sybase" database type was added. e.g. sybase://sa@localhost/db/. Tested with ASE-12.5 with native ctlib as well as unixODBC interfaces.

  • Relevancy improvements: "WordDistanceWeight number" search.htm command was added. Use with a number in the range 0..255 to change effect of distance between the searched words on the resulting score. The default value is 255, which means maximum effect of word distance.

  • Relevancy improvements: "DocSizeWeight number" search.htm command was added. Use with a number in the range 0..255 to give lower score to a longer document and higher score to a shorter document if both documents contain the same number of found words. The default value is 255, which means maximum effect of document size.

  • New "nfw" search.cgi parameter. It uses the same format with "fw". If all found words appear in the only one section, then resulting score becomes lower. It can be used for example to ignore spam in KEYWORDS meta tag. I.e. if you use high "fw" and "nwf" values for the section corresponding to KEYWORDS, then score will high only if a word appeared in KEYWORDS and also in title/section, but not only in KEYWORDS.

  • New "StrictModeThreshold number" search.htm command. If search returned less retults than the given number, then search automatically switches from m=all mode (all words) to less strict m=any mode (any word). Default value is 0, which means don't switch automatically to less strict mode.

  • A new special "User.Date" section was added. It makes possible to use a user defined meta tag (or even any other part) of a HTML document as an alternative "Last-Modified" value: e.g.

    
        Section User.Date 0 10 '<META NAME="Date" +CONTENT="([^"]*)">' "$1"
            
  • "Cached Copy" now looks better for "text/vnd.wap.wml" (WAP documents).

  • Language quesser now understands "cn" as synonym for "zh" to detect Chinese.

  • "DefaultContentType" search.htm command was added. Helps when "Content-Type" header is not stored in the database and automatic guesser fails to detect a document type. Previosly "text/plain" was assumed.

  • search.cgi now can do Cyrillic->Latin and Latin->Cyrillic transliteration. New "tl=yes" search.cgi parameter was added to activate transliteration.

  • Self-links (i.e. when a page has a link to itself) do not affect popularity rank anymore.

  • It is possible to use phrase as a synonym now.

  • Added "AlwaysFoundWord" search template command. It specifies dummy word that is always considered found.

  • PgSQL driver has been slightly optimized.

  • Several improvements to search template to be compatible with XHTML.

  • Fixed that "<![CDATA[...]]>" entries didn't work well in search.htm.

  • Fixed search.cgi crash, which showed up on Debian and Suse in some cases (Bug#1004, Bug#1025).

  • Fixed that after indexing with MinWordLength in indexer.conf phrase search didn't work properly.

  • Fixed that search could split words into parts because of invoking Chinese/Thai segmenter in wrong cases.

  • Fixed that search query and word statistics were displayed in LocalCharset instead of BrowserCharset when no documents were found.

  • Fixed that search.cgi crashed if NumSections was smaller than actual number of sections stored in the database.

  • Fixed minor bug in synonyms code. One wasn't able to use synonyms feature if there are less than three synonyms defined.

  • Several stability and performance improvements were made.

Changes in 3.2.33 (13 June 2005)

  • "indexer -Eblob" doesn't block parallel search execution anymore with MySQL.

  • Japanese stoplist was added. Thanks to Alexander Sharapov.

  • <!EREG> template operator was added.

  • <!IFLE>, <!IFLT>, <!IFGE>, <IFGT> template operators where added for less-or-equal, less, greater-or-equal, greater numeric comparison.

  • New "IndexTime yes/no" indexer.conf command was added. If set to 'yes' then last_mod_time is set to indexing time, instead of value provided by "Last-Modified" HTTP header. Useful for indexing of dynamic pages.

  • "Realm site *" now follows only links from the same site with the current URL.

  • New "Realm urllist" realm type was added.

  • $(CurrentTimestamp) and $(Last-Modified-Timestamp) search.htm variables where added, representing current date and a document modification date in numeric (Unix timestamp) format.

  • New "dstmp" search parameter was added. It can be used instead of dy/dd/dm.

  • New "StartHops" indexer.conf command was added.

  • New "ExcerptStopword yes/no" search.htm command was added, to choose whether stopwords should be highlighted in excerpts.

  • Relevancy improvements were made (better word distance calculation, word count is taken in account now).

  • Excerpt generating performance improvements were made.

  • Boolean indexer.conf and search.htm commands now understand yes/no as well as 1/0 arguments.

  • Fixed that entities like &#27; didn't work with Big5 (bug#755).

  • Fixed that <!INCLUDE> didn't work in 3.2.32.

  • Fixed that indexer exited with "Duplicate error" message with PostgreSQL 8.0.

  • Fixed that Server/Realm commands didn't work after ServerTable command in some cases.

  • Fixed that indexer started with -N flag could hang in some cases.

  • Fixed that indexer could crash when processing a malformed BASE HREF tag.

  • Fixed that "ip" column in "qtrack" table was not filled by PHP module.

  • Fixed that search results were wrongly displayed if search limits returned no documents in some cases.

  • Fixed that a page was not removed from search index in some cases even if it was already removed from site.

  • Fixed that "Alias regex" didn't work in search.htm.

  • Several stability improvements were made.

Changes in 3.2.32 (30 March 2005)

  • MaxDocPerSite indexer.conf command was added.

  • Fixed that "MaxHops N" allowed to follow only N-1 hops. Now it allows to follow N hops.

  • HTML title and META tags are not included into body excertps when excerpts are built from CachedCopy.

  • Misspelled word suggestion now works in DBMode=single too. Earlier it worked only in DBMode=multi.

Changes in 3.2.31 (17 February 2005)

  • Misspelled search word suggestion were added. If a search query didn't return any results, a "Did you mean: similar query" link is displayed. To start using this feature, one needs to run "indexer -Ewordstat" once after indexing, as well as add "Suggest yes" into search.htm. Suggestions currently work only for those languages using Latin script.

  • Fixed that indexer crashed when "Section CachedCopy" was configured in a mistake as a user defined section, i.e. with "expression" and "replacement" arguments (bug#749).

  • Fixed that "Section body" was hardcoded and didn't work as a user defined section (bug#751).

  • Fixed that indexer compiled with --enable-trace crashed if /tmp/udm_agent.0.trace is not writtable.

  • Converter to DBMode=blob was improved to use less memory.

  • Queries returning big number of rows are now a bit faster with MySQL.

  • Ispell dictionaries are loaded a bit faster.

  • Several minor stability improvements were made.

Changes in 3.2.30 (21 January 2005)

  • HTMLENCODE function was added into template language.

  • Fixed that indexer crashed when trying to execute an external parser if /tmp is not writtable.

  • Fixed that cached copy was created for CheckOnly and CheckMP3Only documents (bug#628)

  • Fixed that indexer crashed if MirrorPeriod was set without MirrorRoot (bug#642).

  • Fixed that indexer with -nX, e.g. "indexer -n10" marked more than X documents as non-expired, making them temporarily not available for indexing during subsequent indexer start.

  • Fixed that mnogosearch didn't process URLs containing slashes in query string correctly, in particular it resulted in that socket=/tmp/mysql.sock part was ignored in DBAddr.

  • Fixed that search.cgi crashed in certain cases.

  • Minor bugs #625, #729 were fixed.

  • Bugs in phrase search with stopwords (#546, #645) were fixed.

  • A new WordCacheSize indexer.conf command was added. Increasing it allows to improve indexing speed in dbmode=multi.

  • searchd has been removed from the distribution.

Changes in 3.2.29 (24 December 2004)

  • A subtle bug which led to search.cgi crashes in rare cases was fixed.

  • PHP-4.3.x backward compatibility changes were made. These changes fix mnoGoSearch PHP module compilation failure introduced in 3.2.25.

  • Search speed improvements in blob mode were made. search.cgi now loads url data in a smarter way.

  • search.cgi now accepts only double quotes as phrase delimiters. Apostroph signs don't work as phrase delimiters anymore. This change fixes problems with French and English languages, where apostrophes are commonly used as word parts (e.g. dog's or isn't).

  • Several fixes in boolean and phrase search were made.

  • A bug with accented characters in URL was fixed.

  • Bug#718 "Incorrect parsing of XHTML isolated tags in templates" was fixed.

  • Cosmetic bug #722 was fixed.

Changes in 3.2.28 (17 December 2004)

  • Fixed that PagesPerScreen search.htm command didn't work in 3.2.27.

  • A bug in PopRank calculation was fixed.

  • A bug in phrase and boolean search was fixed (Bug#651). It showed up when the same word appeared more than once in the same query, e.g. a boolean search "(~a & b) | (a & ~b)" or a phrase search "as well as".

  • A bug with indexer crash when fetching a long "file:" url was fixed.

  • Cosmetic bugs #705, #706, #712 were fixed.

Changes in 3.2.27 (10 December 2004)

  • A cross scripting vulnerability problem was fixed. It could be used to obfuscate/fake the output and/or steal cookies by inserting arbitrary html/javascript code when navigating through next/prev search results page as well as extended/simple search form links. The Common Vulnerabilities and Exposures project (cve.mitre.org) has assigned the name CAN-2004-1059 to this issue. Thanks to Michael Krax and Mark J Cox for reporting the problem and for advising how to fix it.

  • Default template search.htm-dist improvements were made. Thanks to Igor Pashev for the suggestions (Feature request#687).

  • ODBC driver improvements were made.

  • Fixed that htdb:/ doesn't make indexer crash anymore without HTDBAddr specified. It now reuses DBAddr connection to access the source data.

  • Fixed that Include command didn't work in search.htm (Bug#637).

  • Fixed that DateFormat didn't affect clone dates (Bug#696).

Changes in 3.2.26 (03 December 2004)

  • Search speed improvements were made. This change mostly affects blob mode users. It makes searches bringing dozen thousand results up to eight times faster in some cases.

  • <!--noindex-->...<!--/noindex--> syntax was added as a synonym for <noindex>...</noindex>.

  • It is now possible to use %XX syntax to escape special characters like ':', '<', '>', '?', '#', '@' in user and password part of DBAddr command, for example,

    
        DBAddr mysql://root:%3A%3B%3C%3D%3E%3F%40@localhost/test/
            
  • It's now possible to switch MySQL query logging on/off using sqllog parameter in DBAddr, e.g.

    DBAddr mysql://user:pass@host/db/?sqllog=0
            
  • Fixed that Allow/Disallow filters were applied to alias rather than the original URL.

  • Fixed that Oracle didn't work via EasySoft ODBC driver.

  • A bug in robots.txt processing was fixed (Bug#686).

Changes in 3.2.25 (22 November 2004)

  • DetectClone/NoDetectClone optional flag was added into Section indexer.conf command. It specifies whether a section should be taken in account for clone detection.

  • URLDECODE function was added into template language.

  • search.cgi now loads ispell files several times faster. Those who use searchd for faster ispell support can reconfigure not to use searchd anymore.

  • LoadChineseList/LoadThaiList now use file names relative to /etc directory of mnoGoSearch installation.

  • Better error reporting on LoadChineseList/LoadThaiList failures.

  • Database merge mode, i.e. search with several DBAddr, was fixed.

  • Fixed that documents with <META NAME="Robots" CONTENT="NOINDEX"> were processed incorrectly in some cases.

Changes in 3.2.24 (04 November 2004)

  • Indexing speed improvements in "multi" DBMode.

  • Search results limit by a user defined section.

  • Search result ordering by a user defined section.

  • Case sensitive conditional template operator was added: <!IFCS NAME="a" CONTENT="ABC">ABC<!ENDIF>

  • Fixed that PHP --with-mnogosearch didn't compile if mnoGoSearch was build with --enable-pthreads.

  • Windows character set canonical names were renamed to conform IANA registry, e.g. cp1251 -> windows-1251. (see http://www.iana.org/assignments/character-sets) mnoGoSearch uses canonical names when displaying HTTP headers and META-tags. Some browsers didn't understand cp1251-style notation.

Changes in 3.2.23 (20 October 2004)

  • New template operators: WHILE, WHILENOT, INC, DEC.

  • Blob-mode converter was improved.

  • install.pl was improved.

  • Bug with empty dictFF was fixed.

  • Bug #513 was fixed.

Changes in 3.2.22 (14 October 2004)

  • New template section was added: <!--variables--> ... <!--/variables-->. Unlike all other sections, it is executed *before* starting search process. Thus you can set some variables additionally to those passed in QUERY_STRING, e.g:

    
        <!--variables-->
            <!SET NAME="ul" CONTENT="/subdir/">
            <!--/variables-->
            
  • User defined sections were added. It is now possible to assign a section to a document part by specifying its context using a regular expression, for example:

    Section mytitle 10 100 "<h1>(.*)</h1>" $1
            

    Text between "<h1>" and "</h1>" will be handled as a separate section.

  • A possibility to use a non-standard socket with PgSQL was added.

  • Indexing speed improvements with UseCRC32URLId were done.

  • multi mode improvements were done.

  • A bug in MySQL create scripts for multi mode was fixed.

  • AliasProg bug was fixed.

  • UdmMarkForReindex bug was fixed.

  • Buffer overflow in UdmMatch was fixed.

  • Bugs #599, #622, #624, #627, #631, #647 were fixed.

Changes in 3.2.21 (01 September 2004)

  • Blob mode now works with Oracle, MSSQL, DB2, Mimer.

  • It's now possible to export single mode into blob mode.

  • Sorting search results in URL order was added search.cgi?s=[U|u]

  • Optimization in database drivers code was done.

  • Memory leak in Oracle driver introduced in 3.2.20 was fixed.

  • Compilation failure on some platforms was fixed.

Changes in 3.2.20 (23 August 2004)

  • One now can run indexer.conf by making it executable and adding "#!/usr/local/mnogosearch/sbin/indexer -d" line in the beginning. This change simplifies maintaining several search databases on the same machine.

  • "Include inc.conf" command now tries to open a file relative to the current indexer.conf before opening a file relative to /usr/local/mnogosearch/etc/. This change provides yet more flexibility to maintain several databases. Thanks to Carsten Bleek and Axel Schwenke.

  • Italian synonym file was added. Thanks to Marcello.

  • Several ul arguments for multiple selects are accepted by search.cgi again.

  • Oracle driver improvements were done.

  • URL parser improvements were done.

  • Documentation improvements were done.

  • Test improvements were done.

  • Bugs #539, #606, #613, #614, #616, #620, #631, #632, #638, #639, #640, #643, #644, #649, #650, #652 were fixed.

Changes in 3.2.19 (07 July 2004)

  • HTDB path part substitution now unescapes path parts when building a HTDBList/HTDBDoc query. E.g. if one uses "SELECT id FROM tab WHERE category='$1'" as a HTDBList query together with "Server htdb:/red%20cars/", the unescaped value "red cars" will be substituted instead of $1. This change makes it possible to use spaces and punctuation characters in htdb:/ URLs.

  • 1K limit in the maximum possible HTDBDoc and HTDBList query length was fixed.

  • Loading URL data was optimized in blob-mode.

  • Bugs #518, #555, #578, #590, #610, #612 were fixed.

  • A bug in excerpt code was fixed.

Changes in 3.2.18 (07 June 2004)

  • Cache' ( http://www.intersystems.com/) database support was added.

  • Excerpts code improvements.

  • Several fixes in words highlighting code.

  • Fixed that stored_href search.htm variable is not set unless CachedCopy really presents. Bug since 3.2.17.

  • Several searchd related fixes.

  • 4 Kilobyte CachedCopy limit with Oracle was fixed.

  • New fast blob-mode is available for testing. Note: you should index your data with "multi" mode and then run "indexer -Eblob" to convert "multi" tables into "blob". Use "dbmode=blob" in search.htm DBAddr command. This mode work only with MySQL for now, but will be extended to work with other databases in the future.

Changes in 3.2.17 (05 May 2004)

  • OpenLink Virtuoso Universal Server support was updated and checked in both single and multi mode.

  • Bugs #497, #509, #510, #561, #566 were fixed.

Changes in 3.2.16 (12 April 2004)

  • multi mode was improved. It is now much faster.

  • crc, crc-multi, cache modes were removed.

  • FollowSymlinks indexer.conf command was added for ftp:// and file:// URL schemes.

  • IndexIf indexer.conf command was added. You can now build a sophisticated set of rules describing which documents should or should not be indexed. For example, you can index documents only in German language, or only documents not containing some words in their body or title.

  • Cached copies are now stored in SQL database, There is no "stored" program anymore. It is much easier to configure cached copies now.

  • storedoc.cgi functionality was moved into search.cgi, there is no separate "storedoc.cgi" program anymore.

  • Traditional Chinese frequency dictionary was added.

  • LoadChinesList and LoadThaiList command's syntax was modified.

  • libparanoia-alike checking was added. Use --with-paranoia switch for configure to enable.

  • <!IFLIKE, <!ELIKE, <!ELSELIKE conditional operators were added into search.htm template language.

  • It's now possible to specify srvinfo table name in ServerTable command parameter.

  • MimerSQL support ( http://www.mimer.com/) via UnixODBC was added.

  • mnoGoSearch was tested with Oracle, IBM DB2, MSSQL, MySQL, PostgreSQL, Interbase, Mimer, SQLite.

  • Some characters were considered as word separators in mistake. This affected Indian (Tamil, Devangari) and some other languages.

  • robots.txt processing was fixed.

  • Several other bugs were fixed: #442, #445, #448, #449, #453, #454, #458, #461, #479, #480, #481.

Changes in 3.2.15 (26 September 2003)

  • A native XML parser was written, one don't need to libxpat anymore.

  • HTDBAddr command was added to index SQL tables from different databases.

  • Range support for ftp was added, MP3 tags indexing is now possible for ftp.

  • Phrase segmentation for search queries in Thai, Chinese and Japanese languages was added.

  • HTDBLimit command was added to avoid huge memory usage for big tables.

  • Thai language phrases segmenter was added. Use LoadThaiList command to enable.

  • One now can increase and decrease indexer log level using SIGUSR1 and SIGUSR2 signals.

  • ResultsLimit command was added to allow reduce maximum number of results.

  • search.cgi now prints more HTML 4.01 compliant HREF values, i.e. "&amp;" rather than "&".

  • GuesserUseMeta command was added.

  • SQLite support was added.

  • Built-in support was removed, use SQLite instead.

  • hops calculation for multilingual documents was fixed.

  • Several bugs (#400, #402, #407, #409, #412, #435) were fixed.

Changes in 3.2.14 (29 July 2003)

  • Search can now order results by relevancy, popularity, date. An option to choose results ordering was added into search.htm-dist.

  • Ability for automatic language maps update was added. Use "LangMapUpdate yes" command to enable.

  • MaxHops is now checked before adding new URLs into database.

  • "splitter" crash after indexing only a few documents was fixed.

  • Empty search results with multiple DBAddr were fixed.

  • NoMatch option for Realm and Server commands was fixed.

  • Memory leaks were fixed.

  • Memory corruption during relevancy calculation was fixed.

  • Normalization of words which appear in dictionaries for several languages was fixed.

  • unclosed file while cached checking-up fixed.

Changes in 3.2.13 (10 July 2003)

  • Check-up functionality for "stored" database was added.

  • "stored" connection locking for multi-threaded version was added.

  • A trap in search.cgi being executed without "stored" was fixed.

  • "indexer -Ecreate" and "indexer -Edrop" now work for Oracle and MS SQL databases.

  • "indexer -q" was restored. A bug from 3.2.12.

  • A trap in multi-threaded indexer being executed in cache dbmode without "cached" running was fixed.

  • "indexer -Esqlmon" now starts indexer in SQL monitor mode. One can execute SQL queries against back-ends given in DBAddr indexer.conf commands.

  • Optional readline support for "indexer -Esqlmon" was added.

  • configure failure with expat path explicitly specified was fixed.

  • "Follow world" indexer.conf command was fixed.

  • ServerTable syntax was fixed in etc/indexer.conf-dist sample

Changes in 3.2.12 (25 June 2003)

  • HTTPS for systems without /dev/random or /dev/urandom was fixed.

  • You can create and drop database structure using "indexer -Ecreate" and "indexer -Edrop" correspondingly.

  • Phrases detection was fixed.

  • Installation problem that appeared in some cases was fixed.

Changes in 3.2.11 (20 June 2003)

  • Buffer overflow exploit was fixed in search.cgi

  • There is no Limit on URL length (256 bytes) anymore. Please update db structure when upgrading from the previous version.

  • Check-up functionality for cached database was added.

  • MeCab Japanese morphological analyzer support was added. Use --enable-mecab option for configure to enable it.

  • Log2stderr command was added.

  • UdmStrCRC32 replace by UdmStrHash32 everywhere except crc32 itself. It's faster and produces less collisions. Full re-indexing is needed if upgrade is performed.

  • PopRankUseTracking, PopRankUseShowCnt, PopRankShowCntRatio and PopRankShowCntWeight commands were added.

  • Multi DBAddr support added. LogdAddr, StoredAddr, TrackQuery commands was removed. See new parameters of DBAddr command.

  • Charset guessing for the case when no language maps are loaded was fixed.

  • search.cgi was not able to open cache-mode files in some cases. creation mode for var/tree/url* files was fixed.

  • qtrack table was separated into qtrack and qinfo tables.

Changes in 3.2.10 (11 April 2003)

  • <base href=...> processing was fixed.

  • Bug #339: all words truncating at 4 characters was fixed.

  • <!ELSEIF and <!ELIF processing in templates was fixed.

Changes in 3.2.9 (07 April 2003)

  • Synonyms list for french language was added.

  • Big synonyms list for Russian language was added.

  • VaryLang command was added for indexing multilingual servers

  • -s switch for cached added to specify sleep time at start-up.

  • <META NAME="robots" CONTENT="NOARCHIVE"> processing was added.

  • server table was slitted on server and srvinfo.

Changes in 3.2.8 (30 January 2003)

  • Unigrams were removed from language and charset guessing. This makes guessing faster and in some cases better.

  • Lithuanian stopword file was added. Thanks Arnoldas Luka�vi�us.

  • mconv utility was added.

  • Georgian geostd8 charset support was added.

  • "DateFormat" template variable was added.

  • indexer now can use UDM_CONF_DIR environment variable.

  • MP3 parser doesn't convert into HTML anymore. New sections MP3.Album, MP3.Song, MP3.Artist and MP3.Year were added.

  • "Server" and "Realm" commands can now take a new optional argument to specify an action which will be applied for documents matching this command. For example, "Server HrefOnly http://localhost/" forces indexer to download given documents, to get new links from them without but indexing of documents content.

  • "Follow" command was removed. Use "Server" or "Realm" instead.

  • text/xml indexing was added, needs Expat library to be installed. Use --with-expat configure switch to activate.

  • search.htm now supports environment variables, e.g. $(ENV.HTTP_REMOTE_ADDR)

  • New <!IF> ... <!ELSEIF> ... <!ELSE> ... <!ENDIF> syntax is now supported in search.htm.

  • New <!SET NAME="dst_var_name" CONTENT="value"> search.htm command.

  • New <!COPY NAME="dst_var_name" CONTENT="src_var_name"> search.htm command.

  • Japanese euc-jp language map was added

  • Chinese sentence segmenter added. You should enable GB2312 charset support and add LoadChineseList command to enable it.

  • ChaSen Japanese morphological analysis system support was added. Use --enable-chasen option for configure to enable it.

  • "Limit" command syntax was simplified.

  • zlib support is now enabled in "configure" by default.

  • "PopRankSkipSameSite" command was added. It allows not to count links which from the same site for Popularity Rank calculation.

Changes in 3.2.7 (11 October 2002)

  • Popularity rank was added.

  • New search CGI-parameters "sp" and "sy" to enable/disable words forms and synonyms in search results respectively.

  • Chinese stoplist and language map were added.

  • Search limit by url.content_type was added.

  • Document score is now displayed in percents.

  • Now one can index specified tag attributes.

  • Search results now can be grouped by site.

  • Default MaxDocSize value is now 2 Mb.

  • Pages can be indexed in their hops order using "indexer -o". Distinct criteria on site_id for PgSQL was added.

  • New "ParserTimeOut" indexer.conf command to avoid indexer hanging with external parser.

Changes in 3.2.6 (19 June 2002)

  • If a document is fetched using a compressed (gzip/compress/deflate) transfer encoding, the original (uncompressed) size is stored now into url.docsize.

  • search.cgi now doesn't fetch whole document from "stored" to display the search words excerptions. Fetching stops when enough excerptions have been already built.

  • Fixed that CVS version now can be built when there is no jade/openjade installed. In 3.2.5, "make install" failed on attempt to install the documentation.

  • Some bugs were fixed.

Changes in 3.2.5 (27 May 2002)

  • Separate DBMode command was removed.

  • DBAddr command was extended to support this syntax: DBAddr mysql://user:pass@host/dbname/?dbmode=multi

  • "ServerTable" indexer.conf command syntax was changed. Use this style syntax: "ServerTable mysql://user:pass@host/dbname/tablename" to load servers information from "tablename" SQL table. Note that now you can load servers information using a database different from the specified one in "DBAddr" command.

  • "DBAddr" argument format for Interbase was changed. Use "DBAddr ibase://hostname/path/to/mnogosearch.gdb/" with trailing slash after the *.gdb file name.

  • A trap on too long and escaped URLs was fixed.

  • Some memory leaks were fixed.

  • Fixed that incorrect SQL queries in "single" and "crc" DBModes were sent to server when the first word on a page is a stopword. Thanks to luc at lvb.net

Changes in 3.2.4 (15 May 2002)

  • Renamed template variables responsible for displaying document sections. Take a look into search.htm-dist.

  • Added a possibility to specify length for documents sections, stored into database (body, title, etc).

  • Added OptimizeInterval and OptimizeRatio stored.conf commands.

  • <!--stored--> section is now processed like <!--clone--> section.

  • "stored" now supports "delete" and "optimize" operations.

  • search.cgi used with "stored" can now return excerpts from document around a place where search word is found.

  • Added new "StoredFiles" stored.conf command to limit a number of archives used by "stored" daemon.

  • news:// and nntp:// retrieval system now supports authorization in both AuthBasic indexer.conf command and in URL part, for example: news://user:pass@servername/

  • Indexer's code is now more thread safe.

  • Added cache mode limits for searchd.

  • All cgi programs now use syslog (like indexer does).

  • Added documents mixing while indexing to avoid "rapid fire".

  • "Charset" indexer.conf command has been renamed to "RemoteCharset".

  • Added ISO-2022-JP charset support.

  • Added TSCII and MacGujarati charsets support.

  • Asian Big5, gb2312, EUC-KR and Shift-JIS charsets are not compiled by default anymore. This allows to reduce binaries size. Use --with-extra-charsets to activate compilation of these charsets.

  • Added NL, TL, BG, SV, DA, FR, ES, DE, HR languages maps built on Bible.

  • Added Esperanto language maps. Thanks to Arto Sarle <arto [at] sarle [dot] com

  • Now one can load several files for the same lang + charset combination. It improves guesser results quality.

  • Some other improvements in language and charset guesser.

  • Removed command DeleteNoServer. Use "Follow world" instead.

  • Removed "SearchdAddr hostname:port" template command. Use "DBAddr searchd://hostame:port/" instead.

  • Fixed that query words were not converted to LocalCharset before storing in "qtrack" table.

  • Fixed that the same word form could appear twice in $(W) variable.

  • $(DT) now displays URL if title is empty. Useful for text/plain documents.

  • Fixed that indexer could loop robots.txt fetching in some cases.

  • Fixed some compilation problems on MAC OS X and IRIX.

  • Make shared libraries by default

Changes in 3.2.3 (24 November 2001)

  • Added that now it's possible to specify an alternative non-default path to MySQL socket when connection to localhost. Use for example:

    
        DBAddr mysql://user:pass@hostname/database/?socket=/tmp/mysql.sock
            
  • Added 'src','width','height','size' and 'class' attributes processing in templates

  • Added wordinfo and searchwords highlighting when searchd used. ATN: you need clear search cache, because format was changed.

  • Added search results cache support for searchd.

  • Added queries tracking at searchd.

  • Added indexer switch -b to block start of several indexer instances. Useful for example when indexers are started from crontab.

  • Added new template section <!--stored-->. Now search.cgi checks that document presents in stored and fill this section only on success.

  • Added new formatting in template variables. Now it's possible to limit variable displayed length. Use $(DU:40) to limit URLs to 40 characters. This helps for example to avoid breaking tables structure in search results when URL is long enough.

  • Added new fields in query tracking system. Now it stores user's IP address. Don't forget to ALTER qtrack table according to new structure.

  • Added new MacCE, MacCroatian, MacGreek, MacRoman, MacTurkish, MacIceland, MacRomania, MacThai, MacArabic, MacHebrew character sets.

  • Added Catalan stopwordslist, thanks Jordi Gay Sensat <jgay [at] ajgirona [dot] org>

  • Added Hungarian stopwordslist, thanks MURANYI Andras <muranyia@iqconsulting.hu>

  • Added Azerbaijan language maps for guesser.

  • Added Swedish stoplist. Thanks Johan Olde <johan.olde@phosworks.se>

  • Fixed that indexer could not connect to stored on remote machine.

  • Fixed that $(W) variable was not recoded to BrowserCharset when no search results were found.

  • Fixed that MinWordLen and MaxWordLen didn't work in search.htm.

  • Fixed that Alias command were not working in search.htm. Thanks Matthew Sullivan <matthew@netscape.com>

  • Fixed that robots.txt content were indexed as a usual text file in some cases.

  • Fixed that <META NAME="Robots" CONTENT="NOINDEX,NOFOLLOW"> were not working in some cases.

  • Fixed that variables where not substituted by their values in <!INCLUDE CONTENT="http://servername/include.cgi?q=$%(q)">

  • Fixed that links like http://xx/yy?a=b&#38;c=d where not properly converted to http://xx/yy?a=b&c=d.

  • Fixed that mnogosearch could not connect to remote Interbase server, as well as other minor Interbase bugs. Thanks Henner.Kollmann [at] gmx [dot] de.

  • Fixed that search.cgi crashed when categories list is requested but table doesn't exist.

  • Fixed minor bug in robots.txt processing. Thanks Tim Pierce <twp@unchi.org>.

  • Fixed compilation problems with ODBC libraries.Bug since 3.2.0.

  • Fixed EasySoft ODBC libraries linking problem which appeared because EasySoft changed names of their libraries. Now configure substitutes new libraries names to Makefile.

Changes in 3.2.2 (24 October 2001)

  • Added meta "Content-Language" processing, added "lang" attribute processing for <html> and <body> tags.

  • Added IBM DB2 support. Tested with DB2 EE V7.1.

  • Stored and storedoc.cgi added. Now it possible to store and display compressed copy of indexed documents with search words highlighting.

  • Tag values are now passed using "tag" form variable so that the variable meaning is more clean. Old "g" form variable does not work anymore.

  • Major documentation improvements and reorganization.

  • Fixed that category and language limits were not working.

  • Fixed that StopwordFile command didn't work in search.htm

  • Fixed that full/substring/beginning/ending word match didn't work.

  • Fixed crash in ServerTable code.

  • Fixed crash in synonyms code on some platforms.

  • qtrack table fields types changed.

  • Fixed bug in MySQL single mode code. It could kill mysqld server when documents is big enough.

  • Fixed that ISO-8859-1 entities like &eacute; were not properly converted to Unicode.

  • Fixed that HTML parser considered scripts body as a text in some cases.

  • install.pl installation script has beed added.

  • Some minor configure script and code clean-ups.

Changes in 3.2.1 (27 September 2001)

  • New "Listen" searchd.conf command. It allows to bind searchd to specified host and/or port.

  • searchd now can reload searchd.conf when signal HUP is arriving.

  • Added some signal safeness in searchd.

  • Fixed that searchd.conf-dist were not included into distribution.

  • Fixed that national letters in the code range. 128-255 were considered as word separators when searchd is used. They also were not displayed in search results (body, title, etc fields).

  • Fixed some bugs in HTML tag parsers that caused indexer to stall or crash in some cases.

  • Fixed that "Proxy" command was ignored.

  • Fixed that robots.txt related code could stall or crash in threaded version.

  • Fixed compilation with Oracle problem.

  • Fixed compilation problem with errno.h on Solaris.

Changes in 3.2.0 (24 September 2001)

  • Now one can compile with several SQL databases support at the same time.

  • Now one can make a binary distribution using "make bin-dist".

  • Added new program searchd. Among other features, it allows to build a search cluster, distributing between several machines.

  • Support for synonyms fuzzy search has been added.

  • Common words endings fuzzy search using ispell now works in 3.2 branch.

  • New "ReverseAlias" indexer.conf command. This command has the same format with "Alias" command. However, URL mapping is executed just after the moment when new link has been found. URL is stored into database after ReverseAliases applying. Among other things it allows for example to index PHP driven sites which add an unique session ID in the form "PHPSESSION=344646342345df". ReverseAlias is able to remove such substrings from URLs.

  • New "Subnet xxx.xxx.xxx.xxx" indexer.conf command. It works like Realm but checks an IP address matching instead of URL. For example, "Subnet 195.239.38.*" or "Subnet NoMatch 192.*.*.*".

  • Search results highlighting (HlBeg and HlEnd search.htm commands) now works in 3.2 branch.

  • CT-Lib support has been added. Now one can use mnoGoSearch together with SyBase and MS SQL natively, without ODCB drivers. Both original SyBase CT-Lib and FreeTDS CT-Lib are supported. However Ct-Lib driver is still in beta.

  • indexer now works approximately twice faster with Interbase.

  • Added deflate and compress Content-Encoding support.

  • New VarDir command in search.htm. It works like the same indexer.conf command but at search time.

  • New "Section" indexer.conf command. It is to be used instead of old ***Weight commands, which have been removed. Take a look into indexer.conf-dist and search.txt for an explanation.

  • Now it is possible to index user-defined META tags as well as HTTP response headers.

  • New "Alias" command in search.htm. It works like "Alias" in indexer.conf but at a search time.

  • Added support for external includes in search template. Format differs from 3.1.x version. Take a look into "templates.txt" for usage information.

  • "Alias" command has been extended. Now it can optionaly use powerful URL mapping using regular expressions like in "Realm" command.

  • Posix threads now should work not only Linux and FreeBSD. Detection for threads for a number of platforms has been added.

  • libudmsearch compilation with pthreads fix. It fixes Apaches with PHP mnoGoSearch extension module crashes when mnoGoSearch was compiled with pthreads support.

  • Tag parser has been rewritten. It now properly process tag attributes with '>' signs, like for example <META NAME=email Contents="<general@mnogosearch.org>">. Earlier '>' signs inside quotes was considered as a tag endings.

  • Apple Darwin fixes for configure scripts

  • Extended number of query parameters stored in qtrack table

  • Added url.charset field. Charset is now stored separately from content_type field. Please recreate or ALTER "url" table structure.

  • "Clones yes/no" has been renamed to "DetectClones yes/no" to avoid confusions.

Changes in 3.2.0.b2 (08 August 2001)

  • Added Thai TIS-620 (aka ISO-8859-11) charset support.

  • Content encoding support added (currently gzip only). Requires libz to compile. Use --with-zlib to activate.

  • Fixed that $(DE) was not working

  • Fixed that the correct charset was forgotten after robots.txt processing.

  • Fixed several bugs in new "cache mode".

Changes in 3.2.0.b1 (03 July 2001)

  • Charsets processing has been rewritten. Now mnoGoSearch supports almost all widely used charsets: various single-byte charsets as well as multi-byte charsets including UTF-8, Chinese (BIG5, GB2312), Korean (EUC-KR), Japanese (S-JIS). All internal processing works using UNICODE representation. Using UTF8 as a LocalCharset one can build a multi-lingual search engine with languages which could not be indexed at the same time in 3.1.x branch, for example German+Greek+Russian+Chinese.

  • Character sets module has a new automatic language and charset detection. Currently more than 70 various charsets and languages can be detected automatically when they are not specified in "Content-type" and "Content-Language" server's response headers or html META tags.

  • News extensions now compiled without --enable-news-extensions. Use "NewsExtensions yes" indexer.conf command to activate them.

  • search.cgi has been rewritten.

  • Cache-mode has been rewritten.

  • Fixed template variables format. Now $(x) is plain variable value, $&(x) is a HTML-escaped value $%(x) is a value escaped to be used in URLs