mnoGoSearch has built-in parsers for text, HTML, XML, DOCX, RTF, message (*.eml, *.mht) and MP3 file formats, and understands the following mime types in the Content-Type HTTP header (or in the AddType command when indexing local files):
For text/plain, text/tab-separated-values, text/css - the built-in text parser is invoked.
For text/html - the built-in HTML parser is invoked.
For text/xml, application/xml, text/vnd.wap.wml, as well as all for mime types that have sub-strings "+xml" or "rss" (e.g. application/rss+xml, application/vnd.wap.xhtml+xml etc.) - the built-in XML parser is invoked.
Note: If the XML built-in parser meets an XML document that starts with<urlset xmlns="...">or<sitemapindex xmlns="...">it considers the document to be a sitemap file and only collects links from this file, without putting its words into the search index.
For application/vnd.openxmlformats-officedocument.wordprocessingml.document - the built-in DOCX parser is invoked.
For text/rtf, application/rtf and application/x-rtf - the built-in RTF parser is invoked.
For message/rfc822 - the built-in message parser is invoked.
For autio/mpeg - the built-in MP3 parser is invoked.
For the mime types application/http and message/http the document is considered as a full HTTP response consisting of headers (including status line, e.g. HTTP/1.0 200 OK) followed by content. The headers are separated from the content and parsed, then one of the known parser is recursively executed for the content (without headers) according to the Content-Type header value.