support.charset module

This module contains tools for working with Sphinx charset table files. These files are useful for doing case and accent folding. See whoosh.analysis.CharsetTokenizer and whoosh.analysis.CharsetFilter.

whoosh.support.charset.default_charset

An extensive case- and accent folding charset table. Taken from http://speeple.com/unicode-maps.txt

whoosh.support.charset.charset_table_to_dict(tablestring)

Takes a string with the contents of a Sphinx charset table file and returns a mapping object (a defaultdict, actually) of the kind expected by the unicode.translate() method: that is, it maps a character number to a unicode character or None if the character is not a valid word character.

The Sphinx charset table format is described at http://www.sphinxsearch.com/docs/current.html#conf-charset-table.