support.bitvector
module¶
An implementation of an object that acts like a collection of on/off bits.
Base classes¶
-
class
whoosh.idsets.
DocIdSet
¶ Base class for a set of positive integers, implementing a subset of the built-in
set
type’s interface with extra docid-related methods.This is a superclass for alternative set implementations to the built-in
set
which are more memory-efficient and specialized toward storing sorted lists of positive integers, though they will inevitably be slower thanset
for most operations since they’re pure Python.-
after
(i)¶ Returns the next integer in the set after
i
, or None.
-
before
(i)¶ Returns the previous integer in the set before
i
, or None.
-
first
()¶ Returns the first (lowest) integer in the set.
-
invert_update
(size)¶ Updates the set in-place to contain numbers in the range
[0 - size)
except numbers that are in this set.
-
last
()¶ Returns the last (highest) integer in the set.
-
-
class
whoosh.idsets.
BaseBitSet
¶
Implementation classes¶
-
class
whoosh.idsets.
BitSet
(source=None, size=0)¶ A DocIdSet backed by an array of bits. This can also be useful as a bit array (e.g. for a Bloom filter). It is much more memory efficient than a large built-in set of integers, but wastes memory for sparse sets.
Parameters: - maxsize – the maximum size of the bit array.
- source – an iterable of positive integers to add to this set.
- bits – an array of unsigned bytes (“B”) to use as the underlying bit array. This is used by some of the object’s methods.
-
class
whoosh.idsets.
OnDiskBitSet
(dbfile, basepos, bytecount)¶ A DocIdSet backed by an array of bits on disk.
>>> st = RamStorage() >>> f = st.create_file("test.bin") >>> bs = BitSet([1, 10, 15, 7, 2]) >>> bytecount = bs.to_disk(f) >>> f.close() >>> # ... >>> f = st.open_file("test.bin") >>> odbs = OnDiskBitSet(f, bytecount) >>> list(odbs) [1, 2, 7, 10, 15]
Parameters: - dbfile – a
StructFile
object to read from. - basepos – the base position of the bytes in the given file.
- bytecount – the number of bytes to use for the bit array.
- dbfile – a
-
class
whoosh.idsets.
SortedIntSet
(source=None, typecode='I')¶ A DocIdSet backed by a sorted array of integers.
-
class
whoosh.idsets.
MultiIdSet
(idsets, offsets)¶ Wraps multiple SERIAL sub-DocIdSet objects and presents them as an aggregated, read-only set.
Parameters: - idsets – a list of DocIdSet objects.
- offsets – a list of offsets corresponding to the DocIdSet objects
in
idsets
.