Creating a Fast, In-Memory Search Engine in Python With Whoosh
If you need powerful full-text search capabilities without relying on external tools like Elasticsearch, Whoosh is a lightweight Python library that lets you create efficient search engines right in your application. In this guide, we'll create an in-memory search engine that can index and query documents quickly and easily.
1. Installation
pip install whoosh
2. Basic Setup
We start by defining a schema for the documents we want to index:
from whoosh.fields import Schema, TEXT, ID
from whoosh.filedb.filestore import RamStorage
schema = Schema(title=TEXT(stored=True), content=TEXT(stored=True), path=ID(stored=True))
storage = RamStorage()
index = storage.create_index(schema)
3. Adding Documents to the Index
writer = index.writer()
documents = [
{"title": "Intro to Python", "content": "Python is a versatile language.", "path": "/docs/python"},
{"title": "Advanced Flask", "content": "Learn advanced web patterns in Flask.", "path": "/docs/flask"},
{"title": "FastAPI Performance", "content": "Build fast APIs with Python.", "path": "/docs/fastapi"},
]
for doc in documents:
writer.add_document(title=doc["title"], content=doc["content"], path=doc["path"])
writer.commit()
4. Searching the Index
from whoosh.qparser import QueryParser
with index.searcher() as searcher:
query = QueryParser("content", index.schema).parse("python")
results = searcher.search(query)
for r in results:
print(f"Title: {r['title']}, Path: {r['path']}")
5. Advanced Querying
Whoosh supports AND, OR, wildcards, phrase searching, fuzzy matches, and more. Here’s a phrase match example:
query = QueryParser("content", index.schema).parse('"fast apis"~2')
6. Use Cases
- Internal documentation search
- Log search utilities
- Static blog/site indexing
Conclusion
Whoosh is an excellent option when you need full-text search with zero infrastructure. It's especially effective for in-memory applications or embedded tools where portability and speed matter.
If this post helped you, consider supporting me: buymeacoffee.com/hexshift