Creating a Fast, In-Memory Search Engine in Python With Whoosh

If you need powerful full-text search capabilities without relying on external tools like Elasticsearch, Whoosh is a lightweight Python library that lets you create efficient search engines right in your application. In this guide, we'll create an in-memory search engine that can index and query documents quickly and easily.

1. Installation

pip install whoosh

2. Basic Setup

We start by defining a schema for the documents we want to index:

from whoosh.fields import Schema, TEXT, ID
from whoosh.filedb.filestore import RamStorage

schema = Schema(title=TEXT(stored=True), content=TEXT(stored=True), path=ID(stored=True))
storage = RamStorage()
index = storage.create_index(schema)

3. Adding Documents to the Index

writer = index.writer()

documents = [
    {"title": "Intro to Python", "content": "Python is a versatile language.", "path": "/docs/python"},
    {"title": "Advanced Flask", "content": "Learn advanced web patterns in Flask.", "path": "/docs/flask"},
    {"title": "FastAPI Performance", "content": "Build fast APIs with Python.", "path": "/docs/fastapi"},
]

for doc in documents:
    writer.add_document(title=doc["title"], content=doc["content"], path=doc["path"])

writer.commit()

4. Searching the Index

from whoosh.qparser import QueryParser

with index.searcher() as searcher:
    query = QueryParser("content", index.schema).parse("python")
    results = searcher.search(query)
    for r in results:
        print(f"Title: {r['title']}, Path: {r['path']}")

5. Advanced Querying

Whoosh supports AND, OR, wildcards, phrase searching, fuzzy matches, and more. Here’s a phrase match example:

query = QueryParser("content", index.schema).parse('"fast apis"~2')

6. Use Cases

  • Internal documentation search
  • Log search utilities
  • Static blog/site indexing

Conclusion

Whoosh is an excellent option when you need full-text search with zero infrastructure. It's especially effective for in-memory applications or embedded tools where portability and speed matter.

If this post helped you, consider supporting me: buymeacoffee.com/hexshift