CrossEncoderRanker combines hybrid search from Weaviate with LLM-based relevance assessment to deliver highly accurate document ranking. Built with DSPy and published on Modaic Hub.

Features

Hybrid Search: Leverages Weaviate's hybrid search (combining vector and keyword search)
Intelligent Reranking: Uses LLM-based cross-encoder with Chain-of-Thought reasoning
Async Processing: Concurrent document scoring for improved performance
Easy Integration: Load pre-compiled programs directly from Modaic Hub
Flexible Configuration: Customize collection, properties, and ranking parameters

Quick Start

Installation

uv add modaic weaviate-client python-dotenv

Environment Setup

Create a .env file with your credentials:

MODAIC_TOKEN=your_modaic_token
WEAVIATE_URL=your_weaviate_cluster_url
WEAVIATE_API_KEY=your_weaviate_api_key
OPENAI_API_KEY=your_openai_api_key

Usage

from dotenv import load_dotenv
from modaic import AutoProgram

load_dotenv()

# Load the pre-compiled program from Modaic Hub
ce_ranker = AutoProgram.from_precompiled(
    "connor/CrossEncoderRanker",
    config_options={
        "collection_name": "IRPapersText_Default",  # Your Weaviate collection
        "return_properties": ["content"],            # Properties to retrieve
        "k": 5                                       # Number of results
    }
)

# Query and get ranked results
response = ce_ranker("What does HyDE stand for?")

# Display results
for idx, ranked_doc in enumerate(response, 1):
    print(f"Rank {idx}: {ranked_doc[:300]}...\n")

Example Output

Rank 1: Figure 1: An illustration of the HyDE model. Document snippets are shown.
HyDE serves all types of queries without changing the underlying InstructGPT and
Contriever/mContriever models...

Rank 2: Table 5: nDCG@10 comparing InstructGPT vs. 3-shot GPT-3 on BEIR...

Rank 3: In this paper, we introduce HyDE, a new approach for building effective
dense retrievers in a completely unsupervised manner...

How It Works

Hybrid Search: Performs initial retrieval from Weaviate using hybrid search
Relevance Assessment: Each document is scored using a DSPy ChainOfThought signature
Concurrent Processing: Documents are scored asynchronously for efficiency
Reranking: Results are sorted by relevance score and returned

Configuration Options

Parameter	Type	Description	Default
`collection_name`	`str`	Weaviate collection name	Required
`return_properties`	`list[str]`	Properties to retrieve from Weaviate	Required
`k`	`int`	Number of results to return	Required
`lm`	`str`	Language model to use	`"openai/gpt-4.1-mini"`

Publishing Your Own Program

Want to publish your own DSPy program to Modaic Hub? Check out the source code:

ce_ranker.py: Main implementation with CERankerAgent class
pyproject.toml: Project dependencies and metadata

Publishing Steps

from ce_ranker import CERankerAgent, CERankerConfig

config = CERankerConfig(
    collection_name="YourCollection",
    return_properties=["content"],
    k=5
)

agent = CERankerAgent(config)
agent.push_to_hub(
    "username/ProgramName",
    with_code=True,
    commit_message="Initial release"
)

Architecture

┌─────────────────────────────────────────────────┐
│                   User Query                    │
└────────────────────┬────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────┐
│          Weaviate Hybrid Search                 │
│   (Vector Search + Keyword Search)              │
└────────────────────┬────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────┐
│        Async Relevance Assessment               │
│   (DSPy ChainOfThought + LLM Scoring)           │
└────────────────────┬────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────┐
│         Reranked Results (Top K)                │
└─────────────────────────────────────────────────┘

Resources

Modaic SDK Documentation: Learn more about Modaic
Modaic Hub: Browse programs
DSPy Framework: GitHub Repository
Weaviate: Vector Database Documentation

Requirements

Python >= 3.11
modaic >= 0.8.0
weaviate-client >= 4.18.1

License

MIT License - See LICENSE file for details

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.