# CrossEncoderRanker > **A DSPy-powered cross-encoder ranker for intelligent document retrieval and reranking** CrossEncoderRanker combines hybrid search from Weaviate with LLM-based relevance assessment to deliver highly accurate document ranking. Built with [DSPy](https://github.com/stanfordnlp/dspy) and published on [Modaic Hub](https://modaic.ai). --- ## Features - **Hybrid Search**: Leverages Weaviate's hybrid search (combining vector and keyword search) - **Intelligent Reranking**: Uses LLM-based cross-encoder with Chain-of-Thought reasoning - **Async Processing**: Concurrent document scoring for improved performance - **Easy Integration**: Load pre-compiled programs directly from Modaic Hub - **Flexible Configuration**: Customize collection, properties, and ranking parameters --- ## Quick Start ### Installation ```bash pip install modaic weaviate-client python-dotenv ``` ### Environment Setup Create a `.env` file with your credentials: ```env MODAIC_TOKEN=your_modaic_token WEAVIATE_URL=your_weaviate_cluster_url WEAVIATE_API_KEY=your_weaviate_api_key OPENAI_API_KEY=your_openai_api_key ``` ### Usage ```python from dotenv import load_dotenv from modaic import AutoProgram load_dotenv() # Load the pre-compiled program from Modaic Hub ce_ranker = AutoProgram.from_precompiled( "connor/CrossEncoderRanker", config_options={ "collection_name": "IRPapersText_Default", # Your Weaviate collection "return_properties": ["content"], # Properties to retrieve "k": 5 # Number of results } ) # Query and get ranked results response = ce_ranker("What does HyDE stand for?") # Display results for idx, ranked_doc in enumerate(response, 1): print(f"Rank {idx}: {ranked_doc[:300]}...\n") ``` ### Example Output ``` Rank 1: Figure 1: An illustration of the HyDE model. Document snippets are shown. HyDE serves all types of queries without changing the underlying InstructGPT and Contriever/mContriever models... Rank 2: Table 5: nDCG@10 comparing InstructGPT vs. 3-shot GPT-3 on BEIR... Rank 3: In this paper, we introduce HyDE, a new approach for building effective dense retrievers in a completely unsupervised manner... ``` --- ## How It Works 1. **Hybrid Search**: Performs initial retrieval from Weaviate using hybrid search 2. **Relevance Assessment**: Each document is scored using a DSPy ChainOfThought signature 3. **Concurrent Processing**: Documents are scored asynchronously for efficiency 4. **Reranking**: Results are sorted by relevance score and returned --- ## Configuration Options | Parameter | Type | Description | Default | |-----------|------|-------------|---------| | `collection_name` | `str` | Weaviate collection name | Required | | `return_properties` | `list[str]` | Properties to retrieve from Weaviate | Required | | `k` | `int` | Number of results to return | Required | | `lm` | `str` | Language model to use | `"openai/gpt-4.1-mini"` | --- ## Publishing Your Own Program Want to publish your own DSPy program to Modaic Hub? Check out the source code: - **`ce_ranker.py`**: Main implementation with `CERankerAgent` class - **`pyproject.toml`**: Project dependencies and metadata ### Publishing Steps ```python from ce_ranker import CERankerAgent, CERankerConfig config = CERankerConfig( collection_name="YourCollection", return_properties=["content"], k=5 ) agent = CERankerAgent(config) agent.push_to_hub( "username/ProgramName", with_code=True, commit_message="Initial release" ) ``` --- ## Architecture ``` ┌─────────────────────────────────────────────────┐ │ User Query │ └────────────────────┬────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────┐ │ Weaviate Hybrid Search │ │ (Vector Search + Keyword Search) │ └────────────────────┬────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────┐ │ Async Relevance Assessment │ │ (DSPy ChainOfThought + LLM Scoring) │ └────────────────────┬────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────┐ │ Reranked Results (Top K) │ └─────────────────────────────────────────────────┘ ``` --- ## Resources - **Modaic SDK Documentation**: [Learn more about Modaic](https://docs.modaic.dev) - **Modaic Hub**: [Browse programs](https://modaic.dev) - **DSPy Framework**: [GitHub Repository](https://github.com/stanfordnlp/dspy) - **Weaviate**: [Vector Database Documentation](https://weaviate.io/developers/weaviate) --- ## Requirements - Python >= 3.11 - `modaic >= 0.8.0` - `weaviate-client >= 4.18.1` --- ## License MIT License - See LICENSE file for details --- ## Contributing Contributions are welcome! Please feel free to submit a Pull Request.