2025-12-27 04:33:07 -08:00
2025-12-27 02:59:43 -08:00
2025-12-27 02:17:22 -08:00
2025-12-27 02:59:43 -08:00
2025-12-27 04:33:07 -08:00
2025-12-27 04:33:07 -08:00
2025-12-27 03:28:59 -08:00
2025-12-27 03:30:28 -08:00

text-to-cypher

LLM-driven automated knowledge graph construction from text using DSPy and Neo4j.

Project Structure

text-to-cypher/
├── README.md
├── examples/
│   └── wikipedia-abstracts-v0_0_1.ndjson
├── img/
├── main.py
├── pyproject.toml
├── uv.lock
└── src/
    ├── __init__.py
    └── neo4j.py

Description

Build knowledge graphs automatically from text using DSPy, Modaic, and Neo4j. This implementation uses OpenAI's GPT-4o to extract entities and relationships from Wikipedia abstracts, generating Cypher statements that create structured knowledge graphs in Neo4j.

Key Features

  • DSPy-Powered: Uses DSPy's Chain of Thought for structured entity and relationship extraction
  • Modaic Integration: Leverages Modaic's PrecompiledProgram for reusable, shareable DSPy programs
  • Schema-Aware: Passes the current Neo4j graph schema to the model, enabling it to reuse existing nodes and relationships
  • Batch Processing: Processes multiple text samples from NDJSON files
  • Hugging Face Hub: Push trained programs to the Hub for sharing and versioning

Optimized Schema Context

The current graph schema is passed to the model as a list of nodes, relationships and properties in the context of the prompt. This allows the model to use elements from the existing schema and make connections between existing entities and relationships.

Quick Start

  1. Clone the repository
  2. Install dependencies using uv
  3. Create a .env file and add the required environment variables
  4. Set up Neo4j (local Docker or cloud-hosted)
  5. Run uv run main.py to process example Wikipedia abstracts
  6. View your Knowledge Graph in Neo4j Browser

Installation

Prerequisites

  • Python 3.13+
  • OpenAI API Key
  • uv (Python package manager)
  • Neo4j instance (local Docker or cloud-hosted)

Installation with uv

Install dependencies using uv:

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install project dependencies
uv sync

Environment Variables

Create a .env file in the project root with the following variables:

# OpenAI API Key
OPENAI_API_KEY=<your-openai-api-key>

# Neo4j Configuration
NEO4J_URI=bolt://localhost:7687  # or neo4j+s://xxx.databases.neo4j.io for cloud
NEO4J_USER=neo4j                 # optional for local Docker with NEO4J_AUTH=none
NEO4J_PASSWORD=<your-password>   # optional for local Docker with NEO4J_AUTH=none

# Modaic Token (optional, for pushing to Hub)
MODAIC_TOKEN=<your-modaic-token>

Neo4j Setup

Option 1: Local Docker (Development)

Run Neo4j locally using Docker:

docker run \
    --name text-to-cypher \
    --publish=7474:7474 \
    --publish=7687:7687 \
    --env "NEO4J_AUTH=none" \
    neo4j:5.15

Access Neo4j Browser at http://localhost:7474

Option 2: Neo4j Aura (Cloud)

  1. Create a free instance at neo4j.com/cloud/aura
  2. Get your connection URI (e.g., neo4j+s://xxx.databases.neo4j.io)
  3. Add credentials to your .env file

Usage

Process Wikipedia Abstracts

Run the main script to process example Wikipedia abstracts and build a knowledge graph:

uv run main.py

This will:

  1. Load Wikipedia abstracts from examples/wikipedia-abstracts-v0_0_1.ndjson
  2. For each abstract, generate a Cypher statement using GPT-4o
  3. Execute the Cypher statement in Neo4j
  4. Build a connected knowledge graph

View Your Knowledge Graph

Navigate to Neo4j Browser:

  • Local: http://localhost:7474/browser/
  • Cloud: Your Neo4j Aura console URL

Run Cypher queries to explore your graph:

MATCH (n) RETURN n LIMIT 25
MATCH (p:Person)-[r]->(n) RETURN p, r, n LIMIT 50

Development

Push to Hugging Face Hub

To share your trained DSPy program on Hugging Face Hub:

# In main.py, uncomment the push_to_hub section
generate_cypher.push_to_hub(
    "your-username/text-to-cypher",
    with_code=True,
    tag="v0.0.1",
    commit_message="Initial release"
)

Customize the Model

Modify the GenerateCypherConfig in main.py to customize:

class GenerateCypherConfig(PrecompiledConfig):
    model: str = "openai/gpt-4o"  # Change model
    max_tokens: int = 1024          # Adjust token limit

Process Custom Text

Modify main.py to process your own text:

text = "Your custom text here..."
cypher = generate_cypher(text=text, neo4j_schema=neo4j.fmt_schema())
neo4j.query(cypher.statement.replace('```', ''))

Clean Up

Stop Neo4j Docker Container

docker stop text-to-cypher
docker rm text-to-cypher

Remove Virtual Environment

rm -rf .venv

License

This project is licensed under the MIT License - see the LICENSE file for details.

References

Description
No description provided
Readme 63 KiB
Languages
Python 100%