Update README.md
This commit is contained in:
151
README.md
151
README.md
@@ -7,51 +7,76 @@ LLM-driven automated knowledge graph construction from text using DSPy and Neo4j
|
||||
```sh
|
||||
dspy-neo4j-knowledge-graph/
|
||||
├── README.md
|
||||
├── examples
|
||||
├── requirements.txt
|
||||
├── run.py
|
||||
└── src
|
||||
├── examples/
|
||||
│ └── wikipedia-abstracts-v0_0_1.ndjson
|
||||
├── img/
|
||||
├── main.py
|
||||
├── pyproject.toml
|
||||
├── uv.lock
|
||||
└── src/
|
||||
├── __init__.py
|
||||
└── neo4j.py
|
||||
```
|
||||
|
||||
## Description
|
||||
Model entities and relationships and build a Knowledge Graph using DSPy, Neo4j, and OpenAI's GPT-4. When given a paragraph or block of text, the app uses the DSPy library and OpenAI's GPT-4 to extract entities and relationships and generate a Cypher statement which is run in Neo4j to create the Knowledge Graph.
|
||||
Build knowledge graphs automatically from text using DSPy, Modaic, and Neo4j. This implementation uses OpenAI's GPT-4o to extract entities and relationships from Wikipedia abstracts, generating Cypher statements that create structured knowledge graphs in Neo4j.
|
||||
|
||||
### Key Features
|
||||
- **DSPy-Powered**: Uses DSPy's Chain of Thought for structured entity and relationship extraction
|
||||
- **Modaic Integration**: Leverages Modaic's PrecompiledProgram for reusable, shareable DSPy programs
|
||||
- **Schema-Aware**: Passes the current Neo4j graph schema to the model, enabling it to reuse existing nodes and relationships
|
||||
- **Batch Processing**: Processes multiple text samples from NDJSON files
|
||||
- **Hugging Face Hub**: Push trained programs to the Hub for sharing and versioning
|
||||
|
||||
### Optimized Schema Context
|
||||
The current graph schema is passed to the model as a list of nodes, relationships and properties in the context of the prompt. This allows the model to use elements from the existing schema and make connections between existing entities and relationships.
|
||||
|
||||
## Quick Start
|
||||
1. Clone the repository.
|
||||
2. Create a [Python virtual environment](#python-virtual-environment) and install the required packages.
|
||||
3. Create a `.env` file and add the required [environment variables](#environment-variables).
|
||||
4. [Run Neo4j using Docker](#usage).
|
||||
5. Run `python3 run.py` and paste your text in the prompt.
|
||||
6. Navigate to `http://localhost:7474/browser/` to view the Knowledge Graph in Neo4j Browser.
|
||||
1. Clone the repository
|
||||
2. Install dependencies using [uv](#installation-with-uv)
|
||||
3. Create a `.env` file and add the required [environment variables](#environment-variables)
|
||||
4. Set up [Neo4j](#neo4j-setup) (local Docker or cloud-hosted)
|
||||
5. Run `uv run main.py` to process example Wikipedia abstracts
|
||||
6. View your Knowledge Graph in Neo4j Browser
|
||||
|
||||
## Installation
|
||||
|
||||
### Prerequisites
|
||||
* Python 3.12
|
||||
* Python 3.13+
|
||||
* OpenAI API Key
|
||||
* Docker
|
||||
* [uv](https://docs.astral.sh/uv/) (Python package manager)
|
||||
* Neo4j instance (local Docker or cloud-hosted)
|
||||
|
||||
### Installation with uv
|
||||
Install dependencies using uv:
|
||||
```sh
|
||||
# Install uv if you haven't already
|
||||
curl -LsSf https://astral.sh/uv/install.sh | sh
|
||||
|
||||
# Install project dependencies
|
||||
uv sync
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
Before you begin, make sure to create a `.env` file and add your OpenAI API key.
|
||||
Create a `.env` file in the project root with the following variables:
|
||||
|
||||
```sh
|
||||
NEO4J_URI=bolt://localhost:7687
|
||||
OPENAI_API_KEY=<your-api-key>
|
||||
# OpenAI API Key
|
||||
OPENAI_API_KEY=<your-openai-api-key>
|
||||
|
||||
# Neo4j Configuration
|
||||
NEO4J_URI=bolt://localhost:7687 # or neo4j+s://xxx.databases.neo4j.io for cloud
|
||||
NEO4J_USER=neo4j # optional for local Docker with NEO4J_AUTH=none
|
||||
NEO4J_PASSWORD=<your-password> # optional for local Docker with NEO4J_AUTH=none
|
||||
|
||||
# Modaic Token (optional, for pushing to Hub)
|
||||
MODAIC_TOKEN=<your-modaic-token>
|
||||
```
|
||||
|
||||
### Python Virtual Environment
|
||||
Create a Python virtual environment and install the required packages.
|
||||
```sh
|
||||
python3 -m venv .venv
|
||||
source .venv/bin/activate
|
||||
pip install --upgrade pip
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
## Neo4j Setup
|
||||
|
||||
## Usage
|
||||
Run Neo4j using Docker.
|
||||
### Option 1: Local Docker (Development)
|
||||
Run Neo4j locally using Docker:
|
||||
```sh
|
||||
docker run \
|
||||
--name dspy-kg \
|
||||
@@ -61,16 +86,79 @@ docker run \
|
||||
neo4j:5.15
|
||||
```
|
||||
|
||||
Access Neo4j Browser at `http://localhost:7474`
|
||||
|
||||
### Option 2: Neo4j Aura (Cloud)
|
||||
1. Create a free instance at [neo4j.com/cloud/aura](https://neo4j.com/cloud/aura)
|
||||
2. Get your connection URI (e.g., `neo4j+s://xxx.databases.neo4j.io`)
|
||||
3. Add credentials to your `.env` file
|
||||
|
||||
## Usage
|
||||
|
||||
### Process Wikipedia Abstracts
|
||||
Run the main script to process example Wikipedia abstracts and build a knowledge graph:
|
||||
```sh
|
||||
uv run main.py
|
||||
```
|
||||
|
||||
This will:
|
||||
1. Load Wikipedia abstracts from `examples/wikipedia-abstracts-v0_0_1.ndjson`
|
||||
2. For each abstract, generate a Cypher statement using GPT-4o
|
||||
3. Execute the Cypher statement in Neo4j
|
||||
4. Build a connected knowledge graph
|
||||
|
||||
### View Your Knowledge Graph
|
||||
Navigate to Neo4j Browser:
|
||||
- Local: `http://localhost:7474/browser/`
|
||||
- Cloud: Your Neo4j Aura console URL
|
||||
|
||||
Run Cypher queries to explore your graph:
|
||||
```cypher
|
||||
MATCH (n) RETURN n LIMIT 25
|
||||
MATCH (p:Person)-[r]->(n) RETURN p, r, n LIMIT 50
|
||||
```
|
||||
|
||||
## Development
|
||||
|
||||
### Push to Hugging Face Hub
|
||||
To share your trained DSPy program on Hugging Face Hub:
|
||||
|
||||
```python
|
||||
# In main.py, uncomment the push_to_hub section
|
||||
generate_cypher.push_to_hub(
|
||||
"your-username/text-to-cypher",
|
||||
with_code=True,
|
||||
tag="v0.0.1",
|
||||
commit_message="Initial release"
|
||||
)
|
||||
```
|
||||
|
||||
### Customize the Model
|
||||
Modify the `GenerateCypherConfig` in `main.py` to customize:
|
||||
```python
|
||||
class GenerateCypherConfig(PrecompiledConfig):
|
||||
model: str = "openai/gpt-4o" # Change model
|
||||
max_tokens: int = 1024 # Adjust token limit
|
||||
```
|
||||
|
||||
### Process Custom Text
|
||||
Modify `main.py` to process your own text:
|
||||
```python
|
||||
text = "Your custom text here..."
|
||||
cypher = generate_cypher(text=text, neo4j_schema=neo4j.fmt_schema())
|
||||
neo4j.query(cypher.statement.replace('```', ''))
|
||||
```
|
||||
|
||||
## Clean Up
|
||||
Stop and remove the Neo4j container.
|
||||
|
||||
### Stop Neo4j Docker Container
|
||||
```sh
|
||||
docker stop dspy-kg
|
||||
docker rm dspy-kg
|
||||
```
|
||||
|
||||
Deactivate the Python virtual environment.
|
||||
### Remove Virtual Environment
|
||||
```sh
|
||||
deactivate
|
||||
rm -rf .venv
|
||||
```
|
||||
|
||||
@@ -79,7 +167,6 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
|
||||
|
||||
## References
|
||||
- [DSPy docs](https://dspy-docs.vercel.app/docs/intro)
|
||||
- [Modaic docs](https://docs.modaic.com/)
|
||||
- [Neo4j docs](https://neo4j.com/docs/)
|
||||
|
||||
## Contact
|
||||
**Primary Contact:** [@chrisammon3000](https://github.com/chrisammon3000)
|
||||
- [uv docs](https://docs.astral.sh/uv/)
|
||||
|
||||
11
main.py
11
main.py
@@ -51,21 +51,22 @@ class GenerateCypher(PrecompiledProgram):
|
||||
generate_cypher = GenerateCypher(GenerateCypherConfig())
|
||||
|
||||
if __name__ == "__main__":
|
||||
"""
|
||||
from pathlib import Path
|
||||
import json
|
||||
|
||||
"""
|
||||
examples_path = Path(__file__).parent / "examples" / "wikipedia-abstracts-v0_0_1.ndjson"
|
||||
with open(examples_path, "r") as f:
|
||||
for line in f:
|
||||
data = json.loads(line)
|
||||
text = data["text"]
|
||||
print(text[:50])
|
||||
print("TEXT TO PROCESS:\n", text[:50])
|
||||
cypher = generate_cypher(text=text, neo4j_schema=neo4j.fmt_schema())
|
||||
neo4j.query(cypher.statement.replace('```', ''))
|
||||
print("CYPHER STATEMENT:\n", cypher.statement)
|
||||
"""
|
||||
schema = neo4j.fmt_schema()
|
||||
print(schema)
|
||||
generate_cypher.push_to_hub("farouk1/text-to-cypher", with_code=True, tag="v0.0.4", commit_message="Syntax fix")
|
||||
print("SCHEMA:\n", schema)
|
||||
generate_cypher.push_to_hub("farouk1/text-to-cypher", with_code=True, tag="v0.0.5", commit_message="Update README.md")
|
||||
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user