diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000..1add746 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,64 @@ +# Contributing to Modaic +## Finding an Issue to Work On +- Check the [issues](https://github.com/modaic-ai/modaic/issues) page for open issues. +- If you are new to the project, start with the [good first issue](https://github.com/modaic-ai/modaic/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) label. +- Also check the [help wanted](https://github.com/modaic-ai/modaic/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted%22) label for issues that are more complex but still manageable. +- If you have questions about an issue, ask in the comments. +- If you want to work on an issue, comment on the issue that you will work on it. +- If you have already started working on an issue, let us know in the comments. + +## Setting up the Development Environment +- First, fork the repository and clone it locally. +- We use [uv](https://docs.astral.sh/uv/) to manage the dependencies. Refer to the site for installation instructions. +- Then install the development dependencies with the following command: +```bash +cd modaic +uv sync --dev +``` + +## Code Style and Formatting +**Comments** + +Include docstrings for all public functions and classes. +Do not use # comments unless they are prefixed with a code tag and are necessary for the code to be understandable. We use the following code tags: +- `# CAVEAT:` - A heads-up that there’s something tricky or non-obvious here that the reader should keep in mind. +- `# NOTE:` - A note to the reader offering some necessary context. +- `# TODO:` - Something needs to be done. +- `# DOCME:` - Needs to be documented. +- `# BUG:` - Something is wrong. +- `# FIXME:` - Something needs to be fixed. +- `# HACK:` - A temporary ugly workaround solution that is hacky and should be changed. +To get the most use out of code tags, I recommend you install the [TODO Tree Extension](https://marketplace.visualstudio.com/items?itemName=Gruntfuggly.todo-tree) + +**Formatting** + +We use the ruff linter/formatter to check for code style and formatting. It is installed with the dev dependencies. To use it, install the [Ruff Extension](https://marketplace.visualstudio.com/items?itemName=charliermarsh.ruff) on VSCode. Make sure your VSCode python interpreter is set to the python path in the `.venv` created by uv. You can change it by pressing `Ctrl+Shift+P` on windows and `Cmd+Shift+P` on Mac and typing `Python: Select Interpreter`. You should see one named `.venv`. This will ensure the formatter rules match the modaic specific formatting. + +When working with ruff, you may find these settings useful for auto-formatting code. You can add these to your VSCode user settings. (cmd + shift + p -> "Preferences: Open Workspace Settings (JSON)") +```json + "editor.formatOnSave": true, + "ruff.path": ["ruff"], + "[python]": { + "editor.defaultFormatter": "charliermarsh.ruff", + "editor.formatOnSave": true, + "editor.codeActionsOnSave": { + "source.fixAll": "explicit" + } + } +``` + +## Contributing to the Documentation +- Follow instructions for setting up the local development environment [above](#setting-up-the-development-environment). +- Next ensure you have [node](https://nodejs.org/en/download/) installed. +- Install the node dependencies for the documentation. +```bash +npm install +``` +- To run the documentation locally with live reload, run the following command: +```bash +npm run dev +``` +- To build the documentation, run the following command: +```bash +npm run build +``` \ No newline at end of file diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..e12c714 --- /dev/null +++ b/LICENSE @@ -0,0 +1,31 @@ +MIT License + +Copyright (c) 2025 Modaic Inc + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. + +--- + +Additional Terms: + +1. You may not modify this Software in any way that changes the default hub + endpoint, nor distribute derivative works that route agents or models to + a hub other than modaic.dev. + +2. All other rights are granted as per the MIT License. diff --git a/README.md b/README.md index 8f9b62b..df70a3b 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,224 @@ -# second-org-repo +[![Docs](https://img.shields.io/badge/docs-available-brightgreen.svg)](https://docs.modaic.dev) +[![PyPI](https://img.shields.io/pypi/v/modaic)](https://pypi.org/project/modaic/) + +# Modaic 🐙 +**Mod**ular **A**gent **I**nfrastructure **C**ollection, a Python framework for maintaining DSPy applications. + +## Overview + +Modaic provides a comprehensive toolkit for creating intelligent DSPY pipelines that can work with diverse data sources including tables, documents, and databases. Built on top of DSPy, it offers a way to share and manage DSPY pipelines with integrated vector, SQL, and graph database support. + +## Key Features + +- **Hub Support**: Load and share precompiled DSPY programs from Modaic Hub +- **Context Management**: Structured handling of molecular and atomic context types +- **Database Integration**: Support for Vector (Milvus, Pinecone, Qdrant), SQL (SQLite, MySQL, PostgreSQL), and Graph (Memgraph, Neo4j) +- **Program Framework**: Precompiled and auto-loading DSPY programs +- **Table Processing**: Advanced Excel/CSV processing with SQL querying capabilities + + +## Installation + +### Using uv (recommended) + +```bash +uv add modaic +``` + +Optional (for hub operations): + +```bash +export MODAIC_TOKEN="" +``` + +### Using pip +Please note that you will not be able to push DSPY programs to the Modaic Hub with pip. +```bash +pip install modaic +``` +## Quick Start + +### Creating a Simple Program + +```python +from modaic import PrecompiledProgram, PrecompiledConfig + +class WeatherConfig(PrecompiledConfig): + weather: str = "sunny" + +class WeatherProgram(PrecompiledProgram): + config: WeatherConfig + + def __init__(self, config: WeatherConfig, **kwargs): + super().__init__(config, **kwargs) + + def forward(self, query: str) -> str: + return f"The weather in {query} is {self.config.weather}." + +weather_program = WeatherProgram(WeatherConfig()) +print(weather_program(query="Tokyo")) +``` + +Save and load locally: + +```python +weather_program.save_precompiled("./my-weather") + +from modaic import AutoProgram, AutoConfig + +cfg = AutoConfig.from_precompiled("./my-weather", local=True) +loaded = AutoProgram.from_precompiled("./my-weather", local=True) +print(loaded(query="Kyoto")) +``` + +### Working with Tables + +```python +from pathlib import Path +from modaic.context import Table, TableFile +import pandas as pd + +# Load from Excel/CSV +excel = TableFile.from_file( + file_ref="employees.xlsx", + file=Path("employees.xlsx"), + file_type="xlsx", +) +csv = TableFile.from_file( + file_ref="data.csv", + file=Path("data.csv"), + file_type="csv", +) + +# Create from DataFrame +df = pd.DataFrame({"col1": [1, 2, 3], "col2": [4, 5, 6]}) +table = Table(df=df, name="my_table") + +# Query with SQL (refer to in-memory table as `this`) +result = table.query("SELECT * FROM this WHERE col1 > 1") + +# Convert to markdown +markdown = table.markdown() +``` + +### Database Integration + +#### SQL Database +```python +from modaic.databases import SQLDatabase, SQLiteBackend + +# Configure and connect +backend = SQLiteBackend(db_path="my_database.db") +db = SQLDatabase(backend) + +# Add table +db.add_table(table) + +# Query +rows = db.fetchall("SELECT * FROM my_table") +``` + +#### Vector Database +#### Graph Database +```python +from modaic.context import Context, Relation +from modaic.databases import GraphDatabase, MemgraphConfig, Neo4jConfig + +# Configure backend (choose one) +mg = GraphDatabase(MemgraphConfig()) +# or +neo = GraphDatabase(Neo4jConfig()) + +# Define nodes +class Person(Context): + name: str + age: int + +class KNOWS(Relation): + since: int + +alice = Person(name="Alice", age=30) +bob = Person(name="Bob", age=28) + +# Save nodes +alice.save(mg) +bob.save(mg) + +# Create relationship (Alice)-[KNOWS]->(Bob) +rel = (alice >> KNOWS(since=2020) >> bob) +rel.save(mg) + +# Query +rows = mg.execute_and_fetch("MATCH (a:Person)-[r:KNOWS]->(b:Person) RETURN a, r, b LIMIT 5") +``` +```python +from modaic import Embedder +from modaic.context import Text +from modaic.databases import VectorDatabase, MilvusBackend + +# Setup embedder and backend +embedder = Embedder("openai/text-embedding-3-small") +backend = MilvusBackend.from_local("vector.db") # milvus lite + +# Initialize database +vdb = VectorDatabase(backend=backend, embedder=embedder, payload_class=Text) + +# Create collection and add records +vdb.create_collection("my_collection", payload_class=Text) +vdb.add_records("my_collection", [Text(text="hello world"), Text(text="modaic makes sharing DSPY programs easy")]) + +# Search +results = vdb.search("my_collection", query="hello", k=3) +top_hit_text = results[0][0].context.text +``` + +## Architecture +### Program Types + +1. **PrecompiledProgram**: Statically defined programs with explicit configuration +2. **AutoProgram**: Dynamically loaded programs from Modaic Hub or local repositories + +### Database Support + +| Database Type | Providers | Use Case | +|---------------|-----------|----------| +| **Vector** | Milvus | Semantic search, RAG | +| **SQL** | SQLite, MySQL, PostgreSQL | Structured queries, table storage | + +## Examples + +### TableRAG Example + +The TableRAG example demonstrates a complete RAG pipeline for table-based question answering: + +```python +from modaic import PrecompiledConfig, PrecompiledProgram +from modaic.context import Table +from modaic.databases import VectorDatabase, SQLDatabase +from modaic.types import Indexer + +class TableRAGConfig(PrecompiledConfig): + k_recall: int = 50 + k_rerank: int = 5 + +class TableRAGProgram(PrecompiledProgram): + config: TableRAGConfig # ! Important: config must be annotated with the config class + + def __init__(self, config: TableRAGConfig, indexer: Indexer, **kwargs): + super().__init__(config, **kwargs) + self.indexer = indexer + # Initialize DSPy modules for reasoning + + def forward(self, user_query: str) -> str: + # Retrieve relevant tables + # Generate SQL queries + # Combine results and provide answer + pass +``` + +## Support + +For issues and questions: +- GitHub Issues: `https://github.com/modaic-ai/modaic/issues` +- Docs: `https://docs.modaic.dev` diff --git a/auto_classes.json b/auto_classes.json new file mode 100644 index 0000000..9e401c7 --- /dev/null +++ b/auto_classes.json @@ -0,0 +1,4 @@ +{ + "AutoConfig": "modaic.PrecompiledConfig", + "AutoProgram": "scratch.push_org.MyProgram" +} \ No newline at end of file diff --git a/config.json b/config.json new file mode 100644 index 0000000..4475ebc --- /dev/null +++ b/config.json @@ -0,0 +1,3 @@ +{ + "model": null +} \ No newline at end of file diff --git a/program.json b/program.json new file mode 100644 index 0000000..23da2b4 --- /dev/null +++ b/program.json @@ -0,0 +1,9 @@ +{ + "metadata": { + "dependency_versions": { + "python": "3.11", + "dspy": "3.0.4", + "cloudpickle": "3.1" + } + } +} \ No newline at end of file diff --git a/pyproject.toml b/pyproject.toml new file mode 100644 index 0000000..3ecc562 --- /dev/null +++ b/pyproject.toml @@ -0,0 +1,20 @@ +[project] +name = "second-org-repo" +dynamic = ["version"] +description = "Modular Agent Infrastructure Collection, a python framework for managing and sharing DSPy agents" +authors = [{ name = "Tyrin", email = "tytodd@mit.edu" }, {name = "Farouk", email = "farouk@modaic.dev"}] +readme = "README.md" +homepage = "https://github.com/modaic-ai/modaic" +license = {file = "LICENSE"} +requires-python = ">=3.10" +dependencies = ["dspy>=2.6.27", "gitpython>=3.1.45", "opik==1.8.42", "platformdirs>=4.3.8", "tomlkit>=0.13.3"] + +[project.urls] +Homepage = "https://github.com/modaic-ai/modaic" +Modaic = "https://www.modaic.dev" +Docs = "https://docs.modaic.dev" + + +[tool.uv.sources] +modaic = { workspace = true } + diff --git a/scratch/push_org.py b/scratch/push_org.py new file mode 100644 index 0000000..1f79e31 --- /dev/null +++ b/scratch/push_org.py @@ -0,0 +1,13 @@ +from modaic import PrecompiledProgram + + +class MyProgram(PrecompiledProgram): + def __init__(self, **kwargs): + super().__init__(**kwargs) + + def forward(self, **kwargs): + return "Hello, world!" + + +program = MyProgram() +program.push_to_hub("modaic/second-org-repo", with_code=True)