init

2026-01-20 20:06:28 -08:00
parent 36d43d820e
commit 14dfbaa6da
8 changed files with 1572 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -1,2 +1,93 @@
 # regspy
 regspy is a regex pattern generator, you enter some data -> select what you want matched and or not matched -> ??? -> Pattern!
 ![alt text](imgs/demo.gif)
 This project started as me trying to learn dspy, its vibe coded to shit and back but it works and has some accomplishments:
 - Runs on small models with 3B parameter at a minimum, so it should run on anything.
 - It outperforms grex ~~in metrics that were defined by me~~.
 - Learns from what you feed it, it generated a pattern you liked? add it to the training set!
 - No human written prompts or rules or "make sure to NOT explode" bs.
 - Context aware generation, it learns from failed patterns and most importantly WHY it failed.
 - Generates patterns based on a scoring system that ranks patterns by:
 	 - **matches_all**: Percentage of required items the pattern matches
 	 - **excludes_all**: Percentage of excluded items the pattern avoids
 		 - *If no excluded items are selected, this metrics weights are divided equally amongst the others.*
 	 - **coherence**: How similar extra matches are to target items
 	 - **generalization**: Use of character classes (\\d, \\w) vs literals
 	 - **simplicity**: How short patterns are and without the use of branching
 Is it perfect? hell no, the training set, scoring system, hint generation could be improved upon, so if you want have a go at it i included a CLAUDE.md for you.
 But if you're a everyday smooth brain like me that needs a simple pattern on the fly because for some reason your brain is physically impossible of remembering that lookaheads exist, regspy should be of some help. 
 ## Features
 - **Visual Text Selection**: Highlight text to create match examples (cyan) or exclusions (red)
 - **LLM-Powered Generation**: Uses local Ollama with qwen2.5-coder:3b for intelligent pattern creation
 - **Training Dataset**: 227+ curated examples with ability to add your own
 - **Pre-compilation**: Optional rule extraction for faster runtime inference
 - **Session Config**: Adjust model, temperature, and scoring weights on the fly
 ## Installation
 - **AutoHotkey v2.0** - [Download](https://www.autohotkey.com/)
 - **Python Libs**:
  ```bash
  pip install dspy grex ollama
  ```
 - **Ollama**:
  ```bash
  ollama serve
  ollama pull qwen2.5-coder:3b
  ```
 - **Run**:
  ```bash
  AutoHotkey64.exe regspy.ahk # Or just double click regspy.ahk
  ```
 ### CLI flags
 ```bash
 # Run test suite
 python regexgen.py --test
 # Pre-compile for faster runtime
 python regexgen.py --compile
 # Generate regex from JSON input
 python regexgen.py input.json output.json
 # With custom config
 python regexgen.py input.json output.json --config config.json
 # Dataset management
 python regexgen.py --list-dataset output.json
 python regexgen.py --add-example example.json
 python regexgen.py --delete-example <index>
 ```
 ## Architecture
 ```
 ┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
 │   AutoHotkey    │────▶│  Web Frontend   │────▶│     Python      │
 │   (Host)        │◀────│  (WebView2)     │◀────│   (DSPy/LLM)    │
 └─────────────────┘     └─────────────────┘     └─────────────────┘
     Window              Text Selection          Regex Generation
     Management          Highlighting            Multi-criteria
     IPC Bridge          Results Display         Scoring
 ```
 ## Configuration
 The Config tab allows session-level adjustments:
 - **Model**: Ollama model name (default: `qwen2.5-coder:3b`)
 - **Temperature**: LLM creativity (default: 0.4)
 - **Max Attempts**: Refinement iterations (default: 10)
 - **Reward Threshold**: Stop early if score exceeds (default: 0.85)
 - **Scoring Weights**: Adjust the 5 criteria weights
 - **Context Window** (`num_ctx`): Ollama context size (default: 8192). Ollama defaults to 4096 which can truncate prompts with many training examples. If you see "truncating input prompt" warnings in Ollama logs, bump this up. Uses ~200MB extra VRAM per 4K increase on 3B models.
--- a/auto_classes.json
+++ b/auto_classes.json
@@ -0,0 +1,4 @@
 {
  "AutoConfig": "regspy.cli.RegexConfig",
  "AutoProgram": "regspy.cli.RegexProgram"
 }
--- a/config.json
+++ b/config.json
@@ -0,0 +1,24 @@
 {
  "model": "qwen2.5-coder:3b",
  "ollama_url": "http://localhost:11434",
  "temperature": 0.4,
  "num_ctx": 8192,
  "enable_cache": false,
  "max_attempts": 10,
  "reward_threshold": 0.85,
  "fail_count": null,
  "use_cot": true,
  "dataset_file": "/Users/fadel/Desktop/dev/regspy/dspy/regex-dspy-train.json",
  "compiled_program_path": "/Users/fadel/Desktop/dev/regspy/dspy/regex_compiled.json",
  "compile_threads": 8,
  "compile_candidates": 16,
  "compile_num_rules": 5,
  "debug": true,
  "weights": {
    "matches_all": 0.35,
    "excludes_all": 0.25,
    "coherence": 0.15,
    "generalization": 0.15,
    "simplicity": 0.1
  }
 }
--- a/program.json
+++ b/program.json
@@ -0,0 +1,44 @@
 {
  "program.predict": {
    "traces": [],
    "train": [],
    "demos": [],
    "signature": {
      "instructions": "Generate a regex pattern from examples.",
      "fields": [
        {
          "prefix": "Text:",
          "description": "The full text to search within"
        },
        {
          "prefix": "Match Items:",
          "description": "Strings the pattern MUST match"
        },
        {
          "prefix": "Exclude Items:",
          "description": "Strings the pattern must NOT match"
        },
        {
          "prefix": "Pattern Hints:",
          "description": "Analysis hints about the match items"
        },
        {
          "prefix": "Reasoning: Let's think step by step in order to",
          "description": "${reasoning}"
        },
        {
          "prefix": "Pattern:",
          "description": "Regex pattern"
        }
      ]
    },
    "lm": null
  },
  "metadata": {
    "dependency_versions": {
      "python": "3.13",
      "dspy": "3.1.2",
      "cloudpickle": "3.1"
    }
  }
 }
--- a/push.py
+++ b/push.py
@@ -0,0 +1,52 @@
 #!/usr/bin/env python3
 """
 Push RegSpy program to Modaic Hub.
 Usage:
  uv run push.py your-username/regspy
  uv run push.py your-username/regspy --with-code --commit-message "Initial"
 """
 from __future__ import annotations
 import argparse
 from regspy import RegexConfig, RegexProgram
 def create_parser() -> argparse.ArgumentParser:
    parser = argparse.ArgumentParser(
        description="Push RegSpy PrecompiledProgram to Modaic Hub",
    )
    parser.add_argument("repo", help="Hub repo in the form username/name")
    parser.add_argument("--with-code", action="store_true", help="Bundle code with the push")
    parser.add_argument("--commit-message", help="Optional commit message")
    parser.add_argument("--branch", help="Optional branch name")
    parser.add_argument("--tag", help="Optional tag name")
    parser.add_argument("--private", action="store_true", help="Push to a private repo")
    return parser
 def main() -> None:
    parser = create_parser()
    args = parser.parse_args()
    program = RegexProgram(RegexConfig())
    push_kwargs: dict[str, object] = {
        "with_code": args.with_code,
        "private": args.private,
    }
    if args.commit_message:
        push_kwargs["commit_message"] = args.commit_message
    if args.branch:
        push_kwargs["branch"] = args.branch
    if args.tag:
        push_kwargs["tag"] = args.tag
    program.push_to_hub(args.repo, **push_kwargs)
    print(f"Pushed to {args.repo}")
 if __name__ == "__main__":
    main()
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -0,0 +1,7 @@
 [project]
 name = "regspy"
 version = "0.1.0"
 description = "Add your description here"
 readme = "README.md"
 requires-python = ">=3.13"
 dependencies = ["dspy>=3.1.2", "grex>=1.0.2", "modaic>=0.10.3"]
--- a/regspy/init.py
+++ b/regspy/init.py
@@ -0,0 +1,10 @@
 """RegSpy package exports."""
 from .cli import (
    RegexConfig,
    RegexProgram,
    compile_and_save,
    generate_regex,
 )
 __all__ = ["RegexConfig", "RegexProgram", "compile_and_save", "generate_regex"]
--- a/regspy/cli.py
+++ b/regspy/cli.py