141 lines
5.0 KiB
Markdown
141 lines
5.0 KiB
Markdown
|
|
# regspy
|
|
|
|
regspy is a regex pattern generator, you enter some data -> select what you want matched and or not matched -> ??? -> Pattern!
|
|
|
|

|
|
|
|
This project started as me trying to learn dspy, its vibe coded to shit and back but it works and has some accomplishments:
|
|
- Runs on small models with 3B parameter at a minimum, so it should run on anything.
|
|
- It outperforms grex ~~in metrics that were defined by me~~.
|
|
- Learns from what you feed it, it generated a pattern you liked? add it to the training set!
|
|
- No human written prompts or rules or "make sure to NOT explode" bs.
|
|
- Context aware generation, it learns from failed patterns and most importantly WHY it failed.
|
|
- Generates patterns based on a scoring system that ranks patterns by:
|
|
- **matches_all**: Percentage of required items the pattern matches
|
|
- **excludes_all**: Percentage of excluded items the pattern avoids
|
|
- *If no excluded items are selected, this metrics weights are divided equally amongst the others.*
|
|
- **coherence**: How similar extra matches are to target items
|
|
- **generalization**: Use of character classes (\\d, \\w) vs literals
|
|
- **simplicity**: How short patterns are and without the use of branching
|
|
|
|
Is it perfect? hell no, the training set, scoring system, hint generation could be improved upon, so if you want have a go at it i included a CLAUDE.md for you.
|
|
|
|
But if you're a everyday smooth brain like me that needs a simple pattern on the fly because for some reason your brain is physically impossible of remembering that lookaheads exist, regspy should be of some help.
|
|
|
|
## Features
|
|
|
|
- **Visual Text Selection**: Highlight text to create match examples (cyan) or exclusions (red)
|
|
- **LLM-Powered Generation**: Uses local Ollama with qwen2.5-coder:3b for intelligent pattern creation
|
|
- **Training Dataset**: 227+ curated examples with ability to add your own
|
|
- **Pre-compilation**: Optional rule extraction for faster runtime inference
|
|
- **Session Config**: Adjust model, temperature, and scoring weights on the fly
|
|
|
|
## Installation
|
|
|
|
- **AutoHotkey v2.0** - [Download](https://www.autohotkey.com/)
|
|
- **Python Libs**:
|
|
```bash
|
|
pip install dspy grex ollama modaic
|
|
```
|
|
- **Ollama**:
|
|
```bash
|
|
ollama serve
|
|
ollama pull qwen2.5-coder:3b
|
|
```
|
|
- **Run**:
|
|
```bash
|
|
AutoHotkey64.exe regspy.ahk # Or just double click regspy.ahk
|
|
```
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
|
│ AutoHotkey │────▶│ Web Frontend │────▶│ Python │
|
|
│ (Host) │◀────│ (WebView2) │◀────│ (DSPy/LLM) │
|
|
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
|
Window Text Selection Regex Generation
|
|
Management Highlighting Multi-criteria
|
|
IPC Bridge Results Display Scoring
|
|
```
|
|
|
|
## Configuration
|
|
|
|
The Config tab allows session-level adjustments:
|
|
|
|
- **Model**: Ollama model name (default: `qwen2.5-coder:3b`)
|
|
- **Temperature**: LLM creativity (default: 0.4)
|
|
- **Max Attempts**: Refinement iterations (default: 10)
|
|
- **Reward Threshold**: Stop early if score exceeds (default: 0.85)
|
|
- **Scoring Weights**: Adjust the 5 criteria weights
|
|
- **Context Window** (`num_ctx`): Ollama context size (default: 8192). Ollama defaults to 4096 which can truncate prompts with many training examples. If you see "truncating input prompt" warnings in Ollama logs, bump this up. Uses ~200MB extra VRAM per 4K increase on 3B models.
|
|
|
|
## Modaic Hub
|
|
|
|
RegSpy uses Modaic's `PrecompiledProgram` so you can publish and load programs with bundled prompts, config, and optionally code.
|
|
|
|
### Push to Modaic Hub
|
|
|
|
Use the helper script:
|
|
|
|
```bash
|
|
uv run push.py your-username/regspy --with-code
|
|
```
|
|
|
|
Or directly in Python:
|
|
|
|
```python
|
|
from regspy import RegexConfig, RegexProgram
|
|
|
|
program = RegexProgram(RegexConfig())
|
|
program.push_to_hub("me/regspy", with_code=True)
|
|
```
|
|
|
|
### DSPy Programs: Auto Classes
|
|
|
|
Load a program:
|
|
|
|
```python
|
|
from modaic import AutoProgram
|
|
|
|
program = AutoProgram.from_precompiled("raet/regspy")
|
|
```
|
|
|
|
Load a config:
|
|
|
|
```python
|
|
from modaic import AutoConfig
|
|
|
|
config = AutoConfig.from_precompiled("raet/regspy")
|
|
```
|
|
|
|
Override a config:
|
|
|
|
```python
|
|
from modaic import AutoProgram
|
|
|
|
program = AutoProgram.from_precompiled(
|
|
"raet/regspy",
|
|
config={"use_cot": False, "temperature": 0.2},
|
|
)
|
|
```
|
|
|
|
Specify runtime parameters:
|
|
|
|
```python
|
|
from modaic import AutoProgram
|
|
|
|
program = AutoProgram.from_precompiled("raet/regspy", timeout=60)
|
|
```
|
|
|
|
Load a specific revision:
|
|
|
|
```python
|
|
from modaic import AutoProgram
|
|
|
|
program = AutoProgram.from_precompiled("raet/regspy", rev="dev")
|
|
program = AutoProgram.from_precompiled("raet/regspy", rev="v1.0")
|
|
program = AutoProgram.from_precompiled("raet/regspy", rev="918ad95")
|
|
```
|