Try It: earl-mcgowen.com/prompt-search
GET /api/prompt-search?q=...&k=5
Overview
This project implements a local-first prompt intelligence system that transforms a GitHub repository of system prompts into a searchable, reusable, and agentic AI asset.
This started as a simple experiment:
“What if prompts weren’t just text… but a dataset you could query, rank, and learn from?”
What came out of it is something much more powerful:
- A semantic search engine for prompts
- A Flask API integrated into my AI gateway
- A SvelteKit frontend for exploration
- Fully powered by my local GPU using Ollama
The Dataset: Prompt Engineering in the Wild
The foundation of this system is an open GitHub repository:
system-prompts-and-models-of-ai-tools (GitHub)
This repository has become one of the most widely circulated collections of real-world AI system prompts:
- 120k+ stars and 30k+ forks, making it one of the largest prompt collections on GitHub
- Frequently referenced as a “goldmine” for understanding how LLMs are guided in production
- Used to study real system prompts behind modern AI tools
Across the AI community, repositories like this are often treated as:
A “Rosetta Stone” for understanding how AI systems are designed to behave
What makes this particularly valuable is that these prompts reflect:
- Real production constraints
- Safety rules and guardrails
- Tool usage patterns
- Company-specific prompt design philosophies
The repository includes:
- System prompts from tools like Cursor, Claude, and VSCode agents
- Instruction templates and reusable patterns
- Real-world implementations of AI behavior design
From Repository → Dataset
What stands out is not just the content, but the structure.
Across hundreds of files, clear patterns emerge in how systems:
- Define roles and constraints
- Structure reasoning steps
- Guide tool usage
- Enforce consistency and safety
These are not isolated prompts. They are designed systems of control and interaction.
Treating this repository as a dataset enables:
- Semantic search (by intent, not keywords)
- Cross-tool comparison of prompt strategies
- Pattern extraction across multiple implementations
- Generation of new prompts based on proven structures
This insight led to the development of a semantic search engine and prompt generation system built on top of this dataset.
What Was Built
This system functions as a Prompt Intelligence Platform.
Semantic Search Engine
- Queries prompts by meaning rather than keywords
- Uses cosine similarity over embeddings
- Returns ranked, relevant prompt snippets
This is a form of RAG (Retrieval-Augmented Generation) applied to prompts instead of documents.
Prompt Generation
The system extends beyond retrieval.
It enables prompt synthesis.
Prompt Synthesis Flow
User Query: "build a coding agent prompt"
↓ retrieve top 5 similar prompts
↓ extract patterns
↓ synthesize
Output: A structured, production-ready system prompt
This transforms the system into a:
Prompt Generator powered by real-world prompt data
Use Cases
This system enables higher-quality prompt development by shifting from intuition-based design to data-driven prompt engineering.
Instead of isolated experimentation, prompts can be analyzed across contexts, compared structurally, and refined into reusable patterns.
It supports:
- Standardization of prompt design across teams
- Creation of reusable prompt templates
- Faster iteration through semantic comparison
- Continuous learning from accumulated prompt data
Over time, this evolves into a learning system, where each new prompt builds on prior knowledge rather than starting from scratch.
Industry Applications
Healthcare
- Clinical decision support prompts
- Medical summarization prompts
- Patient interaction prompts
Legal / Attorneys
- Case summarization prompts
- Contract analysis prompts
- Legal research prompts
What This Project Represents
This project reframes prompt engineering as a data system problem.
Instead of treating prompts as isolated artifacts, they are modeled as structured data with:
- Lineage
- Embeddings
- Queryability
- Measurable relevance
This enables a pipeline similar to modern data systems:
- Raw prompts → embedded representations
- Queries → retrieval + ranking
- Outputs → generated prompts
It also introduces a feedback loop:
Running Everything Locally
The entire system runs on my home setup:
- Ollama (embeddings + LLMs)
- SQLite vector database
- Flask API gateway
- SvelteKit frontend
- Exposed via ngrok / (soon Cloudflare Tunnel)
This gives me:
- Full control
- No API costs
- Private data processing
- Production-like architecture
Here’s your final blog post in clean copy-paste markdown, with your generated hero image included and everything aligned to your site structure.
- New prompts are added
- Performance can be evaluated
- The system continuously improves
What begins as a static repository becomes a living prompt system—capable of evolving over time.
Final Thoughts
Prompt engineering is often treated as a short-lived, trial-and-error process.
At scale, it becomes something else entirely.
When prompts are structured, stored, and analyzed collectively, they become:
A dataset. A system component. A layer of intelligence.
Patterns emerge. Design principles become visible. Effective strategies can be reused and refined.
The result is a new class of systems:
They **learn from them.**
