Zorora: Deep Research Engine

Zorora is a local-deployment deep research engine that searches across academic databases, web sources, and newsroom articles, then synthesizes findings with credibility scoring and citation graphs. Built for macOS (Apple Silicon) with minimal RAM footprint, meant to be run directly from your computer, with all content, outputs, and chats stored locally and not in the cloud, giving you complete control and privacy.

Zorora offers both Web UI and Terminal REPL interfaces

What is Zorora?

Zorora transforms from a basic research tool into a deep research engine that:

Searches EVERYTHING - Academic databases (7 sources) + web search + Asoba newsroom
Follows citation trails - Multi-hop research that explores cited papers
Cross-references claims - Groups similar claims and counts agreement across sources
Scores credibility - Transparent rules-based scoring of source authority
Builds citation graphs - Visualizes relationships between sources
Synthesizes with confidence - Generates comprehensive answers with citation levels

Core Value Proposition

Zorora provides:

Complete Privacy - All processing and storage on your machine
Local-First Architecture - Zero cloud dependencies (except source fetching)
RAM Efficiency - Runs on MacBook Air M3 with 4B model (4-6 GB RAM)
Dual Interfaces - Terminal REPL for engineers, Web UI for non-engineers
Deterministic Reliability - Pattern-based routing (no LLM orchestration failures)
Multi-Provider Support - Configure models from HuggingFace, OpenAI, and Anthropic APIs

Core Features

Deep Research Capabilities

6-Phase Research Pipeline:

Parallel Source Aggregation - Searches academic (7 sources), web (Brave + DDG), and newsroom simultaneously
Citation Following - Multi-hop exploration of cited papers (configurable depth: 1-3)
Cross-Referencing - Groups claims by similarity and counts agreement
Credibility Scoring - Rules-based scoring of source authority (academic journals, predatory publishers, retractions)
Citation Graph Building - Constructs directed graphs showing source relationships
Synthesis - Generates comprehensive answers with confidence levels and citations

Research Depth Levels:

Quick - Initial sources only (skips citation following, depth=1, ~25-35s)
Balanced - Adds citation following (1 hop, depth=2, ~35-50s) - Coming soon
Thorough - Multi-hop citation exploration (up to 3 levels deep, depth=3, ~50-70s) - Coming soon

Additional Features

Research persistence - Save/load findings with metadata
Code generation - Dedicated Codestral model for coding tasks
Multi-step development - /develop workflow: explore → plan → approve → execute → lint
Slash commands - Force workflows: /search, /ask, /code, /develop, /image, /vision
Deterministic routing - Pattern-based decision tree (no LLM routing failures)
Hybrid deployment - Local 4B orchestrator + remote 32B specialists
RAM-efficient - Runs on MacBook Air M3 with 4B model
Dual interfaces - Terminal REPL for engineers, Web UI for non-engineers
Multi-provider support - Configure models from HuggingFace, OpenAI, and Anthropic APIs
Visual settings management - Web UI settings modal for easy configuration
Vision and image generation - Dedicated models for image analysis and text-to-image generation

Architecture

Zorora uses deterministic routing with pattern matching instead of LLM-based orchestration. This design choice enables reliable operation with small 4B models while maintaining RAM efficiency.

Design Philosophy

Deterministic over clever - Code-controlled workflows, not LLM orchestration
Research-first - Optimized for multi-source synthesis and citation management
RAM-efficient - Runs on MacBook Air with 4B orchestrator model
Persistent knowledge - Save and retrieve research findings locally
Simple and reliable - Hardcoded pipelines that just work

Architecture Diagram

User Query / Slash Command / Web UI Request
    ↓
Pattern Matching (simplified_router.py) / Flask Routes (ui/web/app.py)
    ↓
    ├─→ DEEP RESEARCH WORKFLOW (6-phase pipeline)
    │   ├─► Phase 1: Parallel Source Aggregation
    │   │   ├─► Academic (7 sources: Scholar, PubMed, CORE, arXiv, bioRxiv, medRxiv, PMC)
    │   │   ├─► Web (Brave Search + DuckDuckGo)
    │   │   └─► Newsroom (Asoba API)
    │   ├─► Phase 2: Citation Following (configurable depth: 1-3)
    │   ├─► Phase 3: Cross-Referencing (groups claims by similarity)
    │   ├─► Phase 4: Credibility Scoring (rules-based)
    │   ├─► Phase 5: Citation Graph Building
    │   └─► Phase 6: Synthesis (Reasoning Model)
    ├─→ CODE WORKFLOW (Codestral specialist)
    ├─→ DEVELOPMENT WORKFLOW (/develop - multi-step)
    ├─→ FILE OPERATIONS (save/load/list)
    ├─→ IMAGE WORKFLOWS (generate/analyze)
    └─→ SIMPLE Q&A (/ask - direct model)

Key Principles

No LLM-based orchestration - Patterns determine routing, code controls execution
Hardcoded workflows - Fixed pipelines for predictable results
Persistent research - Everything saved to ~/.zorora/research/ with metadata
Specialist models - Codestral for code, reasoning model for synthesis, vision for images
Multi-provider support - Configure models from LM Studio (local), HuggingFace, OpenAI, and Anthropic APIs
Visual configuration - Web UI settings modal for easy model/endpoint management
Hybrid inference - Mix local models (4B orchestrator) with remote HuggingFace endpoints (32B Codestral)

Core Components

1. Simplified Router (simplified_router.py)

Uses pattern matching to route queries to workflows. No LLM involved - pure pattern matching ensures consistent, fast routing (0ms decision time).

2. Research Engine (engine/research_engine.py)

High-level interface for deep research:

Starting research
Loading past research
Searching research history
Executing deep research workflow

3. Deep Research Workflow (workflows/deep_research/)

Hardcoded pipeline for multi-source research:

aggregator.py - Source aggregation
credibility.py - Credibility scoring
synthesizer.py - Synthesis generation
workflow.py - Workflow orchestrator

4. Storage Layer (engine/storage.py)

Local-first storage:

SQLite database (~/.zorora/zorora.db) for fast indexed queries
JSON files (~/.zorora/research/findings/) for full research state

5. Web UI (ui/web/app.py)

Flask-based web interface:

Research query interface with depth selection
Settings modal for configuration
Research results display with synthesis, sources, and credibility scores

Local-First Design

Storage Architecture

Research Request
    ↓
ResearchEngine.deep_research()
    ↓
DeepResearchWorkflow.execute()
    ↓
LocalStorage.save_research()
    ├─► SQLite Index (~/.zorora/zorora.db)
    │   ├─► research_findings (metadata)
    │   ├─► sources (indexed)
    │   └─► citations (graph)
    └─► JSON Files (~/.zorora/research/findings/<id>.json)
        └─► Full research state (sources, findings, synthesis)

Privacy & Control

All processing on your machine - No cloud computation
Local storage only - Research data never leaves your machine
Zero cloud dependencies - Except source fetching (academic databases, web search)
Complete control - You own all data, all outputs, all chats

Performance

Routing decision: 0ms (pattern matching, no LLM)
Research workflow: Varies by depth
- Quick (depth=1): ~25-35s
- Balanced (depth=2): ~35-50s - Coming soon
- Thorough (depth=3): ~50-70s - Coming soon
Storage queries: <100ms (SQLite indexed)
Code generation: 10-90 seconds (local: 10-30s, HF 32B: 60-90s)
RAM usage: 4-6 GB (4B orchestrator model)

Why This Architecture?

Problem: 4B Models Can’t Orchestrate

Traditional multi-model orchestration requires the LLM to:

Generate valid JSON plans
Make routing decisions
Handle multi-step iteration
Recover from tool failures

4B models fail at all of these. They can’t reliably generate JSON, struggle with function calling, and get stuck in loops.

Solution: Code Handles Complexity

Instead of asking the 4B model to be smart, we made the code smart:

Pattern matching routes queries (no LLM decision)
Hardcoded workflows execute pipelines (no LLM planning)
Fixed iteration count (no LLM loop detection)
Deterministic error handling (no LLM recovery)

Result: 100% reliability with 4B models, 1/3 the RAM usage of 8B orchestrators, complete privacy with local storage.

Trade-offs

What we lost:

Flexibility for complex multi-tool queries
LLM creativity in tool selection
Adaptive workflows based on results

What we gained:

100% routing reliability (pattern matching never fails)
Predictable behavior (same query = same workflow)
RAM efficiency (4B model = 4-6 GB vs 8B = 12-16 GB)
Simple debugging (no “why did it choose that tool?”)
Fast responses (no LLM routing overhead)

Quick Start

Installation

Download v3.0.0-prod

Or install from GitHub:

pip install git+https://github.com/AsobaCloud/zorora.git

Run Your First Query

Terminal:

zorora
[1] ⚙ > What are the latest developments in large language model architectures?

Web UI:

zorora web
# Opens at http://localhost:5000

API:

from engine.research_engine import ResearchEngine

engine = ResearchEngine()
state = engine.deep_research("Your research question", depth=1)
print(state.synthesis)

Next Steps

Getting Started - Installation and configuration guide
Guides - Comprehensive guides for all features
API Reference - Programmatic access documentation
Technical Concepts - Deep dive into architecture
Use Cases - Real-world examples