Asoba Zorora Documentation

Research Pipeline

How Zorora’s 6-phase research pipeline works.

Overview

Zorora’s deep research workflow executes a 6-phase pipeline that searches across academic databases, web sources, and newsroom articles, then synthesizes findings with credibility scoring and citation graphs.

Pipeline Phases

Phase 1: Parallel Source Aggregation

What happens:

Implementation:

# Parallel execution
academic_sources = academic_search(query)  # 7 sources in parallel
web_sources = web_search(query)           # Brave + DDG
newsroom_sources = newsroom_search(query) # Asoba API

Output: Raw sources from all three categories

Performance: ~8 seconds (parallel execution)

Phase 2: Citation Following

What happens:

Depth Levels:

Implementation:

if depth > 1:
    cited_papers = extract_citations(initial_sources)
    cited_sources = fetch_cited_papers(cited_papers, depth=depth-1)
    sources.extend(cited_sources)

Output: Extended source set with citation relationships

Performance: Adds ~10-20 seconds per depth level

Phase 3: Cross-Referencing

What happens:

Implementation:

claims = extract_claims(sources)
grouped_claims = group_by_similarity(claims)
agreement_counts = count_agreement(grouped_claims)

Output: Grouped claims with agreement counts

Performance: ~2 seconds

Phase 4: Credibility Scoring

What happens:

Scoring Rules:

Implementation:

for source in sources:
    score = calculate_credibility(source)
    source.credibility_score = score
    source.credibility_category = categorize(score)

Output: Sources with credibility scores and categories

Performance: ~2 seconds

Phase 5: Citation Graph Building

What happens:

Implementation:

graph = build_citation_graph(sources)
key_papers = identify_key_papers(graph)

Output: Citation graph structure

Performance: ~1 second

Phase 6: Synthesis

What happens:

Implementation:

prompt = f"""
SOURCES:
[Academic]: {academic_content}
[Web]: {web_content}
[Newsroom]: {newsroom_content}

QUESTION: {query}

Synthesize findings from ALL sources above.
Cite sources using [Academic], [Web], or [Newsroom] tags.
"""
synthesis = reasoning_model.generate(prompt)

Output: Final synthesis with citations

Performance: ~15-25 seconds (local reasoning model)

Pipeline Execution

Complete Flow

Query
  ↓
Phase 1: Parallel Source Aggregation (~8s)
  ├─► Academic (7 sources)
  ├─► Web (Brave + DDG)
  └─► Newsroom
  ↓
Phase 2: Citation Following (~10-20s, if depth > 1)
  ↓
Phase 3: Cross-Referencing (~2s)
  ↓
Phase 4: Credibility Scoring (~2s)
  ↓
Phase 5: Citation Graph Building (~1s)
  ↓
Phase 6: Synthesis (~15-25s)
  ↓
Result (with citations and confidence levels)

Total Time by Depth

Data Flow

Source Aggregation

Query
  ↓
┌─────────────────────────────────────┐
│ Parallel Source Aggregation         │
├─────────────────────────────────────┤
│ Academic Search (7 sources)         │
│ Web Search (Brave + DDG)             │
│ Newsroom Search (Asoba API)         │
└─────────────────────────────────────┘
  ↓
Raw Sources (academic, web, newsroom)

Processing Pipeline

Raw Sources
  ↓
Citation Following (if depth > 1)
  ↓
Extended Sources
  ↓
Cross-Referencing
  ↓
Grouped Claims
  ↓
Credibility Scoring
  ↓
Scored Sources
  ↓
Citation Graph Building
  ↓
Research Graph
  ↓
Synthesis
  ↓
Final Result

Implementation Details

Sources:

Parallel Execution:

results = await asyncio.gather(
    scholar_search(query),
    pubmed_search(query),
    core_search(query),
    arxiv_search(query),
    biorxiv_search(query),
    medrxiv_search(query),
    pmc_search(query)
)

Primary: Brave Search API

Fallback: DuckDuckGo

Source: Asoba API

Credibility Scoring

Domain-Based Scoring:

Citation Modifiers:

Cross-Reference Agreement:

Synthesis

Model: Reasoning model (qwen2.5:32b or configured alternative)

Prompt Structure:

SOURCES:
[Academic]: {academic_content}
[Web]: {web_content}
[Newsroom]: {newsroom_content}

QUESTION: {query}

Synthesize findings from ALL sources above.
Cite sources using [Academic], [Web], or [Newsroom] tags.
Highlight key findings and areas of consensus/disagreement.

Performance Optimization

Parallel Execution

All source searches happen in parallel:

Caching

Research Results:

Source Data:

See Also