Longterm Wiki

AI-Assisted Research Workflows: Best Practices

Executive Summary

FindingKey InsightRecommendation
Plan before executingClaude tends to jump straight to writing without sufficient researchUse explicit "research → plan → execute" phases
Opus for strategy, Sonnet for executionModel selection matters by phaseSpend budget on thinking (Opus), not typing (Sonnet)
Deep Research APIs existPerplexity Sonar, OpenAI Deep Research, Gemini Deep ResearchConsider OpenRouter for Perplexity API access
Context assembly is underratedLLMs work better with curated context than raw searchPre-gather resources before AI reasoning
Multi-agent beats monolithicSpecialized agents outperform single promptsSeparate researcher, writer, validator roles

Background

The Core Problem

Most AI-assisted writing produces shallow articles because the AI jumps straight to writing without sufficient research or strategic thinking. The fix isn't a better prompt—it's a better pipeline.

This report surveys best practices for AI-assisted research workflows in 2025-2026, drawing from:


The Research Pipeline Problem

Why Single-Shot Prompts Fail

A typical approach:

"Write a comprehensive article about compute governance"

This fails because:

  1. No context gathering - AI uses only training data, misses recent developments
  2. No strategic planning - AI doesn't think about what actually matters
  3. Premature writing - Starts generating prose before understanding the topic
  4. No validation - Errors compound without feedback loops

The Better Architecture

Key Insight from Anthropic

"Asking Claude to research and plan first significantly improves performance for problems requiring deeper thinking upfront—without this, Claude tends to jump straight to coding a solution." — Claude Code Best Practices

Context Assembly → Strategic Planning → Targeted Research → Drafting → Validation → Grading
      (local)          (Opus)            (Perplexity)       (Sonnet)    (scripts)   (Haiku)

Phase 1: Context Assembly

Goal: Gather everything relevant before invoking expensive AI reasoning.

What to Gather

SourceMethodCost
Related wiki pagesGlob/Grep for topic mentionsFree
Existing resourcesQuery resources databaseFree
Entity relationshipsCheck backlinks, cross-refsFree
Summarize contextHaiku to compress$0.01-0.05

Why This Matters

Context Window Economics

Feeding raw search results to Opus wastes expensive tokens on noise. Pre-curated context lets the expensive model focus on thinking, not filtering.

Implementation Pattern

// 1. Find related pages (free)
const relatedPages = await searchWiki(topic);

// 2. Find existing resources (free)
const resources = await queryResourcesDB(topic);

// 3. Summarize with Haiku ($0.02)
const contextBundle = await summarizeContext({
  model: 'haiku',
  pages: relatedPages,
  resources: resources
});

Phase 2: Strategic Planning (Opus)

Goal: Figure out what this article should actually cover and why.

This Is Where Quality Comes From

The difference between a mediocre and excellent article is usually in the framing, not the prose. Opus excels at strategic thinking—use it here.

What Opus Should Decide

QuestionWhy It Matters
What are the key cruxes/debates?Structures the entire article
What's the right framing?Determines reader takeaway
What's already well-covered elsewhere?Avoids duplication
What specific questions need external research?Directs Phase 3
What's the relationship to existing pages?Enables cross-linking

Prompt Pattern

Given this context bundle about [TOPIC]:

[CONTEXT_BUNDLE]

You are planning a wiki article. Before any writing, think through:

1. **Cruxes**: What are the 2-3 key debates or uncertainties about this topic?
2. **Framing**: What's the most useful frame for readers? (risk? opportunity? tradeoff?)
3. **Gap analysis**: What does existing coverage miss?
4. **Research questions**: What specific questions need external research?
5. **Structure**: What sections would best serve readers?

Do NOT write the article. Output a structured plan in JSON.

Budget

ComplexityEst. InputEst. OutputCost
Simple topic20K tokens2K tokens$0.50-1.00
Complex topic50K tokens5K tokens$2.00-4.00

Phase 3: Targeted Research

Goal: Fill specific gaps identified in the plan—not open-ended browsing.

Option A: Claude Code WebSearch

Uses Claude's built-in web search. Good integration but limited depth.

// Directed by the plan
for (const question of plan.researchQuestions) {
  await webSearch(question);
}

Cost: Included in Claude API pricing

Depth: Moderate (single search per query)

Option B: Perplexity Sonar via OpenRouter

Perplexity Sonar Deep Research is purpose-built for comprehensive research. Available via OpenRouter API.

ModelUse CasePricing
sonarQuick lookups$1/1M tokens
sonar-proDeeper search$3/1M tokens + $5/1K searches
sonar-deep-researchComprehensive reports$3/1M tokens + $5/1K searches
How Sonar Deep Research Works

"Sonar Deep Research autonomously searches, reads, and evaluates sources, refining its approach as it gathers information. This enables comprehensive report generation across domains." — OpenRouter

Integration Example:


const openrouter = new OpenAI({
  baseURL: 'https://openrouter.ai/api/v1',
  apiKey: process.env.OPENROUTER_API_KEY,
});

const research = await openrouter.chat.completions.create({
  model: 'perplexity/sonar-deep-research',
  messages: [{
    role: 'user',
    content: `Research: ${plan.researchQuestions.join('\n')}`
  }],
});

Option C: Other Deep Research APIs

ProviderAPI Available?Notes
Perplexity✅ via OpenRouterBest for research depth
OpenAI Deep Research⚠️ LimitedAzure AI Foundry only
Gemini Deep ResearchNo API (consumer only)
Grok DeepSearch⚠️ LimitedxAI API, X integration

Option D: Open Source

Open Deep Research (HuggingFace) provides an open-source implementation with 10K+ GitHub stars.


Phase 4: Drafting (Sonnet)

Goal: Execute the plan with research in hand.

Common Mistake

Don't give Sonnet the raw research dump. Give it: (1) the plan from Opus, (2) curated findings, (3) the style guide.

Prompt Pattern

You are writing a wiki article based on this plan:

[OPUS_PLAN]

Using these research findings:

[CURATED_RESEARCH]

Following this style guide:

[STYLE_GUIDE_EXCERPT]

Write the article. Use tables over bullet lists. Include citations.
Escape all dollar signs (\\$100 not $100).

Why Sonnet, Not Opus?

Drafting is execution, not strategy. Sonnet:

  • Follows instructions well
  • Costs 1/10th of Opus
  • Produces similar prose quality when given a good plan

Cost: $0.50-1.50 per article


Phase 5: Validation

Goal: Catch errors before they compound.

Automated Checks (Free)

npm run crux -- validate compile        # Syntax errors
npm run crux -- validate unified --rules=dollar-signs,comparison-operators
npm run crux -- validate entity-links   # Broken links

Fix Loop (Haiku)

If validation fails, use Haiku to fix mechanical issues:

if (validationErrors.length > 0) {
  await fixWithHaiku(draft, validationErrors);
  // Re-validate
}

Cost: $0.02-0.10 per fix cycle


Phase 6: Grading

Goal: Ensure quality meets threshold before accepting.

Use existing grading infrastructure:

node scripts/content/grade-by-template.mjs --page new-article

Quality Gates

GradeAction
Q4-Q5 (80+)Accept
Q3 (60-79)Targeted improvements
Q1-Q2 (below 60)Significant rework or reject

Cost Comparison

Old Approach: Single-Shot Opus

ComponentCost
Opus writes entire article$3-5
Often needs rework+$2-3
Total$5-8
QualityInconsistent

New Pipeline Approach

PhaseModelCost
Context assemblyHaiku$0.05
Strategic planningOpus$1.50-3.00
Deep researchPerplexity$0.50-1.00
DraftingSonnet$0.50-1.00
ValidationLocal$0.00
FixesHaiku$0.05-0.10
GradingHaiku$0.05
Total$2.65-5.20
QualityMore consistent
Key Insight

The pipeline approach often costs less AND produces better results because expensive reasoning (Opus) is focused where it matters—strategic planning—not wasted on prose generation.


Multi-Agent Architectures

For complex articles, consider specialized agents:

Agent Laboratory Pattern

Agent Laboratory (arXiv 2025) achieves 84% cost reduction using three stages:

  1. Literature review agent - Gathers sources
  2. Experimentation agent - Tests claims
  3. Report writing agent - Produces output

CrewAI Pattern

from crewai import Agent, Task, Crew

researcher = Agent(role='Researcher', goal='Find authoritative sources')
analyst = Agent(role='Analyst', goal='Identify key insights')
writer = Agent(role='Writer', goal='Produce clear prose')

crew = Crew(agents=[researcher, analyst, writer], tasks=[...])

Master-Planner-Executor-Writer

Multi-agent search architecture:

  • Master: Coordinates overall workflow
  • Planner: Decomposes tasks into DAG
  • Executor: Runs tool calls
  • Writer: Synthesizes into prose

Implementation Recommendations

For LongtermWiki Wiki

PriorityRecommendation
HighConvert page-improver to SDK (done ✅)
HighAdd context assembly phase before Opus
MediumIntegrate Perplexity via OpenRouter for deep research
MediumCreate page-creator with full pipeline
LowExplore multi-agent CrewAI for complex topics

Environment Setup

# .env additions
ANTHROPIC_API_KEY=sk-ant-...      # For Claude Code SDK
OPENROUTER_API_KEY=sk-or-...      # For Perplexity access

Budget Guidelines

Article TypeBudgetModel Mix
Simple stub expansion$1-2Haiku + Sonnet
Standard knowledge-base page$3-5Opus planning + Sonnet execution
Complex research report$5-10Opus + Perplexity + Sonnet
Flagship article$10-15Opus + Deep Research + Opus review

Products & APIs Landscape

This section catalogs tools available for AI-assisted research workflows as of early 2026.

Web Search / Grounding APIs

These provide real-time web search for grounding LLM responses with citations.

ProductPricingKey FeaturesBest For
Perplexity Sonar$1-5/1K queriesDeep research mode, multi-step reasoningComprehensive research
Exa AI$5/1K queries + free tierSemantic search, embeddings-based, research agentsAI-native search
Tavily$0.008/credit, 1K free/moSOC 2 certified, LangChain native, MCP supportProduction RAG pipelines
You.com APITiered plans93% SimpleQA score, MCP server, Deep SearchHigh-accuracy grounding
Brave Search API$4/1K + $5/1M tokens94.1% F1 SimpleQA, AI Grounding modePrivacy-focused, MCP
OpenRouter :online$4/1K resultsWorks with any model, Exa-poweredModel flexibility
Quick Recommendation

For LongtermWiki: Start with Perplexity Sonar via OpenRouter for deep research, and Brave or Tavily for quick grounding. Both have MCP servers for easy Claude Code integration.

Academic Literature Search

Specialized tools for searching and analyzing scientific papers.

ProductPricingDatabaseAPI?Best For
ElicitFreemium + paid plansSemantic Scholar (200M papers)Systematic reviews, data extraction
ConsensusFreemiumSemantic ScholarLimitedEvidence synthesis, yes/no questions
Undermind$16/moSemantic Scholar, PubMed, arXivDeep niche literature discovery
Semantic Scholar APIFree200M+ papersBuilding custom research tools
ResearchRabbitFreeCross-databaseCitation mapping, discovery

Specialized Corpus Search

Highly Relevant for AI Safety

Scry is particularly interesting for LongtermWiki because it includes curated content from LessWrong, EA Forum, AI safety research, and alignment papers—exactly the sources we cite most.

ProductPricingCorpusAPI?Best For
ScryFree / $9/mo72M docs: arXiv, LessWrong, EA Forum, X, Wikipedia✅ SQL+vectorAI safety research, reproducible queries

Scry Key Features:

  • SQL + vector search with arbitrary query composition
  • Semantic operations: mixing concepts, debiasing ("X but not Y"), contrastive axes
  • Curated sources: arXiv, bioRxiv, PhilPapers, LessWrong, EA Forum, HN, Twitter/X, Bluesky
  • Reproducibility: visible SQL queries, structured metadata, iterative refinement
  • Custom embeddings: store named concept vectors for reuse

Scry vs ChatGPT Deep Research: Scry emphasizes control and reproducibility (you write SQL, see exactly what matched), while Deep Research is opaque but broader. Scry is better for iterative exploration of a fixed corpus; Deep Research for one-shot web synthesis.

Academic vs Web Search

For AI safety topics, you often need both: academic tools (Elicit, Semantic Scholar) for papers + web search (Perplexity, Exa) for reports, blog posts, and recent developments.

Scientific Research Agents

Full agentic systems for autonomous research.

ProductAccessFocusKey Capability
FutureHouse FalconWeb + APIScientific literatureDeep synthesis across thousands of papers
FutureHouse CrowWeb + APIQuick scientific Q&AFast factual answers with citations
OpenAI Deep ResearchChatGPT Pro/PlusGeneral researchMulti-step web research, o3-powered
Gemini Deep ResearchConsumer onlyGeneral + Google ecosystemGmail, Drive, Docs integration
Grok DeepSearchxAI APIGeneral + X/TwitterReal-time social + web, very fast
API Availability

OpenAI and Gemini Deep Research are not available via API for programmatic access. For automated pipelines, use Perplexity Sonar, Exa Research, or FutureHouse.

Web Scraping / Content Extraction

For when you need to extract content from specific URLs.

ProductPricingKey Features
Firecrawl$16-719/moLLM-ready markdown, 67% token reduction
Jina ReaderFree tier availableURL to markdown, simple API
ApifyUsage-basedWeb scraping platform, many actors

MCP Servers for Claude Code

Model Context Protocol servers enable direct integration with Claude Code.

MCP ServerFunctionSource
Brave SearchWeb search groundingOfficial
ExaSemantic web searchOfficial
TavilySearch + extractOfficial
You.comWeb searchOfficial
PerplexityDeep researchCommunity
FirecrawlURL scrapingOfficial

Cost Comparison Matrix

Estimated costs for a typical research task (finding 20 sources with summaries):

ApproachEst. CostQualitySpeed
Manual Google + reading$0 (your time)HighSlow
Perplexity Sonar Deep Research$0.50-1.00HighFast
Exa Research Pro$0.50-1.50HighFast
Claude WebSearch (5 searches)≈$0.10MediumFast
Elicit (with extraction)$0.50-2.00High (academic)Medium
FutureHouse FalconUnknownVery High (scientific)Medium
OpenAI Deep ResearchIncluded in $20/moHighSlow (minutes)

Open Questions

QuestionWhy It MattersCurrent State
Perplexity vs Exa vs Tavily?Affects research phase designNeed empirical comparison
Optimal context window size?Too much noise, too little misses info≈30-50K tokens seems good
Human-in-loop checkpoints?Quality vs automation tradeoffAfter planning phase?
Caching research results?Reuse across similar articlesNot implemented yet
MCP vs direct API?Integration complexity vs flexibilityMCP simpler but less control
FutureHouse for AI safety?Scientific focus may miss grey literatureWorth testing

Sources

Best Practices

Deep Research APIs

Search APIs

Academic Research Tools

Scientific Research Agents

Research Frameworks

Model Comparisons