Internal guide recommending a multi-phase AI research pipeline (context assembly → Opus planning → Perplexity research → Sonnet drafting → validation) over single-shot prompts, with cost estimates of $2.65-5.20 per article. Provides comprehensive survey of 2025-2026 research APIs (Perplexity, Exa, Tavily, Elicit, Scry) with pricing and feature comparisons.
Added `evergreen: false` frontmatter field to allow pages (reports, experiments, proposals) to opt out of the update schedule. Full feature implementation: frontmatter schema + validation (evergreen: false + update_frequency is an error), Page interface + build-data, getUpdateSchedule(), bootstrap/reassign scripts, updates command, staleness checker, PageStatus UI (shows "Point-in-time content · Not on update schedule"), IssuesSection (no stale warnings for non-evergreen). Applied to all 6 internal report pages. Updated automation-tools docs.
AI-Assisted Research Workflows: Best Practices
Executive Summary
| Finding | Key Insight | Recommendation |
|---|---|---|
| Plan before executing | Claude tends to jump straight to writing without sufficient research | Use explicit "research → plan → execute" phases |
| Opus for strategy, Sonnet for execution | Model selection matters by phase | Spend budget on thinking (Opus), not typing (Sonnet) |
| Deep Research APIs exist | Perplexity Sonar, OpenAIOrganizationOpenAIComprehensive organizational profile of OpenAI documenting evolution from 2015 non-profit to commercial AGI developer, with detailed analysis of governance crisis, safety researcher exodus (75% of ... Deep Research, Gemini Deep Research | Consider OpenRouter for Perplexity API access |
| Context assembly is underrated | LLMs work better with curated context than raw search | Pre-gather resources before AI reasoning |
| Multi-agent beats monolithic | Specialized agents outperform single prompts | Separate researcher, writer, validator roles |
Background
Most AI-assisted writing produces shallow articles because the AI jumps straight to writing without sufficient research or strategic thinking. The fix isn't a better prompt—it's a better pipeline.
This report surveys best practices for AI-assisted research workflows in 2025-2026, drawing from:
- Anthropic's Claude Code best practices
- Multi-agent orchestration frameworks (LangChain, CrewAI)
- Deep research API providers (Perplexity, OpenAI, Google, xAI)
- Academic work on autonomous research agents (Agent Laboratory)
The Research Pipeline Problem
Why Single-Shot Prompts Fail
A typical approach:
"Write a comprehensive article about compute governance"
This fails because:
- No context gathering - AI uses only training data, misses recent developments
- No strategic planning - AI doesn't think about what actually matters
- Premature writing - Starts generating prose before understanding the topic
- No validation - Errors compound without feedback loops
The Better Architecture
"Asking Claude to research and plan first significantly improves performance for problems requiring deeper thinking upfront—without this, Claude tends to jump straight to coding a solution." — Claude Code Best Practices
Context Assembly → Strategic Planning → Targeted Research → Drafting → Validation → Grading
(local) (Opus) (Perplexity) (Sonnet) (scripts) (Haiku)
Phase 1: Context Assembly
Goal: Gather everything relevant before invoking expensive AI reasoning.
What to Gather
| Source | Method | Cost |
|---|---|---|
| Related wiki pages | Glob/Grep for topic mentions | Free |
| Existing resources | Query resources database | Free |
| Entity relationships | Check backlinks, cross-refs | Free |
| Summarize context | Haiku to compress | $0.01-0.05 |
Why This Matters
Feeding raw search results to Opus wastes expensive tokens on noise. Pre-curated context lets the expensive model focus on thinking, not filtering.
Implementation Pattern
// 1. Find related pages (free)
const relatedPages = await searchWiki(topic);
// 2. Find existing resources (free)
const resources = await queryResourcesDB(topic);
// 3. Summarize with Haiku ($0.02)
const contextBundle = await summarizeContext({
model: 'haiku',
pages: relatedPages,
resources: resources
});
Phase 2: Strategic Planning (Opus)
Goal: Figure out what this article should actually cover and why.
The difference between a mediocre and excellent article is usually in the framing, not the prose. Opus excels at strategic thinking—use it here.
What Opus Should Decide
| Question | Why It Matters |
|---|---|
| What are the key cruxes/debates? | Structures the entire article |
| What's the right framing? | Determines reader takeaway |
| What's already well-covered elsewhere? | Avoids duplication |
| What specific questions need external research? | Directs Phase 3 |
| What's the relationship to existing pages? | Enables cross-linking |
Prompt Pattern
Given this context bundle about [TOPIC]:
[CONTEXT_BUNDLE]
You are planning a wiki article. Before any writing, think through:
1. **Cruxes**: What are the 2-3 key debates or uncertainties about this topic?
2. **Framing**: What's the most useful frame for readers? (risk? opportunity? tradeoff?)
3. **Gap analysis**: What does existing coverage miss?
4. **Research questions**: What specific questions need external research?
5. **Structure**: What sections would best serve readers?
Do NOT write the article. Output a structured plan in JSON.
Budget
| Complexity | Est. Input | Est. Output | Cost |
|---|---|---|---|
| Simple topic | 20K tokens | 2K tokens | $0.50-1.00 |
| Complex topic | 50K tokens | 5K tokens | $2.00-4.00 |
Phase 3: Targeted Research
Goal: Fill specific gaps identified in the plan—not open-ended browsing.
Option A: Claude Code WebSearch
Uses Claude's built-in web search. Good integration but limited depth.
// Directed by the plan
for (const question of plan.researchQuestions) {
await webSearch(question);
}
Cost: Included in Claude API pricing
Depth: Moderate (single search per query)
Option B: Perplexity Sonar via OpenRouter
Perplexity Sonar Deep Research is purpose-built for comprehensive research. Available via OpenRouter API.
| Model | Use Case | Pricing |
|---|---|---|
| sonar | Quick lookups | $1/1M tokens |
| sonar-pro | Deeper search | $3/1M tokens + $5/1K searches |
| sonar-deep-research | Comprehensive reports | $3/1M tokens + $5/1K searches |
"Sonar Deep Research autonomously searches, reads, and evaluates sources, refining its approach as it gathers information. This enables comprehensive report generation across domains." — OpenRouter
Integration Example:
const openrouter = new OpenAI({
baseURL: 'https://openrouter.ai/api/v1',
apiKey: process.env.OPENROUTER_API_KEY,
});
const research = await openrouter.chat.completions.create({
model: 'perplexity/sonar-deep-research',
messages: [{
role: 'user',
content: `Research: ${plan.researchQuestions.join('\n')}`
}],
});
Option C: Other Deep Research APIs
| Provider | API Available? | Notes |
|---|---|---|
| Perplexity | ✅ via OpenRouter | Best for research depth |
| OpenAI Deep Research | ⚠️ Limited | Azure AI Foundry only |
| Gemini Deep Research | ❌ | No API (consumer only) |
| Grok DeepSearch | ⚠️ Limited | xAI API, X integration |
Option D: Open Source
Open Deep Research (HuggingFace) provides an open-source implementation with 10K+ GitHub stars.
Phase 4: Drafting (Sonnet)
Goal: Execute the plan with research in hand.
Don't give Sonnet the raw research dump. Give it: (1) the plan from Opus, (2) curated findings, (3) the style guide.
Prompt Pattern
You are writing a wiki article based on this plan:
[OPUS_PLAN]
Using these research findings:
[CURATED_RESEARCH]
Following this style guide:
[STYLE_GUIDE_EXCERPT]
Write the article. Use tables over bullet lists. Include citations.
Escape all dollar signs (\\$100 not $100).
Why Sonnet, Not Opus?
Drafting is execution, not strategy. Sonnet:
- Follows instructions well
- Costs 1/10th of Opus
- Produces similar prose quality when given a good plan
Cost: $0.50-1.50 per article
Phase 5: Validation
Goal: Catch errors before they compound.
Automated Checks (Free)
npm run crux -- validate compile # Syntax errors
npm run crux -- validate unified --rules=dollar-signs,comparison-operators
npm run crux -- validate entity-links # Broken links
Fix Loop (Haiku)
If validation fails, use Haiku to fix mechanical issues:
if (validationErrors.length > 0) {
await fixWithHaiku(draft, validationErrors);
// Re-validate
}
Cost: $0.02-0.10 per fix cycle
Phase 6: Grading
Goal: Ensure quality meets threshold before accepting.
Use existing grading infrastructure:
node scripts/content/grade-by-template.mjs --page new-article
Quality Gates
| Grade | Action |
|---|---|
| Q4-Q5 (80+) | Accept |
| Q3 (60-79) | Targeted improvements |
| Q1-Q2 (below 60) | Significant rework or reject |
Cost Comparison
Old Approach: Single-Shot Opus
| Component | Cost |
|---|---|
| Opus writes entire article | $3-5 |
| Often needs rework | +$2-3 |
| Total | $5-8 |
| Quality | Inconsistent |
New Pipeline Approach
| Phase | Model | Cost |
|---|---|---|
| Context assembly | Haiku | $0.05 |
| Strategic planning | Opus | $1.50-3.00 |
| Deep research | Perplexity | $0.50-1.00 |
| Drafting | Sonnet | $0.50-1.00 |
| Validation | Local | $0.00 |
| Fixes | Haiku | $0.05-0.10 |
| Grading | Haiku | $0.05 |
| Total | $2.65-5.20 | |
| Quality | More consistent |
The pipeline approach often costs less AND produces better results because expensive reasoning (Opus) is focused where it matters—strategic planning—not wasted on prose generation.
Multi-Agent Architectures
For complex articles, consider specialized agents:
Agent Laboratory Pattern
Agent Laboratory (arXiv 2025) achieves 84% cost reduction using three stages:
- Literature review agent - Gathers sources
- Experimentation agent - Tests claims
- Report writing agent - Produces output
CrewAI Pattern
from crewai import Agent, Task, Crew
researcher = Agent(role='Researcher', goal='Find authoritative sources')
analyst = Agent(role='Analyst', goal='Identify key insights')
writer = Agent(role='Writer', goal='Produce clear prose')
crew = Crew(agents=[researcher, analyst, writer], tasks=[...])
Master-Planner-Executor-Writer
Multi-agent search architecture:
- Master: Coordinates overall workflow
- Planner: Decomposes tasks into DAG
- Executor: Runs tool calls
- Writer: Synthesizes into prose
Implementation Recommendations
For LongtermWiki Wiki
| Priority | Recommendation |
|---|---|
| High | Convert page-improver to SDK (done ✅) |
| High | Add context assembly phase before Opus |
| Medium | Integrate Perplexity via OpenRouter for deep research |
| Medium | Create page-creator with full pipeline |
| Low | Explore multi-agent CrewAI for complex topics |
Environment Setup
# .env additions
ANTHROPIC_API_KEY=sk-ant-... # For Claude Code SDK
OPENROUTER_API_KEY=sk-or-... # For Perplexity access
Budget Guidelines
| Article Type | Budget | Model Mix |
|---|---|---|
| Simple stub expansion | $1-2 | Haiku + Sonnet |
| Standard knowledge-base page | $3-5 | Opus planning + Sonnet execution |
| Complex research report | $5-10 | Opus + Perplexity + Sonnet |
| Flagship article | $10-15 | Opus + Deep Research + Opus review |
Products & APIs Landscape
This section catalogs tools available for AI-assisted research workflows as of early 2026.
Web Search / Grounding APIs
These provide real-time web search for grounding LLM responses with citations.
| Product | Pricing | Key Features | Best For |
|---|---|---|---|
| Perplexity Sonar | $1-5/1K queries | Deep research mode, multi-step reasoning | Comprehensive research |
| Exa AI | $5/1K queries + free tier | Semantic search, embeddings-based, research agents | AI-native search |
| Tavily | $0.008/credit, 1K free/mo | SOC 2 certified, LangChain native, MCP support | Production RAG pipelines |
| You.com API | Tiered plans | 93% SimpleQA score, MCP server, Deep Search | High-accuracy grounding |
| Brave Search API | $4/1K + $5/1M tokens | 94.1% F1 SimpleQA, AI Grounding mode | Privacy-focused, MCP |
| OpenRouter :online | $4/1K results | Works with any model, Exa-powered | Model flexibility |
For LongtermWiki: Start with Perplexity Sonar via OpenRouter for deep research, and Brave or Tavily for quick grounding. Both have MCP servers for easy Claude Code integration.
Academic Literature Search
Specialized tools for searching and analyzing scientific papers.
| Product | Pricing | Database | API? | Best For |
|---|---|---|---|---|
| Elicit | Freemium + paid plans | Semantic Scholar (200M papers) | ✅ | Systematic reviews, data extraction |
| Consensus | Freemium | Semantic Scholar | Limited | Evidence synthesis, yes/no questions |
| Undermind | $16/mo | Semantic Scholar, PubMed, arXiv | ❌ | Deep niche literature discovery |
| Semantic Scholar API | Free | 200M+ papers | ✅ | Building custom research tools |
| ResearchRabbit | Free | Cross-database | ❌ | Citation mapping, discovery |
Specialized Corpus Search
Scry is particularly interesting for LongtermWiki because it includes curated content from LessWrong, EA Forum, AI safety research, and alignment papers—exactly the sources we cite most.
| Product | Pricing | Corpus | API? | Best For |
|---|---|---|---|---|
| Scry | Free / $9/mo | 72M docs: arXiv, LessWrong, EA Forum, X, Wikipedia | ✅ SQL+vector | AI safety research, reproducible queries |
Scry Key Features:
- SQL + vector search with arbitrary query composition
- Semantic operations: mixing concepts, debiasing ("X but not Y"), contrastive axes
- Curated sources: arXiv, bioRxiv, PhilPapers, LessWrong, EA Forum, HN, Twitter/X, Bluesky
- Reproducibility: visible SQL queries, structured metadata, iterative refinement
- Custom embeddings: store named concept vectors for reuse
Scry vs ChatGPT Deep Research: Scry emphasizes control and reproducibility (you write SQL, see exactly what matched), while Deep Research is opaque but broader. Scry is better for iterative exploration of a fixed corpus; Deep Research for one-shot web synthesis.
For AI safety topics, you often need both: academic tools (Elicit, Semantic Scholar) for papers + web search (Perplexity, Exa) for reports, blog posts, and recent developments.
Scientific Research Agents
Full agentic systems for autonomous research.
| Product | Access | Focus | Key Capability |
|---|---|---|---|
| FutureHouse Falcon | Web + API | Scientific literature | Deep synthesis across thousands of papers |
| FutureHouse Crow | Web + API | Quick scientific Q&A | Fast factual answers with citations |
| OpenAI Deep Research | ChatGPT Pro/Plus | General research | Multi-step web research, o3-powered |
| Gemini Deep Research | Consumer only | General + Google ecosystem | Gmail, Drive, Docs integration |
| Grok DeepSearch | xAI API | General + X/Twitter | Real-time social + web, very fast |
OpenAI and Gemini Deep Research are not available via API for programmatic access. For automated pipelines, use Perplexity Sonar, Exa Research, or FutureHouse.
Web Scraping / Content Extraction
For when you need to extract content from specific URLs.
| Product | Pricing | Key Features |
|---|---|---|
| Firecrawl | $16-719/mo | LLM-ready markdown, 67% token reduction |
| Jina Reader | Free tier available | URL to markdown, simple API |
| Apify | Usage-based | Web scraping platform, many actors |
MCP Servers for Claude Code
Model Context Protocol servers enable direct integration with Claude Code.
| MCP Server | Function | Source |
|---|---|---|
| Brave Search | Web search grounding | Official |
| Exa | Semantic web search | Official |
| Tavily | Search + extract | Official |
| You.com | Web search | Official |
| Perplexity | Deep research | Community |
| Firecrawl | URL scraping | Official |
Cost Comparison Matrix
Estimated costs for a typical research task (finding 20 sources with summaries):
| Approach | Est. Cost | Quality | Speed |
|---|---|---|---|
| Manual Google + reading | $0 (your time) | High | Slow |
| Perplexity Sonar Deep Research | $0.50-1.00 | High | Fast |
| Exa Research Pro | $0.50-1.50 | High | Fast |
| Claude WebSearch (5 searches) | ≈$0.10 | Medium | Fast |
| Elicit (with extraction) | $0.50-2.00 | High (academic) | Medium |
| FutureHouse Falcon | Unknown | Very High (scientific) | Medium |
| OpenAI Deep Research | Included in $20/mo | High | Slow (minutes) |
Open Questions
| Question | Why It Matters | Current State |
|---|---|---|
| Perplexity vs Exa vs Tavily? | Affects research phase design | Need empirical comparison |
| Optimal context window size? | Too much noise, too little misses info | ≈30-50K tokens seems good |
| Human-in-loop checkpoints? | Quality vs automation tradeoff | After planning phase? |
| Caching research results? | Reuse across similar articles | Not implemented yet |
| MCP vs direct API? | Integration complexity vs flexibility | MCP simpler but less control |
| FutureHouse for AI safety? | Scientific focus may miss grey literature | Worth testing |
Sources
Best Practices
- Claude Code: Best practices for agentic coding — Anthropic Engineering
- Multi-Step LLM Chains: Best Practices — Deepchecks
- 20 Agentic AI Workflow Patterns — Skywork AI
Deep Research APIs
- Perplexity Sonar Deep Research — OpenRouter
- OpenRouter Web Search — Real-time web grounding for any model
- Introducing Deep Research — OpenAI
- Deep Research AI Tools Comparison — Bright Inventions
Search APIs
- Exa AI — Semantic search API for AI applications
- Tavily — Web API for AI agents, $25M raise
- You.com APIs — 93% SimpleQA, Deep Search API
- Brave AI Grounding — 94.1% F1 score on SimpleQA
- Complete Guide to Web Search APIs 2025 — Firecrawl
Academic Research Tools
- Elicit — AI research assistant, 200M papers
- Semantic Scholar — Free academic search API
- Scry — SQL+vector search over 72M docs (LessWrong, EA Forum, arXiv, etc.)
- 8 Best AI Research Assistant Tools — Documind
- Best AI Research Tools for Literature Review — Medium
Scientific Research Agents
- FutureHouse Platform — Superintelligent AI agents for science
- FutureHouse customer story — Claude
- Agent Laboratory: Using LLM Agents as Research Assistants — arXiv 2025
Research Frameworks
- Autonomous Agents papers — GitHub (updated daily)
- LLM Agents Explained: Complete Guide — Dynamiq
- Top 5 MCP Search Tools 2025 — Oreate AI
Model Comparisons
- AI Deep Research: Claude vs ChatGPT vs Grok — AIMultiple
- Agentic LLMs in 2025 — Data Science Dojo
- Deep Research Survey — HuggingFace