Longterm Wiki
Updated 2026-02-06HistoryData
Page StatusResponse
Edited 7 days ago2.4k words
45
QualityAdequate
48
ImportanceReference
14
Structure14/15
515026%30%
Updated every 6 weeksDue in 5 weeks
Issues2
QualityRated 45 but structure suggests 93 (underrated by 48 points)
Links2 links could use <R> components

AI-Assisted Rhetoric Highlighting

Approach

AI-Assisted Rhetoric Highlighting

A proposed automated system for detecting and flagging persuasive-but-misleading rhetoric, including logical fallacies, emotionally loaded language, selective quoting, and citation misrepresentation. Could serve as a reading aid or author-side linting tool.

2.4k words

Part of the Design Sketches for Collective Epistemics series by Forethought Foundation.

Overview

Rhetoric Highlighting is a proposed automated system that identifies potentially manipulative rhetorical moves in text—logical fallacies, emotionally loaded language, selective quoting, misrepresented citations, buried assumptions, and statistical distortions—and flags them to readers or writers. The concept was outlined in Forethought Foundation's 2025 report "Design Sketches for Collective Epistemics."

Unlike fact-checking, which assesses whether claims are true, rhetoric highlighting assesses how claims are presented. A statement can be technically true while being deeply misleading through framing, emphasis, omission, or emotional manipulation. Rhetoric highlighting aims to make these moves visible.

The system could operate in two modes:

  • Reader mode: Highlights and annotates published text, helping readers identify where they might be being manipulated
  • Writer mode: Functions as a "rhetoric linter" that helps authors strengthen their reasoning and avoid accidental misrepresentation before publishing

How It Would Work

Loading diagram...

Step-by-Step Pipeline

  1. Text decomposition: Parse the document into sentences and extract explicit and implied claims
  2. Context retrieval: Fetch cited passages, background information, and relevant context to evaluate claims against
  3. Rhetoric classification: Run trained classifiers on each sentence to detect multiple categories of rhetorical issues
  4. Impact assessment: Evaluate severity—a minor hedging issue matters less than a fundamentally misrepresented citation
  5. User-facing output: Display results as color-coded highlights with hover explanations, click-through details, and category-based filtering

Categories of Rhetoric Detected

CategoryDescriptionExample
Logical fallaciesArguments that don't logically followAd hominem attacks, false dichotomies, appeal to authority
Emotionally loaded languageWords chosen to manipulate feelings rather than inform"Catastrophic failure" vs. "significant setback"
Selective quotingQuotes taken out of context to change meaningCherry-picking a sentence that reverses the author's actual conclusion
Citation misrepresentationCited sources don't support the claims madePaper cited as "proving X" when it actually found mixed results
Statistical distortionsMisleading use of numbersRelative vs. absolute risk, base rate neglect, misleading axes
Buried assumptionsKey assumptions hidden in phrasing"Given that X is inevitable..." when X is contested
False balancePresenting fringe views as equally credible"Some scientists say climate change is real, others disagree"
AnchoringInitial framing that biases interpretationLeading with an extreme scenario to make moderate claims seem reasonable

Technical Feasibility

Cost Analysis

Forethought provides a detailed cost estimate. For one hour of reading (approximately 30 pages):

ParameterValue
Pages per hour of readingapproximately 30
Sentences per pageabout 20
LLM calls per sentenceabout 5 (decomposition, retrieval, classification, assessment, drafting)
Tokens per callabout 1,000
Total tokens per hourabout 3 million

At current (2025) LLM pricing:

  • Cheapest models: about $1 per hour of reading
  • Most capable models: Hundreds of dollars per hour of reading
  • Expected trajectory: As inference costs fall roughly 10x/year, costs should reach $0.10–1.00 per hour within 2-3 years

Speed Constraints

The multi-step pipeline creates latency challenges:

  • Each sentence requires multiple sequential LLM calls
  • Real-time highlighting while reading may require pre-processing
  • Caching and batching can help for static content
  • Streaming/progressive display could improve perceived responsiveness

Current Economic Viability

Given 2025 costs, rhetoric highlighting is currently viable only for:

  • High-stakes content: Policy documents, legal filings, major publications
  • Widely-read content: Articles with millions of readers (cost amortized)
  • Author-side use: Writers checking their own work before publication (lower coverage needed)
  • Educational contexts: Teaching critical thinking with annotated examples

Existing Work and Related Tools

Academic Research on Automated Rhetoric Detection

The field of computational argumentation and rhetoric analysis has been growing significantly:

Research AreaKey WorkStatus
Argument miningCentre for Argument Technology (ARG-tech), University of Dundee, led by Professor Chris Reed. Developed the Argument Interchange Format (AIF) standard ontology and AIFdb, the largest publicly accessible corpus of annotated argumentation. Annual ArgMining workshops at ACL since 2014.Active research infrastructure
Logical fallacy detectionJin et al. (2022) "Logical Fallacy Detection" introduced LOGIC dataset with 2,449 instances across 13 fallacy types plus LogicClimate challenge set. GPT-4 achieves 79-90% accuracy depending on conditions (Carstens et al., 2024).Published benchmark; LLMs improving
Propaganda detectionSemEval-2020 Task 11: Detection of Propaganda Techniques in News Articles (14 techniques, 250 teams). Extended through SemEval-2023 (23 persuasion techniques across 9 languages) and SemEval-2024 (multilingual meme analysis).Multi-year shared task series
Deceptive reasoningRuozhiBench (2025): 677 questions testing LLMs against deceptive reasoning. Best model achieved only 62% vs. humans at 90%+.Active benchmark
Claim verificationFEVER (Fact Extraction and VERification) benchmark and subsequent workActive benchmark
Citation verificationSciFact and related datasets for scientific claim verification against cited papersActive research
Hedge/weasel word detectionGanter & Strube (2009) used Wikipedia's weasel-word annotations; updated in 2024. Detects vague language like "some people say," "researchers believe."Established subfield
Rhetorical figure detectionSystematic survey (2024) covering 24 different rhetorical figures and computational detection methodsActive research

Existing Tools and Prototypes

ToolDescriptionApproachAdoption
FallacyCheckBrowser extension using inoculation theory; detects 13 fallacy types (MUM 2024)LLM; proactive questioningResearch prototype
Skeptic ReaderChrome/Firefox extension scoring balance, coherence, objectivity via GPT-4oLLM; scoringEarly stage
FallacyFilterChrome extension detecting biases and logical fallaciesLLM; browser extensionSmall
IBM Project DebaterArgument mining across 10B sentences; public APIs for claim/evidence detectionNLP pipeline; APIsEnterprise; niche
KialoArgument mapping with hierarchical pro/con trees across 49 languagesHuman-driven1M+ users; 400K+ discussions
GrammarlyWriting assistant flagging tone and clarity issuesRule-based + ML30M+ daily users
Ad Fontes MediaRates news sources on reliability and bias axesHuman ratingWidely cited bias chart
Ground NewsCross-outlet story comparison; "Blindspot" feature for lopsided coverageAggregationGrowing mobile app
Logically.aiAI-powered fact-checking and harmful content detectionCommercial platformGov/enterprise clients
fallacycheck.comAutomated fallacy detection crawling news, editorials, social mediaWeb crawlingSmall; niche

LLM-Based Approaches

Recent LLM capabilities make several components of rhetoric highlighting more feasible than they were with traditional NLP:

  • Zero-shot fallacy detection: GPT-4 achieves 79-90% accuracy on the LOGIC benchmark depending on conditions. Prompt enrichment with counterarguments and explanations (NAACL 2025) improved F1 by up to 0.60 in zero-shot settings.
  • Citation verification: LLMs can compare claims against cited sources and identify misrepresentations, though reliability varies
  • Tone analysis: Modern models can distinguish between informative and manipulative framing with increasing sophistication
  • Inoculation approach: The most successful real-world deployments (Google Jigsaw's prebunking videos, FallacyCheck) use inoculation theory—teaching users to recognize techniques rather than filtering content
  • Argument reconstruction: LLMs can extract implicit premises and unstated assumptions from natural language

However, LLMs also introduce new challenges: they can confidently flag non-issues, miss subtle manipulation, and themselves produce rhetoric that would warrant highlighting.

Target Applications

Near-Term (High-Value, Low-Scale)

  1. Academic peer review: Flag citation misrepresentation and logical gaps in manuscript reviews
  2. Preprint servers: Annotate papers on arXiv/medRxiv before formal peer review
  3. Policy analysis: Highlight rhetorical moves in government reports, legislative proposals
  4. Journalism tools: Help reporters identify manipulation in sources' statements

Medium-Term (Broader Deployment)

  1. Author-side plugins: Writing tools that warn about ambiguous phrasing, unsupported claims
  2. Educational platforms: Teach critical thinking by showing rhetoric patterns in real text
  3. Fact-checker augmentation: Speed up professional fact-checkers by pre-identifying issues

Long-Term (Universal Access)

  1. Browser extensions: Real-time annotation of any web content
  2. Social media integration: Platform-level rhetoric flagging
  3. Email and messaging: Highlight manipulation in personal communications

Suggested Prototypes (from Forethought)

  • Author-side plugin: Warning about ambiguous phrasing or unsupported claims during writing
  • Cite-checker: Verifying that paper quotations accurately represent the source
  • Marked-up news articles: Demonstrations of rhetoric patterns highlighted in published news

Worked Example: AI Lab Blog Post

Consider a hypothetical AI lab blog post announcing a new model:

"Our groundbreaking model achieves superhuman performance on every major benchmark, making it the most capable AI system ever created. Independent researchers have confirmed that this represents a fundamental leap in intelligence. While some have raised safety concerns, our rigorous testing shows the model is completely safe for deployment."

A rhetoric highlighting system would annotate this passage as follows:

Sentence FragmentFlagExplanation
"superhuman performance on every major benchmark"OvergeneralizationMost models excel on some benchmarks but not others. "Every" is likely false or requires significant qualification.
"most capable AI system ever created"Superlative claim without qualificationCapability depends on the metric. This implies universal superiority, which is almost never true.
"Independent researchers have confirmed"Vague attributionWhich researchers? What specifically did they confirm? "Have confirmed" implies consensus that may not exist.
"a fundamental leap in intelligence"Emotionally loaded language"Leap" and "intelligence" are both contested terms that imply more than benchmark improvements warrant.
"While some have raised safety concerns"Dismissive framing"While some" minimizes safety concerns and positions them as a minority view to be acknowledged then dismissed.
"completely safe for deployment"Absolute safety claimNo system is "completely safe." This is a red flag for missing caveats about limitations and risk mitigation.

In writer mode, these flags would appear as the author drafts the post, encouraging more precise language: "achieves state-of-the-art on 7 of 12 major benchmarks" instead of "every," and "our evaluations found no critical safety issues in tested scenarios" instead of "completely safe."

Extensions and Open Ideas

Rhetoric diff: Compare two versions of a statement—an original and a revision—to visualize how rhetoric changed. Useful for tracking how press releases evolve from internal drafts, or how a claim morphs as it's reported across outlets. "The original paper said 'modest improvement'; the press release said 'breakthrough.'"

Author rhetoric profiles: Aggregate rhetoric patterns across an author's or organization's body of work. "This author uses emotional language 3x more than the domain average" or "This organization's press releases consistently use absolute safety claims." This connects rhetoric highlighting to reliability tracking.

Rhetoric translation: Automatically rewrite flagged sentences in neutral language and show the comparison side-by-side. Not to replace the original, but to help readers see what the same information looks like without the rhetorical moves. "Here's what this paragraph says if you remove the loaded framing."

Symmetric debate highlighting: When analyzing content about a contested topic, highlight rhetoric on all sides symmetrically, not just the side the system's training data identifies as "wrong." This addresses the concern that rhetoric highlighting could become a partisan tool.

Integration with LLM output: Apply rhetoric highlighting to AI-generated content itself. Users could enable a mode where their AI assistant's responses are simultaneously checked for rhetorical manipulation—a self-auditing feature that builds trust.

Calibrated confidence for flags: Rather than binary flag/no-flag, each annotation could come with a confidence score: "85% likely this is selective quoting" vs. "55% likely this is emotionally loaded language." Users set their own threshold for what to display.

Collaborative annotation refinement: When the system flags something incorrectly, users can dispute it. Disputed flags are reviewed by other users (similar to the community notes bridging algorithm), creating a feedback loop that improves the system's accuracy over time.

Challenges and Risks

False Positives and Chilling Effects

The most significant risk is that rhetoric highlighting could discourage legitimate persuasion. Not all emotional language is manipulative; not all simplification is distortion. A system that flags too aggressively could:

  • Make writing sterile and unengaging
  • Discourage strong advocacy for important causes
  • Create a new form of tone policing
  • Advantage bland corporate communication over passionate individual voices

Subjectivity of "Misleading"

What counts as manipulative rhetoric is often subjective:

  • Cultural context matters—rhetorical norms differ across communities
  • Some "fallacies" are reasonable heuristics in everyday reasoning
  • The boundary between persuasion and manipulation is genuinely fuzzy
  • Political framing is inherently contestable

Gaming and Arms Races

Sophisticated communicators could adapt to avoid detection while maintaining manipulation:

  • Use more subtle rhetorical techniques
  • Structure arguments to technically avoid flagged patterns
  • Preemptively address flags in ways that make them seem unreasonable
  • This could create an arms race similar to SEO vs. search algorithms

Power Dynamics

  • Who controls the definitions? The choice of what constitutes "misleading rhetoric" embeds values
  • Asymmetric impact: Could disproportionately flag certain communication styles, dialects, or cultural norms
  • Corporate capture: Could be tuned to favor certain political perspectives or commercial interests

Connection to AI Safety

Rhetoric highlighting connects to AI safety in multiple ways:

  • AI-generated persuasion: As AI systems become better at generating persuasive content, tools that help humans detect manipulation become more important for maintaining epistemic health
  • Sycophancy detection: The same techniques could be applied to AI outputs, flagging when AI systems use rhetorically manipulative patterns to tell users what they want to hear
  • Policy discourse: Improving the quality of debate about AI governance could lead to better regulatory outcomes
  • Civilizational competence: Populations that can better identify manipulation are better positioned to make wise collective decisions about transformative AI

Key Uncertainties

Key Questions

  • ?Can automated rhetoric detection distinguish genuine persuasion from manipulation reliably enough to be useful?
  • ?Will the chilling effect on legitimate speech outweigh the benefits of flagging manipulation?
  • ?How quickly will costs fall enough to make real-time rhetoric highlighting viable for everyday reading?
  • ?Can the system be made robust to adversarial adaptation by sophisticated communicators?
  • ?What governance structure can ensure rhetoric highlighting definitions remain balanced across perspectives?

Further Reading

Related Pages

Top Related Pages

Approaches

Design Sketches for Collective EpistemicsAI-Augmented Forecasting