AI-Assisted Rhetoric Highlighting
AI-Assisted Rhetoric Highlighting
A proposed automated system for detecting and flagging persuasive-but-misleading rhetoric, including logical fallacies, emotionally loaded language, selective quoting, and citation misrepresentation. Could serve as a reading aid or author-side linting tool.
Part of the Design Sketches for Collective Epistemics series by Forethought Foundation.
Overview
Rhetoric Highlighting is a proposed automated system that identifies potentially manipulative rhetorical moves in text—logical fallacies, emotionally loaded language, selective quoting, misrepresented citations, buried assumptions, and statistical distortions—and flags them to readers or writers. The concept was outlined in Forethought Foundation's 2025 report "Design Sketches for Collective Epistemics."
Unlike fact-checking, which assesses whether claims are true, rhetoric highlighting assesses how claims are presented. A statement can be technically true while being deeply misleading through framing, emphasis, omission, or emotional manipulation. Rhetoric highlighting aims to make these moves visible.
The system could operate in two modes:
- Reader mode: Highlights and annotates published text, helping readers identify where they might be being manipulated
- Writer mode: Functions as a "rhetoric linter" that helps authors strengthen their reasoning and avoid accidental misrepresentation before publishing
How It Would Work
Step-by-Step Pipeline
- Text decomposition: Parse the document into sentences and extract explicit and implied claims
- Context retrieval: Fetch cited passages, background information, and relevant context to evaluate claims against
- Rhetoric classification: Run trained classifiers on each sentence to detect multiple categories of rhetorical issues
- Impact assessment: Evaluate severity—a minor hedging issue matters less than a fundamentally misrepresented citation
- User-facing output: Display results as color-coded highlights with hover explanations, click-through details, and category-based filtering
Categories of Rhetoric Detected
| Category | Description | Example |
|---|---|---|
| Logical fallacies | Arguments that don't logically follow | Ad hominem attacks, false dichotomies, appeal to authority |
| Emotionally loaded language | Words chosen to manipulate feelings rather than inform | "Catastrophic failure" vs. "significant setback" |
| Selective quoting | Quotes taken out of context to change meaning | Cherry-picking a sentence that reverses the author's actual conclusion |
| Citation misrepresentation | Cited sources don't support the claims made | Paper cited as "proving X" when it actually found mixed results |
| Statistical distortions | Misleading use of numbers | Relative vs. absolute risk, base rate neglect, misleading axes |
| Buried assumptions | Key assumptions hidden in phrasing | "Given that X is inevitable..." when X is contested |
| False balance | Presenting fringe views as equally credible | "Some scientists say climate change is real, others disagree" |
| Anchoring | Initial framing that biases interpretation | Leading with an extreme scenario to make moderate claims seem reasonable |
Technical Feasibility
Cost Analysis
Forethought provides a detailed cost estimate. For one hour of reading (approximately 30 pages):
| Parameter | Value |
|---|---|
| Pages per hour of reading | approximately 30 |
| Sentences per page | about 20 |
| LLM calls per sentence | about 5 (decomposition, retrieval, classification, assessment, drafting) |
| Tokens per call | about 1,000 |
| Total tokens per hour | about 3 million |
At current (2025) LLM pricing:
- Cheapest models: about $1 per hour of reading
- Most capable models: Hundreds of dollars per hour of reading
- Expected trajectory: As inference costs fall roughly 10x/year, costs should reach $0.10–1.00 per hour within 2-3 years
Speed Constraints
The multi-step pipeline creates latency challenges:
- Each sentence requires multiple sequential LLM calls
- Real-time highlighting while reading may require pre-processing
- Caching and batching can help for static content
- Streaming/progressive display could improve perceived responsiveness
Current Economic Viability
Given 2025 costs, rhetoric highlighting is currently viable only for:
- High-stakes content: Policy documents, legal filings, major publications
- Widely-read content: Articles with millions of readers (cost amortized)
- Author-side use: Writers checking their own work before publication (lower coverage needed)
- Educational contexts: Teaching critical thinking with annotated examples
Existing Work and Related Tools
Academic Research on Automated Rhetoric Detection
The field of computational argumentation and rhetoric analysis has been growing significantly:
| Research Area | Key Work | Status |
|---|---|---|
| Argument mining | Centre for Argument Technology (ARG-tech), University of Dundee, led by Professor Chris Reed. Developed the Argument Interchange Format (AIF) standard ontology and AIFdb, the largest publicly accessible corpus of annotated argumentation. Annual ArgMining workshops at ACL since 2014. | Active research infrastructure |
| Logical fallacy detection | Jin et al. (2022) "Logical Fallacy Detection" introduced LOGIC dataset with 2,449 instances across 13 fallacy types plus LogicClimate challenge set. GPT-4 achieves 79-90% accuracy depending on conditions (Carstens et al., 2024). | Published benchmark; LLMs improving |
| Propaganda detection | SemEval-2020 Task 11: Detection of Propaganda Techniques in News Articles (14 techniques, 250 teams). Extended through SemEval-2023 (23 persuasion techniques across 9 languages) and SemEval-2024 (multilingual meme analysis). | Multi-year shared task series |
| Deceptive reasoning | RuozhiBench (2025): 677 questions testing LLMs against deceptive reasoning. Best model achieved only 62% vs. humans at 90%+. | Active benchmark |
| Claim verification | FEVER (Fact Extraction and VERification) benchmark and subsequent work | Active benchmark |
| Citation verification | SciFact and related datasets for scientific claim verification against cited papers | Active research |
| Hedge/weasel word detection | Ganter & Strube (2009) used Wikipedia's weasel-word annotations; updated in 2024. Detects vague language like "some people say," "researchers believe." | Established subfield |
| Rhetorical figure detection | Systematic survey (2024) covering 24 different rhetorical figures and computational detection methods | Active research |
Existing Tools and Prototypes
| Tool | Description | Approach | Adoption |
|---|---|---|---|
| FallacyCheck | Browser extension using inoculation theory; detects 13 fallacy types (MUM 2024) | LLM; proactive questioning | Research prototype |
| Skeptic Reader | Chrome/Firefox extension scoring balance, coherence, objectivity via GPT-4o | LLM; scoring | Early stage |
| FallacyFilter | Chrome extension detecting biases and logical fallacies | LLM; browser extension | Small |
| IBM Project Debater | Argument mining across 10B sentences; public APIs for claim/evidence detection | NLP pipeline; APIs | Enterprise; niche |
| Kialo | Argument mapping with hierarchical pro/con trees across 49 languages | Human-driven | 1M+ users; 400K+ discussions |
| Grammarly | Writing assistant flagging tone and clarity issues | Rule-based + ML | 30M+ daily users |
| Ad Fontes Media | Rates news sources on reliability and bias axes | Human rating | Widely cited bias chart |
| Ground News | Cross-outlet story comparison; "Blindspot" feature for lopsided coverage | Aggregation | Growing mobile app |
| Logically.ai | AI-powered fact-checking and harmful content detection | Commercial platform | Gov/enterprise clients |
| fallacycheck.com | Automated fallacy detection crawling news, editorials, social media | Web crawling | Small; niche |
LLM-Based Approaches
Recent LLM capabilities make several components of rhetoric highlighting more feasible than they were with traditional NLP:
- Zero-shot fallacy detection: GPT-4 achieves 79-90% accuracy on the LOGIC benchmark depending on conditions. Prompt enrichment with counterarguments and explanations (NAACL 2025) improved F1 by up to 0.60 in zero-shot settings.
- Citation verification: LLMs can compare claims against cited sources and identify misrepresentations, though reliability varies
- Tone analysis: Modern models can distinguish between informative and manipulative framing with increasing sophistication
- Inoculation approach: The most successful real-world deployments (Google Jigsaw's prebunking videos, FallacyCheck) use inoculation theory—teaching users to recognize techniques rather than filtering content
- Argument reconstruction: LLMs can extract implicit premises and unstated assumptions from natural language
However, LLMs also introduce new challenges: they can confidently flag non-issues, miss subtle manipulation, and themselves produce rhetoric that would warrant highlighting.
Target Applications
Near-Term (High-Value, Low-Scale)
- Academic peer review: Flag citation misrepresentation and logical gaps in manuscript reviews
- Preprint servers: Annotate papers on arXiv/medRxiv before formal peer review
- Policy analysis: Highlight rhetorical moves in government reports, legislative proposals
- Journalism tools: Help reporters identify manipulation in sources' statements
Medium-Term (Broader Deployment)
- Author-side plugins: Writing tools that warn about ambiguous phrasing, unsupported claims
- Educational platforms: Teach critical thinking by showing rhetoric patterns in real text
- Fact-checker augmentation: Speed up professional fact-checkers by pre-identifying issues
Long-Term (Universal Access)
- Browser extensions: Real-time annotation of any web content
- Social media integration: Platform-level rhetoric flagging
- Email and messaging: Highlight manipulation in personal communications
Suggested Prototypes (from Forethought)
- Author-side plugin: Warning about ambiguous phrasing or unsupported claims during writing
- Cite-checker: Verifying that paper quotations accurately represent the source
- Marked-up news articles: Demonstrations of rhetoric patterns highlighted in published news
Worked Example: AI Lab Blog Post
Consider a hypothetical AI lab blog post announcing a new model:
"Our groundbreaking model achieves superhuman performance on every major benchmark, making it the most capable AI system ever created. Independent researchers have confirmed that this represents a fundamental leap in intelligence. While some have raised safety concerns, our rigorous testing shows the model is completely safe for deployment."
A rhetoric highlighting system would annotate this passage as follows:
| Sentence Fragment | Flag | Explanation |
|---|---|---|
| "superhuman performance on every major benchmark" | Overgeneralization | Most models excel on some benchmarks but not others. "Every" is likely false or requires significant qualification. |
| "most capable AI system ever created" | Superlative claim without qualification | Capability depends on the metric. This implies universal superiority, which is almost never true. |
| "Independent researchers have confirmed" | Vague attribution | Which researchers? What specifically did they confirm? "Have confirmed" implies consensus that may not exist. |
| "a fundamental leap in intelligence" | Emotionally loaded language | "Leap" and "intelligence" are both contested terms that imply more than benchmark improvements warrant. |
| "While some have raised safety concerns" | Dismissive framing | "While some" minimizes safety concerns and positions them as a minority view to be acknowledged then dismissed. |
| "completely safe for deployment" | Absolute safety claim | No system is "completely safe." This is a red flag for missing caveats about limitations and risk mitigation. |
In writer mode, these flags would appear as the author drafts the post, encouraging more precise language: "achieves state-of-the-art on 7 of 12 major benchmarks" instead of "every," and "our evaluations found no critical safety issues in tested scenarios" instead of "completely safe."
Extensions and Open Ideas
Rhetoric diff: Compare two versions of a statement—an original and a revision—to visualize how rhetoric changed. Useful for tracking how press releases evolve from internal drafts, or how a claim morphs as it's reported across outlets. "The original paper said 'modest improvement'; the press release said 'breakthrough.'"
Author rhetoric profiles: Aggregate rhetoric patterns across an author's or organization's body of work. "This author uses emotional language 3x more than the domain average" or "This organization's press releases consistently use absolute safety claims." This connects rhetoric highlighting to reliability tracking.
Rhetoric translation: Automatically rewrite flagged sentences in neutral language and show the comparison side-by-side. Not to replace the original, but to help readers see what the same information looks like without the rhetorical moves. "Here's what this paragraph says if you remove the loaded framing."
Symmetric debate highlighting: When analyzing content about a contested topic, highlight rhetoric on all sides symmetrically, not just the side the system's training data identifies as "wrong." This addresses the concern that rhetoric highlighting could become a partisan tool.
Integration with LLM output: Apply rhetoric highlighting to AI-generated content itself. Users could enable a mode where their AI assistant's responses are simultaneously checked for rhetorical manipulation—a self-auditing feature that builds trust.
Calibrated confidence for flags: Rather than binary flag/no-flag, each annotation could come with a confidence score: "85% likely this is selective quoting" vs. "55% likely this is emotionally loaded language." Users set their own threshold for what to display.
Collaborative annotation refinement: When the system flags something incorrectly, users can dispute it. Disputed flags are reviewed by other users (similar to the community notes bridging algorithm), creating a feedback loop that improves the system's accuracy over time.
Challenges and Risks
False Positives and Chilling Effects
The most significant risk is that rhetoric highlighting could discourage legitimate persuasion. Not all emotional language is manipulative; not all simplification is distortion. A system that flags too aggressively could:
- Make writing sterile and unengaging
- Discourage strong advocacy for important causes
- Create a new form of tone policing
- Advantage bland corporate communication over passionate individual voices
Subjectivity of "Misleading"
What counts as manipulative rhetoric is often subjective:
- Cultural context matters—rhetorical norms differ across communities
- Some "fallacies" are reasonable heuristics in everyday reasoning
- The boundary between persuasion and manipulation is genuinely fuzzy
- Political framing is inherently contestable
Gaming and Arms Races
Sophisticated communicators could adapt to avoid detection while maintaining manipulation:
- Use more subtle rhetorical techniques
- Structure arguments to technically avoid flagged patterns
- Preemptively address flags in ways that make them seem unreasonable
- This could create an arms race similar to SEO vs. search algorithms
Power Dynamics
- Who controls the definitions? The choice of what constitutes "misleading rhetoric" embeds values
- Asymmetric impact: Could disproportionately flag certain communication styles, dialects, or cultural norms
- Corporate capture: Could be tuned to favor certain political perspectives or commercial interests
Connection to AI Safety
Rhetoric highlighting connects to AI safety in multiple ways:
- AI-generated persuasion: As AI systems become better at generating persuasive content, tools that help humans detect manipulation become more important for maintaining epistemic healthAi Transition Model ParameterEpistemic HealthThis page contains only a component placeholder with no actual content. Cannot be evaluated for AI prioritization relevance.
- Sycophancy detection: The same techniques could be applied to AI outputs, flagging when AI systems use rhetorically manipulative patterns to tell users what they want to hear
- Policy discourse: Improving the quality of debate about AI governance could lead to better regulatory outcomes
- Civilizational competenceAi Transition Model FactorCivilizational CompetenceSociety's aggregate capacity to navigate AI transition well—including governance effectiveness, epistemic health, coordination capacity, and adaptive resilience.: Populations that can better identify manipulation are better positioned to make wise collective decisions about transformative AI
Key Uncertainties
Key Questions
- ?Can automated rhetoric detection distinguish genuine persuasion from manipulation reliably enough to be useful?
- ?Will the chilling effect on legitimate speech outweigh the benefits of flagging manipulation?
- ?How quickly will costs fall enough to make real-time rhetoric highlighting viable for everyday reading?
- ?Can the system be made robust to adversarial adaptation by sophisticated communicators?
- ?What governance structure can ensure rhetoric highlighting definitions remain balanced across perspectives?
Further Reading
- Original Report: Design Sketches for Collective Epistemics — Rhetoric Highlighting — Forethought Foundation
- Related Research: Logical Fallacy Detection — Jin et al. (2022), introducing the LOGIC benchmark
- Computational Argumentation: Argument Mining — ACL Workshop series since 2014
- Overview: Design Sketches for Collective Epistemics — parent page with all five proposed tools