Methods

Hybrid Multi-Agent Factuality Framework

Our hybrid system combines a multi-agent orchestration built with Google’s Agent Development Kit (ADK) and a set of factuality factor machine learning models trained on LIAR-PLUS. The agents and models collaborate to evaluate news articles along four factuality dimensions and produce an interpretable, overall credibility assessment.

System architecture overview

We decomposed the factuality evaluation process into three layers:

  1. Claim extraction and verification – identify and verify atomic factual claims in an article.
  2. Parallel factuality analysis – four specialized LLM agents independently score different factuality factors in parallel.
  3. Sequential score aggregation – a merger agent combines factor-level outputs into an overall truthfulness assessment.
Multi-agent system architecture diagram showing an orchestrator feeding a claim extraction and verification pipeline, a parallel layer of four factor agents with ML tools, and a merger agent that aggregates outputs.
Multi-agent system architecture: sequential claim pipeline, parallel factor analysis, and merger aggregation.

Layer 1 Claim extraction & verification

Claim extraction agent

The claim extraction agent decomposes an article into testable factual claims that satisfy three constraints:

  • Factual rather than purely opinion or rhetoric.
  • Testable against external sources.
  • Self-contained and unambiguous.

The agent outputs a structured list of claims that downstream components treat as the unit of verification.

Claim verification agent with external search

For each extracted claim, a verification agent calls a google_search tool within ADK to retrieve relevant evidence. For every claim, the agent:

  1. Constructs a targeted search query.
  2. Calls the search tool to retrieve candidate sources.
  3. Assesses the credibility and agreement of retrieved information.
  4. Assigns a verification status and confidence score.

This follows a retrieval-augmented reasoning pattern: instead of relying solely on an LLM’s internal knowledge, it explicitly grounds its judgments in up-to-date external evidence, reducing hallucinations.

Layer 2 Parallel factuality factor agents

The core analysis layer consists of four independent LlmAgent instances, each responsible for one factuality factor:

Frequency heuristic agent
Scores how strongly the article leans on repetition, bandwagon language, and popularity cues as substitutes for evidence.
Sensationalism agent
Evaluates emotional exaggeration and dramatic framing, using both lexical cues (e.g., all-caps, exclamation marks, “shocking” adjectives) and sentiment analysis.
Malicious account agent
Detects language patterns associated with inauthentic behavior, such as spam-like link density, high mention counts, or unusual punctuation and capitalization ratios.
Naive realism agent
Measures how dogmatically the article presents its viewpoint as the only reasonable truth, focusing on absolutist phrasing, dismissal of alternatives, and lack of uncertainty markers.

All four agents are grouped under a ParallelAgent so that they can execute concurrently. Importantly, agents do not share intermediate analysis, preventing cross-contamination between factors.

Tools: ML models as callable functions

Step 1
The agent performs structured reasoning over the article text to decide what it wants to know.
Step 2
It calls the corresponding predictive model tool (for example get_frequency_heuristic_prediction(text)) to get a score and confidence.
Step 3
It then blends the tool output with its own explanation to produce a final score and justification.

Prompts tell agents to treat model outputs as informative but not ground truth. If an agent disagrees with the model score, it has to explain why. The hybrid design lets us:

  • Use supervised models for pattern-based grounding learned from thousands of labeled statements.
  • Lean on LLMs for contextual nuance and interpretability at the article level.

Layer 3 Sequential orchestration & aggregation

A SequentialAgent acts as the root controller and runs the full pipeline:

  1. Invoke claim extraction and verification.
  2. Launch parallel factuality factor evaluations.
  3. Pass all factor outputs to the merger agent.

The merger agent then synthesizes the four factor-level outputs into an overall truthfulness score, produces a final explanation that highlights which factors mattered most, and enforces a strict JSON schema for all agent outputs.

Agent-to-Agent (A2A) Deployment

To make the system easier to integrate into other applications, we deploy each specialized factuality factor agent as an Agent-to-Agent (A2A) service using ADK’s to_a2a functionality.

  • Each agent can run as a standalone service on its own port, ready to be integrated into future pipelines.
  • Execution environments are isolated, guaranteeing that prompts, tool calls, and outputs remain cleanly separated between factors.

Methodological trade-offs

Our design has several trade-offs:

  • Pros: interpretable factor scores, modular agents, and explicit grounding in supervised models and external evidence.
  • Cons: increased computational overheadfrom multi-agent orchestration and multi-iteration reasoning.