The Problem With Confident AI — And How We Built Around It

On confident errors, honest failures, and the architecture behind CanadaGPT

Mar 17, 2026

Hallucination is the original sin of generative AI. Ask a large language model a question it doesn’t know with certainty, and there’s a reasonable chance it will answer anyway — fluently, confidently, and incorrectly. For casual use, that’s an inconvenience. For a platform built around parliamentary accountability, it’s a fundamental design problem.

When we built CanadaGPT, we knew we couldn’t afford to be wrong with confidence. Democratic accountability depends on precision. A misattributed vote, an outdated policy position, a fabricated quote — these aren’t edge cases to tolerate, they’re the exact failure modes that undermine trust in institutions. We needed a different architecture.

The Core Problem With Conventional AI

Most AI assistants — ChatGPT, Claude, Gemini — are trained on enormous datasets and generate answers by predicting what a plausible response looks like, based on everything they’ve absorbed. When that training data is rich and accurate, results can be impressive. When it’s outdated, ambiguous, or simply absent, the model fills in the gaps — often without any signal to the user that it’s doing so.

Take a simple example: “Who is the Prime Minister of Canada?” A leading AI assistant trained before Mark Carney took office may carry thousands of references to Justin Trudeau in that role. Asked today, it may still return his name — with full confidence, no caveat. That’s not a hallucination in the dramatic sense, but it represents exactly the kind of confident-but-wrong response that erodes trust over time.

A Different Approach: GraphRAG

CanadaGPT’s AI assistant, Gordie, is built on a technique called GraphRAG: Graph Retrieval-Augmented Generation. The name is technical, but the principle is straightforward: rather than asking AI to generate an answer from training data, we use AI to formulate a precise query against a structured database, and let the data itself provide the answer.

The underlying infrastructure of CanadaGPT is a Neo4j graph database containing structured parliamentary data — votes, debates, bills, committee appearances, and the relationships between them. This database can be queried directly by anyone, human or AI, using Cypher — a precise query language that returns exact results, not approximations.

So when you ask Gordie who the current Prime Minister is, it doesn’t reach into a probabilistic model of language. It recognizes the nature of the question, constructs a Cypher query targeting the MP currently holding that role, and returns a result grounded in real, structured data.

The Failure Mode That Actually Matters

Here’s the insight that shaped our architecture most: it’s not just about what an AI gets wrong, it’s about how it gets wrong.

A conventional AI that hallucinates typically delivers its error with the same tone and confidence as a correct answer. The user has no way to distinguish them. That’s a deeply broken failure mode for a platform meant to be trusted.

GraphRAG fails differently. If Gordie formulates an incorrect query, the database returns an error or no results — not a plausible-sounding fabrication. The system can then attempt a corrected query. When it genuinely can’t find an answer, it says so. That kind of epistemic honesty — I couldn’t find that rather than here’s something that sounds right — is a feature, not a limitation.

Still Iterating

No AI system is immune to error, and we’re not claiming otherwise. We continuously monitor Gordie’s query patterns, review edge cases, and refine the system’s ability to translate natural language into accurate database queries. That work is ongoing and probably always will be.

But the foundational design choice — anchoring AI reasoning to a structured, queryable graph of verifiable parliamentary data — means our errors tend to be visible, recoverable, and rare, rather than silent, confident, and corrosive.

That’s the kind of AI infrastructure Canadian democratic accountability deserves.

MJVD

Mar 18

Wait, so you built a system that can do database querries in plain language? Can it, in principal, work in SQL as well as it does is Cypher? It's an incredible public service as is, but seems like it has pretty broad enterprise applications if I'm understanding correctly.

3 replies by Northern Variables and others

Mike B. | Hansard Files

Mar 17

That Prime Minister mix-up example hits close. I asked a standard AI about the current Speaker and got the wrong name with zero doubt.

Gordie skips that by querying the real data graph instead. Smart move for accountability.

The feds took a similar tack. Their AI register now publishes details on every government system to build trust.

Details at https://open.canada.ca/data/en/dataset/fcbc0200-79ba-4fa4-94a6-00e32facea6b

3 more comments...

Discussion about this post

Ready for more?