AI Reliability

Why The Same Prompt Can Yield Different Answers

Large language models generate text by sampling one token at a time from probability distributions. Even when the prompt is identical, small differences in token selection can compound into materially different outputs across runs.

Stochastic Decoding

Temperature and nucleus sampling intentionally introduce variability to avoid repetitive output.

Path Dependence

A different early token changes later context, so divergence increases with response length.

Ambiguous Prompts

Broad prompts leave multiple valid continuations, increasing answer spread across generations.

What Research Shows

This variability is expected behavior, not an edge case. The literature consistently treats modern LLMs as probabilistic sequence generators and studies methods to improve consistency by aggregating multiple sampled paths.

Brown et al. (2020), Language Models are Few-Shot Learners
Describes autoregressive language modeling where next-token probabilities drive generated text.
Holtzman et al. (2019), The Curious Case of Neural Text Degeneration
Motivates nucleus sampling, which improves output quality but preserves non-deterministic behavior.
Wang et al. (2022), Self-Consistency Improves Chain of Thought Reasoning
Shows that aggregating multiple sampled reasoning paths can improve answer reliability.
Lin et al. (2021), TruthfulQA
Quantifies limits in model truthfulness, reinforcing the need for structured validation workflows.

How TENSOR Reduces Operational Variance

TENSOR does not try to force LLMs to become deterministic. Instead, it constrains decision execution with a deterministic graph contract.

Fixed question IDs: model outputs are mapped to stable investigation nodes.
Explicit branch semantics: outcomes must resolve to yes, no, or unknown.
Repeatable traversal: the same evidence state can be re-evaluated against the same graph path.
Audit trail: human reviewers can inspect each decision transition post-incident.

Recommended Enterprise Pattern

Run at least three model passes for high-impact investigative questions.
Apply self-consistency or rule-based adjudication to derive a branch decision.
Store all candidate answers and the selected decision in extensions.
Escalate unresolved disagreement to analyst review and keep the path as unknown.

AI can accelerate investigations, but the control plane must remain deterministic. TENSOR supplies that control plane.