Confidence Scoring

The problem with binary memory

Most memory systems store facts as binary: either stored or not stored, with every stored fact treated as equally true. This fails in production because:

A user preference stored 8 months ago is treated the same as one set yesterday
There’s no way to express that an inferred preference is less certain than a direct statement
Contradicting information doesn’t reduce confidence in existing beliefs

Log-odds confidence

Engram represents confidence as a value in [0, 1] but updates it using log-odds arithmetic:

logit(p) = log(p / (1 - p))

When new evidence arrives, the update is:

new_confidence = sigmoid(logit(current_confidence) + Δ)

This matters because:

Additive updates on raw probabilities break near the boundaries (0 and 1)
Log-odds stays in the valid probability range naturally
The same Δ has less effect when confidence is already very high or very low — which matches intuition

Confidence updates

Reinforcement

When a memory is recalled (accessed), it receives a small boost:

Δ = +0.02  (UsageReinforcementBoost)

When new information confirms an existing belief:

Δ = +0.05  (ReinforcementConfidenceBoost)

Contradiction

When a hard contradiction is detected:

new_confidence = max(0.10, current_confidence - 0.20)

Temporal update

When newer information supersedes an existing belief, the old memory is archived (confidence drops below 0.40) and a new memory is created.

Decay

Unused memories decay over time. The base decay formula:

new_confidence = current_confidence × (1 - base_decay_rate × effective_decay_multiplier)

Where effective_decay_multiplier accounts for competition from similar memories (see Lifecycle).

Calibration

A confidence of 0.80 should mean the memory is correct ~80% of the time. Engram’s confidence model is designed to be calibrated — not just a relative ranking score. This is tested in the Agent Memory Benchmark Suite 2 using Expected Calibration Error (ECE):

ECE = mean(|bucket_confidence - bucket_accuracy|)

Systems that use raw cosine similarity as a “confidence” score produce ECE ≈ 0.50 (worse than random). Engram’s log-odds model achieves ECE ≈ 0.18 with embedding-only mode.

Checking confidence

# Get the current confidence of a specific memory
memory = client.memories.get(memory_id="uuid-here")
print(f"Confidence: {memory.confidence:.3f}")

# Get confidence history (mutation audit trail)
stats = client.cognitive.get_confidence_stats(memory_id="uuid-here")

# Manually reinforce or penalise
client.cognitive.reinforce(memory_id="uuid-here", boost=0.05)
client.cognitive.penalize(memory_id="uuid-here", penalty=0.10)

Uncertainty detection

# Surface memories with low confidence or active contradictions
uncertainty = client.cognitive.detect_uncertainty(agent_id=agent.id)

for item in uncertainty.low_confidence:
    print(f"[{item.confidence:.2f}] {item.content}")

​The problem with binary memory

​Log-odds confidence

​Confidence updates

​Reinforcement

​Contradiction

​Temporal update

​Decay

​Calibration

​Checking confidence

​Uncertainty detection