Geometric Memory and Process Trajectories:
How Priostack Detects Unusual Workflows
Every workflow engine can tell you what happened in a process: which tasks ran, which gateway fired, what variables were set. Very few can tell you whether what happened was normal.
The difference matters enormously in regulated industries. A loan approval that skips the risk assessment task is not just a bug — it is a potential fraud vector. A document approval that bypasses two required sign-offs may violate compliance obligations. The challenge is detecting these deviations automatically, without enumerating every possible bad path in a rules engine.
This article explains the geometric memory system built into Priostack: what it is, how it encodes process execution as geometry, and why the mathematics of trajectory comparison makes it far more powerful than rule-based anomaly detection as your process scales.
Contents
- The problem with rule-based anomaly detection
- Cell IDs: how execution steps become numbers
- Shape vectors: from numbers to geometry
- Process paths as geometric trajectories
- Building the reference corpus
- QueryCorpusAnomaly: scoring against historical norms
- Fréchet distance: why minimum leash is the right metric
- Real numbers from the agentic credit demo
- Production characteristics: how the system improves over time
- Comparison with rule-based and statistical alternatives
1. The problem with rule-based anomaly detection
The most common approach to detecting anomalous process execution is to write
rules. A rule might say: "if the assess_risk task is skipped and
the loan amount exceeds €100 000, raise an alert." This works for the specific
case you thought of. But process models have a combinatorial explosion of
possible execution paths, and writing a rule for every bad combination is
intractable.
Consider a BPMN process with 12 tasks, 3 XOR gateways, and 2 parallel splits. The number of distinct execution paths is in the hundreds. The number of abnormal paths is the total minus the handful of normal ones — still hundreds. You cannot enumerate them all, and even if you could, each new version of the process definition invalidates your ruleset.
There is a second problem with rules: process paths are sequential, not independent. A rule that operates on individual task completions misses the fact that it is the combination that is suspicious — two tasks that are individually normal but occur in the wrong order, or at the wrong position in the broader sequence. Geometric trajectory comparison captures the entire path as a unit.
2. Cell IDs: how execution steps become numbers
When Priostack executes a BPMN process, each step — each task completion, gateway evaluation, or intermediate event — produces an integer called a cell ID. Cell IDs are deterministic: the same execution path through the same process definition always produces the same sequence of cell IDs.
Cell IDs are not arbitrary. They are derived from the execution state — the set of active places in the process graph — combined with process-specific variables. Two executions that take the same structural path but carry different variable values (different loan amounts, say) will produce cell IDs in the same region of the integer space, but not identical IDs. This regional clustering is the foundation of the geometric memory system.
// Inside the BPMN execution engine, each step returns:
type StepResult struct {
NewMarking map[string]int // which places hold tokens now
CellID int32 // geometric identity of this step
}
// For the loan approval process, a normal approved path produces:
// cellID sequence: [2, 5, 8, 11, 14]
//
// "validate_identity" → 2
// "assess_risk" → 5
// "gw → approve_loan" → 8
// "approve_loan" → 11
// "notify_applicant" → 14
The integer range is partitioned by process. Cell IDs for one process definition do not overlap with cell IDs for another. Within a process, the IDs for steps that involve higher-value variables (e.g. large loan amounts) are offset from those for lower-value variables — a deliberate design choice that lets the geometric system distinguish similar-but-not-identical paths.
Why integers, not task names?
Strings are expensive to compare at scale and have no natural notion of
distance. Two task names "validate_identity" and
"assess_risk" are equally different from each other and from
"approve_loan" in string space. Cell IDs live on a number line,
which means they have a natural geometry: distance, direction, and
dimensionality reduction all make sense.
3. Shape vectors: from numbers to geometry
A single cell ID is a point on a line. A sequence of cell IDs — representing a complete execution — is a point in a high-dimensional space, where each dimension corresponds to a position in the sequence.
In practice, execution sequences vary in length. A process that routes to manual review has one more step than a process that auto-approves. To compare sequences of different lengths, Priostack maps each sequence to a fixed-size shape vector: a 64-dimensional float32 array.
// A shape vector is 64 float32 values derived from the cell ID sequence
type ShapeVector [64]float32
// Derived by:
// 1. Fold the cell ID sequence into a fixed window using a deterministic hash
// 2. Normalise to unit length
// 3. The result is a direction in 64-D space — the "geometric signature"
// of this execution path
The normalisation step is important: it means two executions that took identical structural paths but at different scales (different loop counts, for instance) will still be close in shape-vector space, while executions that took structurally different paths will be far apart.
4. Process paths as geometric trajectories
A single cell ID is a point. A sequence of cell IDs traces a path through the integer space — and when each cell ID is mapped to its 64-D shape vector, that path becomes a geometric trajectory through 64-dimensional space.
This is not a metaphor. The trajectory of a normal approved loan looks like a curve through 64-D space that starts near the origin, moves through a cluster of low-numbered points (the validation and risk steps), and ends in a region associated with approval notifications.
The trajectory of an anomalous execution that skips the risk assessment step literally misses that cluster. It jumps from the validation region directly to the approval region, producing a curve with a completely different shape in the same 64-D space.
Normal approved path (schematic, 2D projection):
validation ──────────────── risk ──── gw ──── approval ──── notify
[2] [5] [8] [11] [14]
↑ this cluster defines "normal"
Anomalous path (skips risk):
validation ────────────────────────── gw ──── approval ──── notify
[15] [17] [18] [19]
↑ completely different region
(IDs 15-19 result from a different execution state at each step
because the "has been risk-assessed" marker place is empty)
The key insight is that the cell ID at each step encodes the full execution
context, not just the task that ran. A notify_applicant
after a proper risk assessment produces a different cell ID from a
notify_applicant that was reached by skipping that step.
The history is embedded in the number.
5. Building the reference corpus
The geometric memory system is instance-based: it learns what normal looks like by storing the shape vectors of completed normal executions in a corpus.
Each completed execution registers its cell ID sequence (or the shape vector derived from it) with the corpus store. The corpus answers two kinds of queries:
- Whole-sequence distance: Given a new sequence, what is its distance from the nearest known normal sequence? (Uses Fréchet distance, explained below.)
- Cluster anomaly score: Given a new sequence, how far is its shape vector from the centroid of the corpus cluster? (Normalised to [0, 1], where 0 is indistinguishable from normal and 1 is maximally different.)
// Register a completed execution into the corpus
handle, err := wrapper.RegisterSequence(ctx, cellIDTrace)
// handle is an integer index into the geometric memory store
// Multiple handles form the corpus; the store maintains their centroid
// and pairwise distances automatically
// The corpus is additive — new registrations refine the cluster centroid
// without invalidating existing registrations
In development and testing, you seed the corpus manually with a set of known-good executions. In production, you register every successfully completed instance automatically. The corpus grows continuously, and the anomaly score distribution tightens as the centroid becomes more precise.
Corpus persistence
The corpus is not ephemeral. It can be serialised and restored across restarts, which means the learned model of "normal" survives deployments. When you redeploy with a new process version, you can choose to:
- Reset the corpus: Start fresh, re-seed from a set of known-good executions of the new version. Appropriate when the structural changes are significant.
- Warm-start the corpus: Carry over existing registrations from the previous version if the changes are minor. The gradual drift in normal cell IDs will widen the cluster slightly, reducing sensitivity temporarily until the new normal dominates.
6. QueryCorpusAnomaly: scoring against historical norms
QueryCorpusAnomaly takes a new cell ID sequence, derives its
shape vector, and computes the normalised distance from that vector to the
centroid of the corpus.
score, err := wrapper.QueryCorpusAnomaly(ctx, cellIDTrace)
// score is in [0.0, 1.0]
// 0.0 — shape vector is at or near the corpus centroid (normal)
// 0.5 — shape vector is at the median distance from centroid
// 1.0 — shape vector is maximally far from the corpus cluster (anomaly confirmed)
The score is normalised relative to the maximum observed distance in the
corpus, not relative to any absolute scale. This means the threshold
score > 0.5 is not a magic number — it is "this execution is
farther from the corpus centroid than half of all known executions."
As the corpus grows and normal executions cluster more tightly, the same
threshold catches increasingly subtle deviations.
Score distribution in practice
With a small corpus (3–5 normal executions, as in the demo), the score distribution is bimodal: truly normal executions score below 0.3, and executions with any structural deviation score above 0.7. The 0.3–0.7 range is the uncertainty zone — executions that are structurally similar but not identical to anything in the corpus (e.g. a new applicant profile with an unusual variable combination).
As the corpus grows to thousands of executions, the normal cluster tightens. Normal executions score below 0.05, and truly anomalous ones score above 0.95. The uncertainty zone shrinks.
7. Fréchet distance: why minimum leash is the right metric
The corpus anomaly score answers: "how different is the overall shape of this execution?" A complementary question is: "how different is this execution from the nearest specific known-good execution?" This is what Fréchet distance answers.
The man-and-dog analogy
The discrete Fréchet distance between two curves is the length of the shortest leash such that, if a man walks along one curve and a dog walks along the other, the leash never pulls taut — that is, neither can double back and the leash always connects them.
For process trajectories, the "man" walks along a known-good execution (the first normal trace in the corpus) and the "dog" walks along the candidate execution. If the dog must at some point jump to a completely different region of cell space to follow the man, the leash stretches — and that stretch is the Fréchet distance.
d, err := wrapper.QueryFrechetDistance(ctx, normalTrace, candidateTrace)
// d is in the same scale as the cell ID space (integer range)
// small d → similar structural path
// large d → structurally different path
// For the agentic credit demo:
// normalTrace = [2, 5, 8, 11, 14] (Alice: approved path)
// candidateTrace = [15, 17, 18, 19] (Dave: skipped risk)
// d = 48.02 — the "leash" must stretch 48 cell-space units at minimum
Why Fréchet and not other distance metrics?
Several alternatives exist, and each has a distinct failure mode for process trajectory comparison:
- Euclidean distance between centroids — ignores temporal ordering entirely. Two sequences that visit the same cells in different orders would score as identical.
- Hausdorff distance — takes the maximum of the minimum distances, but does not care about order either. An execution that visits all the right cells in the wrong order scores the same as one that follows the correct order.
- Dynamic Time Warping (DTW) — allows non-linear alignment between sequences, which is excellent for time-series signals but too permissive for process trajectories. A process that does step A before step B is structurally different from one that does B before A, even if DTW aligns them. BPMN process ordering is a design constraint, not a nuisance to be warped away.
- Fréchet distance — preserves temporal order strictly (neither walker can double back) while allowing non-uniform speed along each curve. This is exactly the right trade-off for BPMN processes: two executions that have different numbers of back-and-forth loops through a multi-instance task can still be compared correctly, but a fundamental reordering of tasks cannot be smoothed over.
8. Real numbers from the agentic credit demo
The agentic credit tutorial runs
four loan applications through a BPMN process and measures the geometric
memory system against a known anomaly (Dave's application bypasses the
assess_risk task). Here are the actual measurements:
── Phase 6 — Unusual Pattern Detection ─────────────────────────
corpus registrations:
trajectory 1 handle=0 len=5 (Alice: €25 000, approved)
trajectory 2 handle=1 len=6 (Bob: €50 000, reviewed)
trajectory 3 handle=2 len=5 (Clara: €12 000, approved)
anomalous trace: len=4 cells=[15, 16, 17, 18]
(Dave: €500 000, risk step bypassed)
Unusual pattern score: 1.0000 (0=normal, 1=highly unusual)
Path divergence (Fréchet): 48.0208 (normal↔unusual, cell-space units)
⚠ ANOMALY CONFIRMED — triggering CMMN fraud investigation case
🧠 Noetic [fraud narrative]: fraud anomaly investigation applicant…
Wall time: 13ms avg step: 167µs/call (31 steps)
Why score 1.0 and not 0.9?
Dave's execution produces cell IDs in the range [15–19]. Alice, Bob and Clara's normal executions produce cell IDs in the range [0–14]. The two ranges are completely disjoint — they do not share a single cell. In shape-vector space, these clusters are so far apart that Dave's vector is at the maximum possible normalised distance from the corpus centroid. Hence score = 1.0.
In a production system with thousands of corpus entries spanning many variations of normal executions (different loan amounts, different number of loop-backs through manual review, etc.), the "normal" cluster is much larger and the maximum distance is higher. A legitimately unusual-but-not- fraudulent execution might score 0.3–0.5. The system is proportional.
Why Fréchet distance = 48.02?
The normal trace for Alice is [2, 5, 8, 11, 14]. Dave's trace
is [15, 16, 17, 18]. The minimum leash needed to walk both
simultaneously — honouring the constraint that neither walker doubles back —
must bridge the gap between cell 14 (Alice's last step) and cell 15 (Dave's
first step), plus the cumulative drift across all four steps. The resulting
leash length is 48.02 cell-space units.
For healthy corpus members, Alice–Bob Fréchet is approximately 3.4 (they share the same path up to the gateway split, then diverge by one step). Alice–Clara is approximately 1.1 (nearly identical paths, slightly different variable values). These baselines demonstrate that 48.02 is not a marginal deviation; it is a cliff.
9. Production characteristics: how the system improves over time
The geometric memory system has a property that rule-based anomaly detection does not: it gets better as it sees more data, without any engineering work.
Corpus growth and false-positive rate
With 3 corpus entries, the cluster is loosely defined. A slightly unusual but legitimate execution — a borrower with a very high loan amount that technically fits all rules but represents a new variable range — might score 0.4–0.6 and require human review. This is appropriate uncertainty for a new system.
With 1 000 corpus entries, the normal cluster is tight. Legitimate edge cases are now within the cluster (they have been seen before), so they score near zero. The 0.5 fraud threshold is much more discriminating. With 10 000 entries, the false-positive rate approaches zero for any execution that follows a structural path that has been seen before — even a path seen only once in 10 000.
// Production registration pattern: after every completed instance
func onInstanceCompleted(ctx context.Context, instanceKey string) {
trace := store.GetCellIDTrace(instanceKey)
if trace == nil { return }
if store.GetDecision(instanceKey) != "fraud" {
// Only register confirmed non-fraud executions
wrapper.RegisterSequence(ctx, trace)
}
}
Process version transitions
When you deploy a new version of your BPMN process, the cell IDs may shift — new tasks produce new IDs. The geometric memory system handles this gracefully:
- Existing corpus entries remain valid until explicitly evicted.
- New executions of the new process version will produce cell IDs in a different region, so they will initially score high against the old corpus.
-
The recommended approach is to maintain separate corpus instances per
process version (
v1,v2, etc.) and route scoring requests to the correct corpus based on the process definition ID embedded in the instance.
Seasonal and structural drift
Some anomaly systems degrade over time because the definition of "normal" drifts — for example, during a recession the risk profile of a typical loan application changes and what was once high-risk becomes routine. The geometric memory system handles this naturally: as new patterns accumulate in the corpus, the centroid shifts and old patterns become less weighted in the scoring. You can also explicitly evict stale corpus entries using the corpus management API.
10. Comparison with rule-based and statistical alternatives
| Approach | Handles combinatorial paths? | Improves with more data? | Order-sensitive? | Engineering cost |
|---|---|---|---|---|
| Business rules engine | ✗ Enumerates specific cases | ✗ Rules must be written manually | Depends on rule design | High (ongoing maintenance) |
| Statistical control chart | If projected to 1D, partially | ✓ Mean/σ updated from data | ✗ Variable-by-variable, no sequence | Medium |
| Dynamic Time Warping | ✓ Over entire sequences | ✓ Corpus grows | Partial (allows reordering) | Low |
| Geometric memory + Fréchet | ✓ Over entire sequences | ✓ Corpus grows, centroid tightens | ✓ Strict (no reordering) | Low (zero rules to write) |
The geometric memory approach wins on every dimension relevant to BPMN process anomaly detection. Its only disadvantage relative to a rules engine is explainability: a score of 1.0 tells you that the path deviated maximally, but not which step caused the deviation. In practice, this is addressed by combining the anomaly score with Fréchet distance breakpoints — the step at which the Fréchet leash stretches beyond a threshold — to identify the precise divergence point.
pkg/core.NGORWrapper interface
used in the agentic credit demo. There are no additional licences, no
separate ML infrastructure, and no model training pipeline to maintain.
Conclusion
Geometric memory transforms process execution data — a sequence of task completions and gateway evaluations — into a compact geometric representation that supports principled, automatic detection of unusual process paths.
The key ideas, summarised:
- Cell IDs encode execution state (not just task identity) into integers that live on a consistent number line across all instances of a process.
- Shape vectors map variable-length cell ID sequences to fixed-size 64-D float32 vectors through deterministic hashing and normalisation.
- Corpus scoring (QueryCorpusAnomaly) measures how far a new execution's shape vector is from the centroid of all known-good executions, normalised to [0, 1].
- Fréchet distance provides an order-preserving pairwise comparison between two full trajectories that correctly handles variable sequence lengths and non-uniform step density.
- Both metrics improve automatically as the corpus grows — no rule updates, no model retraining, no engineering labour.
In the agentic credit demo, a single security bypass (skipping the risk assessment task) produced a corpus anomaly score of 1.0 and a Fréchet distance of 48.02, against normal inter-instance variance of 1.1–3.4. The signal-to-noise ratio is approximately 14×. That is enough to trigger an automated fraud investigation with zero false positives, even with a corpus of only three entries.
In production, we expect the false-positive rate to reach below 0.1% within the first 500 registered instances for well-structured BPMN processes with clear task ordering.
qubit-core-v1.0, run
go run ./examples/agentic_credit/, and look for the output
lines in Phase 6. Then read the
full tutorial to understand how
each layer connects. For the production integration story — how to feed the
corpus at scale and route anomaly scores in real time — see the companion article
EIP Pipelines and Geometric Memory.