Learning Loop

Compound learning is the moat.

This document specifies b1e55ed’s compound learning engine: how it attributes outcomes, adjusts synthesis weights, scores producers, and feeds results back into the corpus.

Thesis

A trading system that does not learn from its own outcomes is a static tool. The learning loop turns every closed position into training data. Module motto:

“The system that learns from its own outcomes will outperform systems that don’t.”

Components

1) Outcome attribution (per-trade / daily)

Goal: match a closed position back to the conviction score that opened it. Inputs

positions row (id, opened_at, closed_at, realized_pnl, conviction_id, regime_at_entry, max_drawdown_during)
conviction_scores row (id = positions.conviction_id)
conviction_log rows (cycle_id + symbol) capturing domain scores at entry

Outputs

Update conviction_scores.outcome and conviction_scores.outcome_ts
Emit learning.outcome.v1 event

Metrics written

realized_pnl
time_held_hours
max_drawdown_pct
direction_correct (derived from PnL sign)
regime_at_entry
domain_scores_at_entry (domain → score)

2) Domain weight adjustment (weekly/monthly)

Goal: nudge synthesis weights toward domains that predicted better outcomes. Window: rolling 30 days (ADJUSTMENT_WINDOW_DAYS = 30). Observation threshold: no adjustment unless at least 20 closed positions (MIN_OBSERVATIONS = 20). Safety constraints

MAX_WEIGHT_DELTA = 0.02 (±2% per cycle)
MIN_DOMAIN_WEIGHT = 0.05 (5% floor)
MAX_DOMAIN_WEIGHT = 0.40 (40% ceiling)

Algorithm (v1)

For each closed position in the window, compute outcome sign y ∈ {+1, -1} from realized_pnl.
Pull domain scores at entry from conviction_log for the score’s cycle_id and symbol.
For each domain, compute correlation between domain score and outcome sign.
Translate correlation → delta (scaled, clamped to ±MAX_WEIGHT_DELTA).
Clamp to floor/ceiling and renormalize to sum to 1.0.
Persist to data/learned_weights.yaml and record in learning_weights.

3) Producer scoring

Goal: track which producers are reliable. Producer scoring is designed to evolve. In the current implementation, the system stores producer health in producer_health and emits a conservative scoring summary based on staleness and error rate. Constraints

No adjustments until at least 20 observations.

3b) Producer karma (flywheel)

Goal: close the attribution loop with per-producer outcome tracking. When a position closes:

attribute_outcome() retrieves all SIGNAL_ACCEPTED_V1 events linked to the trade
Each contributing producer receives an EMA karma update (α = 0.05): karma_new = karma_old × 0.95 + outcome × 0.05
Results stored in producer_karma table
ATTRIBUTION_OUTCOME_V1 event emitted

Phase 0: equal weights across contributing producers. Positive outcomes applied immediately; negative outcomes tracked but dampened. Karma starts at 1.0 for all new producers. Files: engine/execution/karma.py, engine/integration/outcome_writer.py

4) Corpus feedback

Goal: update patterns and skills based on realized outcomes.

Pattern outcomes are tracked in pattern_matches (when pattern matching is wired).
Skill lifecycle is file-based in corpus/skills/.

Lifecycle rules (initial)

Pending skill promoted to active when score >= 3
Active skill archived when score <= -3

Skill score storage

A score: <int> line in the first ~40 lines of the markdown file.

Cold start behavior

First 30 days: observe only. No weight adjustments.
- Quote: “Patience is not inaction. It is intelligent waiting.”
30–90 days: warm period. Adjustments are allowed, but MAX_WEIGHT_DELTA is halved to ±1%.
90+ days: full adjustments active (±2%).

Overfitting protection

The system tracks rolling performance around adjustments. If 3 consecutive cycles degrade performance, weights are reverted to preset defaults.

Quote: “The market rewards adaptation. It punishes curve-fitting.”
Reversion quote: “Sometimes the wisest adjustment is to undo the last one.”

Operator review

Weekly/monthly adjustments are stored in the database (learning_weights) and persisted as an overlay YAML file (data/learned_weights.yaml). Operators can:

inspect the change history
delete the overlay file to revert immediately
approve/reject changes once the approval UI is implemented

Forecast-Level Learning (P4)

The learning loop above operates at the position/trade level. The P4 intelligence layer adds a parallel forecast-level learning loop that works at higher resolution.

Outcome Resolver

The outcome resolver is the data collection mechanism for the forecast-level loop. It runs every 30 minutes via cron and resolves elapsed FORECAST_V1 events against actual prices. What it produces: FORECAST_OUTCOME_V1 events containing:

forecast_event_id — link to the original forecast
producer_id — which producer made the call
direction_correct — was the direction right?
brier_score — (confidence - outcome)² calibration metric
return_actual_pct — actual price change
regime_at_forecast — what regime was active

Idempotency: Each forecast can only be resolved once (tracked via forecast_resolution_state table). Safe to run repeatedly. Price sources: Local price_history table first, Binance public klines API as fallback. How to run:

b1e55ed resolve-outcomes

Cron setup:

*/30 * * * * /usr/local/bin/b1e55ed resolve-outcomes >> /var/log/b1e55ed/resolver.log 2>&1

Performance Aggregator

The performance aggregator computes rolling statistics from FORECAST_OUTCOME_V1 events:

producer_performance table: Per-producer win rates, average Brier scores, average confidence, and confidence-outcome correlation — grouped by asset, horizon, and regime.
producer_correlation table: Pairwise agreement rates between producers, including agreement/disagreement win rates and sample counts.

These tables feed the hierarchical weighting engine (P4.1) and the meta-producer (P4.4). Minimum threshold: 5 resolved outcomes per group.

MetaProducer

The meta-producer is the learning loop’s output layer. It reads only from performance tables and FORECAST_OUTCOME_V1 history — never from raw market data. What it learns: Which ensemble patterns (combination of producer calls) historically led to correct outcomes. When the current ensemble state matches a historically successful pattern, it emits a forecast with the pattern’s win rate as confidence. Activation gate: 500 resolved outcomes must exist before the meta-producer emits any non-abstention forecast (MIN_FORECASTS_FOR_ACTIVATION = 500). Below this, it always abstains. Shadow mode: Even after activation, the meta-producer defaults to shadow=True — it logs what it would have emitted but produces abstentions. This ensures the pattern library matures before affecting synthesis. The full learning chain:

FORECAST_V1 → (horizon elapses) → OutcomeResolver → FORECAST_OUTCOME_V1
    → PerformanceAggregator → producer_performance + producer_correlation
    → MetaProducer (pattern matching) → FORECAST_V1 (meta ensemble signal)

For full details on the interpreter stack and activation timeline, see producer-intelligence.md.

Files

engine/brain/learning.py — learning engine
engine/integration/outcome_writer.py — writes outcomes when positions close
engine/integration/learning_loop.py — cadence scheduling + persistence glue
engine/brain/outcome_resolver.py — forecast outcome resolver (P4)
engine/brain/performance_aggregator.py — rolling producer stats (P4)
engine/producers/meta.py — meta-producer / ensemble pattern learner (P4)
data/learned_weights.yaml — learned weights overlay (auto-generated)

Tests

tests/unit/test_learning.py
tests/unit/test_learning_weights.py
tests/unit/test_learning_corpus.py
tests/integration/test_learning_e2e.py

Documentation Index

​Learning Loop

​Thesis

​Components

​1) Outcome attribution (per-trade / daily)

​2) Domain weight adjustment (weekly/monthly)

​3) Producer scoring

​3b) Producer karma (flywheel)

​4) Corpus feedback

​Cold start behavior

​Overfitting protection

​Operator review

​Forecast-Level Learning (P4)

​Outcome Resolver

​Performance Aggregator

​MetaProducer

​Files

​Tests