Skip to content

Toolchain Refactor Plan

Purpose

This document defines the target toolchain for the analysis-first financial modeling workflow in this repository.

It replaces the implicit workbook-first workflow with an explicit analytical pipeline centered on:

  • BUFM model contracts
  • scenario definitions
  • ecosystem coupling logic
  • profitability-pool analysis
  • generated outputs for BD, finance, and stakeholder communication

The existing scripts in scripts/ should be treated as legacy prompts and transitional artifacts, not as architecture to preserve.

Why Refactor

The current repo already contains the right long-term seed in models/: testable Python BUFM modules and supporting docs.

However, the surrounding toolchain still reflects an earlier phase:

  • a monolithic Excel workbook generator
  • workbook-oriented transformation/report scripts
  • one-off naming and migration utilities
  • documentation references that still assume workbook-centric workflows

That is not the right center of gravity for the modeling method we now want.

Our actual objective is not to maintain a large spreadsheet stack. It is to develop a better analytical method for:

  • understanding aggregate ecosystem economics
  • understanding participant-level economics
  • tracing profitability pools across the system
  • supporting multiple commercial segmentations
  • producing persuasive BD materials for specific counterparties

Governing Principles

  1. Analysis first. The primary work happens in model logic, scenario definitions, and analytical outputs. Excel is optional output.
  2. BUFMs are explicit. Each business unit model must have a clear boundary, inputs, outputs, KPIs, and profitability drivers.
  3. Couplings are explicit. Ecosystem behavior comes from modelled buy/sell relationships, utilization effects, and shared constraints.
  4. Multiple segmentations are supported. The same operational ecosystem may map to one integrated company or multiple distinct commercial players.
  5. Profitability pools are first-class outputs. We must be able to explain who captures value and why.
  6. Generated artifacts are renderers, not sources of truth. Excel, markdown briefs, charts, and slide-ready tables should all be derived from the same analytical core.

Target Architecture

1. Model Core

Keep models/ as the computational center of gravity.

Target responsibilities:

  • one module per BUFM or glue layer
  • deterministic calculation functions
  • small, well-typed input and output contracts
  • zero dependency on spreadsheet layout
  • unit tests for every model block

Expected components:

  • ffd_model.py
  • eed_model.py
  • participant BUFMs such as rider, BCS, SNS, EPS
  • coupling functions or modules for inter-BUFM transactions

2. Contract Layer

Add explicit schemas for model inputs, outputs, and scenario structure.

Recommended implementation:

  • pydantic models for validation
  • separate contracts for:
  • BUFM input blocks
  • BUFM output blocks
  • coupling definitions
  • commercial segmentation overlays
  • report/export requests

Illustrative structure:

src/emob_financial_models/contracts/
  bufm.py
  coupling.py
  scenario.py
  segmentation.py
  outputs.py

3. Scenario Layer

Create scenario definitions as structured files, not code and not workbook tabs.

Each scenario should be able to define:

  • market assumptions
  • technical assumptions
  • capacity assumptions
  • pricing assumptions
  • coupling assumptions
  • ownership and segmentation assumptions

This is the layer that lets us answer:

  • Is the whole system economically viable?
  • Which participant captures the value?
  • How does the answer change if one company is thin?
  • How does the answer change if roles are split across separate players?

Illustrative structure:

scenarios/
  baseline/
    togo-baseline.yaml
    togo-thin-integrator.yaml
    togo-separated-players.yaml
  sensitivity/
    fuel-price-high.yaml
    low-utilization.yaml

4. Run Layer

Build a single command-line entrypoint for analysis runs.

Recommended implementation:

  • typer CLI
  • rich terminal output

Illustrative commands:

emob-model run-bufm --model rider --scenario scenarios/baseline/togo-baseline.yaml
emob-model run-ecosystem --scenario scenarios/baseline/togo-baseline.yaml
emob-model run-profit-pools --scenario scenarios/baseline/togo-separated-players.yaml
emob-model compare-segmentations --scenario scenarios/baseline/togo-baseline.yaml
emob-model export-brief --scenario scenarios/baseline/togo-thin-integrator.yaml
emob-model export-excel --scenario scenarios/baseline/togo-baseline.yaml

5. Data And Result Layer

Store run outputs as structured analytical artifacts.

Recommended approach:

  • JSON or parquet outputs for run snapshots
  • polars for analytical transforms
  • duckdb for querying scenario results across many runs

Illustrative structure:

runs/
  2026-04-01-togo-baseline/
    inputs.json
    ecosystem-summary.json
    participant-pnl.parquet
    profitability-pools.parquet
    bd-brief.md
    charts/

6. Presentation Layer

Generate stakeholder-facing outputs from the same run artifacts.

Output types should include:

  • markdown summary briefs
  • chart packs
  • tables for slide decks
  • participant-specific profitability views
  • optional Excel workbook export

Excel should be treated as:

  • a familiar handoff artifact
  • a snapshot format
  • a presentation surface

Excel should not be treated as:

  • the modeling engine
  • the system of record
  • the coupling logic authority

7. Interactive Exploration Layer

For internal analysis, prefer modern Python-native analytical tools over spreadsheet manipulation.

Recommended first choice:

  • marimo notebooks for reactive, versionable analytical exploration

Possible future additions:

  • fastapi for API access
  • a lightweight UI if scenario exploration becomes frequent

Do not introduce an API or front-end before model contracts and scenario structure are stable.

Core

  • uv for Python project management
  • pytest for tests
  • pydantic for validated contracts
  • typer for CLI
  • rich for command output

Analysis

  • polars for tabular analytics
  • duckdb for local analytical querying
  • plotly for charts
  • marimo for exploratory analysis

Optional Later

  • fastapi for service/API exposure
  • orchestration only if scenario execution becomes materially complex

Proposed Repository Shape

Illustrative target structure:

emob-financial-models/
├── models/                      # Current computational seed, to be normalized
├── scenarios/                   # Structured scenario definitions
├── runs/                        # Generated analytical outputs
├── docs/
│   ├── analysis/
│   │   ├── overview.md
│   │   └── toolchain-refactor-plan.md
│   └── models/
├── src/
│   └── emob_financial_models/
│       ├── contracts/
│       ├── coupling/
│       ├── runners/
│       ├── exports/
│       └── cli.py
├── tests/
│   ├── unit/
│   ├── integration/
│   └── scenario/
└── scripts/                     # Legacy transitional adapters only

Notes:

  • models/ may eventually move under src/, but that is not required for phase 1.
  • scripts/ should shrink over time and eventually contain only thin adapters or maintenance utilities.

Legacy Script Disposition

Keep Temporarily As Legacy Adapters

  • scripts/create_workbook.py
  • scripts/generate_report.py
  • scripts/transform_data.py
  • scripts/sync_excel_to_docs.py

These should be reclassified as transitional tooling and no longer treated as the main workflow.

Retire After Replacement

Most rename/fix/update utilities under scripts/ should be retired after we have:

  • stable model contracts
  • a scenario runner
  • export adapters for required outputs

Examples include:

  • naming migration scripts
  • terminology patch scripts
  • workbook fix scripts

These are historical maintenance artifacts, not strategic capabilities.

Migration Plan

Phase 1. Freeze The New Doctrine

Status:

  • explicit analysis-first doctrine
  • explicit profitability-pool objective
  • explicit support for multiple commercial segmentations

This phase is now largely documented.

Phase 2. Normalize Existing Models

Goal:

  • make each current BUFM callable through a consistent contract

Tasks:

  • define shared input/output schemas
  • standardize naming across model results
  • identify which coupling assumptions are currently embedded and extract them
  • separate pure model logic from legacy workbook-derived assumptions

Deliverable:

  • consistent Python API across BUFMs

Phase 3. Build The Scenario Schema

Goal:

  • represent assumptions and segmentations explicitly

Tasks:

  • define scenario file structure
  • define commercial segmentation overlay format
  • add baseline example scenarios

Deliverable:

  • scenario files for at least one baseline market and two segmentation variants

Phase 4. Build The Runner

Goal:

  • execute BUFMs, ecosystem runs, and profitability-pool analyses from the command line

Tasks:

  • implement CLI
  • implement result packaging
  • implement participant-level and aggregate views

Deliverable:

  • one repeatable CLI-driven analysis workflow

Phase 5. Rebuild Outputs From The New Core

Goal:

  • regenerate stakeholder outputs from analytical results

Tasks:

  • generate markdown brief
  • generate chart pack
  • generate machine-readable result files
  • add optional Excel export adapter

Deliverable:

  • Excel becomes one output option, not the workflow center

Phase 6. Retire Legacy Workflow

Goal:

  • stop evolving workbook-first scripts

Tasks:

  • mark legacy scripts clearly
  • remove stale references from docs
  • keep only required backward-compatible export paths

Deliverable:

  • repo workflow is clearly analysis-first in both code and docs

Immediate Implementation Backlog

Recommended next build sequence:

  1. Add project packaging with uv
  2. Add contract models for one BUFM and one scenario
  3. Build a minimal typer CLI
  4. Run one baseline scenario end to end
  5. Produce one aggregate summary and one participant-profitability summary
  6. Add profitability-pool table output
  7. Add one Excel export adapter last

Definition Of Success

The refactor is successful when:

  • analysts run scenarios without touching workbook formulas
  • BD materials can be generated for both system-level and participant-level views
  • alternative commercial segmentations can be compared explicitly
  • Excel is a renderer, not the computational authority
  • the repo communicates a modern analytical method rather than a legacy spreadsheet workflow