Toolchain Refactor Plan¶

Purpose¶

This document defines the target toolchain for the analysis-first financial modeling workflow in this repository.

It replaces the implicit workbook-first workflow with an explicit analytical pipeline centered on:

BUFM model contracts
scenario definitions
ecosystem coupling logic
profitability-pool analysis
generated outputs for BD, finance, and stakeholder communication

The existing scripts in scripts/ should be treated as legacy prompts and transitional artifacts, not as architecture to preserve.

Why Refactor¶

The current repo already contains the right long-term seed in models/: testable Python BUFM modules and supporting docs.

However, the surrounding toolchain still reflects an earlier phase:

a monolithic Excel workbook generator
workbook-oriented transformation/report scripts
one-off naming and migration utilities
documentation references that still assume workbook-centric workflows

That is not the right center of gravity for the modeling method we now want.

Our actual objective is not to maintain a large spreadsheet stack. It is to develop a better analytical method for:

understanding aggregate ecosystem economics
understanding participant-level economics
tracing profitability pools across the system
supporting multiple commercial segmentations
producing persuasive BD materials for specific counterparties

Governing Principles¶

Analysis first. The primary work happens in model logic, scenario definitions, and analytical outputs. Excel is optional output.
BUFMs are explicit. Each business unit model must have a clear boundary, inputs, outputs, KPIs, and profitability drivers.
Couplings are explicit. Ecosystem behavior comes from modelled buy/sell relationships, utilization effects, and shared constraints.
Multiple segmentations are supported. The same operational ecosystem may map to one integrated company or multiple distinct commercial players.
Profitability pools are first-class outputs. We must be able to explain who captures value and why.
Generated artifacts are renderers, not sources of truth. Excel, markdown briefs, charts, and slide-ready tables should all be derived from the same analytical core.

Target Architecture¶

1. Model Core¶

Keep models/ as the computational center of gravity.

Target responsibilities:

one module per BUFM or glue layer
deterministic calculation functions
small, well-typed input and output contracts
zero dependency on spreadsheet layout
unit tests for every model block

Expected components:

ffd_model.py
eed_model.py
participant BUFMs such as rider, BCS, SNS, EPS
coupling functions or modules for inter-BUFM transactions

2. Contract Layer¶

Add explicit schemas for model inputs, outputs, and scenario structure.

Recommended implementation:

pydantic models for validation
separate contracts for:
BUFM input blocks
BUFM output blocks
coupling definitions
commercial segmentation overlays
report/export requests

Illustrative structure:

src/emob_financial_models/contracts/
  bufm.py
  coupling.py
  scenario.py
  segmentation.py
  outputs.py

3. Scenario Layer¶

Create scenario definitions as structured files, not code and not workbook tabs.

Each scenario should be able to define:

market assumptions
technical assumptions
capacity assumptions
pricing assumptions
coupling assumptions
ownership and segmentation assumptions

This is the layer that lets us answer:

Is the whole system economically viable?
Which participant captures the value?
How does the answer change if one company is thin?
How does the answer change if roles are split across separate players?

Illustrative structure:

scenarios/
  baseline/
    togo-baseline.yaml
    togo-thin-integrator.yaml
    togo-separated-players.yaml
  sensitivity/
    fuel-price-high.yaml
    low-utilization.yaml

4. Run Layer¶

Build a single command-line entrypoint for analysis runs.

Recommended implementation:

typer CLI
rich terminal output

Illustrative commands:

emob-model run-bufm --model rider --scenario scenarios/baseline/togo-baseline.yaml
emob-model run-ecosystem --scenario scenarios/baseline/togo-baseline.yaml
emob-model run-profit-pools --scenario scenarios/baseline/togo-separated-players.yaml
emob-model compare-segmentations --scenario scenarios/baseline/togo-baseline.yaml
emob-model export-brief --scenario scenarios/baseline/togo-thin-integrator.yaml
emob-model export-excel --scenario scenarios/baseline/togo-baseline.yaml

5. Data And Result Layer¶

Store run outputs as structured analytical artifacts.

Recommended approach:

JSON or parquet outputs for run snapshots
polars for analytical transforms
duckdb for querying scenario results across many runs

Illustrative structure:

runs/
  2026-04-01-togo-baseline/
    inputs.json
    ecosystem-summary.json
    participant-pnl.parquet
    profitability-pools.parquet
    bd-brief.md
    charts/

6. Presentation Layer¶

Generate stakeholder-facing outputs from the same run artifacts.

Output types should include:

markdown summary briefs
chart packs
tables for slide decks
participant-specific profitability views
optional Excel workbook export

Excel should be treated as:

a familiar handoff artifact
a snapshot format
a presentation surface

Excel should not be treated as:

the modeling engine
the system of record
the coupling logic authority

7. Interactive Exploration Layer¶

For internal analysis, prefer modern Python-native analytical tools over spreadsheet manipulation.

Recommended first choice:

marimo notebooks for reactive, versionable analytical exploration

Possible future additions:

fastapi for API access
a lightweight UI if scenario exploration becomes frequent

Do not introduce an API or front-end before model contracts and scenario structure are stable.

Recommended Technology Stack¶

Core¶

uv for Python project management
pytest for tests
pydantic for validated contracts
typer for CLI
rich for command output

Analysis¶

polars for tabular analytics
duckdb for local analytical querying
plotly for charts
marimo for exploratory analysis

Optional Later¶

fastapi for service/API exposure
orchestration only if scenario execution becomes materially complex

Proposed Repository Shape¶

Illustrative target structure:

emob-financial-models/
├── models/                      # Current computational seed, to be normalized
├── scenarios/                   # Structured scenario definitions
├── runs/                        # Generated analytical outputs
├── docs/
│   ├── analysis/
│   │   ├── overview.md
│   │   └── toolchain-refactor-plan.md
│   └── models/
├── src/
│   └── emob_financial_models/
│       ├── contracts/
│       ├── coupling/
│       ├── runners/
│       ├── exports/
│       └── cli.py
├── tests/
│   ├── unit/
│   ├── integration/
│   └── scenario/
└── scripts/                     # Legacy transitional adapters only

Notes:

models/ may eventually move under src/, but that is not required for phase 1.
scripts/ should shrink over time and eventually contain only thin adapters or maintenance utilities.

Legacy Script Disposition¶

Keep Temporarily As Legacy Adapters¶

scripts/create_workbook.py
scripts/generate_report.py
scripts/transform_data.py
scripts/sync_excel_to_docs.py

These should be reclassified as transitional tooling and no longer treated as the main workflow.

Retire After Replacement¶

Most rename/fix/update utilities under scripts/ should be retired after we have:

stable model contracts
a scenario runner
export adapters for required outputs

Examples include:

naming migration scripts
terminology patch scripts
workbook fix scripts

These are historical maintenance artifacts, not strategic capabilities.

Migration Plan¶

Phase 1. Freeze The New Doctrine¶

Status:

explicit analysis-first doctrine
explicit profitability-pool objective
explicit support for multiple commercial segmentations

This phase is now largely documented.

Phase 2. Normalize Existing Models¶

Goal:

make each current BUFM callable through a consistent contract

Tasks:

define shared input/output schemas
standardize naming across model results
identify which coupling assumptions are currently embedded and extract them
separate pure model logic from legacy workbook-derived assumptions

Deliverable:

consistent Python API across BUFMs

Phase 3. Build The Scenario Schema¶

Goal:

represent assumptions and segmentations explicitly

Tasks:

define scenario file structure
define commercial segmentation overlay format
add baseline example scenarios

Deliverable:

scenario files for at least one baseline market and two segmentation variants

Phase 4. Build The Runner¶

Goal:

execute BUFMs, ecosystem runs, and profitability-pool analyses from the command line

Tasks:

implement CLI
implement result packaging
implement participant-level and aggregate views

Deliverable:

one repeatable CLI-driven analysis workflow

Phase 5. Rebuild Outputs From The New Core¶

Goal:

regenerate stakeholder outputs from analytical results

Tasks:

generate markdown brief
generate chart pack
generate machine-readable result files
add optional Excel export adapter

Deliverable:

Excel becomes one output option, not the workflow center

Phase 6. Retire Legacy Workflow¶

Goal:

stop evolving workbook-first scripts

Tasks:

mark legacy scripts clearly
remove stale references from docs
keep only required backward-compatible export paths

Deliverable:

repo workflow is clearly analysis-first in both code and docs

Immediate Implementation Backlog¶

Recommended next build sequence:

Add project packaging with uv
Add contract models for one BUFM and one scenario
Build a minimal typer CLI
Run one baseline scenario end to end
Produce one aggregate summary and one participant-profitability summary
Add profitability-pool table output
Add one Excel export adapter last

Definition Of Success¶

The refactor is successful when:

analysts run scenarios without touching workbook formulas
BD materials can be generated for both system-level and participant-level views
alternative commercial segmentations can be compared explicitly
Excel is a renderer, not the computational authority
the repo communicates a modern analytical method rather than a legacy spreadsheet workflow