Toolchain Refactor Plan¶
Purpose¶
This document defines the target toolchain for the analysis-first financial modeling workflow in this repository.
It replaces the implicit workbook-first workflow with an explicit analytical pipeline centered on:
- BUFM model contracts
- scenario definitions
- ecosystem coupling logic
- profitability-pool analysis
- generated outputs for BD, finance, and stakeholder communication
The existing scripts in scripts/ should be treated as legacy prompts and transitional artifacts, not as architecture to preserve.
Why Refactor¶
The current repo already contains the right long-term seed in models/: testable Python BUFM modules and supporting docs.
However, the surrounding toolchain still reflects an earlier phase:
- a monolithic Excel workbook generator
- workbook-oriented transformation/report scripts
- one-off naming and migration utilities
- documentation references that still assume workbook-centric workflows
That is not the right center of gravity for the modeling method we now want.
Our actual objective is not to maintain a large spreadsheet stack. It is to develop a better analytical method for:
- understanding aggregate ecosystem economics
- understanding participant-level economics
- tracing profitability pools across the system
- supporting multiple commercial segmentations
- producing persuasive BD materials for specific counterparties
Governing Principles¶
- Analysis first. The primary work happens in model logic, scenario definitions, and analytical outputs. Excel is optional output.
- BUFMs are explicit. Each business unit model must have a clear boundary, inputs, outputs, KPIs, and profitability drivers.
- Couplings are explicit. Ecosystem behavior comes from modelled buy/sell relationships, utilization effects, and shared constraints.
- Multiple segmentations are supported. The same operational ecosystem may map to one integrated company or multiple distinct commercial players.
- Profitability pools are first-class outputs. We must be able to explain who captures value and why.
- Generated artifacts are renderers, not sources of truth. Excel, markdown briefs, charts, and slide-ready tables should all be derived from the same analytical core.
Target Architecture¶
1. Model Core¶
Keep models/ as the computational center of gravity.
Target responsibilities:
- one module per BUFM or glue layer
- deterministic calculation functions
- small, well-typed input and output contracts
- zero dependency on spreadsheet layout
- unit tests for every model block
Expected components:
ffd_model.pyeed_model.py- participant BUFMs such as rider, BCS, SNS, EPS
- coupling functions or modules for inter-BUFM transactions
2. Contract Layer¶
Add explicit schemas for model inputs, outputs, and scenario structure.
Recommended implementation:
pydanticmodels for validation- separate contracts for:
- BUFM input blocks
- BUFM output blocks
- coupling definitions
- commercial segmentation overlays
- report/export requests
Illustrative structure:
src/emob_financial_models/contracts/
bufm.py
coupling.py
scenario.py
segmentation.py
outputs.py
3. Scenario Layer¶
Create scenario definitions as structured files, not code and not workbook tabs.
Each scenario should be able to define:
- market assumptions
- technical assumptions
- capacity assumptions
- pricing assumptions
- coupling assumptions
- ownership and segmentation assumptions
This is the layer that lets us answer:
- Is the whole system economically viable?
- Which participant captures the value?
- How does the answer change if one company is thin?
- How does the answer change if roles are split across separate players?
Illustrative structure:
scenarios/
baseline/
togo-baseline.yaml
togo-thin-integrator.yaml
togo-separated-players.yaml
sensitivity/
fuel-price-high.yaml
low-utilization.yaml
4. Run Layer¶
Build a single command-line entrypoint for analysis runs.
Recommended implementation:
typerCLIrichterminal output
Illustrative commands:
emob-model run-bufm --model rider --scenario scenarios/baseline/togo-baseline.yaml
emob-model run-ecosystem --scenario scenarios/baseline/togo-baseline.yaml
emob-model run-profit-pools --scenario scenarios/baseline/togo-separated-players.yaml
emob-model compare-segmentations --scenario scenarios/baseline/togo-baseline.yaml
emob-model export-brief --scenario scenarios/baseline/togo-thin-integrator.yaml
emob-model export-excel --scenario scenarios/baseline/togo-baseline.yaml
5. Data And Result Layer¶
Store run outputs as structured analytical artifacts.
Recommended approach:
- JSON or parquet outputs for run snapshots
polarsfor analytical transformsduckdbfor querying scenario results across many runs
Illustrative structure:
runs/
2026-04-01-togo-baseline/
inputs.json
ecosystem-summary.json
participant-pnl.parquet
profitability-pools.parquet
bd-brief.md
charts/
6. Presentation Layer¶
Generate stakeholder-facing outputs from the same run artifacts.
Output types should include:
- markdown summary briefs
- chart packs
- tables for slide decks
- participant-specific profitability views
- optional Excel workbook export
Excel should be treated as:
- a familiar handoff artifact
- a snapshot format
- a presentation surface
Excel should not be treated as:
- the modeling engine
- the system of record
- the coupling logic authority
7. Interactive Exploration Layer¶
For internal analysis, prefer modern Python-native analytical tools over spreadsheet manipulation.
Recommended first choice:
marimonotebooks for reactive, versionable analytical exploration
Possible future additions:
fastapifor API access- a lightweight UI if scenario exploration becomes frequent
Do not introduce an API or front-end before model contracts and scenario structure are stable.
Recommended Technology Stack¶
Core¶
uvfor Python project managementpytestfor testspydanticfor validated contractstyperfor CLIrichfor command output
Analysis¶
polarsfor tabular analyticsduckdbfor local analytical queryingplotlyfor chartsmarimofor exploratory analysis
Optional Later¶
fastapifor service/API exposure- orchestration only if scenario execution becomes materially complex
Proposed Repository Shape¶
Illustrative target structure:
emob-financial-models/
├── models/ # Current computational seed, to be normalized
├── scenarios/ # Structured scenario definitions
├── runs/ # Generated analytical outputs
├── docs/
│ ├── analysis/
│ │ ├── overview.md
│ │ └── toolchain-refactor-plan.md
│ └── models/
├── src/
│ └── emob_financial_models/
│ ├── contracts/
│ ├── coupling/
│ ├── runners/
│ ├── exports/
│ └── cli.py
├── tests/
│ ├── unit/
│ ├── integration/
│ └── scenario/
└── scripts/ # Legacy transitional adapters only
Notes:
models/may eventually move undersrc/, but that is not required for phase 1.scripts/should shrink over time and eventually contain only thin adapters or maintenance utilities.
Legacy Script Disposition¶
Keep Temporarily As Legacy Adapters¶
scripts/create_workbook.pyscripts/generate_report.pyscripts/transform_data.pyscripts/sync_excel_to_docs.py
These should be reclassified as transitional tooling and no longer treated as the main workflow.
Retire After Replacement¶
Most rename/fix/update utilities under scripts/ should be retired after we have:
- stable model contracts
- a scenario runner
- export adapters for required outputs
Examples include:
- naming migration scripts
- terminology patch scripts
- workbook fix scripts
These are historical maintenance artifacts, not strategic capabilities.
Migration Plan¶
Phase 1. Freeze The New Doctrine¶
Status:
- explicit analysis-first doctrine
- explicit profitability-pool objective
- explicit support for multiple commercial segmentations
This phase is now largely documented.
Phase 2. Normalize Existing Models¶
Goal:
- make each current BUFM callable through a consistent contract
Tasks:
- define shared input/output schemas
- standardize naming across model results
- identify which coupling assumptions are currently embedded and extract them
- separate pure model logic from legacy workbook-derived assumptions
Deliverable:
- consistent Python API across BUFMs
Phase 3. Build The Scenario Schema¶
Goal:
- represent assumptions and segmentations explicitly
Tasks:
- define scenario file structure
- define commercial segmentation overlay format
- add baseline example scenarios
Deliverable:
- scenario files for at least one baseline market and two segmentation variants
Phase 4. Build The Runner¶
Goal:
- execute BUFMs, ecosystem runs, and profitability-pool analyses from the command line
Tasks:
- implement CLI
- implement result packaging
- implement participant-level and aggregate views
Deliverable:
- one repeatable CLI-driven analysis workflow
Phase 5. Rebuild Outputs From The New Core¶
Goal:
- regenerate stakeholder outputs from analytical results
Tasks:
- generate markdown brief
- generate chart pack
- generate machine-readable result files
- add optional Excel export adapter
Deliverable:
- Excel becomes one output option, not the workflow center
Phase 6. Retire Legacy Workflow¶
Goal:
- stop evolving workbook-first scripts
Tasks:
- mark legacy scripts clearly
- remove stale references from docs
- keep only required backward-compatible export paths
Deliverable:
- repo workflow is clearly analysis-first in both code and docs
Immediate Implementation Backlog¶
Recommended next build sequence:
- Add project packaging with
uv - Add contract models for one BUFM and one scenario
- Build a minimal
typerCLI - Run one baseline scenario end to end
- Produce one aggregate summary and one participant-profitability summary
- Add profitability-pool table output
- Add one Excel export adapter last
Definition Of Success¶
The refactor is successful when:
- analysts run scenarios without touching workbook formulas
- BD materials can be generated for both system-level and participant-level views
- alternative commercial segmentations can be compared explicitly
- Excel is a renderer, not the computational authority
- the repo communicates a modern analytical method rather than a legacy spreadsheet workflow