πŸ—οΈ Pipeline ArchitectureΒΆ

This document provides a deeper look at the internal architecture of the Energy Measurement Pipeline, explaining its stage-based design, execution flow, and extensibility mechanisms.


βš™οΈ Modular Stage PipelineΒΆ

The pipeline is implemented as a pipe-and-filter model composed of modular units called stages. Each stage performs a focused task and passes control to the next stage via a shared context.

Each stage implements the same interface:

class PipelineStage(ABC):
    @abstractmethod
    def run(self, context: dict[str, Any]) -> None:
        ...

🧱 Stage Categories¢

Stages are grouped by when and how often they are executed:

Stage Type

Frequency

Example Tasks

Pre-Stages

Once per batch

Check RAPL/perf access

Pre-Test Stages

Once per unique commit

Checkout, compile, setup

Batch Stages

Repeated for every test run

Measure energy, cleanup


πŸ” Execution FlowΒΆ

Pipeline (per batch)
β”œβ”€β”€ Pre-Stages (1x)
β”‚   └── e.g. VerifyPerfStage
β”œβ”€β”€ Pre-Test Stages (1x per commit, parallelized)
β”‚   β”œβ”€β”€ CheckoutStage
β”‚   β”œβ”€β”€ BuildStage
β”‚   └── JavaSetupStage
└── Batch Stages (N x per commit)
    β”œβ”€β”€ TemperatureCheckStage
    β”œβ”€β”€ MeasureEnergyStage
    └── PostTestStage

This model enables:

  • βœ… Pre-building commits once and reusing them

  • βœ… Concurrent stage execution where safe

  • βœ… Fine-grained extensibility per stage group


🧠 Shared Context¢

All stages receive a context: dict[str, Any] which allows:

  • Passing commit information

  • Communicating control signals (abort_pipeline, build_failed, etc.)

  • Sharing paths, results, and state between stages

Example usage:

context["build_failed"] = True
context["abort_pipeline"] = True

πŸ”Œ Plugin System for StagesΒΆ

Each stage is a self-contained Python class and can be loaded dynamically from user-defined files.

Requirements for a Custom StageΒΆ

  • Inherits from PipelineStage

  • Implements run(context: dict) method

  • Exposes get_stage() function (used for dynamic loading)

ExampleΒΆ

# modules/python_env_stage.py
class PythonEnvStage(PipelineStage):
    def run(self, context: dict[str, Any]) -> None:
        os.system("pip install -r requirements.txt")

def get_stage():
    return PythonEnvStage()

Then include it in your config:

"modules_enabled": ["python_env_stage.py"]

πŸ“ Directory LayoutΒΆ

.
β”œβ”€β”€ main.py               # CLI entry point
β”œβ”€β”€ pipeline/             # Pipeline engine and interfaces
β”‚   β”œβ”€β”€ core_stages/      # Built-in stages (checkout, measure, etc.)
β”‚   β”œβ”€β”€ pipeline.py       # Orchestrator for stages
β”‚   └── stage_interface.py
β”œβ”€β”€ modules/              # Optional user-defined custom stages
β”œβ”€β”€ config/               # Config models (Pydantic)
β”œβ”€β”€ plot.py               # Plotting results
β”œβ”€β”€ sort.py               # Sort results by Git history
β”œβ”€β”€ system_setup.sh       # System preparation script

πŸ›  Example Execution Flow (High-Level)ΒΆ

main.py measure β†’ load config β†’ prepare repo
        ↓
   gather commits + batch them
        ↓
Run pre-stages (e.g. perf check)
        ↓
Run pre-test stages (in parallel):
    - Checkout β†’ Build β†’ JavaSetup
        ↓
For each commit:
    Repeat batch stages (MeasureEnergy, PostTest) N times
        ↓
Restore repo HEAD

πŸ’‘ Design PrinciplesΒΆ

Principle

Implementation

Modularity

Each stage is an isolated Python class

Extensibility

Users can add their own stages dynamically

Separation

Config-driven behavior, no hardcoded logic

Reproducibility

Deterministic commit batching + reuse

Minimal coupling

Context dictionary avoids global state


πŸ”„ Parallelism StrategyΒΆ

  • Pre-Test stages for different commits are executed in parallel using ProcessPoolExecutor

  • Batch stages are run sequentially per commit to preserve measurement integrity


πŸš€ Optimization Ideas (Planned)ΒΆ

  • [ ] Detect and skip already measured commits

  • [ ] Smart batching based on CPU temperature

  • [ ] Reuse compiled artifacts across sessions

  • [ ] Live log dashboards

  • [ ] Advanced scheduling policies