ποΈ Pipeline ArchitectureΒΆ
This document provides a deeper look at the internal architecture of the Energy Measurement Pipeline, explaining its stage-based design, execution flow, and extensibility mechanisms.
βοΈ Modular Stage PipelineΒΆ
The pipeline is implemented as a pipe-and-filter model composed of modular units called stages. Each stage performs a focused task and passes control to the next stage via a shared context.
Each stage implements the same interface:
class PipelineStage(ABC):
@abstractmethod
def run(self, context: dict[str, Any]) -> None:
...
π§± Stage CategoriesΒΆ
Stages are grouped by when and how often they are executed:
Stage Type |
Frequency |
Example Tasks |
|---|---|---|
|
Once per batch |
Check RAPL/perf access |
|
Once per unique commit |
Checkout, compile, setup |
|
Repeated for every test run |
Measure energy, cleanup |
π Execution FlowΒΆ
Pipeline (per batch)
βββ Pre-Stages (1x)
β βββ e.g. VerifyPerfStage
βββ Pre-Test Stages (1x per commit, parallelized)
β βββ CheckoutStage
β βββ BuildStage
β βββ JavaSetupStage
βββ Batch Stages (N x per commit)
βββ TemperatureCheckStage
βββ MeasureEnergyStage
βββ PostTestStage
This model enables:
β Pre-building commits once and reusing them
β Concurrent stage execution where safe
β Fine-grained extensibility per stage group
π Plugin System for StagesΒΆ
Each stage is a self-contained Python class and can be loaded dynamically from user-defined files.
Requirements for a Custom StageΒΆ
Inherits from
PipelineStageImplements
run(context: dict)methodExposes
get_stage()function (used for dynamic loading)
ExampleΒΆ
# modules/python_env_stage.py
class PythonEnvStage(PipelineStage):
def run(self, context: dict[str, Any]) -> None:
os.system("pip install -r requirements.txt")
def get_stage():
return PythonEnvStage()
Then include it in your config:
"modules_enabled": ["python_env_stage.py"]
π Directory LayoutΒΆ
.
βββ main.py # CLI entry point
βββ pipeline/ # Pipeline engine and interfaces
β βββ core_stages/ # Built-in stages (checkout, measure, etc.)
β βββ pipeline.py # Orchestrator for stages
β βββ stage_interface.py
βββ modules/ # Optional user-defined custom stages
βββ config/ # Config models (Pydantic)
βββ plot.py # Plotting results
βββ sort.py # Sort results by Git history
βββ system_setup.sh # System preparation script
π Example Execution Flow (High-Level)ΒΆ
main.py measure β load config β prepare repo
β
gather commits + batch them
β
Run pre-stages (e.g. perf check)
β
Run pre-test stages (in parallel):
- Checkout β Build β JavaSetup
β
For each commit:
Repeat batch stages (MeasureEnergy, PostTest) N times
β
Restore repo HEAD
π‘ Design PrinciplesΒΆ
Principle |
Implementation |
|---|---|
Modularity |
Each stage is an isolated Python class |
Extensibility |
Users can add their own stages dynamically |
Separation |
Config-driven behavior, no hardcoded logic |
Reproducibility |
Deterministic commit batching + reuse |
Minimal coupling |
Context dictionary avoids global state |
π Parallelism StrategyΒΆ
Pre-Test stages for different commits are executed in parallel using
ProcessPoolExecutorBatch stages are run sequentially per commit to preserve measurement integrity
π Optimization Ideas (Planned)ΒΆ
[ ] Detect and skip already measured commits
[ ] Smart batching based on CPU temperature
[ ] Reuse compiled artifacts across sessions
[ ] Live log dashboards
[ ] Advanced scheduling policies