Merge pull request #650 from beckn/benchmark

[Benchmark] End-to-end performance benchmark suite for the beckn-onix adapter
2026-04-10 16:07:53 +05:30
parent dab54b574c e0d7e3508f
commit cc4eb1efdd
19 changed files with 6024 additions and 19 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -131,6 +131,12 @@ dist
 .yarn/install-state.gz
 .pnp.*
 # Benchmark runtime output (raw go test output, logs, CSVs)
 benchmarks/results/
 # Utility scripts not part of the project
 create_benchmark_issues.sh
 # Ignore compiled shared object files
 *.so
--- a/benchmarks/README.md
+++ b/benchmarks/README.md
@@ -0,0 +1,183 @@
 # beckn-onix Adapter Benchmarks
 End-to-end performance benchmarks for the beckn-onix ONIX adapter, using Go's native `testing.B` framework and `net/http/httptest`. No Docker, no external services — everything runs in-process.
 ---
 ## Quick Start
 ```bash
 # From the repo root
 go mod tidy                        # fetch miniredis + benchstat checksums
 bash benchmarks/run_benchmarks.sh  # compile plugins, run all scenarios, generate report
 ```
 Runtime output lands in `benchmarks/results/<timestamp>/` (gitignored). Committed reports live in `benchmarks/reports/`.
 ---
 ## What Is Being Benchmarked
 The benchmarks target the **`bapTxnCaller`** handler — the primary outbound path a BAP takes when initiating a Beckn transaction. Every request travels through the full production pipeline:
 ```
 Benchmark goroutine(s)
        │  HTTP POST /bap/caller/<action>
        ▼
 httptest.Server  ←  ONIX adapter (real compiled .so plugins)
        │
        ├── addRoute      router plugin      resolve BPP URL from routing config
        ├── sign          signer + simplekeymanager  Ed25519 / BLAKE-512 signing
        └── validateSchema  schemav2validator  Beckn OpenAPI spec validation
        │
        └──▶ httptest mock BPP  (instant ACK — no network)
 ```
 Mock services replace all external dependencies so results reflect **adapter-internal latency only**:
 | Dependency | Replaced by |
 |------------|-------------|
 | Redis | `miniredis` (in-process) |
 | BPP backend | `httptest` mock — returns `{"message":{"ack":{"status":"ACK"}}}` |
 | Beckn registry | `httptest` mock — returns the dev key pair for signature verification |
 ---
 ## Benchmark Scenarios
 | Benchmark | What it measures |
 |-----------|-----------------|
 | `BenchmarkBAPCaller_Discover` | Baseline single-goroutine latency for `/discover` |
 | `BenchmarkBAPCaller_Discover_Parallel` | Throughput under concurrent load; run with `-cpu=1,2,4,8,16` |
 | `BenchmarkBAPCaller_AllActions` | Per-action latency: `discover`, `select`, `init`, `confirm` |
 | `BenchmarkBAPCaller_Discover_Percentiles` | p50 / p95 / p99 latency via `b.ReportMetric` |
 | `BenchmarkBAPCaller_CacheWarm` | Latency when the Redis key cache is already populated |
 | `BenchmarkBAPCaller_CacheCold` | Latency on a cold cache — full key-derivation round-trip |
 | `BenchmarkBAPCaller_RPS` | Requests-per-second under parallel load (`req/s` custom metric) |
 ---
 ## How It Works
 ### Startup (`TestMain`)
 Before any benchmark runs, `TestMain` in `e2e/setup_test.go`:
 1. **Compiles all required plugins** to a temporary directory using `go build -buildmode=plugin`. The first run takes 60–90 s (cold Go build cache); subsequent runs are near-instant.
 2. **Starts miniredis** — an in-process Redis server used by the `cache` plugin (no external Redis needed).
 3. **Starts mock servers** — an instant-ACK BPP and a registry mock that returns the dev signing public key.
 4. **Starts the adapter** — wires all plugins programmatically (no YAML parsing) and wraps it in an `httptest.Server`.
 ### Per-iteration (`buildSignedRequest`)
 Each benchmark iteration:
 1. Loads the JSON fixture for the requested Beckn action (`testdata/<action>_request.json`).
 2. Substitutes sentinel values (`BENCH_TIMESTAMP`, `BENCH_MESSAGE_ID`, `BENCH_TRANSACTION_ID`) with fresh values, ensuring unique message IDs per iteration.
 3. Signs the body using the Beckn Ed25519/BLAKE-512 spec (same algorithm as the production `signer` plugin).
 4. Sends the signed `POST` to the adapter and validates a `200 OK` response.
 ### Validation test (`TestSignBecknPayload`)
 A plain `Test*` function runs before the benchmarks and sends one signed request end-to-end. If the signing helper is mis-implemented, this fails fast before any benchmark time is wasted.
 ---
 ## Directory Layout
 ```
 benchmarks/
 ├── README.md                        ← you are here
 ├── run_benchmarks.sh                ← one-shot runner script
 ├── e2e/
 │   ├── bench_test.go                ← benchmark functions
 │   ├── setup_test.go                ← TestMain, startAdapter, signing helper
 │   ├── mocks_test.go                ← mock BPP and registry servers
 │   ├── keys_test.go                 ← dev key pair constants
 │   └── testdata/
 │       ├── routing-BAPCaller.yaml   ← routing config (BENCH_BPP_URL placeholder)
 │       ├── discover_request.json    ← Beckn search payload fixture
 │       ├── select_request.json
 │       ├── init_request.json
 │       └── confirm_request.json
 ├── tools/
 │   ├── parse_results.go             ← CSV exporter for latency + throughput data
 │   └── generate_report.go           ← fills REPORT_TEMPLATE.md with run data
 ├── reports/                         ← committed benchmark reports and template
 │   ├── REPORT_TEMPLATE.md           ← template used to generate each run's report
 │   └── REPORT_ONIX_v150.md          ← baseline report (Apple M5, Beckn v2.0.0)
 └── results/                         ← gitignored; created by run_benchmarks.sh
    └── <timestamp>/
        ├── BENCHMARK_REPORT.md            — generated human-readable report
        ├── run1.txt, run2.txt, run3.txt   — raw go test -bench output
        ├── parallel_cpu*.txt              — concurrency sweep
        ├── benchstat_summary.txt          — statistical aggregation
        ├── latency_report.csv             — per-benchmark latency (from parse_results.go)
        └── throughput_report.csv          — RPS vs GOMAXPROCS (from parse_results.go)
 ```
 ---
 ## Reports
 Committed reports are stored in `benchmarks/reports/`. Each report documents the environment, raw numbers, and analysis for a specific run and adapter version.
 | File | Platform | Adapter version |
 |------|----------|-----------------|
 | `REPORT_ONIX_v150.md` | Apple M5 · darwin/arm64 · GOMAXPROCS=10 | beckn-onix v1.5.0 |
 The script auto-generates `BENCHMARK_REPORT.md` in each results directory using `REPORT_TEMPLATE.md`. To permanently record a run:
 1. Run `bash benchmarks/run_benchmarks.sh` — `BENCHMARK_REPORT.md` is generated automatically.
 2. Review it, fill in the B5 bottleneck analysis section.
 3. Copy it to `benchmarks/reports/REPORT_<tag>.md` and commit.
 4. `benchmarks/results/` stays gitignored; only the curated report goes in.
 ---
 ## Running Individual Benchmarks
 ```bash
 # Single benchmark, 10 s
 go test ./benchmarks/e2e/... \
  -bench=BenchmarkBAPCaller_Discover \
  -benchtime=10s -benchmem -timeout=30m
 # All actions in one shot
 go test ./benchmarks/e2e/... \
  -bench=BenchmarkBAPCaller_AllActions \
  -benchtime=5s -benchmem -timeout=30m
 # Concurrency sweep at 1, 4, and 16 goroutines
 go test ./benchmarks/e2e/... \
  -bench=BenchmarkBAPCaller_Discover_Parallel \
  -benchtime=30s -cpu=1,4,16 -timeout=30m
 # Race detector check (no data races)
 go test ./benchmarks/e2e/... \
  -bench=BenchmarkBAPCaller_Discover_Parallel \
  -benchtime=5s -race -timeout=30m
 # Percentile metrics (p50/p95/p99 in µs)
 go test ./benchmarks/e2e/... \
  -bench=BenchmarkBAPCaller_Discover_Percentiles \
  -benchtime=10s -benchmem -timeout=30m
 ```
 ## Comparing Two Runs with benchstat
 ```bash
 go test ./benchmarks/e2e/... -bench=. -benchtime=10s -count=6 > before.txt
 # ... make your change ...
 go test ./benchmarks/e2e/... -bench=. -benchtime=10s -count=6 > after.txt
 go tool benchstat before.txt after.txt
 ```
 ---
 ## Dependencies
 | Package | Purpose |
 |---------|---------|
 | `github.com/alicebob/miniredis/v2` | In-process Redis for the `cache` plugin |
 | `golang.org/x/perf/cmd/benchstat` | Statistical benchmark comparison (CLI tool) |
 Both are declared in `go.mod`. Run `go mod tidy` once to fetch their checksums.
--- a/benchmarks/e2e/bench_test.go
+++ b/benchmarks/e2e/bench_test.go
@@ -0,0 +1,183 @@
 package e2e_bench_test
 import (
 	"net/http"
 	"sort"
 	"testing"
 	"time"
 )
 // ── BenchmarkBAPCaller_Discover ───────────────────────────────────────────────
 // Baseline single-goroutine throughput and latency for the discover endpoint.
 // Exercises the full bapTxnCaller pipeline: addRoute → sign → validateSchema.
 func BenchmarkBAPCaller_Discover(b *testing.B) {
 	b.ReportAllocs()
 	b.ResetTimer()
 	for i := 0; i < b.N; i++ {
 		req := buildSignedRequest(b, "discover")
 		if err := sendRequest(req); err != nil {
 			b.Errorf("iteration %d: %v", i, err)
 		}
 	}
 }
 // ── BenchmarkBAPCaller_Discover_Parallel ─────────────────────────────────────
 // Measures throughput under concurrent load. Run with -cpu=1,2,4,8,16 to
 // produce a concurrency sweep. Each goroutine runs its own request loop.
 func BenchmarkBAPCaller_Discover_Parallel(b *testing.B) {
 	b.ReportAllocs()
 	b.ResetTimer()
 	b.RunParallel(func(pb *testing.PB) {
 		for pb.Next() {
 			req := buildSignedRequest(b, "discover")
 			if err := sendRequest(req); err != nil {
 				b.Errorf("parallel: %v", err)
 			}
 		}
 	})
 }
 // ── BenchmarkBAPCaller_AllActions ────────────────────────────────────────────
 // Measures per-action latency for discover, select, init, and confirm in a
 // single benchmark run. Each sub-benchmark is independent.
 func BenchmarkBAPCaller_AllActions(b *testing.B) {
 	actions := []string{"discover", "select", "init", "confirm"}
 	for _, action := range actions {
 		action := action // capture for sub-benchmark closure
 		b.Run(action, func(b *testing.B) {
 			b.ReportAllocs()
 			b.ResetTimer()
 			for i := 0; i < b.N; i++ {
 				req := buildSignedRequest(b, action)
 				if err := sendRequest(req); err != nil {
 					b.Errorf("action %s iteration %d: %v", action, i, err)
 				}
 			}
 		})
 	}
 }
 // ── BenchmarkBAPCaller_Discover_Percentiles ───────────────────────────────────
 // Collects individual request durations and reports p50, p95, and p99 latency
 // in microseconds via b.ReportMetric. The percentile data is only meaningful
 // when -benchtime is at least 5s (default used in run_benchmarks.sh).
 func BenchmarkBAPCaller_Discover_Percentiles(b *testing.B) {
 	durations := make([]time.Duration, 0, b.N)
 	b.ReportAllocs()
 	b.ResetTimer()
 	for i := 0; i < b.N; i++ {
 		req := buildSignedRequest(b, "discover")
 		start := time.Now()
 		if err := sendRequest(req); err != nil {
 			b.Errorf("iteration %d: %v", i, err)
 			continue
 		}
 		durations = append(durations, time.Since(start))
 	}
 	// Compute and report percentiles.
 	if len(durations) == 0 {
 		return
 	}
 	sort.Slice(durations, func(i, j int) bool { return durations[i] < durations[j] })
 	p50 := durations[len(durations)*50/100]
 	p95 := durations[len(durations)*95/100]
 	p99 := durations[len(durations)*99/100]
 	b.ReportMetric(float64(p50.Microseconds()), "p50_µs")
 	b.ReportMetric(float64(p95.Microseconds()), "p95_µs")
 	b.ReportMetric(float64(p99.Microseconds()), "p99_µs")
 }
 // ── BenchmarkBAPCaller_CacheWarm / CacheCold ─────────────────────────────────
 // Compares latency when the Redis cache holds a pre-warmed key set (CacheWarm)
 // vs. when each iteration has a fresh message_id that the cache has never seen
 // (CacheCold). The delta reveals the key-lookup overhead on a cold path.
 // BenchmarkBAPCaller_CacheWarm sends a fixed body (constant message_id) so the
 // simplekeymanager's Redis cache is hit on every iteration after the first.
 func BenchmarkBAPCaller_CacheWarm(b *testing.B) {
 	body := warmFixtureBody(b, "discover")
 	// Warm-up: send once to populate the cache before the timer starts.
 	warmReq := buildSignedRequestFixed(b, "discover", body)
 	if err := sendRequest(warmReq); err != nil {
 		b.Fatalf("cache warm-up request failed: %v", err)
 	}
 	b.ReportAllocs()
 	b.ResetTimer()
 	for i := 0; i < b.N; i++ {
 		req := buildSignedRequestFixed(b, "discover", body)
 		if err := sendRequest(req); err != nil {
 			b.Errorf("CacheWarm iteration %d: %v", i, err)
 		}
 	}
 }
 // BenchmarkBAPCaller_CacheCold uses a fresh message_id per iteration, so every
 // request experiences a cache miss and a full key-derivation round-trip.
 func BenchmarkBAPCaller_CacheCold(b *testing.B) {
 	b.ReportAllocs()
 	b.ResetTimer()
 	for i := 0; i < b.N; i++ {
 		req := buildSignedRequest(b, "discover") // fresh IDs each time
 		if err := sendRequest(req); err != nil {
 			b.Errorf("CacheCold iteration %d: %v", i, err)
 		}
 	}
 }
 // ── BenchmarkBAPCaller_RPS ────────────────────────────────────────────────────
 // Reports requests-per-second as a custom metric alongside the default ns/op.
 // Run with -benchtime=30s for a stable RPS reading.
 func BenchmarkBAPCaller_RPS(b *testing.B) {
 	b.ReportAllocs()
 	var count int64
 	start := time.Now()
 	b.ResetTimer()
 	b.RunParallel(func(pb *testing.PB) {
 		var local int64
 		for pb.Next() {
 			req := buildSignedRequest(b, "discover")
 			if err := sendRequest(req); err == nil {
 				local++
 			}
 		}
 		// Accumulate without atomic for simplicity — final value only read after
 		// RunParallel returns and all goroutines have exited.
 		count += local
 	})
 	elapsed := time.Since(start).Seconds()
 	if elapsed > 0 {
 		b.ReportMetric(float64(count)/elapsed, "req/s")
 	}
 }
 // ── helper: one-shot HTTP client ─────────────────────────────────────────────
 // benchHTTPClient is a shared client for all benchmark goroutines.
 // MaxConnsPerHost caps the total active connections to localhost so we don't
 // exhaust the OS ephemeral port range. MaxIdleConnsPerHost keeps that many
 // connections warm in the pool so parallel goroutines reuse them rather than
 // opening fresh TCP connections on every request.
 var benchHTTPClient = &http.Client{
 	Transport: &http.Transport{
 		MaxIdleConns:        200,
 		MaxIdleConnsPerHost: 200,
 		MaxConnsPerHost:     200,
 		IdleConnTimeout:     90 * time.Second,
 		DisableCompression:  true, // no benefit compressing localhost traffic
 	},
 }
--- a/benchmarks/e2e/keys_test.go
+++ b/benchmarks/e2e/keys_test.go
@@ -0,0 +1,13 @@
 package e2e_bench_test
 // Development key pair from config/local-retail-bap.yaml.
 // Used across the retail devkit for non-production testing.
 // DO NOT use in any production or staging environment.
 const (
 	benchSubscriberID = "sandbox.food-finder.com"
 	benchKeyID        = "76EU7VwahYv4XztXJzji9ssiSV74eWXWBcCKGn7jAdm5VGLCdYAJ8j"
 	benchPrivKey      = "rrNtVgyASCGlo+ebsJaA37D5CZYZVfT0JA5/vlkTeV0="
 	benchPubKey       = "oFIk7KqCqvqRYkLMjQqiaKM5oOozkYT64bfLuc8p/SU="
 	benchEncrPrivKey  = "rrNtVgyASCGlo+ebsJaA37D5CZYZVfT0JA5/vlkTeV0="
 	benchEncrPubKey   = "oFIk7KqCqvqRYkLMjQqiaKM5oOozkYT64bfLuc8p/SU="
 )
--- a/benchmarks/e2e/mocks_test.go
+++ b/benchmarks/e2e/mocks_test.go
@@ -0,0 +1,63 @@
 package e2e_bench_test
 import (
 	"encoding/json"
 	"fmt"
 	"net/http"
 	"net/http/httptest"
 	"strings"
 	"time"
 )
 // startMockBPP starts an httptest server that accepts any POST request and
 // immediately returns a valid Beckn ACK. This replaces the real BPP backend,
 // isolating benchmark results to adapter-internal latency only.
 func startMockBPP() *httptest.Server {
 	ackBody := `{"message":{"ack":{"status":"ACK"}}}`
 	return httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
 		w.Header().Set("Content-Type", "application/json")
 		w.WriteHeader(http.StatusOK)
 		fmt.Fprint(w, ackBody)
 	}))
 }
 // subscriberRecord mirrors the registry API response shape for a single subscriber.
 type subscriberRecord struct {
 	SubscriberID    string `json:"subscriber_id"`
 	UniqueKeyID     string `json:"unique_key_id"`
 	SigningPublicKey string `json:"signing_public_key"`
 	ValidFrom       string `json:"valid_from"`
 	ValidUntil      string `json:"valid_until"`
 	Status          string `json:"status"`
 }
 // startMockRegistry starts an httptest server that returns a subscriber record
 // matching the benchmark test keys. The signvalidator plugin uses this to
 // resolve the public key for signature verification on incoming requests.
 func startMockRegistry() *httptest.Server {
 	record := subscriberRecord{
 		SubscriberID:    benchSubscriberID,
 		UniqueKeyID:     benchKeyID,
 		SigningPublicKey: benchPubKey,
 		ValidFrom:       time.Now().AddDate(-1, 0, 0).Format(time.RFC3339),
 		ValidUntil:      time.Now().AddDate(10, 0, 0).Format(time.RFC3339),
 		Status:          "SUBSCRIBED",
 	}
 	body, _ := json.Marshal([]subscriberRecord{record})
 	return httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
 		// Support both GET (lookup) and POST (lookup with body) registry calls.
 		// Respond with the subscriber record regardless of subscriber_id query param.
 		subscriberID := r.URL.Query().Get("subscriber_id")
 		if subscriberID == "" {
 			// Try extracting from path for dedi-registry style calls.
 			parts := strings.Split(strings.TrimPrefix(r.URL.Path, "/"), "/")
 			if len(parts) > 0 {
 				subscriberID = parts[len(parts)-1]
 			}
 		}
 		w.Header().Set("Content-Type", "application/json")
 		w.WriteHeader(http.StatusOK)
 		w.Write(body)
 	}))
 }
--- a/benchmarks/e2e/setup_test.go
+++ b/benchmarks/e2e/setup_test.go
@@ -0,0 +1,466 @@
 package e2e_bench_test
 import (
 	"bytes"
 	"context"
 	"crypto/ed25519"
 	"encoding/base64"
 	"encoding/json"
 	"fmt"
 	"io"
 	"net/http"
 	"net/http/httptest"
 	"os"
 	"os/exec"
 	"path/filepath"
 	"strings"
 	"testing"
 	"time"
 	"github.com/alicebob/miniredis/v2"
 	"github.com/beckn-one/beckn-onix/core/module"
 	"github.com/beckn-one/beckn-onix/core/module/handler"
 	"github.com/beckn-one/beckn-onix/pkg/model"
 	"github.com/beckn-one/beckn-onix/pkg/plugin"
 	"github.com/google/uuid"
 	"github.com/rs/zerolog"
 	"golang.org/x/crypto/blake2b"
 )
 // Package-level references shared across all benchmarks.
 var (
 	adapterServer *httptest.Server
 	miniRedis     *miniredis.Miniredis
 	mockBPP       *httptest.Server
 	mockRegistry  *httptest.Server
 	pluginDir     string
 	moduleRoot    string // set in TestMain; used by buildBAPCallerConfig for local file paths
 )
 // Plugins to compile for the benchmark. Each entry is (pluginID, source path relative to module root).
 var pluginsToBuild = []struct {
 	id  string
 	src string
 }{
 	{"router", "pkg/plugin/implementation/router/cmd/plugin.go"},
 	{"signer", "pkg/plugin/implementation/signer/cmd/plugin.go"},
 	{"signvalidator", "pkg/plugin/implementation/signvalidator/cmd/plugin.go"},
 	{"simplekeymanager", "pkg/plugin/implementation/simplekeymanager/cmd/plugin.go"},
 	{"cache", "pkg/plugin/implementation/cache/cmd/plugin.go"},
 	{"schemav2validator", "pkg/plugin/implementation/schemav2validator/cmd/plugin.go"},
 	{"otelsetup", "pkg/plugin/implementation/otelsetup/cmd/plugin.go"},
 	// registry is required by stdHandler to wire KeyManager, even on the caller
 	// path where sign-validation never runs.
 	{"registry", "pkg/plugin/implementation/registry/cmd/plugin.go"},
 }
 // TestMain is the entry point for the benchmark package. It:
 //  1. Compiles all required .so plugins into a temp directory
 //  2. Starts miniredis (in-process Redis)
 //  3. Starts mock BPP and registry HTTP servers
 //  4. Starts the adapter as an httptest.Server
 //  5. Runs all benchmarks
 //  6. Tears everything down in reverse order
 func TestMain(m *testing.M) {
 	ctx := context.Background()
 	// ── Step 1: Compile plugins ───────────────────────────────────────────────
 	var err error
 	pluginDir, err = os.MkdirTemp("", "beckn-bench-plugins-*")
 	if err != nil {
 		fmt.Fprintf(os.Stderr, "ERROR: failed to create plugin temp dir: %v\n", err)
 		os.Exit(1)
 	}
 	defer os.RemoveAll(pluginDir)
 	moduleRoot, err = findModuleRoot()
 	if err != nil {
 		fmt.Fprintf(os.Stderr, "ERROR: failed to locate module root: %v\n", err)
 		os.Exit(1)
 	}
 	fmt.Printf("=== Building plugins (first run may take 60-90s) ===\n")
 	for _, p := range pluginsToBuild {
 		outPath := filepath.Join(pluginDir, p.id+".so")
 		srcPath := filepath.Join(moduleRoot, p.src)
 		fmt.Printf("  compiling %s.so ...\n", p.id)
 		cmd := exec.Command("go", "build", "-buildmode=plugin", "-o", outPath, srcPath)
 		cmd.Dir = moduleRoot
 		if out, buildErr := cmd.CombinedOutput(); buildErr != nil {
 			fmt.Fprintf(os.Stderr, "ERROR: failed to build plugin %s:\n%s\n", p.id, string(out))
 			os.Exit(1)
 		}
 	}
 	fmt.Printf("=== All plugins compiled successfully ===\n\n")
 	// ── Step 2: Start miniredis ───────────────────────────────────────────────
 	miniRedis, err = miniredis.Run()
 	if err != nil {
 		fmt.Fprintf(os.Stderr, "ERROR: failed to start miniredis: %v\n", err)
 		os.Exit(1)
 	}
 	defer miniRedis.Close()
 	// ── Step 3: Start mock servers ────────────────────────────────────────────
 	mockBPP = startMockBPP()
 	defer mockBPP.Close()
 	mockRegistry = startMockRegistry()
 	defer mockRegistry.Close()
 	// ── Step 4: Start adapter ─────────────────────────────────────────────────
 	adapterServer, err = startAdapter(ctx)
 	if err != nil {
 		fmt.Fprintf(os.Stderr, "ERROR: failed to start adapter: %v\n", err)
 		os.Exit(1)
 	}
 	defer adapterServer.Close()
 	// ── Step 5: Run benchmarks ────────────────────────────────────────────────
 	// Silence the adapter's zerolog output for the duration of the benchmark
 	// run. Without this, every HTTP request the adapter processes emits a JSON
 	// log line to stdout, which interleaves with Go's benchmark result lines
 	// (BenchmarkFoo-N\t\t<count>\t<ns/op>) and makes benchstat unparseable.
 	// Setup logging above still ran normally; zerolog.Disabled is set only here,
 	// just before m.Run(), so errors during startup remain visible.
 	zerolog.SetGlobalLevel(zerolog.Disabled)
 	os.Exit(m.Run())
 }
 // findModuleRoot walks up from the current directory to find the go.mod root.
 func findModuleRoot() (string, error) {
 	dir, err := os.Getwd()
 	if err != nil {
 		return "", err
 	}
 	for {
 		if _, err := os.Stat(filepath.Join(dir, "go.mod")); err == nil {
 			return dir, nil
 		}
 		parent := filepath.Dir(dir)
 		if parent == dir {
 			return "", fmt.Errorf("go.mod not found from %s", dir)
 		}
 		dir = parent
 	}
 }
 // writeRoutingConfig reads the benchmark routing config template, replaces the
 // BENCH_BPP_URL placeholder with the live mock BPP server URL, and writes the
 // result to a temp file. Returns the path to the temp file.
 func writeRoutingConfig(bppURL string) (string, error) {
 	templatePath := filepath.Join("testdata", "routing-BAPCaller.yaml")
 	data, err := os.ReadFile(templatePath)
 	if err != nil {
 		return "", fmt.Errorf("reading routing config template: %w", err)
 	}
 	content := strings.ReplaceAll(string(data), "BENCH_BPP_URL", bppURL)
 	f, err := os.CreateTemp("", "bench-routing-*.yaml")
 	if err != nil {
 		return "", fmt.Errorf("creating temp routing config: %w", err)
 	}
 	if _, err := f.WriteString(content); err != nil {
 		f.Close()
 		return "", fmt.Errorf("writing routing config: %w", err)
 	}
 	f.Close()
 	return f.Name(), nil
 }
 // startAdapter constructs a fully wired adapter using the compiled plugins and
 // returns it as an *httptest.Server. All external dependencies are replaced with
 // local mock servers: Redis → miniredis, BPP → mockBPP, registry → mockRegistry.
 func startAdapter(ctx context.Context) (*httptest.Server, error) {
 	routingConfigPath, err := writeRoutingConfig(mockBPP.URL)
 	if err != nil {
 		return nil, fmt.Errorf("writing routing config: %w", err)
 	}
 	// Plugin manager: load all compiled .so files from pluginDir.
 	mgr, closer, err := plugin.NewManager(ctx, &plugin.ManagerConfig{
 		Root: pluginDir,
 	})
 	if err != nil {
 		return nil, fmt.Errorf("creating plugin manager: %w", err)
 	}
 	_ = closer // closer is called when the server shuts down; deferred in TestMain via server.Close
 	// Build module configurations.
 	mCfgs := []module.Config{
 		buildBAPCallerConfig(routingConfigPath, mockRegistry.URL),
 	}
 	mux := http.NewServeMux()
 	if err := module.Register(ctx, mCfgs, mux, mgr); err != nil {
 		return nil, fmt.Errorf("registering modules: %w", err)
 	}
 	srv := httptest.NewServer(mux)
 	return srv, nil
 }
 // buildBAPCallerConfig returns the module.Config for the bapTxnCaller handler,
 // mirroring config/local-retail-bap.yaml but pointing at benchmark mock services.
 // registryURL must point at the mock registry so simplekeymanager can satisfy the
 // Registry requirement imposed by stdHandler — even though the caller path never
 // performs signature validation, the handler wiring requires it to be present.
 func buildBAPCallerConfig(routingConfigPath, registryURL string) module.Config {
 	return module.Config{
 		Name: "bapTxnCaller",
 		Path: "/bap/caller/",
 		Handler: handler.Config{
 			Type:         handler.HandlerTypeStd,
 			Role:         model.RoleBAP,
 			SubscriberID: benchSubscriberID,
 			HttpClientConfig: handler.HttpClientConfig{
 				MaxIdleConns:          1000,
 				MaxIdleConnsPerHost:   200,
 				IdleConnTimeout:       300 * time.Second,
 				ResponseHeaderTimeout: 5 * time.Second,
 			},
 			Plugins: handler.PluginCfg{
 				// Registry is required by stdHandler before it will wire KeyManager,
 				// even on the caller path where sign-validation never runs. We point
 				// it at the mock registry (retry_max=0 so failures are immediate).
 				Registry: &plugin.Config{
 					ID: "registry",
 					Config: map[string]string{
 						"url":       registryURL,
 						"retry_max": "0",
 					},
 				},
 				KeyManager: &plugin.Config{
 					ID: "simplekeymanager",
 					Config: map[string]string{
 						"networkParticipant": benchSubscriberID,
 						"keyId":              benchKeyID,
 						"signingPrivateKey":  benchPrivKey,
 						"signingPublicKey":   benchPubKey,
 						"encrPrivateKey":     benchEncrPrivKey,
 						"encrPublicKey":      benchEncrPubKey,
 					},
 				},
 				SchemaValidator: &plugin.Config{
 					ID: "schemav2validator",
 					Config: map[string]string{
 						"type":     "file",
 						"location": filepath.Join(moduleRoot, "benchmarks/e2e/testdata/beckn.yaml"),
 						"cacheTTL": "3600",
 					},
 				},
 				Cache: &plugin.Config{
 					ID: "cache",
 					Config: map[string]string{
 						"addr": miniRedis.Addr(),
 					},
 				},
 				Router: &plugin.Config{
 					ID: "router",
 					Config: map[string]string{
 						"routingConfig": routingConfigPath,
 					},
 				},
 				Signer: &plugin.Config{
 					ID: "signer",
 				},
 			},
 			Steps: []string{"addRoute", "sign", "validateSchema"},
 		},
 	}
 }
 // ── Request builder and Beckn signing helper ─────────────────────────────────
 // becknPayloadTemplate holds the raw JSON for a fixture file with sentinels.
 var fixtureCache = map[string][]byte{}
 // loadFixture reads a fixture file from testdata/ and caches it.
 func loadFixture(action string) ([]byte, error) {
 	if data, ok := fixtureCache[action]; ok {
 		return data, nil
 	}
 	path := filepath.Join("testdata", action+"_request.json")
 	data, err := os.ReadFile(path)
 	if err != nil {
 		return nil, fmt.Errorf("loading fixture %s: %w", action, err)
 	}
 	fixtureCache[action] = data
 	return data, nil
 }
 // buildSignedRequest reads the fixture for the given action, substitutes
 // BENCH_TIMESTAMP / BENCH_MESSAGE_ID / BENCH_TRANSACTION_ID with fresh values,
 // signs the body using the Beckn Ed25519 spec, and returns a ready-to-send
 // *http.Request targeting the adapter's /bap/caller/<action> path.
 func buildSignedRequest(tb testing.TB, action string) *http.Request {
 	tb.Helper()
 	fixture, err := loadFixture(action)
 	if err != nil {
 		tb.Fatalf("buildSignedRequest: %v", err)
 	}
 	// Substitute sentinels with fresh values for this iteration.
 	now := time.Now().UTC().Format(time.RFC3339)
 	msgID := uuid.New().String()
 	txnID := uuid.New().String()
 	body := bytes.ReplaceAll(fixture, []byte("BENCH_TIMESTAMP"), []byte(now))
 	body = bytes.ReplaceAll(body, []byte("BENCH_MESSAGE_ID"), []byte(msgID))
 	body = bytes.ReplaceAll(body, []byte("BENCH_TRANSACTION_ID"), []byte(txnID))
 	// Sign the body per the Beckn Ed25519 spec.
 	authHeader, err := signBecknPayload(body)
 	if err != nil {
 		tb.Fatalf("buildSignedRequest: signing failed: %v", err)
 	}
 	url := adapterServer.URL + "/bap/caller/" + action
 	req, err := http.NewRequest(http.MethodPost, url, bytes.NewReader(body))
 	if err != nil {
 		tb.Fatalf("buildSignedRequest: http.NewRequest: %v", err)
 	}
 	req.Header.Set("Content-Type", "application/json")
 	req.Header.Set(model.AuthHeaderSubscriber, authHeader)
 	return req
 }
 // buildSignedRequestFixed builds a signed request with a fixed body (same
 // message_id every call) — used for cache-warm benchmarks.
 func buildSignedRequestFixed(tb testing.TB, action string, body []byte) *http.Request {
 	tb.Helper()
 	authHeader, err := signBecknPayload(body)
 	if err != nil {
 		tb.Fatalf("buildSignedRequestFixed: signing failed: %v", err)
 	}
 	url := adapterServer.URL + "/bap/caller/" + action
 	req, err := http.NewRequest(http.MethodPost, url, bytes.NewReader(body))
 	if err != nil {
 		tb.Fatalf("buildSignedRequestFixed: http.NewRequest: %v", err)
 	}
 	req.Header.Set("Content-Type", "application/json")
 	req.Header.Set(model.AuthHeaderSubscriber, authHeader)
 	return req
 }
 // signBecknPayload signs a request body using the Beckn Ed25519 signing spec
 // and returns a formatted Authorization header value.
 //
 // Beckn signing spec:
 //  1. Digest:  "BLAKE-512=" + base64(blake2b-512(body))
 //  2. Signing string: "(created): <ts>\n(expires): <ts+5m>\ndigest: <digest>"
 //  3. Signature: base64(ed25519.Sign(privKey, signingString))
 //  4. Header: Signature keyId="<sub>|<keyId>|ed25519",algorithm="ed25519",
 //     created="<ts>",expires="<ts+5m>",headers="(created) (expires) digest",
 //     signature="<sig>"
 //
 // Reference: pkg/plugin/implementation/signer/signer.go
 func signBecknPayload(body []byte) (string, error) {
 	createdAt := time.Now().Unix()
 	expiresAt := time.Now().Add(5 * time.Minute).Unix()
 	// Step 1: BLAKE-512 digest.
 	hasher, _ := blake2b.New512(nil)
 	hasher.Write(body)
 	digest := "BLAKE-512=" + base64.StdEncoding.EncodeToString(hasher.Sum(nil))
 	// Step 2: Signing string.
 	signingString := fmt.Sprintf("(created): %d\n(expires): %d\ndigest: %s", createdAt, expiresAt, digest)
 	// Step 3: Ed25519 signature.
 	privKeyBytes, err := base64.StdEncoding.DecodeString(benchPrivKey)
 	if err != nil {
 		return "", fmt.Errorf("decoding private key: %w", err)
 	}
 	privKey := ed25519.NewKeyFromSeed(privKeyBytes)
 	sig := base64.StdEncoding.EncodeToString(ed25519.Sign(privKey, []byte(signingString)))
 	// Step 4: Format Authorization header (matches generateAuthHeader in step.go).
 	header := fmt.Sprintf(
 		`Signature keyId="%s|%s|ed25519",algorithm="ed25519",created="%d",expires="%d",headers="(created) (expires) digest",signature="%s"`,
 		benchSubscriberID, benchKeyID, createdAt, expiresAt, sig,
 	)
 	return header, nil
 }
 // warmFixtureBody returns a fixed body for the given action with stable IDs —
 // used to pre-warm the cache so cache-warm benchmarks hit the Redis fast path.
 func warmFixtureBody(tb testing.TB, action string) []byte {
 	tb.Helper()
 	fixture, err := loadFixture(action)
 	if err != nil {
 		tb.Fatalf("warmFixtureBody: %v", err)
 	}
 	body := bytes.ReplaceAll(fixture, []byte("BENCH_TIMESTAMP"), []byte("2025-01-01T00:00:00Z"))
 	body = bytes.ReplaceAll(body, []byte("BENCH_MESSAGE_ID"), []byte("00000000-warm-0000-0000-000000000000"))
 	body = bytes.ReplaceAll(body, []byte("BENCH_TRANSACTION_ID"), []byte("00000000-warm-txn-0000-000000000000"))
 	return body
 }
 // sendRequest executes an HTTP request using the shared bench client and
 // discards the response body. Returns a non-nil error for non-2xx responses.
 func sendRequest(req *http.Request) error {
 	resp, err := benchHTTPClient.Do(req)
 	if err != nil {
 		return fmt.Errorf("http do: %w", err)
 	}
 	defer resp.Body.Close()
 	// Drain the body so the connection is returned to the pool for reuse.
 	// Without this, Go discards the connection after each request, causing
 	// port exhaustion under parallel load ("can't assign requested address").
 	_, _ = io.Copy(io.Discard, resp.Body)
 	// We accept any 2xx response (ACK or forwarded BPP response).
 	if resp.StatusCode < 200 || resp.StatusCode >= 300 {
 		return fmt.Errorf("unexpected status: %d", resp.StatusCode)
 	}
 	return nil
 }
 // ── TestSignBecknPayload: validation test before running benchmarks ───────────
 // Sends a signed discover request to the live adapter and asserts a 200 response,
 // confirming the signing helper produces headers accepted by the adapter pipeline.
 func TestSignBecknPayload(t *testing.T) {
 	if adapterServer == nil {
 		t.Skip("adapterServer not initialised (run via TestMain)")
 	}
 	fixture, err := loadFixture("discover")
 	if err != nil {
 		t.Fatalf("loading fixture: %v", err)
 	}
 	// Substitute sentinels.
 	now := time.Now().UTC().Format(time.RFC3339)
 	body := bytes.ReplaceAll(fixture, []byte("BENCH_TIMESTAMP"), []byte(now))
 	body = bytes.ReplaceAll(body, []byte("BENCH_MESSAGE_ID"), []byte(uuid.New().String()))
 	body = bytes.ReplaceAll(body, []byte("BENCH_TRANSACTION_ID"), []byte(uuid.New().String()))
 	authHeader, err := signBecknPayload(body)
 	if err != nil {
 		t.Fatalf("signBecknPayload: %v", err)
 	}
 	url := adapterServer.URL + "/bap/caller/discover"
 	req, err := http.NewRequest(http.MethodPost, url, bytes.NewReader(body))
 	if err != nil {
 		t.Fatalf("http.NewRequest: %v", err)
 	}
 	req.Header.Set("Content-Type", "application/json")
 	req.Header.Set(model.AuthHeaderSubscriber, authHeader)
 	resp, err := http.DefaultClient.Do(req)
 	if err != nil {
 		t.Fatalf("sending request: %v", err)
 	}
 	defer resp.Body.Close()
 	var result map[string]interface{}
 	json.NewDecoder(resp.Body).Decode(&result)
 	t.Logf("Response status: %d, body: %v", resp.StatusCode, result)
 	if resp.StatusCode != http.StatusOK {
 		t.Errorf("expected 200 OK, got %d", resp.StatusCode)
 	}
 }
--- a/benchmarks/e2e/testdata/beckn.yaml
+++ b/benchmarks/e2e/testdata/beckn.yaml
--- a/benchmarks/e2e/testdata/confirm_request.json
+++ b/benchmarks/e2e/testdata/confirm_request.json
@@ -0,0 +1,84 @@
 {
  "context": {
    "action": "confirm",
    "bapId": "sandbox.food-finder.com",
    "bapUri": "http://bench-bap.example.com",
    "bppId": "bench-bpp.example.com",
    "bppUri": "BENCH_BPP_URL",
    "messageId": "BENCH_MESSAGE_ID",
    "transactionId": "BENCH_TRANSACTION_ID",
    "timestamp": "BENCH_TIMESTAMP",
    "ttl": "PT30S",
    "version": "2.0.0"
  },
  "message": {
    "order": {
      "provider": {
        "id": "bench-provider-001"
      },
      "items": [
        {
          "id": "bench-item-001",
          "quantity": {
            "selected": {
              "count": 1
            }
          }
        }
      ],
      "billing": {
        "name": "Bench User",
        "address": "123 Bench Street, Bangalore, 560001",
        "city": {
          "name": "Bangalore"
        },
        "state": {
          "name": "Karnataka"
        },
        "country": {
          "code": "IND"
        },
        "area_code": "560001",
        "email": "bench@example.com",
        "phone": "9999999999"
      },
      "fulfillments": [
        {
          "id": "f1",
          "type": "Delivery",
          "stops": [
            {
              "type": "end",
              "location": {
                "gps": "12.9716,77.5946",
                "area_code": "560001"
              },
              "contact": {
                "phone": "9999999999",
                "email": "bench@example.com"
              }
            }
          ],
          "customer": {
            "person": {
              "name": "Bench User"
            },
            "contact": {
              "phone": "9999999999",
              "email": "bench@example.com"
            }
          }
        }
      ],
      "payments": [
        {
          "type": "ON-FULFILLMENT",
          "params": {
            "amount": "150.00",
            "currency": "INR"
          }
        }
      ]
    }
  }
 }
--- a/benchmarks/e2e/testdata/discover_request.json
+++ b/benchmarks/e2e/testdata/discover_request.json
@@ -0,0 +1,17 @@
 {
  "context": {
    "action": "discover",
    "bapId": "sandbox.food-finder.com",
    "bapUri": "http://bench-bap.example.com",
    "messageId": "BENCH_MESSAGE_ID",
    "transactionId": "BENCH_TRANSACTION_ID",
    "timestamp": "BENCH_TIMESTAMP",
    "ttl": "PT30S",
    "version": "2.0.0"
  },
  "message": {
    "intent": {
      "textSearch": "pizza"
    }
  }
 }
--- a/benchmarks/e2e/testdata/init_request.json
+++ b/benchmarks/e2e/testdata/init_request.json
@@ -0,0 +1,80 @@
 {
  "context": {
    "action": "init",
    "bapId": "sandbox.food-finder.com",
    "bapUri": "http://bench-bap.example.com",
    "bppId": "bench-bpp.example.com",
    "bppUri": "BENCH_BPP_URL",
    "messageId": "BENCH_MESSAGE_ID",
    "transactionId": "BENCH_TRANSACTION_ID",
    "timestamp": "BENCH_TIMESTAMP",
    "ttl": "PT30S",
    "version": "2.0.0"
  },
  "message": {
    "order": {
      "provider": {
        "id": "bench-provider-001"
      },
      "items": [
        {
          "id": "bench-item-001",
          "quantity": {
            "selected": {
              "count": 1
            }
          }
        }
      ],
      "billing": {
        "name": "Bench User",
        "address": "123 Bench Street, Bangalore, 560001",
        "city": {
          "name": "Bangalore"
        },
        "state": {
          "name": "Karnataka"
        },
        "country": {
          "code": "IND"
        },
        "area_code": "560001",
        "email": "bench@example.com",
        "phone": "9999999999"
      },
      "fulfillments": [
        {
          "id": "f1",
          "type": "Delivery",
          "stops": [
            {
              "type": "end",
              "location": {
                "gps": "12.9716,77.5946",
                "area_code": "560001"
              },
              "contact": {
                "phone": "9999999999",
                "email": "bench@example.com"
              }
            }
          ],
          "customer": {
            "person": {
              "name": "Bench User"
            },
            "contact": {
              "phone": "9999999999",
              "email": "bench@example.com"
            }
          }
        }
      ],
      "payments": [
        {
          "type": "ON-FULFILLMENT"
        }
      ]
    }
  }
 }
--- a/benchmarks/e2e/testdata/routing-BAPCaller.yaml
+++ b/benchmarks/e2e/testdata/routing-BAPCaller.yaml
@@ -0,0 +1,13 @@
 # Routing config for v2.0.0 benchmark. Domain is not required for v2.x.x — the
 # router ignores it and routes purely by version + endpoint.
 # BENCH_BPP_URL is substituted at runtime with the mock BPP server URL.
 routingRules:
  - version: "2.0.0"
    targetType: "url"
    target:
      url: "BENCH_BPP_URL"
    endpoints:
      - discover
      - select
      - init
      - confirm
--- a/benchmarks/e2e/testdata/select_request.json
+++ b/benchmarks/e2e/testdata/select_request.json
@@ -0,0 +1,55 @@
 {
  "context": {
    "action": "select",
    "bapId": "sandbox.food-finder.com",
    "bapUri": "http://bench-bap.example.com",
    "bppId": "bench-bpp.example.com",
    "bppUri": "BENCH_BPP_URL",
    "messageId": "BENCH_MESSAGE_ID",
    "transactionId": "BENCH_TRANSACTION_ID",
    "timestamp": "BENCH_TIMESTAMP",
    "ttl": "PT30S",
    "version": "2.0.0"
  },
  "message": {
    "order": {
      "provider": {
        "id": "bench-provider-001"
      },
      "items": [
        {
          "id": "bench-item-001",
          "quantity": {
            "selected": {
              "count": 1
            }
          }
        }
      ],
      "fulfillments": [
        {
          "id": "f1",
          "type": "Delivery",
          "stops": [
            {
              "type": "end",
              "location": {
                "gps": "12.9716,77.5946",
                "area_code": "560001"
              },
              "contact": {
                "phone": "9999999999",
                "email": "bench@example.com"
              }
            }
          ]
        }
      ],
      "payments": [
        {
          "type": "ON-FULFILLMENT"
        }
      ]
    }
  }
 }
--- a/benchmarks/reports/REPORT_ONIX_v150.md
+++ b/benchmarks/reports/REPORT_ONIX_v150.md
@@ -0,0 +1,255 @@
 # beckn-onix Adapter — Benchmark Report
 > **Run:** `2026-03-31_14-19-19`
 > **Platform:** Apple M5 · darwin/arm64 · GOMAXPROCS=10 (default)
 > **Protocol:** Beckn v2.0.0
 ---
 ## Part A — Executive Summary
 ### What Was Tested
 The beckn-onix ONIX adapter was benchmarked end-to-end using Go's native `testing.B` framework and `net/http/httptest`. Requests flowed through a real compiled adapter — with all production plugins active — against in-process mock servers, isolating adapter-internal latency from network variables.
 **Pipeline tested (bapTxnCaller):** `addRoute → sign → validateSchema`
 **Plugins active:** `router`, `signer`, `simplekeymanager`, `cache` (miniredis), `schemav2validator`
 **Actions benchmarked:** `discover`, `select`, `init`, `confirm`
 ---
 ### Key Results
 | Metric | Value |
 |--------|-------|
 | Serial p50 latency (discover) | **130 µs** |
 | Serial p95 latency (discover) | **144 µs** |
 | Serial p99 latency (discover) | **317 µs** |
 | Serial mean latency (discover) | **164 µs** |
 | Serial throughput (discover, GOMAXPROCS=10) | **~6,095 req/s** |
 | Peak parallel throughput (GOMAXPROCS=10) | **25,502 req/s** |
 | Cache warm vs cold delta | **≈ 0** (noise-level, ~3.7 µs) |
 | Memory per request (discover) | **~81 KB · 662 allocs** |
 ### Interpretation
 The adapter delivers sub-200 µs median end-to-end latency for all four Beckn actions on a single goroutine. The p99 tail of 317 µs shows good tail-latency control — the ratio of p99/p50 is only 2.4×, indicating no significant outlier spikes.
 Memory allocation is consistent and predictable: discover uses 662 heap objects at ~81 KB per request. More complex actions (confirm, init) use proportionally more memory due to larger payloads but remain below 130 KB per request.
 The Redis key-manager cache shows **no measurable benefit** in this setup: warm and cold paths differ by ~3.7 µs (< 2%), which is within measurement noise for a 164 µs mean. This is expected — miniredis is in-process and sub-microsecond; the signing and schema-validation steps dominate.
 Concurrency scaling is excellent: latency drops from 157 µs at GOMAXPROCS=1 to 54 µs at GOMAXPROCS=16 — a **2.9× improvement**. Throughput scales from 6,499 req/s at GOMAXPROCS=1 to 17,455 req/s at GOMAXPROCS=16.
 ### Recommendation
 The adapter is ready for staged load testing against a real BPP. For production sizing, allocate at least 4 cores to the adapter process; beyond 8 cores, gains begin to taper (diminishing returns from ~17,233 to 17,455 req/s going from 8 to 16). If schema validation dominates CPU, profile with `go tool pprof` (see B5).
 ---
 ## Part B — Technical Detail
 ### B0 — Test Environment
 | Parameter | Value |
 |-----------|-------|
 | CPU | Apple M5 (arm64) |
 | OS | darwin/arm64 |
 | Go package | `github.com/beckn-one/beckn-onix/benchmarks/e2e` |
 | Default GOMAXPROCS | 10 |
 | Benchmark timeout | 30 minutes |
 | Serial run duration | 10s per benchmark × 3 runs |
 | Parallel sweep duration | 30s per GOMAXPROCS level |
 | GOMAXPROCS sweep | 1, 2, 4, 8, 16 |
 | Redis | miniredis (in-process, no network) |
 | BPP | httptest mock (instant ACK) |
 | Registry | httptest mock (dev key pair) |
 | Schema spec | Beckn v2.0.0 OpenAPI (`beckn.yaml`, local file) |
 **Plugins and steps (bapTxnCaller):**
 | Step | Plugin | Role |
 |------|--------|------|
 | 1 | `router` | Resolves BPP URL from routing config |
 | 2 | `signer` + `simplekeymanager` | Signs request body (Ed25519/BLAKE-512) |
 | 3 | `schemav2validator` | Validates Beckn v2.0 API schema (kin-openapi, local file) |
 ---
 ### B1 — Latency by Action
 Averages from `run1.txt` (10s, GOMAXPROCS=10). Percentile values from the standalone `BenchmarkBAPCaller_Discover_Percentiles` run.
 | Action | Mean (µs) | p50 (µs) | p95 (µs) | p99 (µs) | Allocs/req | Bytes/req |
 |--------|----------:|--------:|--------:|--------:|----------:|----------:|
 | discover (serial) | 164 | 130 | 144 | 317 | 662 | 80,913 (~81 KB) |
 | discover (parallel) | 40 | — | — | — | 660 | 80,792 (~79 KB) |
 | select | 194 | — | — | — | 1,033 | 106,857 (~104 KB) |
 | init | 217 | — | — | — | 1,421 | 126,842 (~124 KB) |
 | confirm | 221 | — | — | — | 1,485 | 129,240 (~126 KB) |
 **Observations:**
 - Latency increases linearly with payload complexity: select (+18%), init (+32%), confirm (+35%) vs discover baseline.
 - Allocation count tracks payload size precisely — each extra field adds heap objects during JSON unmarshalling and schema validation.
 - Memory is extremely stable across the 3 serial runs (geomean memory: 91.18 Ki, ±0.02%).
 - The parallel discover benchmark runs 8× faster than serial (40 µs vs 164 µs) because multiple goroutines share the CPU time budget and the adapter handles requests concurrently.
 ---
 ### B2 — Throughput vs Concurrency
 Results from the concurrency sweep (`parallel_cpu*.txt`, 30s per level).
 | GOMAXPROCS | Mean Latency (µs) | Improvement vs cpu=1 | RPS (BenchmarkRPS) |
 |:----------:|------------------:|---------------------:|-------------------:|
 | 1 | 157 | baseline | 6,499 |
 | 2 | 118 | 1.33× | 7,606 |
 | 4 | 73 | 2.14× | 14,356 |
 | 8 | 62 | 2.53× | 17,233 |
 | 16 | 54 | 2.89× | 17,455 |
 | 10 (default) | 40\* | ~3.9×\* | 25,502\* |
 \* _The default GOMAXPROCS=10 serial run has a different benchmark structure (not the concurrency sweep), so latency and RPS are not directly comparable — they include warm connection pool effects from the serial baseline._
 **Scaling efficiency:**
 - Doubling cores from 1→2 yields 1.33× latency improvement (67% efficiency).
 - From 2→4: 1.61× improvement (80% efficiency) — best scaling band.
 - From 4→8: 1.18× improvement (59% efficiency) — adapter starts becoming compute-bound.
 - From 8→16: 1.14× improvement (57% efficiency) — diminishing returns; likely the signing/validation pipeline serialises on some shared resource (e.g., key derivation, kin-openapi schema tree reads).
 **Recommendation:** 4–8 cores offers the best throughput/cost ratio.
 ---
 ### B3 — Cache Impact (Redis warm vs cold)
 Results from `cache_comparison.txt` (10s each, GOMAXPROCS=10).
 | Scenario | Mean (µs) | Allocs/req | Bytes/req |
 |----------|----------:|-----------:|----------:|
 | CacheWarm | 190 | 654 | 81,510 |
 | CacheCold | 186 | 662 | 82,923 |
 | **Delta** | **+3.7 µs (warm slower)** | **−8** | **−1,413** |
 **Interpretation:** There is no meaningful difference between warm and cold cache paths. The apparent 3.7 µs "advantage" for the cold path is within normal measurement noise for a 186–190 µs benchmark. The Redis key-manager cache does not dominate latency in this in-process test setup.
 The warm path allocates 8 fewer objects per request (652 vs 662 allocs) — consistent with cache hits skipping key-derivation allocation paths — but this saving is too small to affect wall-clock time at current throughput levels.
 In a **production environment** with real Redis over the network (1–5 ms round-trip), the cache warm path would show a meaningful advantage. These numbers represent the lower bound on signing latency with zero-latency Redis.
 ---
 ### B4 — benchstat Statistical Summary (3 Runs)
 ```
 goos: darwin
 goarch: arm64
 pkg: github.com/beckn-one/beckn-onix/benchmarks/e2e
 cpu: Apple M5
                                  │   run1.txt    │              run2.txt               │              run3.txt               │
                                  │    sec/op     │    sec/op     vs base                │    sec/op     vs base                │
 BAPCaller_Discover-10               164.2µ ± ∞ ¹   165.4µ ± ∞ ¹  ~ (p=1.000 n=1) ²      165.3µ ± ∞ ¹  ~ (p=1.000 n=1) ²
 BAPCaller_Discover_Parallel-10       39.73µ ± ∞ ¹   41.48µ ± ∞ ¹  ~ (p=1.000 n=1) ²      52.84µ ± ∞ ¹  ~ (p=1.000 n=1) ²
 BAPCaller_AllActions/discover-10    165.4µ ± ∞ ¹   164.9µ ± ∞ ¹  ~ (p=1.000 n=1) ²      163.1µ ± ∞ ¹  ~ (p=1.000 n=1) ²
 BAPCaller_AllActions/select-10      194.5µ ± ∞ ¹   194.5µ ± ∞ ¹  ~ (p=1.000 n=1) ²      186.7µ ± ∞ ¹  ~ (p=1.000 n=1) ²
 BAPCaller_AllActions/init-10        217.1µ ± ∞ ¹   216.6µ ± ∞ ¹  ~ (p=1.000 n=1) ²      218.0µ ± ∞ ¹  ~ (p=1.000 n=1) ²
 BAPCaller_AllActions/confirm-10     221.0µ ± ∞ ¹   219.8µ ± ∞ ¹  ~ (p=1.000 n=1) ²      221.9µ ± ∞ ¹  ~ (p=1.000 n=1) ²
 BAPCaller_Discover_Percentiles-10   164.5µ ± ∞ ¹   165.3µ ± ∞ ¹  ~ (p=1.000 n=1) ²      162.2µ ± ∞ ¹  ~ (p=1.000 n=1) ²
 BAPCaller_CacheWarm-10              162.7µ ± ∞ ¹   162.8µ ± ∞ ¹  ~ (p=1.000 n=1) ²      169.4µ ± ∞ ¹  ~ (p=1.000 n=1) ²
 BAPCaller_CacheCold-10              164.2µ ± ∞ ¹   205.1µ ± ∞ ¹  ~ (p=1.000 n=1) ²      171.9µ ± ∞ ¹  ~ (p=1.000 n=1) ²
 geomean                             152.4µ          157.0µ  +3.02%                         157.8µ  +3.59%
 Memory (B/op) — geomean: 91.18 Ki across all runs (±0.02%)
 Allocs/op   — geomean: 825.9 across all runs (perfectly stable across all 3 runs)
 ```
 > **Note on confidence intervals:** benchstat requires ≥6 samples per benchmark for confidence intervals. With `-count=1` and 3 runs, results show ∞ uncertainty bands. The geomean drift of +3.59% across runs is within normal OS scheduler noise. To narrow confidence intervals, re-run with `-count=6` and `benchstat` will produce meaningful p-values.
 ---
 ### B5 — Bottleneck Analysis
 Based on the allocation profile and latency data:
 | Rank | Plugin / Step | Estimated contribution | Evidence |
 |:----:|---------------|------------------------|---------|
 | 1 | `schemav2validator` (kin-openapi validation) | 40–60% | Alloc count proportional to payload complexity; JSON schema traversal creates many short-lived objects |
 | 2 | `signer` (Ed25519/BLAKE-512) | 20–30% | Cryptographic operations are CPU-bound; scaling efficiency plateau at 8+ cores consistent with crypto serialisation |
 | 3 | `simplekeymanager` (key derivation, Redis) | 5–10% | 8-alloc savings on cache-warm path; small but detectable |
 | 4 | `router` (YAML routing lookup) | < 5% | Minimal; in-memory map lookup |
 **Key insight from the concurrency data:** RPS plateaus at ~17,000–17,500 between GOMAXPROCS=8 and 16. This suggests a shared serialisation point — most likely the kin-openapi schema validation tree (a read-heavy but non-trivially-lockable data structure), or the Ed25519 key operations.
 **Profiling commands to isolate the bottleneck:**
 ```bash
 # CPU profile — run from beckn-onix root
 go test ./benchmarks/e2e/... \
  -bench=BenchmarkBAPCaller_Discover \
  -benchtime=30s \
  -cpuprofile=benchmarks/results/cpu.prof \
  -timeout=5m
 go tool pprof -http=:6060 benchmarks/results/cpu.prof
 # Memory profile
 go test ./benchmarks/e2e/... \
  -bench=BenchmarkBAPCaller_Discover \
  -benchtime=30s \
  -memprofile=benchmarks/results/mem.prof \
  -timeout=5m
 go tool pprof -http=:6060 benchmarks/results/mem.prof
 # Parallel profile (find lock contention)
 go test ./benchmarks/e2e/... \
  -bench=BenchmarkBAPCaller_Discover_Parallel \
  -benchtime=30s \
  -blockprofile=benchmarks/results/block.prof \
  -mutexprofile=benchmarks/results/mutex.prof \
  -timeout=5m
 go tool pprof -http=:6060 benchmarks/results/mutex.prof
 ```
 ---
 ## Running the Benchmarks
 ```bash
 # Full run: compile plugins, run all scenarios, generate CSV and benchstat summary
 cd beckn-onix
 bash benchmarks/run_benchmarks.sh
 # Quick smoke test (fast, lower iteration counts):
 # Edit BENCH_TIME_SERIAL="2s" and BENCH_TIME_PARALLEL="5s" at the top of the script.
 # Individual benchmark (manual):
 go test ./benchmarks/e2e/... \
  -bench=BenchmarkBAPCaller_Discover \
  -benchtime=10s \
  -benchmem \
  -timeout=30m
 # Race detector check:
 go test ./benchmarks/e2e/... \
  -bench=BenchmarkBAPCaller_Discover_Parallel \
  -benchtime=5s \
  -race \
  -timeout=30m
 # Concurrency sweep (manual):
 for cpu in 1 2 4 8 16; do
  go test ./benchmarks/e2e/... \
    -bench="BenchmarkBAPCaller_Discover_Parallel|BenchmarkBAPCaller_RPS" \
    -benchtime=30s -cpu=$cpu -benchmem -timeout=10m
 done
 ```
 > **Note:** The first run takes 60–90 s while plugins compile. Subsequent runs use Go's build cache and start in seconds.
 ---
 *Generated from run `2026-03-31_14-19-19` · beckn-onix · Beckn Protocol v2.0.0*
--- a/benchmarks/reports/REPORT_TEMPLATE.md
+++ b/benchmarks/reports/REPORT_TEMPLATE.md
@@ -0,0 +1,148 @@
 # beckn-onix Adapter — Benchmark Report
 > **Run:** `__TIMESTAMP__`
 > **Platform:** __CPU__ · __GOOS__/__GOARCH__ · GOMAXPROCS=__GOMAXPROCS__ (default)
 > **Adapter version:** __ONIX_VERSION__
 > **Beckn Protocol:** v2.0.0
 ---
 ## Part A — Executive Summary
 ### What Was Tested
 The beckn-onix ONIX adapter was benchmarked end-to-end using Go's native `testing.B`
 framework and `net/http/httptest`. Requests flowed through a real compiled adapter —
 with all production plugins active — against in-process mock servers, isolating
 adapter-internal latency from network variables.
 **Pipeline tested (bapTxnCaller):** `addRoute → sign → validateSchema`
 **Plugins active:** `router`, `signer`, `simplekeymanager`, `cache` (miniredis), `schemav2validator`
 **Actions benchmarked:** `discover`, `select`, `init`, `confirm`
 ### Key Results
 | Metric | Value |
 |--------|-------|
 | Serial p50 latency (discover) | **__P50_US__ µs** |
 | Serial p95 latency (discover) | **__P95_US__ µs** |
 | Serial p99 latency (discover) | **__P99_US__ µs** |
 | Serial mean latency (discover) | **__MEAN_DISCOVER_US__ µs** |
 | Peak parallel throughput | **__PEAK_RPS__ req/s** |
 | Cache warm vs cold delta | **__CACHE_DELTA__** |
 | Memory per request (discover) | **~__MEM_DISCOVER_KB__ KB · __ALLOCS_DISCOVER__ allocs** |
 ### Interpretation
 __INTERPRETATION__
 ### Recommendation
 __RECOMMENDATION__
 ---
 ## Part B — Technical Detail
 ### B0 — Test Environment
 | Parameter | Value |
 |-----------|-------|
 | CPU | __CPU__ (__GOARCH__) |
 | OS | __GOOS__/__GOARCH__ |
 | Go package | `github.com/beckn-one/beckn-onix/benchmarks/e2e` |
 | Default GOMAXPROCS | __GOMAXPROCS__ |
 | Benchmark timeout | 30 minutes |
 | Serial run duration | 10s per benchmark × 3 runs |
 | Parallel sweep duration | 30s per GOMAXPROCS level |
 | GOMAXPROCS sweep | 1, 2, 4, 8, 16 |
 | Redis | miniredis (in-process, no network) |
 | BPP | httptest mock (instant ACK) |
 | Registry | httptest mock (dev key pair) |
 | Schema spec | Beckn v2.0.0 OpenAPI (`beckn.yaml`, local file) |
 **Plugins and steps (bapTxnCaller):**
 | Step | Plugin | Role |
 |------|--------|------|
 | 1 | `router` | Resolves BPP URL from routing config |
 | 2 | `signer` + `simplekeymanager` | Signs request body (Ed25519/BLAKE-512) |
 | 3 | `schemav2validator` | Validates Beckn v2.0 API schema |
 ---
 ### B1 — Latency by Action
 Averages from `run1.txt` (10s, GOMAXPROCS=__GOMAXPROCS__). Percentile values from `percentiles.txt`.
 | Action | Mean (µs) | p50 (µs) | p95 (µs) | p99 (µs) | Allocs/req | Bytes/req |
 |--------|----------:|--------:|--------:|--------:|----------:|----------:|
 | discover (serial) | __MEAN_DISCOVER_US__ | __P50_US__ | __P95_US__ | __P99_US__ | __ALLOCS_DISCOVER__ | __BYTES_DISCOVER__ (~__MEM_DISCOVER_KB__ KB) |
 | select | __MEAN_SELECT_US__ | — | — | — | __ALLOCS_SELECT__ | __BYTES_SELECT__ (~__MEM_SELECT_KB__ KB) |
 | init | __MEAN_INIT_US__ | — | — | — | __ALLOCS_INIT__ | __BYTES_INIT__ (~__MEM_INIT_KB__ KB) |
 | confirm | __MEAN_CONFIRM_US__ | — | — | — | __ALLOCS_CONFIRM__ | __BYTES_CONFIRM__ (~__MEM_CONFIRM_KB__ KB) |
 ---
 ### B2 — Throughput vs Concurrency
 Results from the concurrency sweep (`parallel_cpu*.txt`, 30s per level).
 __THROUGHPUT_TABLE__
 ---
 ### B3 — Cache Impact (Redis warm vs cold)
 Results from `cache_comparison.txt` (10s each, GOMAXPROCS=__GOMAXPROCS__).
 | Scenario | Mean (µs) | Allocs/req | Bytes/req |
 |----------|----------:|-----------:|----------:|
 | CacheWarm | __CACHE_WARM_US__ | __CACHE_WARM_ALLOCS__ | __CACHE_WARM_BYTES__ |
 | CacheCold | __CACHE_COLD_US__ | __CACHE_COLD_ALLOCS__ | __CACHE_COLD_BYTES__ |
 | **Delta** | **__CACHE_DELTA__** | — | — |
 ---
 ### B4 — benchstat Statistical Summary (3 Runs)
 ```
 __BENCHSTAT_SUMMARY__
 ```
 ---
 ### B5 — Bottleneck Analysis
 > Populate after reviewing the numbers above and profiling with `go tool pprof`.
 | Rank | Plugin / Step | Estimated contribution | Evidence |
 |:----:|---------------|------------------------|---------|
 | 1 | | | |
 | 2 | | | |
 | 3 | | | |
 **Profiling commands:**
 ```bash
 # CPU profile
 go test ./benchmarks/e2e/... -bench=BenchmarkBAPCaller_Discover \
  -benchtime=30s -cpuprofile=benchmarks/results/cpu.prof -timeout=5m
 go tool pprof -http=:6060 benchmarks/results/cpu.prof
 # Memory profile
 go test ./benchmarks/e2e/... -bench=BenchmarkBAPCaller_Discover \
  -benchtime=30s -memprofile=benchmarks/results/mem.prof -timeout=5m
 go tool pprof -http=:6060 benchmarks/results/mem.prof
 # Lock contention (find serialisation under parallel load)
 go test ./benchmarks/e2e/... -bench=BenchmarkBAPCaller_Discover_Parallel \
  -benchtime=30s -mutexprofile=benchmarks/results/mutex.prof -timeout=5m
 go tool pprof -http=:6060 benchmarks/results/mutex.prof
 ```
 ---
 *Generated from run `__TIMESTAMP__` · beckn-onix __ONIX_VERSION__ · Beckn Protocol v2.0.0*
--- a/benchmarks/run_benchmarks.sh
+++ b/benchmarks/run_benchmarks.sh
@@ -0,0 +1,200 @@
 #!/usr/bin/env bash
 # =============================================================================
 # run_benchmarks.sh — beckn-onix adapter benchmark runner
 #
 # Usage:
 #   cd beckn-onix
 #   bash benchmarks/run_benchmarks.sh
 #
 # Requirements:
 #   - Go 1.24+ installed
 #   - benchstat is declared as a tool in go.mod; invoked via "go tool benchstat"
 #
 # Output:
 #   benchmarks/results/<YYYY-MM-DD_HH-MM-SS>/
 #     run1.txt, run2.txt, run3.txt   — raw go test -bench output
 #     parallel_cpu1.txt ... cpu16.txt — concurrency sweep
 #     benchstat_summary.txt           — statistical aggregation
 # =============================================================================
 set -euo pipefail
 SCRIPT_START=$(date +%s)
 REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
 BENCH_PKG="./benchmarks/e2e/..."
 BENCH_TIMEOUT="10m"
 BENCH_TIME_SERIAL="10s"
 BENCH_TIME_PARALLEL="30s"
 BENCH_COUNT=1             # benchstat uses the 3 serial files for stability
 # Adapter version — reads from git tag, falls back to "dev"
 ONIX_VERSION="$(git -C "$REPO_ROOT" describe --tags --abbrev=0 2>/dev/null || echo "dev")"
 REPORT_TEMPLATE="$REPO_ROOT/benchmarks/reports/REPORT_TEMPLATE.md"
 # ── -report-only <dir>: regenerate report from an existing results directory ──
 if [[ "${1:-}" == "-report-only" ]]; then
  RESULTS_DIR="${2:-}"
  if [[ -z "$RESULTS_DIR" ]]; then
    echo "Usage: bash benchmarks/run_benchmarks.sh -report-only <results-dir>"
    echo "Example: bash benchmarks/run_benchmarks.sh -report-only benchmarks/results/2026-04-09_10-30-00"
    exit 1
  fi
  if [[ ! -d "$RESULTS_DIR" ]]; then
    echo "ERROR: results directory not found: $RESULTS_DIR"
    exit 1
  fi
  echo "=== Regenerating report from existing results ==="
  echo "Results dir : $RESULTS_DIR"
  echo ""
  cd "$REPO_ROOT"
  echo "Parsing results to CSV..."
  go run "$REPO_ROOT/benchmarks/tools/parse_results.go" \
    -dir="$RESULTS_DIR" -out="$RESULTS_DIR" 2>&1 || true
  echo ""
  echo "Generating benchmark report..."
  go run "$REPO_ROOT/benchmarks/tools/generate_report.go" \
    -dir="$RESULTS_DIR" \
    -template="$REPORT_TEMPLATE" \
    -version="$ONIX_VERSION"
  echo ""
  echo "Done. Report written to: $RESULTS_DIR/BENCHMARK_REPORT.md"
  exit 0
 fi
 RESULTS_DIR="$REPO_ROOT/benchmarks/results/$(date +%Y-%m-%d_%H-%M-%S)"
 cd "$REPO_ROOT"
 # ── benchstat is declared as a go tool in go.mod; no separate install needed ──
 # Use: go tool benchstat  (works anywhere without PATH changes)
 # bench_filter: tee full output to the .log file for debugging, and write a
 # clean copy (only benchstat-parseable lines) to the .txt file.
 # The adapter logger is silenced via zerolog.SetGlobalLevel(zerolog.Disabled)
 # in TestMain, so stdout should already be clean; the grep is a safety net for
 # any stray lines from go test itself (build output, redis warnings, etc.).
 bench_filter() {
  local txt="$1" log="$2"
  tee "$log" | grep -E "^(Benchmark|goos:|goarch:|pkg:|cpu:|ok |PASS|FAIL|--- )" > "$txt" || true
 }
 # ── Create results directory ──────────────────────────────────────────────────
 mkdir -p "$RESULTS_DIR"
 echo "=== beckn-onix Benchmark Runner ==="
 echo "Results dir : $RESULTS_DIR"
 echo "Package     : $BENCH_PKG"
 echo ""
 # ── Serial runs (3x for benchstat stability) ──────────────────────────────────
 echo "Running serial benchmarks (3 runs × ${BENCH_TIME_SERIAL})..."
 for run in 1 2 3; do
  echo "  Run $run/3..."
  go test \
    -timeout="$BENCH_TIMEOUT" \
    -run=^$ \
    -bench="." \
    -benchtime="$BENCH_TIME_SERIAL" \
    -benchmem \
    -count="$BENCH_COUNT" \
    "$BENCH_PKG" 2>&1 | bench_filter "$RESULTS_DIR/run${run}.txt" "$RESULTS_DIR/run${run}.log"
  echo "    Saved → $RESULTS_DIR/run${run}.txt (full log → run${run}.log)"
 done
 echo ""
 # ── Concurrency sweep ─────────────────────────────────────────────────────────
 echo "Running parallel concurrency sweep (cpu=1,2,4,8,16; ${BENCH_TIME_PARALLEL} each)..."
 for cpu in 1 2 4 8 16; do
  echo "  GOMAXPROCS=$cpu..."
  go test \
    -timeout="$BENCH_TIMEOUT" \
    -run=^$ \
    -bench="BenchmarkBAPCaller_Discover_Parallel|BenchmarkBAPCaller_RPS" \
    -benchtime="$BENCH_TIME_PARALLEL" \
    -benchmem \
    -cpu="$cpu" \
    -count=1 \
    "$BENCH_PKG" 2>&1 | bench_filter "$RESULTS_DIR/parallel_cpu${cpu}.txt" "$RESULTS_DIR/parallel_cpu${cpu}.log"
  echo "    Saved → $RESULTS_DIR/parallel_cpu${cpu}.txt (full log → parallel_cpu${cpu}.log)"
 done
 echo ""
 # ── Percentile benchmark ──────────────────────────────────────────────────────
 echo "Running percentile benchmark (${BENCH_TIME_SERIAL})..."
 go test \
  -timeout="$BENCH_TIMEOUT" \
  -run=^$ \
  -bench="BenchmarkBAPCaller_Discover_Percentiles" \
  -benchtime="$BENCH_TIME_SERIAL" \
  -benchmem \
  -count=1 \
  "$BENCH_PKG" 2>&1 | bench_filter "$RESULTS_DIR/percentiles.txt" "$RESULTS_DIR/percentiles.log"
 echo "  Saved → $RESULTS_DIR/percentiles.txt (full log → percentiles.log)"
 echo ""
 # ── Cache comparison ──────────────────────────────────────────────────────────
 echo "Running cache warm vs cold comparison..."
 go test \
  -timeout="$BENCH_TIMEOUT" \
  -run=^$ \
  -bench="BenchmarkBAPCaller_Cache" \
  -benchtime="$BENCH_TIME_SERIAL" \
  -benchmem \
  -count=1 \
  "$BENCH_PKG" 2>&1 | bench_filter "$RESULTS_DIR/cache_comparison.txt" "$RESULTS_DIR/cache_comparison.log"
 echo "  Saved → $RESULTS_DIR/cache_comparison.txt (full log → cache_comparison.log)"
 echo ""
 # ── benchstat statistical summary ─────────────────────────────────────────────
 echo "Running benchstat statistical analysis..."
 go tool benchstat \
  "$RESULTS_DIR/run1.txt" \
  "$RESULTS_DIR/run2.txt" \
  "$RESULTS_DIR/run3.txt" \
  > "$RESULTS_DIR/benchstat_summary.txt" 2>&1
 echo "  Saved → $RESULTS_DIR/benchstat_summary.txt"
 echo ""
 # ── Parse results to CSV ──────────────────────────────────────────────────────
 echo "Parsing results to CSV..."
 go run "$REPO_ROOT/benchmarks/tools/parse_results.go" \
  -dir="$RESULTS_DIR" \
  -out="$RESULTS_DIR" 2>&1 || echo "  (parse_results.go: skipping on error)"
 echo ""
 # ── Generate human-readable report ───────────────────────────────────────────
 echo "Generating benchmark report..."
 if [[ -f "$REPORT_TEMPLATE" ]]; then
  go run "$REPO_ROOT/benchmarks/tools/generate_report.go" \
    -dir="$RESULTS_DIR" \
    -template="$REPORT_TEMPLATE" \
    -version="$ONIX_VERSION" 2>&1 || echo "  (generate_report.go: skipping on error)"
 else
  echo "  WARNING: template not found at $REPORT_TEMPLATE — skipping report generation"
 fi
 # ── Summary ───────────────────────────────────────────────────────────────────
 SCRIPT_END=$(date +%s)
 ELAPSED_SECS=$(( SCRIPT_END - SCRIPT_START ))
 ELAPSED_MIN=$(( ELAPSED_SECS / 60 ))
 ELAPSED_SEC_REM=$(( ELAPSED_SECS % 60 ))
 echo ""
 echo "========================================"
 echo "✅ Benchmark run complete!"
 echo ""
 echo "Total runtime : ${ELAPSED_MIN}m ${ELAPSED_SEC_REM}s"
 echo ""
 echo "Results written to:"
 echo "  $RESULTS_DIR"
 echo ""
 echo "Key files:"
 echo "  BENCHMARK_REPORT.md   — generated human-readable report"
 echo "  benchstat_summary.txt — statistical analysis of 3 serial runs"
 echo "  latency_report.csv    — per-benchmark latency and allocation data"
 echo "  throughput_report.csv — RPS and latency by GOMAXPROCS level"
 echo "  parallel_cpu*.txt     — concurrency sweep raw output"
 echo "  percentiles.txt       — p50/p95/p99 latency data"
 echo "  cache_comparison.txt  — warm vs cold Redis cache comparison"
 echo ""
 echo "To review the report:"
 echo "  open $RESULTS_DIR/BENCHMARK_REPORT.md"
 echo "========================================"
--- a/benchmarks/tools/generate_report.go
+++ b/benchmarks/tools/generate_report.go
@@ -0,0 +1,595 @@
 // generate_report.go — Fills REPORT_TEMPLATE.md with data from a completed
 // benchmark run and writes BENCHMARK_REPORT.md to the results directory.
 //
 // Usage:
 //
 //	go run benchmarks/tools/generate_report.go \
 //	  -dir=benchmarks/results/<timestamp>/ \
 //	  -template=benchmarks/reports/REPORT_TEMPLATE.md \
 //	  -version=<onix-version>
 //
 // The generator reads:
 //   - latency_report.csv       — per-benchmark latency and allocation data
 //   - throughput_report.csv    — RPS and latency by GOMAXPROCS level
 //   - benchstat_summary.txt    — raw benchstat output block
 //   - run1.txt                 — goos / goarch / cpu metadata
 //
 // Placeholders filled in the template:
 //
 //	__TIMESTAMP__         results dir basename (YYYY-MM-DD_HH-MM-SS)
 //	__ONIX_VERSION__      -version flag value
 //	__GOOS__              from run1.txt header
 //	__GOARCH__            from run1.txt header
 //	__CPU__               from run1.txt header
 //	__GOMAXPROCS__        derived from the benchmark name suffix in run1.txt
 //	__P50_US__            p50 latency in µs (from Discover_Percentiles row)
 //	__P95_US__            p95 latency in µs
 //	__P99_US__            p99 latency in µs
 //	__MEAN_DISCOVER_US__  mean latency in µs for discover
 //	__MEAN_SELECT_US__    mean latency in µs for select
 //	__MEAN_INIT_US__      mean latency in µs for init
 //	__MEAN_CONFIRM_US__   mean latency in µs for confirm
 //	__ALLOCS_DISCOVER__   allocs/req for discover
 //	__ALLOCS_SELECT__     allocs/req for select
 //	__ALLOCS_INIT__       allocs/req for init
 //	__ALLOCS_CONFIRM__    allocs/req for confirm
 //	__BYTES_DISCOVER__    bytes/req for discover
 //	__BYTES_SELECT__      bytes/req for select
 //	__BYTES_INIT__        bytes/req for init
 //	__BYTES_CONFIRM__     bytes/req for confirm
 //	__MEM_DISCOVER_KB__   bytes/req converted to KB for discover
 //	__MEM_SELECT_KB__     bytes/req converted to KB for select
 //	__MEM_INIT_KB__       bytes/req converted to KB for init
 //	__MEM_CONFIRM_KB__    bytes/req converted to KB for confirm
 //	__PEAK_RPS__          highest RPS across all GOMAXPROCS levels
 //	__CACHE_WARM_US__     mean latency in µs for CacheWarm
 //	__CACHE_COLD_US__     mean latency in µs for CacheCold
 //	__CACHE_WARM_ALLOCS__ allocs/req for CacheWarm
 //	__CACHE_COLD_ALLOCS__ allocs/req for CacheCold
 //	__CACHE_WARM_BYTES__  bytes/req for CacheWarm
 //	__CACHE_COLD_BYTES__  bytes/req for CacheCold
 //	__CACHE_DELTA__       formatted warm-vs-cold delta string
 //	__THROUGHPUT_TABLE__  generated markdown table from throughput_report.csv
 //	__BENCHSTAT_SUMMARY__ raw contents of benchstat_summary.txt
 package main
 import (
 	"bufio"
 	"encoding/csv"
 	"flag"
 	"fmt"
 	"io"
 	"math"
 	"os"
 	"path/filepath"
 	"regexp"
 	"strconv"
 	"strings"
 )
 func main() {
 	dir := flag.String("dir", "", "Results directory (required)")
 	tmplPath := flag.String("template", "benchmarks/reports/REPORT_TEMPLATE.md", "Path to report template")
 	version := flag.String("version", "unknown", "Adapter version (e.g. v1.5.0)")
 	flag.Parse()
 	if *dir == "" {
 		fmt.Fprintln(os.Stderr, "ERROR: -dir is required")
 		os.Exit(1)
 	}
 	// Derive timestamp from the directory basename.
 	timestamp := filepath.Base(*dir)
 	// ── Read template ──────────────────────────────────────────────────────────
 	tmplBytes, err := os.ReadFile(*tmplPath)
 	if err != nil {
 		fmt.Fprintf(os.Stderr, "ERROR: reading template %s: %v\n", *tmplPath, err)
 		os.Exit(1)
 	}
 	report := string(tmplBytes)
 	// ── Parse run1.txt for environment metadata ────────────────────────────────
 	env := parseEnv(filepath.Join(*dir, "run1.txt"))
 	// ── Parse latency_report.csv ──────────────────────────────────────────────
 	latency, err := parseLatencyCSV(filepath.Join(*dir, "latency_report.csv"))
 	if err != nil {
 		fmt.Fprintf(os.Stderr, "WARNING: could not parse latency_report.csv: %v\n", err)
 	}
 	// ── Parse throughput_report.csv ───────────────────────────────────────────
 	throughput, err := parseThroughputCSV(filepath.Join(*dir, "throughput_report.csv"))
 	if err != nil {
 		fmt.Fprintf(os.Stderr, "WARNING: could not parse throughput_report.csv: %v\n", err)
 	}
 	// ── Read benchstat_summary.txt ────────────────────────────────────────────
 	benchstat := readFileOrDefault(filepath.Join(*dir, "benchstat_summary.txt"),
 		"(benchstat output not available)")
 	// ── Compute derived values ─────────────────────────────────────────────────
 	// Mean latency: convert ms → µs, round to integer.
 	meanDiscoverUS := msToUS(latency["BenchmarkBAPCaller_Discover"]["mean_ms"])
 	meanSelectUS := msToUS(latency["BenchmarkBAPCaller_AllActions/select"]["mean_ms"])
 	meanInitUS := msToUS(latency["BenchmarkBAPCaller_AllActions/init"]["mean_ms"])
 	meanConfirmUS := msToUS(latency["BenchmarkBAPCaller_AllActions/confirm"]["mean_ms"])
 	// Percentiles come from the Discover_Percentiles row.
 	perc := latency["BenchmarkBAPCaller_Discover_Percentiles"]
 	p50 := fmtMetric(perc["p50_µs"], "µs")
 	p95 := fmtMetric(perc["p95_µs"], "µs")
 	p99 := fmtMetric(perc["p99_µs"], "µs")
 	// Memory: bytes → KB (1 decimal place).
 	memDiscoverKB := bytesToKB(latency["BenchmarkBAPCaller_Discover"]["bytes_op"])
 	memSelectKB := bytesToKB(latency["BenchmarkBAPCaller_AllActions/select"]["bytes_op"])
 	memInitKB := bytesToKB(latency["BenchmarkBAPCaller_AllActions/init"]["bytes_op"])
 	memConfirmKB := bytesToKB(latency["BenchmarkBAPCaller_AllActions/confirm"]["bytes_op"])
 	// Cache delta.
 	warmUS := msToUS(latency["BenchmarkBAPCaller_CacheWarm"]["mean_ms"])
 	coldUS := msToUS(latency["BenchmarkBAPCaller_CacheCold"]["mean_ms"])
 	cacheDelta := formatCacheDelta(warmUS, coldUS)
 	// Peak RPS across all concurrency levels.
 	peakRPS := "—"
 	var peakRPSVal float64
 	for _, row := range throughput {
 		if v := parseFloatOrZero(row["rps"]); v > peakRPSVal {
 			peakRPSVal = v
 			peakRPS = fmt.Sprintf("%.0f", peakRPSVal)
 		}
 	}
 	// ── Build throughput table ─────────────────────────────────────────────────
 	throughputTable := buildThroughputTable(throughput)
 	// ── Generate interpretation and recommendation ─────────────────────────────
 	interpretation := buildInterpretation(perc, latency, throughput, warmUS, coldUS)
 	recommendation := buildRecommendation(throughput)
 	// ── Apply substitutions ────────────────────────────────────────────────────
 	replacements := map[string]string{
 		"__TIMESTAMP__":        timestamp,
 		"__ONIX_VERSION__":     *version,
 		"__GOOS__":             env["goos"],
 		"__GOARCH__":           env["goarch"],
 		"__CPU__":              env["cpu"],
 		"__GOMAXPROCS__":       env["gomaxprocs"],
 		"__P50_US__":           p50,
 		"__P95_US__":           p95,
 		"__P99_US__":           p99,
 		"__MEAN_DISCOVER_US__": meanDiscoverUS,
 		"__MEAN_SELECT_US__":   meanSelectUS,
 		"__MEAN_INIT_US__":     meanInitUS,
 		"__MEAN_CONFIRM_US__":  meanConfirmUS,
 		"__ALLOCS_DISCOVER__":  fmtInt(latency["BenchmarkBAPCaller_Discover"]["allocs_op"]),
 		"__ALLOCS_SELECT__":    fmtInt(latency["BenchmarkBAPCaller_AllActions/select"]["allocs_op"]),
 		"__ALLOCS_INIT__":      fmtInt(latency["BenchmarkBAPCaller_AllActions/init"]["allocs_op"]),
 		"__ALLOCS_CONFIRM__":   fmtInt(latency["BenchmarkBAPCaller_AllActions/confirm"]["allocs_op"]),
 		"__BYTES_DISCOVER__":   fmtInt(latency["BenchmarkBAPCaller_Discover"]["bytes_op"]),
 		"__BYTES_SELECT__":     fmtInt(latency["BenchmarkBAPCaller_AllActions/select"]["bytes_op"]),
 		"__BYTES_INIT__":       fmtInt(latency["BenchmarkBAPCaller_AllActions/init"]["bytes_op"]),
 		"__BYTES_CONFIRM__":    fmtInt(latency["BenchmarkBAPCaller_AllActions/confirm"]["bytes_op"]),
 		"__MEM_DISCOVER_KB__":  memDiscoverKB,
 		"__MEM_SELECT_KB__":    memSelectKB,
 		"__MEM_INIT_KB__":      memInitKB,
 		"__MEM_CONFIRM_KB__":   memConfirmKB,
 		"__PEAK_RPS__":         peakRPS,
 		"__CACHE_WARM_US__":    warmUS,
 		"__CACHE_COLD_US__":    coldUS,
 		"__CACHE_WARM_ALLOCS__": fmtInt(latency["BenchmarkBAPCaller_CacheWarm"]["allocs_op"]),
 		"__CACHE_COLD_ALLOCS__": fmtInt(latency["BenchmarkBAPCaller_CacheCold"]["allocs_op"]),
 		"__CACHE_WARM_BYTES__":  fmtInt(latency["BenchmarkBAPCaller_CacheWarm"]["bytes_op"]),
 		"__CACHE_COLD_BYTES__":  fmtInt(latency["BenchmarkBAPCaller_CacheCold"]["bytes_op"]),
 		"__CACHE_DELTA__":      cacheDelta,
 		"__THROUGHPUT_TABLE__":  throughputTable,
 		"__BENCHSTAT_SUMMARY__": benchstat,
 		"__INTERPRETATION__":   interpretation,
 		"__RECOMMENDATION__":   recommendation,
 	}
 	for placeholder, value := range replacements {
 		report = strings.ReplaceAll(report, placeholder, value)
 	}
 	// ── Write output ───────────────────────────────────────────────────────────
 	outPath := filepath.Join(*dir, "BENCHMARK_REPORT.md")
 	if err := os.WriteFile(outPath, []byte(report), 0o644); err != nil {
 		fmt.Fprintf(os.Stderr, "ERROR: writing report: %v\n", err)
 		os.Exit(1)
 	}
 	fmt.Printf("  Written → %s\n", outPath)
 }
 // ── Parsers ────────────────────────────────────────────────────────────────────
 var gomaxprocsRe = regexp.MustCompile(`-(\d+)$`)
 // parseEnv reads goos, goarch, cpu, and GOMAXPROCS from a run*.txt file header.
 func parseEnv(path string) map[string]string {
 	env := map[string]string{
 		"goos": "unknown", "goarch": "unknown",
 		"cpu": "unknown", "gomaxprocs": "unknown",
 	}
 	f, err := os.Open(path)
 	if err != nil {
 		return env
 	}
 	defer f.Close()
 	scanner := bufio.NewScanner(f)
 	for scanner.Scan() {
 		line := strings.TrimSpace(scanner.Text())
 		switch {
 		case strings.HasPrefix(line, "goos:"):
 			env["goos"] = strings.TrimSpace(strings.TrimPrefix(line, "goos:"))
 		case strings.HasPrefix(line, "goarch:"):
 			env["goarch"] = strings.TrimSpace(strings.TrimPrefix(line, "goarch:"))
 		case strings.HasPrefix(line, "cpu:"):
 			env["cpu"] = strings.TrimSpace(strings.TrimPrefix(line, "cpu:"))
 		case strings.HasPrefix(line, "Benchmark"):
 			// Extract GOMAXPROCS from first benchmark line suffix (e.g. "-10").
 			if m := gomaxprocsRe.FindStringSubmatch(strings.Fields(line)[0]); m != nil {
 				env["gomaxprocs"] = m[1]
 			}
 		}
 	}
 	return env
 }
 // parseLatencyCSV returns a map of benchmark name → field name → raw string value.
 // When multiple rows exist for the same benchmark (3 serial runs), values from
 // the first non-empty occurrence are used.
 func parseLatencyCSV(path string) (map[string]map[string]string, error) {
 	f, err := os.Open(path)
 	if err != nil {
 		return nil, err
 	}
 	defer f.Close()
 	r := csv.NewReader(f)
 	header, err := r.Read()
 	if err != nil {
 		return nil, err
 	}
 	result := map[string]map[string]string{}
 	for {
 		row, err := r.Read()
 		if err == io.EOF {
 			break
 		}
 		if err != nil || len(row) == 0 {
 			continue
 		}
 		name := row[0]
 		if _, exists := result[name]; !exists {
 			result[name] = map[string]string{}
 		}
 		for i, col := range header[1:] {
 			idx := i + 1
 			if idx < len(row) && row[idx] != "" && result[name][col] == "" {
 				result[name][col] = row[idx]
 			}
 		}
 	}
 	return result, nil
 }
 // parseThroughputCSV returns rows as a slice of field maps.
 func parseThroughputCSV(path string) ([]map[string]string, error) {
 	f, err := os.Open(path)
 	if err != nil {
 		return nil, err
 	}
 	defer f.Close()
 	r := csv.NewReader(f)
 	header, err := r.Read()
 	if err != nil {
 		return nil, err
 	}
 	var rows []map[string]string
 	for {
 		row, err := r.Read()
 		if err == io.EOF {
 			break
 		}
 		if err != nil || len(row) == 0 {
 			continue
 		}
 		m := map[string]string{}
 		for i, col := range header {
 			if i < len(row) {
 				m[col] = row[i]
 			}
 		}
 		rows = append(rows, m)
 	}
 	return rows, nil
 }
 // buildThroughputTable renders the throughput CSV as a markdown table.
 func buildThroughputTable(rows []map[string]string) string {
 	if len(rows) == 0 {
 		return "_No concurrency sweep data available._"
 	}
 	var sb strings.Builder
 	sb.WriteString("| GOMAXPROCS | Mean Latency (µs) | RPS |\n")
 	sb.WriteString("|:----------:|------------------:|----:|\n")
 	for _, row := range rows {
 		cpu := orDash(row["gomaxprocs"])
 		latUS := "—"
 		if v := parseFloatOrZero(row["mean_latency_ms"]); v > 0 {
 			latUS = fmt.Sprintf("%.0f", v*1000)
 		}
 		rps := orDash(row["rps"])
 		sb.WriteString(fmt.Sprintf("| %s | %s | %s |\n", cpu, latUS, rps))
 	}
 	return sb.String()
 }
 // ── Formatters ─────────────────────────────────────────────────────────────────
 // msToUS converts a ms string to a rounded µs string.
 func msToUS(ms string) string {
 	v := parseFloatOrZero(ms)
 	if v == 0 {
 		return "—"
 	}
 	return fmt.Sprintf("%.0f", v*1000)
 }
 // bytesToKB converts a bytes string to a KB string with 1 decimal place.
 func bytesToKB(bytes string) string {
 	v := parseFloatOrZero(bytes)
 	if v == 0 {
 		return "—"
 	}
 	return fmt.Sprintf("%.1f", v/1024)
 }
 // fmtInt formats a float string as a rounded integer string.
 func fmtInt(s string) string {
 	v := parseFloatOrZero(s)
 	if v == 0 {
 		return "—"
 	}
 	return fmt.Sprintf("%.0f", math.Round(v))
 }
 // fmtMetric formats a metric value with the given unit, or returns "—".
 func fmtMetric(s, unit string) string {
 	v := parseFloatOrZero(s)
 	if v == 0 {
 		return "—"
 	}
 	return fmt.Sprintf("%.0f %s", v, unit)
 }
 // formatCacheDelta produces a human-readable warm-vs-cold delta string.
 func formatCacheDelta(warmUS, coldUS string) string {
 	w := parseFloatOrZero(warmUS)
 	c := parseFloatOrZero(coldUS)
 	if w == 0 || c == 0 {
 		return "—"
 	}
 	delta := w - c
 	sign := "+"
 	if delta < 0 {
 		sign = ""
 	}
 	return fmt.Sprintf("%s%.0f µs (warm vs cold)", sign, delta)
 }
 func orDash(s string) string {
 	if s == "" {
 		return "—"
 	}
 	return s
 }
 func parseFloatOrZero(s string) float64 {
 	v, _ := strconv.ParseFloat(strings.TrimSpace(s), 64)
 	return v
 }
 func readFileOrDefault(path, def string) string {
 	b, err := os.ReadFile(path)
 	if err != nil {
 		return def
 	}
 	return strings.TrimRight(string(b), "\n")
 }
 // ── Narrative generators ───────────────────────────────────────────────────────
 // buildInterpretation generates a data-driven interpretation paragraph from the
 // benchmark results. It covers tail-latency control, action complexity trend,
 // concurrency scaling efficiency, and cache impact.
 func buildInterpretation(
 	perc map[string]string,
 	latency map[string]map[string]string,
 	throughput []map[string]string,
 	warmUS, coldUS string,
 ) string {
 	var sb strings.Builder
 	p50 := parseFloatOrZero(perc["p50_µs"])
 	p99 := parseFloatOrZero(perc["p99_µs"])
 	meanDiscover := parseFloatOrZero(latency["BenchmarkBAPCaller_Discover"]["mean_ms"]) * 1000
 	// Tail-latency control.
 	if p50 > 0 && p99 > 0 {
 		ratio := p99 / p50
 		quality := "good"
 		if ratio > 5 {
 			quality = "poor"
 		} else if ratio > 3 {
 			quality = "moderate"
 		}
 		sb.WriteString(fmt.Sprintf(
 			"The adapter delivers a p50 latency of **%.0f µs** for the discover action. "+
 				"The p99/p50 ratio is **%.1f×**, indicating %s tail-latency control — "+
 				"spikes are %s relative to the median.\n\n",
 			p50, ratio, quality, tailDescription(ratio),
 		))
 	} else if meanDiscover > 0 {
 		sb.WriteString(fmt.Sprintf(
 			"The adapter delivers a mean latency of **%.0f µs** for the discover action. "+
 				"Run with `-bench=BenchmarkBAPCaller_Discover_Percentiles` to obtain p50/p95/p99 data.\n\n",
 			meanDiscover,
 		))
 	}
 	// Action complexity trend.
 	selectMS := parseFloatOrZero(latency["BenchmarkBAPCaller_AllActions/select"]["mean_ms"]) * 1000
 	initMS := parseFloatOrZero(latency["BenchmarkBAPCaller_AllActions/init"]["mean_ms"]) * 1000
 	confirmMS := parseFloatOrZero(latency["BenchmarkBAPCaller_AllActions/confirm"]["mean_ms"]) * 1000
 	if meanDiscover > 0 && selectMS > 0 && initMS > 0 && confirmMS > 0 {
 		sb.WriteString(fmt.Sprintf(
 			"Latency scales with payload complexity: select (+%.0f%%), init (+%.0f%%), confirm (+%.0f%%) "+
 				"vs the discover baseline. Allocation counts track proportionally, driven by JSON "+
 				"unmarshalling and schema validation of larger payloads.\n\n",
 			pctChange(meanDiscover, selectMS),
 			pctChange(meanDiscover, initMS),
 			pctChange(meanDiscover, confirmMS),
 		))
 	}
 	// Concurrency scaling.
 	lat1 := latencyAtCPU(throughput, "1")
 	lat16 := latencyAtCPU(throughput, "16")
 	if lat1 > 0 && lat16 > 0 {
 		improvement := lat1 / lat16
 		sb.WriteString(fmt.Sprintf(
 			"Concurrency scaling is effective: mean latency drops from **%.0f µs** at GOMAXPROCS=1 "+
 				"to **%.0f µs** at GOMAXPROCS=16 — a **%.1f× improvement**.",
 			lat1*1000, lat16*1000, improvement,
 		))
 		if improvement < 4 {
 			sb.WriteString(" Gains taper beyond 8 cores, suggesting a shared serialisation point " +
 				"(likely schema validation or key derivation).")
 		}
 		sb.WriteString("\n\n")
 	}
 	// Cache impact.
 	w := parseFloatOrZero(warmUS)
 	c := parseFloatOrZero(coldUS)
 	if w > 0 && c > 0 {
 		delta := math.Abs(w-c) / w * 100
 		if delta < 5 {
 			sb.WriteString(fmt.Sprintf(
 				"The Redis key-manager cache shows **no measurable impact** in this setup "+
 					"(warm vs cold delta: %.0f µs, %.1f%% of mean). "+
 					"miniredis is in-process; signing and schema validation dominate. "+
 					"Cache benefit would be visible with real Redis over a network.",
 				math.Abs(w-c), delta,
 			))
 		} else {
 			sb.WriteString(fmt.Sprintf(
 				"The Redis key-manager cache provides a **%.0f µs improvement** (%.1f%%) "+
 					"on the warm path vs cold.",
 				math.Abs(w-c), delta,
 			))
 		}
 		sb.WriteString("\n")
 	}
 	if sb.Len() == 0 {
 		return "_Insufficient data to generate interpretation. Ensure all benchmark scenarios completed successfully._"
 	}
 	return strings.TrimRight(sb.String(), "\n")
 }
 // buildRecommendation generates a sizing and tuning recommendation based on the
 // concurrency sweep results.
 func buildRecommendation(throughput []map[string]string) string {
 	if len(throughput) == 0 {
 		return "_Run the concurrency sweep to generate sizing recommendations._"
 	}
 	// Find the GOMAXPROCS level with best scaling efficiency (RPS gain per core).
 	type cpuPoint struct {
 		cpu int
 		rps float64
 		lat float64
 	}
 	var points []cpuPoint
 	for _, row := range throughput {
 		cpu := int(parseFloatOrZero(row["gomaxprocs"]))
 		rps := parseFloatOrZero(row["rps"])
 		lat := parseFloatOrZero(row["mean_latency_ms"]) * 1000
 		if cpu > 0 && lat > 0 {
 			points = append(points, cpuPoint{cpu, rps, lat})
 		}
 	}
 	if len(points) == 0 {
 		return "_Run the concurrency sweep (parallel_cpu*.txt) to generate sizing recommendations._"
 	}
 	// Find sweet spot: largest latency improvement per doubling of cores.
 	bestEffCPU := points[0].cpu
 	bestEff := 0.0
 	for i := 1; i < len(points); i++ {
 		if points[i-1].lat > 0 {
 			eff := (points[i-1].lat - points[i].lat) / points[i-1].lat
 			if eff > bestEff {
 				bestEff = eff
 				bestEffCPU = points[i].cpu
 			}
 		}
 	}
 	var sb strings.Builder
 	sb.WriteString(fmt.Sprintf(
 		"**%d cores** offers the best throughput/cost ratio based on the concurrency sweep — "+
 			"scaling efficiency begins to taper beyond this point.\n\n",
 		bestEffCPU,
 	))
 	sb.WriteString("The adapter is ready for staged load testing against a real BPP. " +
 		"For production sizing, start with the recommended core count above and adjust based " +
 		"on observed throughput targets. If schema validation dominates CPU (likely at high " +
 		"concurrency), profile with `go tool pprof` using the commands in B5 to isolate the bottleneck.")
 	return sb.String()
 }
 // ── Narrative helpers ──────────────────────────────────────────────────────────
 func tailDescription(ratio float64) string {
 	switch {
 	case ratio <= 2:
 		return "minimal"
 	case ratio <= 3:
 		return "modest"
 	case ratio <= 5:
 		return "noticeable"
 	default:
 		return "significant"
 	}
 }
 func pctChange(base, val float64) float64 {
 	if base == 0 {
 		return 0
 	}
 	return (val - base) / base * 100
 }
 func latencyAtCPU(throughput []map[string]string, cpu string) float64 {
 	for _, row := range throughput {
 		if row["gomaxprocs"] == cpu {
 			if v := parseFloatOrZero(row["mean_latency_ms"]); v > 0 {
 				return v
 			}
 		}
 	}
 	return 0
 }
--- a/benchmarks/tools/parse_results.go
+++ b/benchmarks/tools/parse_results.go
@@ -0,0 +1,256 @@
 // parse_results.go — Parses raw go test -bench output from the benchmark results
 // directory and produces two CSV files for analysis and reporting.
 //
 // Usage:
 //
 //	go run benchmarks/tools/parse_results.go \
 //	  -dir=benchmarks/results/<timestamp>/ \
 //	  -out=benchmarks/results/<timestamp>/
 //
 // Output files:
 //
 //	latency_report.csv    — per-benchmark mean, p50, p95, p99 latency, allocs
 //	throughput_report.csv — RPS and mean latency at each GOMAXPROCS level from the parallel sweep
 package main
 import (
 	"bufio"
 	"encoding/csv"
 	"flag"
 	"fmt"
 	"os"
 	"path/filepath"
 	"regexp"
 	"strconv"
 	"strings"
 )
 var (
 	// Matches the benchmark name and ns/op from a standard go test -bench output line.
 	// Go outputs custom metrics (p50_µs, req/s, …) BEFORE B/op and allocs/op, so we
 	// extract those fields with dedicated regexps rather than relying on positional groups.
 	//
 	// Example lines:
 	//   BenchmarkBAPCaller_Discover-10        73542  164193 ns/op  82913 B/op  662 allocs/op
 	//   BenchmarkBAPCaller_Discover_Percentiles-10  72849  164518 ns/op  130.0 p50_µs  144.0 p95_µs  317.0 p99_µs  82528 B/op  660 allocs/op
 	//   BenchmarkBAPCaller_RPS-4              700465  73466 ns/op  14356.0 req/s  80375 B/op  660 allocs/op
 	benchLineRe = regexp.MustCompile(`^(Benchmark\S+)\s+\d+\s+([\d.]+)\s+ns/op`)
 	bytesRe     = regexp.MustCompile(`([\d.]+)\s+B/op`)
 	allocsRe    = regexp.MustCompile(`([\d.]+)\s+allocs/op`)
 	// Extracts any custom metric value from a benchmark line.
 	metricRe = regexp.MustCompile(`([\d.]+)\s+(p50_µs|p95_µs|p99_µs|req/s)`)
 )
 type benchResult struct {
 	name     string
 	nsPerOp  float64
 	bytesOp  float64
 	allocsOp float64
 	p50      float64
 	p95      float64
 	p99      float64
 	rps      float64
 }
 // cpuResult pairs a GOMAXPROCS value with a benchmark result from the parallel sweep.
 type cpuResult struct {
 	cpu int
 	res benchResult
 }
 func main() {
 	dir := flag.String("dir", ".", "Directory containing benchmark result files")
 	out := flag.String("out", ".", "Output directory for CSV files")
 	flag.Parse()
 	if err := os.MkdirAll(*out, 0o755); err != nil {
 		fmt.Fprintf(os.Stderr, "ERROR creating output dir: %v\n", err)
 		os.Exit(1)
 	}
 	// ── Parse serial runs (run1.txt, run2.txt, run3.txt) ─────────────────────
 	var latencyResults []benchResult
 	for _, runFile := range []string{"run1.txt", "run2.txt", "run3.txt"} {
 		path := filepath.Join(*dir, runFile)
 		results, err := parseRunFile(path)
 		if err != nil {
 			fmt.Fprintf(os.Stderr, "WARNING: could not parse %s: %v\n", runFile, err)
 			continue
 		}
 		latencyResults = append(latencyResults, results...)
 	}
 	// Also parse percentiles file for p50/p95/p99.
 	percPath := filepath.Join(*dir, "percentiles.txt")
 	if percResults, err := parseRunFile(percPath); err == nil {
 		latencyResults = append(latencyResults, percResults...)
 	}
 	if err := writeLatencyCSV(filepath.Join(*out, "latency_report.csv"), latencyResults); err != nil {
 		fmt.Fprintf(os.Stderr, "ERROR writing latency CSV: %v\n", err)
 		os.Exit(1)
 	}
 	fmt.Printf("Written: %s\n", filepath.Join(*out, "latency_report.csv"))
 	// ── Parse parallel sweep (parallel_cpu*.txt) ──────────────────────────────
 	var throughputRows []cpuResult
 	for _, cpu := range []int{1, 2, 4, 8, 16} {
 		path := filepath.Join(*dir, fmt.Sprintf("parallel_cpu%d.txt", cpu))
 		results, err := parseRunFile(path)
 		if err != nil {
 			fmt.Fprintf(os.Stderr, "WARNING: could not parse parallel_cpu%d.txt: %v\n", cpu, err)
 			continue
 		}
 		for _, r := range results {
 			throughputRows = append(throughputRows, cpuResult{cpu: cpu, res: r})
 		}
 	}
 	if err := writeThroughputCSV(filepath.Join(*out, "throughput_report.csv"), throughputRows); err != nil {
 		fmt.Fprintf(os.Stderr, "ERROR writing throughput CSV: %v\n", err)
 		os.Exit(1)
 	}
 	fmt.Printf("Written: %s\n", filepath.Join(*out, "throughput_report.csv"))
 }
 // parseRunFile reads a go test -bench output file and returns all benchmark results.
 func parseRunFile(path string) ([]benchResult, error) {
 	f, err := os.Open(path)
 	if err != nil {
 		return nil, err
 	}
 	defer f.Close()
 	var results []benchResult
 	scanner := bufio.NewScanner(f)
 	for scanner.Scan() {
 		line := strings.TrimSpace(scanner.Text())
 		m := benchLineRe.FindStringSubmatch(line)
 		if m == nil {
 			continue
 		}
 		r := benchResult{name: stripCPUSuffix(m[1])}
 		r.nsPerOp = parseFloat(m[2])
 		// B/op and allocs/op — extracted independently because Go places custom
 		// metrics (p50_µs, req/s, …) between ns/op and B/op on the same line.
 		if bm := bytesRe.FindStringSubmatch(line); bm != nil {
 			r.bytesOp = parseFloat(bm[1])
 		}
 		if am := allocsRe.FindStringSubmatch(line); am != nil {
 			r.allocsOp = parseFloat(am[1])
 		}
 		// Custom metrics — scan the whole line regardless of position.
 		for _, mm := range metricRe.FindAllStringSubmatch(line, -1) {
 			switch mm[2] {
 			case "p50_µs":
 				r.p50 = parseFloat(mm[1])
 			case "p95_µs":
 				r.p95 = parseFloat(mm[1])
 			case "p99_µs":
 				r.p99 = parseFloat(mm[1])
 			case "req/s":
 				r.rps = parseFloat(mm[1])
 			}
 		}
 		results = append(results, r)
 	}
 	return results, scanner.Err()
 }
 func writeLatencyCSV(path string, results []benchResult) error {
 	f, err := os.Create(path)
 	if err != nil {
 		return err
 	}
 	defer f.Close()
 	w := csv.NewWriter(f)
 	defer w.Flush()
 	header := []string{"benchmark", "mean_ms", "p50_µs", "p95_µs", "p99_µs", "allocs_op", "bytes_op"}
 	if err := w.Write(header); err != nil {
 		return err
 	}
 	for _, r := range results {
 		row := []string{
 			r.name,
 			fmtFloat(r.nsPerOp / 1e6), // ns/op → ms
 			fmtFloat(r.p50),
 			fmtFloat(r.p95),
 			fmtFloat(r.p99),
 			fmtFloat(r.allocsOp),
 			fmtFloat(r.bytesOp),
 		}
 		if err := w.Write(row); err != nil {
 			return err
 		}
 	}
 	return nil
 }
 func writeThroughputCSV(path string, rows []cpuResult) error {
 	f, err := os.Create(path)
 	if err != nil {
 		return err
 	}
 	defer f.Close()
 	w := csv.NewWriter(f)
 	defer w.Flush()
 	// p95 latency is not available from the parallel sweep files — those benchmarks
 	// only emit ns/op and req/s. p95 data comes exclusively from
 	// BenchmarkBAPCaller_Discover_Percentiles, which runs at a single GOMAXPROCS
 	// setting and is not part of the concurrency sweep.
 	header := []string{"gomaxprocs", "benchmark", "rps", "mean_latency_ms"}
 	if err := w.Write(header); err != nil {
 		return err
 	}
 	for _, row := range rows {
 		r := []string{
 			strconv.Itoa(row.cpu),
 			row.res.name,
 			fmtFloat(row.res.rps),
 			fmtFloat(row.res.nsPerOp / 1e6),
 		}
 		if err := w.Write(r); err != nil {
 			return err
 		}
 	}
 	return nil
 }
 // stripCPUSuffix removes trailing "-N" goroutine count suffixes from benchmark names.
 func stripCPUSuffix(name string) string {
 	if idx := strings.LastIndex(name, "-"); idx > 0 {
 		if _, err := strconv.Atoi(name[idx+1:]); err == nil {
 			return name[:idx]
 		}
 	}
 	return name
 }
 func parseFloat(s string) float64 {
 	if s == "" {
 		return 0
 	}
 	v, _ := strconv.ParseFloat(s, 64)
 	return v
 }
 func fmtFloat(v float64) string {
 	if v == 0 {
 		return ""
 	}
 	return strconv.FormatFloat(v, 'f', 3, 64)
 }
--- a/go.mod
+++ b/go.mod
@@ -4,7 +4,7 @@ go 1.24.6
 require (
 	github.com/santhosh-tekuri/jsonschema/v6 v6.0.1
-	golang.org/x/crypto v0.47.0
+	golang.org/x/crypto v0.49.0
 )
 require github.com/stretchr/testify v1.11.1
@@ -19,9 +19,12 @@ require (
 require github.com/zenazn/pkcs7pad v0.0.0-20170308005700-253a5b1f0e03
-require golang.org/x/text v0.33.0 // indirect
+tool golang.org/x/perf/cmd/benchstat
 require golang.org/x/text v0.35.0 // indirect
 require (
 	github.com/aclements/go-moremath v0.0.0-20210112150236-f10218a38794 // indirect
 	github.com/agnivade/levenshtein v1.2.1 // indirect
 	github.com/beorn7/perks v1.0.1 // indirect
 	github.com/cenkalti/backoff/v4 v4.3.0 // indirect
@@ -82,9 +85,10 @@ require (
 	go.opentelemetry.io/proto/otlp v1.9.0 // indirect
 	go.yaml.in/yaml/v2 v2.4.2 // indirect
 	go.yaml.in/yaml/v3 v3.0.4 // indirect
-	golang.org/x/net v0.49.0 // indirect
+	golang.org/x/net v0.52.0 // indirect
-	golang.org/x/sync v0.19.0 // indirect
+	golang.org/x/perf v0.0.0-20260312031701-16a31bc5fbd0 // indirect
-	golang.org/x/sys v0.40.0 // indirect
+	golang.org/x/sync v0.20.0 // indirect
 	golang.org/x/sys v0.42.0 // indirect
 	golang.org/x/time v0.14.0 // indirect
 	google.golang.org/genproto/googleapis/api v0.0.0-20260128011058-8636f8732409 // indirect
 	google.golang.org/genproto/googleapis/rpc v0.0.0-20260128011058-8636f8732409 // indirect
--- a/go.sum
+++ b/go.sum
@@ -1,3 +1,5 @@
 github.com/aclements/go-moremath v0.0.0-20210112150236-f10218a38794 h1:xlwdaKcTNVW4PtpQb8aKA4Pjy0CdJHEqvFbAnvR5m2g=
 github.com/aclements/go-moremath v0.0.0-20210112150236-f10218a38794/go.mod h1:7e+I0LQFUI9AXWxOfsQROs9xPhoJtbsyWcjJqDd4KPY=
 github.com/agnivade/levenshtein v1.2.1 h1:EHBY3UOn1gwdy/VbFwgo4cxecRznFk7fKWN1KOX7eoM=
 github.com/agnivade/levenshtein v1.2.1/go.mod h1:QVVI16kDrtSuwcpd0p1+xMC6Z/VfhtCyDIjcwga4/DU=
 github.com/andreyvit/diff v0.0.0-20170406064948-c7f18ee00883 h1:bvNMNQO63//z+xNgfBlViaCIJKLlCJ6/fmUseuG0wVQ=
@@ -274,26 +276,28 @@ go.yaml.in/yaml/v2 v2.4.2 h1:DzmwEr2rDGHl7lsFgAHxmNz/1NlQ7xLIrlN2h5d1eGI=
 go.yaml.in/yaml/v2 v2.4.2/go.mod h1:081UH+NErpNdqlCXm3TtEran0rJZGxAYx9hb/ELlsPU=
 go.yaml.in/yaml/v3 v3.0.4 h1:tfq32ie2Jv2UxXFdLJdh3jXuOzWiL1fo0bu/FbuKpbc=
 go.yaml.in/yaml/v3 v3.0.4/go.mod h1:DhzuOOF2ATzADvBadXxruRBLzYTpT36CKvDb3+aBEFg=
-golang.org/x/crypto v0.47.0 h1:V6e3FRj+n4dbpw86FJ8Fv7XVOql7TEwpHapKoMJ/GO8=
+golang.org/x/crypto v0.49.0 h1:+Ng2ULVvLHnJ/ZFEq4KdcDd/cfjrrjjNSXNzxg0Y4U4=
-golang.org/x/crypto v0.47.0/go.mod h1:ff3Y9VzzKbwSSEzWqJsJVBnWmRwRSHt/6Op5n9bQc4A=
+golang.org/x/crypto v0.49.0/go.mod h1:ErX4dUh2UM+CFYiXZRTcMpEcN8b/1gxEuv3nODoYtCA=
-golang.org/x/mod v0.31.0 h1:HaW9xtz0+kOcWKwli0ZXy79Ix+UW/vOfmWI5QVd2tgI=
+golang.org/x/mod v0.33.0 h1:tHFzIWbBifEmbwtGz65eaWyGiGZatSrT9prnU8DbVL8=
-golang.org/x/mod v0.31.0/go.mod h1:43JraMp9cGx1Rx3AqioxrbrhNsLl2l/iNAvuBkrezpg=
+golang.org/x/mod v0.33.0/go.mod h1:swjeQEj+6r7fODbD2cqrnje9PnziFuw4bmLbBZFrQ5w=
-golang.org/x/net v0.49.0 h1:eeHFmOGUTtaaPSGNmjBKpbng9MulQsJURQUAfUwY++o=
+golang.org/x/net v0.52.0 h1:He/TN1l0e4mmR3QqHMT2Xab3Aj3L9qjbhRm78/6jrW0=
-golang.org/x/net v0.49.0/go.mod h1:/ysNB2EvaqvesRkuLAyjI1ycPZlQHM3q01F02UY/MV8=
+golang.org/x/net v0.52.0/go.mod h1:R1MAz7uMZxVMualyPXb+VaqGSa3LIaUqk0eEt3w36Sw=
-golang.org/x/sync v0.19.0 h1:vV+1eWNmZ5geRlYjzm2adRgW2/mcpevXNg50YZtPCE4=
+golang.org/x/perf v0.0.0-20260312031701-16a31bc5fbd0 h1:VgUwdbeBqkERh4BX46p4O2fSng7duMS+0V01EEAt2Vk=
-golang.org/x/sync v0.19.0/go.mod h1:9KTHXmSnoGruLpwFjVSX0lNNA75CykiMECbovNTZqGI=
+golang.org/x/perf v0.0.0-20260312031701-16a31bc5fbd0/go.mod h1:UWOuhEKaiVtLW8tca1eEwpuNy4tzUubUXNAnA51k48o=
 golang.org/x/sync v0.20.0 h1:e0PTpb7pjO8GAtTs2dQ6jYa5BWYlMuX047Dco/pItO4=
 golang.org/x/sync v0.20.0/go.mod h1:9xrNwdLfx4jkKbNva9FpL6vEN7evnE43NNNJQ2LF3+0=
 golang.org/x/sys v0.0.0-20180823144017-11551d06cbcc/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
 golang.org/x/sys v0.0.0-20220811171246-fbc7d0a398ab/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
 golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
 golang.org/x/sys v0.12.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
-golang.org/x/sys v0.40.0 h1:DBZZqJ2Rkml6QMQsZywtnjnnGvHza6BTfYFWY9kjEWQ=
+golang.org/x/sys v0.42.0 h1:omrd2nAlyT5ESRdCLYdm3+fMfNFE/+Rf4bDIQImRJeo=
-golang.org/x/sys v0.40.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks=
+golang.org/x/sys v0.42.0/go.mod h1:4GL1E5IUh+htKOUEOaiffhrAeqysfVGipDYzABqnCmw=
-golang.org/x/text v0.33.0 h1:B3njUFyqtHDUI5jMn1YIr5B0IE2U0qck04r6d4KPAxE=
+golang.org/x/text v0.35.0 h1:JOVx6vVDFokkpaq1AEptVzLTpDe9KGpj5tR4/X+ybL8=
-golang.org/x/text v0.33.0/go.mod h1:LuMebE6+rBincTi9+xWTY8TztLzKHc/9C1uBCG27+q8=
+golang.org/x/text v0.35.0/go.mod h1:khi/HExzZJ2pGnjenulevKNX1W67CUy0AsXcNubPGCA=
 golang.org/x/time v0.14.0 h1:MRx4UaLrDotUKUdCIqzPC48t1Y9hANFKIRpNx+Te8PI=
 golang.org/x/time v0.14.0/go.mod h1:eL/Oa2bBBK0TkX57Fyni+NgnyQQN4LitPmob2Hjnqw4=
-golang.org/x/tools v0.40.0 h1:yLkxfA+Qnul4cs9QA3KnlFu0lVmd8JJfoq+E41uSutA=
+golang.org/x/tools v0.42.0 h1:uNgphsn75Tdz5Ji2q36v/nsFSfR/9BRFvqhGBaJGd5k=
-golang.org/x/tools v0.40.0/go.mod h1:Ik/tzLRlbscWpqqMRjyWYDisX8bG13FrdXp3o4Sr9lc=
+golang.org/x/tools v0.42.0/go.mod h1:Ma6lCIwGZvHK6XtgbswSoWroEkhugApmsXyrUmBhfr0=
 gonum.org/v1/gonum v0.16.0 h1:5+ul4Swaf3ESvrOnidPp4GZbzf0mxVQpDCYUQE7OJfk=
 gonum.org/v1/gonum v0.16.0/go.mod h1:fef3am4MQ93R2HHpKnLk4/Tbh/s0+wqD5nfa6Pnwy4E=
 google.golang.org/genproto/googleapis/api v0.0.0-20260128011058-8636f8732409 h1:merA0rdPeUV3YIIfHHcH4qBkiQAc1nfCKSI7lB4cV2M=