Merge pull request #650 from beckn/benchmark

[Benchmark] End-to-end performance benchmark suite for the beckn-onix adapter
2026-04-10 16:07:53 +05:30
parent dab54b574c e0d7e3508f
commit cc4eb1efdd
19 changed files with 6024 additions and 19 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -131,6 +131,12 @@ dist
 .yarn/install-state.gz
 .pnp.*

+# Benchmark runtime output (raw go test output, logs, CSVs)
+benchmarks/results/
+
+# Utility scripts not part of the project
+create_benchmark_issues.sh
+
 # Ignore compiled shared object files
 *.so

--- a/benchmarks/README.md
+++ b/benchmarks/README.md
@@ -0,0 +1,183 @@
+# beckn-onix Adapter Benchmarks
+
+End-to-end performance benchmarks for the beckn-onix ONIX adapter, using Go's native `testing.B` framework and `net/http/httptest`. No Docker, no external services — everything runs in-process.
+
+---
+
+## Quick Start
+
+```bash
+# From the repo root
+go mod tidy                        # fetch miniredis + benchstat checksums
+bash benchmarks/run_benchmarks.sh  # compile plugins, run all scenarios, generate report
+```
+
+Runtime output lands in `benchmarks/results/<timestamp>/` (gitignored). Committed reports live in `benchmarks/reports/`.
+
+---
+
+## What Is Being Benchmarked
+
+The benchmarks target the **`bapTxnCaller`** handler — the primary outbound path a BAP takes when initiating a Beckn transaction. Every request travels through the full production pipeline:
+
+```
+Benchmark goroutine(s)
+        │  HTTP POST /bap/caller/<action>
+        ▼
+httptest.Server  ←  ONIX adapter (real compiled .so plugins)
+        │
+        ├── addRoute      router plugin      resolve BPP URL from routing config
+        ├── sign          signer + simplekeymanager  Ed25519 / BLAKE-512 signing
+        └── validateSchema  schemav2validator  Beckn OpenAPI spec validation
+        │
+        └──▶ httptest mock BPP  (instant ACK — no network)
+```
+
+Mock services replace all external dependencies so results reflect **adapter-internal latency only**:
+
+| Dependency | Replaced by |
+|------------|-------------|
+| Redis | `miniredis` (in-process) |
+| BPP backend | `httptest` mock — returns `{"message":{"ack":{"status":"ACK"}}}` |
+| Beckn registry | `httptest` mock — returns the dev key pair for signature verification |
+
+---
+
+## Benchmark Scenarios
+
+| Benchmark | What it measures |
+|-----------|-----------------|
+| `BenchmarkBAPCaller_Discover` | Baseline single-goroutine latency for `/discover` |
+| `BenchmarkBAPCaller_Discover_Parallel` | Throughput under concurrent load; run with `-cpu=1,2,4,8,16` |
+| `BenchmarkBAPCaller_AllActions` | Per-action latency: `discover`, `select`, `init`, `confirm` |
+| `BenchmarkBAPCaller_Discover_Percentiles` | p50 / p95 / p99 latency via `b.ReportMetric` |
+| `BenchmarkBAPCaller_CacheWarm` | Latency when the Redis key cache is already populated |
+| `BenchmarkBAPCaller_CacheCold` | Latency on a cold cache — full key-derivation round-trip |
+| `BenchmarkBAPCaller_RPS` | Requests-per-second under parallel load (`req/s` custom metric) |
+
+---
+
+## How It Works
+
+### Startup (`TestMain`)
+
+Before any benchmark runs, `TestMain` in `e2e/setup_test.go`:
+
+1. **Compiles all required plugins** to a temporary directory using `go build -buildmode=plugin`. The first run takes 60–90 s (cold Go build cache); subsequent runs are near-instant.
+2. **Starts miniredis** — an in-process Redis server used by the `cache` plugin (no external Redis needed).
+3. **Starts mock servers** — an instant-ACK BPP and a registry mock that returns the dev signing public key.
+4. **Starts the adapter** — wires all plugins programmatically (no YAML parsing) and wraps it in an `httptest.Server`.
+
+### Per-iteration (`buildSignedRequest`)
+
+Each benchmark iteration:
+1. Loads the JSON fixture for the requested Beckn action (`testdata/<action>_request.json`).
+2. Substitutes sentinel values (`BENCH_TIMESTAMP`, `BENCH_MESSAGE_ID`, `BENCH_TRANSACTION_ID`) with fresh values, ensuring unique message IDs per iteration.
+3. Signs the body using the Beckn Ed25519/BLAKE-512 spec (same algorithm as the production `signer` plugin).
+4. Sends the signed `POST` to the adapter and validates a `200 OK` response.
+
+### Validation test (`TestSignBecknPayload`)
+
+A plain `Test*` function runs before the benchmarks and sends one signed request end-to-end. If the signing helper is mis-implemented, this fails fast before any benchmark time is wasted.
+
+---
+
+## Directory Layout
+
+```
+benchmarks/
+├── README.md                        ← you are here
+├── run_benchmarks.sh                ← one-shot runner script
+├── e2e/
+│   ├── bench_test.go                ← benchmark functions
+│   ├── setup_test.go                ← TestMain, startAdapter, signing helper
+│   ├── mocks_test.go                ← mock BPP and registry servers
+│   ├── keys_test.go                 ← dev key pair constants
+│   └── testdata/
+│       ├── routing-BAPCaller.yaml   ← routing config (BENCH_BPP_URL placeholder)
+│       ├── discover_request.json    ← Beckn search payload fixture
+│       ├── select_request.json
+│       ├── init_request.json
+│       └── confirm_request.json
+├── tools/
+│   ├── parse_results.go             ← CSV exporter for latency + throughput data
+│   └── generate_report.go           ← fills REPORT_TEMPLATE.md with run data
+├── reports/                         ← committed benchmark reports and template
+│   ├── REPORT_TEMPLATE.md           ← template used to generate each run's report
+│   └── REPORT_ONIX_v150.md          ← baseline report (Apple M5, Beckn v2.0.0)
+└── results/                         ← gitignored; created by run_benchmarks.sh
+    └── <timestamp>/
+        ├── BENCHMARK_REPORT.md            — generated human-readable report
+        ├── run1.txt, run2.txt, run3.txt   — raw go test -bench output
+        ├── parallel_cpu*.txt              — concurrency sweep
+        ├── benchstat_summary.txt          — statistical aggregation
+        ├── latency_report.csv             — per-benchmark latency (from parse_results.go)
+        └── throughput_report.csv          — RPS vs GOMAXPROCS (from parse_results.go)
+```
+
+---
+
+## Reports
+
+Committed reports are stored in `benchmarks/reports/`. Each report documents the environment, raw numbers, and analysis for a specific run and adapter version.
+
+| File | Platform | Adapter version |
+|------|----------|-----------------|
+| `REPORT_ONIX_v150.md` | Apple M5 · darwin/arm64 · GOMAXPROCS=10 | beckn-onix v1.5.0 |
+
+The script auto-generates `BENCHMARK_REPORT.md` in each results directory using `REPORT_TEMPLATE.md`. To permanently record a run:
+1. Run `bash benchmarks/run_benchmarks.sh` — `BENCHMARK_REPORT.md` is generated automatically.
+2. Review it, fill in the B5 bottleneck analysis section.
+3. Copy it to `benchmarks/reports/REPORT_<tag>.md` and commit.
+4. `benchmarks/results/` stays gitignored; only the curated report goes in.
+
+---
+
+## Running Individual Benchmarks
+
+```bash
+# Single benchmark, 10 s
+go test ./benchmarks/e2e/... \
+  -bench=BenchmarkBAPCaller_Discover \
+  -benchtime=10s -benchmem -timeout=30m
+
+# All actions in one shot
+go test ./benchmarks/e2e/... \
+  -bench=BenchmarkBAPCaller_AllActions \
+  -benchtime=5s -benchmem -timeout=30m
+
+# Concurrency sweep at 1, 4, and 16 goroutines
+go test ./benchmarks/e2e/... \
+  -bench=BenchmarkBAPCaller_Discover_Parallel \
+  -benchtime=30s -cpu=1,4,16 -timeout=30m
+
+# Race detector check (no data races)
+go test ./benchmarks/e2e/... \
+  -bench=BenchmarkBAPCaller_Discover_Parallel \
+  -benchtime=5s -race -timeout=30m
+
+# Percentile metrics (p50/p95/p99 in µs)
+go test ./benchmarks/e2e/... \
+  -bench=BenchmarkBAPCaller_Discover_Percentiles \
+  -benchtime=10s -benchmem -timeout=30m
+```
+
+## Comparing Two Runs with benchstat
+
+```bash
+go test ./benchmarks/e2e/... -bench=. -benchtime=10s -count=6 > before.txt
+# ... make your change ...
+go test ./benchmarks/e2e/... -bench=. -benchtime=10s -count=6 > after.txt
+go tool benchstat before.txt after.txt
+```
+
+---
+
+## Dependencies
+
+| Package | Purpose |
+|---------|---------|
+| `github.com/alicebob/miniredis/v2` | In-process Redis for the `cache` plugin |
+| `golang.org/x/perf/cmd/benchstat` | Statistical benchmark comparison (CLI tool) |
+
+Both are declared in `go.mod`. Run `go mod tidy` once to fetch their checksums.
--- a/benchmarks/e2e/bench_test.go
+++ b/benchmarks/e2e/bench_test.go
@@ -0,0 +1,183 @@
+package e2e_bench_test
+
+import (
+	"net/http"
+	"sort"
+	"testing"
+	"time"
+)
+
+// ── BenchmarkBAPCaller_Discover ───────────────────────────────────────────────
+// Baseline single-goroutine throughput and latency for the discover endpoint.
+// Exercises the full bapTxnCaller pipeline: addRoute → sign → validateSchema.
+func BenchmarkBAPCaller_Discover(b *testing.B) {
+	b.ReportAllocs()
+	b.ResetTimer()
+
+	for i := 0; i < b.N; i++ {
+		req := buildSignedRequest(b, "discover")
+		if err := sendRequest(req); err != nil {
+			b.Errorf("iteration %d: %v", i, err)
+		}
+	}
+}
+
+// ── BenchmarkBAPCaller_Discover_Parallel ─────────────────────────────────────
+// Measures throughput under concurrent load. Run with -cpu=1,2,4,8,16 to
+// produce a concurrency sweep. Each goroutine runs its own request loop.
+func BenchmarkBAPCaller_Discover_Parallel(b *testing.B) {
+	b.ReportAllocs()
+	b.ResetTimer()
+
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			req := buildSignedRequest(b, "discover")
+			if err := sendRequest(req); err != nil {
+				b.Errorf("parallel: %v", err)
+			}
+		}
+	})
+}
+
+// ── BenchmarkBAPCaller_AllActions ────────────────────────────────────────────
+// Measures per-action latency for discover, select, init, and confirm in a
+// single benchmark run. Each sub-benchmark is independent.
+func BenchmarkBAPCaller_AllActions(b *testing.B) {
+	actions := []string{"discover", "select", "init", "confirm"}
+
+	for _, action := range actions {
+		action := action // capture for sub-benchmark closure
+		b.Run(action, func(b *testing.B) {
+			b.ReportAllocs()
+			b.ResetTimer()
+			for i := 0; i < b.N; i++ {
+				req := buildSignedRequest(b, action)
+				if err := sendRequest(req); err != nil {
+					b.Errorf("action %s iteration %d: %v", action, i, err)
+				}
+			}
+		})
+	}
+}
+
+// ── BenchmarkBAPCaller_Discover_Percentiles ───────────────────────────────────
+// Collects individual request durations and reports p50, p95, and p99 latency
+// in microseconds via b.ReportMetric. The percentile data is only meaningful
+// when -benchtime is at least 5s (default used in run_benchmarks.sh).
+func BenchmarkBAPCaller_Discover_Percentiles(b *testing.B) {
+	durations := make([]time.Duration, 0, b.N)
+
+	b.ReportAllocs()
+	b.ResetTimer()
+
+	for i := 0; i < b.N; i++ {
+		req := buildSignedRequest(b, "discover")
+		start := time.Now()
+		if err := sendRequest(req); err != nil {
+			b.Errorf("iteration %d: %v", i, err)
+			continue
+		}
+		durations = append(durations, time.Since(start))
+	}
+
+	// Compute and report percentiles.
+	if len(durations) == 0 {
+		return
+	}
+	sort.Slice(durations, func(i, j int) bool { return durations[i] < durations[j] })
+
+	p50 := durations[len(durations)*50/100]
+	p95 := durations[len(durations)*95/100]
+	p99 := durations[len(durations)*99/100]
+
+	b.ReportMetric(float64(p50.Microseconds()), "p50_µs")
+	b.ReportMetric(float64(p95.Microseconds()), "p95_µs")
+	b.ReportMetric(float64(p99.Microseconds()), "p99_µs")
+}
+
+// ── BenchmarkBAPCaller_CacheWarm / CacheCold ─────────────────────────────────
+// Compares latency when the Redis cache holds a pre-warmed key set (CacheWarm)
+// vs. when each iteration has a fresh message_id that the cache has never seen
+// (CacheCold). The delta reveals the key-lookup overhead on a cold path.
+
+// BenchmarkBAPCaller_CacheWarm sends a fixed body (constant message_id) so the
+// simplekeymanager's Redis cache is hit on every iteration after the first.
+func BenchmarkBAPCaller_CacheWarm(b *testing.B) {
+	body := warmFixtureBody(b, "discover")
+
+	// Warm-up: send once to populate the cache before the timer starts.
+	warmReq := buildSignedRequestFixed(b, "discover", body)
+	if err := sendRequest(warmReq); err != nil {
+		b.Fatalf("cache warm-up request failed: %v", err)
+	}
+
+	b.ReportAllocs()
+	b.ResetTimer()
+
+	for i := 0; i < b.N; i++ {
+		req := buildSignedRequestFixed(b, "discover", body)
+		if err := sendRequest(req); err != nil {
+			b.Errorf("CacheWarm iteration %d: %v", i, err)
+		}
+	}
+}
+
+// BenchmarkBAPCaller_CacheCold uses a fresh message_id per iteration, so every
+// request experiences a cache miss and a full key-derivation round-trip.
+func BenchmarkBAPCaller_CacheCold(b *testing.B) {
+	b.ReportAllocs()
+	b.ResetTimer()
+
+	for i := 0; i < b.N; i++ {
+		req := buildSignedRequest(b, "discover") // fresh IDs each time
+		if err := sendRequest(req); err != nil {
+			b.Errorf("CacheCold iteration %d: %v", i, err)
+		}
+	}
+}
+
+// ── BenchmarkBAPCaller_RPS ────────────────────────────────────────────────────
+// Reports requests-per-second as a custom metric alongside the default ns/op.
+// Run with -benchtime=30s for a stable RPS reading.
+func BenchmarkBAPCaller_RPS(b *testing.B) {
+	b.ReportAllocs()
+
+	var count int64
+	start := time.Now()
+
+	b.ResetTimer()
+	b.RunParallel(func(pb *testing.PB) {
+		var local int64
+		for pb.Next() {
+			req := buildSignedRequest(b, "discover")
+			if err := sendRequest(req); err == nil {
+				local++
+			}
+		}
+		// Accumulate without atomic for simplicity — final value only read after
+		// RunParallel returns and all goroutines have exited.
+		count += local
+	})
+
+	elapsed := time.Since(start).Seconds()
+	if elapsed > 0 {
+		b.ReportMetric(float64(count)/elapsed, "req/s")
+	}
+}
+
+// ── helper: one-shot HTTP client ─────────────────────────────────────────────
+
+// benchHTTPClient is a shared client for all benchmark goroutines.
+// MaxConnsPerHost caps the total active connections to localhost so we don't
+// exhaust the OS ephemeral port range. MaxIdleConnsPerHost keeps that many
+// connections warm in the pool so parallel goroutines reuse them rather than
+// opening fresh TCP connections on every request.
+var benchHTTPClient = &http.Client{
+	Transport: &http.Transport{
+		MaxIdleConns:        200,
+		MaxIdleConnsPerHost: 200,
+		MaxConnsPerHost:     200,
+		IdleConnTimeout:     90 * time.Second,
+		DisableCompression:  true, // no benefit compressing localhost traffic
+	},
+}
--- a/benchmarks/e2e/keys_test.go
+++ b/benchmarks/e2e/keys_test.go
@@ -0,0 +1,13 @@
+package e2e_bench_test
+
+// Development key pair from config/local-retail-bap.yaml.
+// Used across the retail devkit for non-production testing.
+// DO NOT use in any production or staging environment.
+const (
+	benchSubscriberID = "sandbox.food-finder.com"
+	benchKeyID        = "76EU7VwahYv4XztXJzji9ssiSV74eWXWBcCKGn7jAdm5VGLCdYAJ8j"
+	benchPrivKey      = "rrNtVgyASCGlo+ebsJaA37D5CZYZVfT0JA5/vlkTeV0="
+	benchPubKey       = "oFIk7KqCqvqRYkLMjQqiaKM5oOozkYT64bfLuc8p/SU="
+	benchEncrPrivKey  = "rrNtVgyASCGlo+ebsJaA37D5CZYZVfT0JA5/vlkTeV0="
+	benchEncrPubKey   = "oFIk7KqCqvqRYkLMjQqiaKM5oOozkYT64bfLuc8p/SU="
+)
--- a/benchmarks/e2e/mocks_test.go
+++ b/benchmarks/e2e/mocks_test.go
@@ -0,0 +1,63 @@
+package e2e_bench_test
+
+import (
+	"encoding/json"
+	"fmt"
+	"net/http"
+	"net/http/httptest"
+	"strings"
+	"time"
+)
+
+// startMockBPP starts an httptest server that accepts any POST request and
+// immediately returns a valid Beckn ACK. This replaces the real BPP backend,
+// isolating benchmark results to adapter-internal latency only.
+func startMockBPP() *httptest.Server {
+	ackBody := `{"message":{"ack":{"status":"ACK"}}}`
+	return httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		w.Header().Set("Content-Type", "application/json")
+		w.WriteHeader(http.StatusOK)
+		fmt.Fprint(w, ackBody)
+	}))
+}
+
+// subscriberRecord mirrors the registry API response shape for a single subscriber.
+type subscriberRecord struct {
+	SubscriberID    string `json:"subscriber_id"`
+	UniqueKeyID     string `json:"unique_key_id"`
+	SigningPublicKey string `json:"signing_public_key"`
+	ValidFrom       string `json:"valid_from"`
+	ValidUntil      string `json:"valid_until"`
+	Status          string `json:"status"`
+}
+
+// startMockRegistry starts an httptest server that returns a subscriber record
+// matching the benchmark test keys. The signvalidator plugin uses this to
+// resolve the public key for signature verification on incoming requests.
+func startMockRegistry() *httptest.Server {
+	record := subscriberRecord{
+		SubscriberID:    benchSubscriberID,
+		UniqueKeyID:     benchKeyID,
+		SigningPublicKey: benchPubKey,
+		ValidFrom:       time.Now().AddDate(-1, 0, 0).Format(time.RFC3339),
+		ValidUntil:      time.Now().AddDate(10, 0, 0).Format(time.RFC3339),
+		Status:          "SUBSCRIBED",
+	}
+	body, _ := json.Marshal([]subscriberRecord{record})
+
+	return httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		// Support both GET (lookup) and POST (lookup with body) registry calls.
+		// Respond with the subscriber record regardless of subscriber_id query param.
+		subscriberID := r.URL.Query().Get("subscriber_id")
+		if subscriberID == "" {
+			// Try extracting from path for dedi-registry style calls.
+			parts := strings.Split(strings.TrimPrefix(r.URL.Path, "/"), "/")
+			if len(parts) > 0 {
+				subscriberID = parts[len(parts)-1]
+			}
+		}
+		w.Header().Set("Content-Type", "application/json")
+		w.WriteHeader(http.StatusOK)
+		w.Write(body)
+	}))
+}
--- a/benchmarks/e2e/setup_test.go
+++ b/benchmarks/e2e/setup_test.go
@@ -0,0 +1,466 @@
+package e2e_bench_test
+
+import (
+	"bytes"
+	"context"
+	"crypto/ed25519"
+	"encoding/base64"
+	"encoding/json"
+	"fmt"
+	"io"
+	"net/http"
+	"net/http/httptest"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"strings"
+	"testing"
+	"time"
+
+	"github.com/alicebob/miniredis/v2"
+	"github.com/beckn-one/beckn-onix/core/module"
+	"github.com/beckn-one/beckn-onix/core/module/handler"
+	"github.com/beckn-one/beckn-onix/pkg/model"
+	"github.com/beckn-one/beckn-onix/pkg/plugin"
+	"github.com/google/uuid"
+	"github.com/rs/zerolog"
+	"golang.org/x/crypto/blake2b"
+)
+
+// Package-level references shared across all benchmarks.
+var (
+	adapterServer *httptest.Server
+	miniRedis     *miniredis.Miniredis
+	mockBPP       *httptest.Server
+	mockRegistry  *httptest.Server
+	pluginDir     string
+	moduleRoot    string // set in TestMain; used by buildBAPCallerConfig for local file paths
+)
+
+// Plugins to compile for the benchmark. Each entry is (pluginID, source path relative to module root).
+var pluginsToBuild = []struct {
+	id  string
+	src string
+}{
+	{"router", "pkg/plugin/implementation/router/cmd/plugin.go"},
+	{"signer", "pkg/plugin/implementation/signer/cmd/plugin.go"},
+	{"signvalidator", "pkg/plugin/implementation/signvalidator/cmd/plugin.go"},
+	{"simplekeymanager", "pkg/plugin/implementation/simplekeymanager/cmd/plugin.go"},
+	{"cache", "pkg/plugin/implementation/cache/cmd/plugin.go"},
+	{"schemav2validator", "pkg/plugin/implementation/schemav2validator/cmd/plugin.go"},
+	{"otelsetup", "pkg/plugin/implementation/otelsetup/cmd/plugin.go"},
+	// registry is required by stdHandler to wire KeyManager, even on the caller
+	// path where sign-validation never runs.
+	{"registry", "pkg/plugin/implementation/registry/cmd/plugin.go"},
+}
+
+// TestMain is the entry point for the benchmark package. It:
+//  1. Compiles all required .so plugins into a temp directory
+//  2. Starts miniredis (in-process Redis)
+//  3. Starts mock BPP and registry HTTP servers
+//  4. Starts the adapter as an httptest.Server
+//  5. Runs all benchmarks
+//  6. Tears everything down in reverse order
+func TestMain(m *testing.M) {
+	ctx := context.Background()
+
+	// ── Step 1: Compile plugins ───────────────────────────────────────────────
+	var err error
+	pluginDir, err = os.MkdirTemp("", "beckn-bench-plugins-*")
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "ERROR: failed to create plugin temp dir: %v\n", err)
+		os.Exit(1)
+	}
+	defer os.RemoveAll(pluginDir)
+
+	moduleRoot, err = findModuleRoot()
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "ERROR: failed to locate module root: %v\n", err)
+		os.Exit(1)
+	}
+
+	fmt.Printf("=== Building plugins (first run may take 60-90s) ===\n")
+	for _, p := range pluginsToBuild {
+		outPath := filepath.Join(pluginDir, p.id+".so")
+		srcPath := filepath.Join(moduleRoot, p.src)
+		fmt.Printf("  compiling %s.so ...\n", p.id)
+		cmd := exec.Command("go", "build", "-buildmode=plugin", "-o", outPath, srcPath)
+		cmd.Dir = moduleRoot
+		if out, buildErr := cmd.CombinedOutput(); buildErr != nil {
+			fmt.Fprintf(os.Stderr, "ERROR: failed to build plugin %s:\n%s\n", p.id, string(out))
+			os.Exit(1)
+		}
+	}
+	fmt.Printf("=== All plugins compiled successfully ===\n\n")
+
+	// ── Step 2: Start miniredis ───────────────────────────────────────────────
+	miniRedis, err = miniredis.Run()
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "ERROR: failed to start miniredis: %v\n", err)
+		os.Exit(1)
+	}
+	defer miniRedis.Close()
+
+	// ── Step 3: Start mock servers ────────────────────────────────────────────
+	mockBPP = startMockBPP()
+	defer mockBPP.Close()
+
+	mockRegistry = startMockRegistry()
+	defer mockRegistry.Close()
+
+	// ── Step 4: Start adapter ─────────────────────────────────────────────────
+	adapterServer, err = startAdapter(ctx)
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "ERROR: failed to start adapter: %v\n", err)
+		os.Exit(1)
+	}
+	defer adapterServer.Close()
+
+	// ── Step 5: Run benchmarks ────────────────────────────────────────────────
+	// Silence the adapter's zerolog output for the duration of the benchmark
+	// run. Without this, every HTTP request the adapter processes emits a JSON
+	// log line to stdout, which interleaves with Go's benchmark result lines
+	// (BenchmarkFoo-N\t\t<count>\t<ns/op>) and makes benchstat unparseable.
+	// Setup logging above still ran normally; zerolog.Disabled is set only here,
+	// just before m.Run(), so errors during startup remain visible.
+	zerolog.SetGlobalLevel(zerolog.Disabled)
+	os.Exit(m.Run())
+}
+
+// findModuleRoot walks up from the current directory to find the go.mod root.
+func findModuleRoot() (string, error) {
+	dir, err := os.Getwd()
+	if err != nil {
+		return "", err
+	}
+	for {
+		if _, err := os.Stat(filepath.Join(dir, "go.mod")); err == nil {
+			return dir, nil
+		}
+		parent := filepath.Dir(dir)
+		if parent == dir {
+			return "", fmt.Errorf("go.mod not found from %s", dir)
+		}
+		dir = parent
+	}
+}
+
+// writeRoutingConfig reads the benchmark routing config template, replaces the
+// BENCH_BPP_URL placeholder with the live mock BPP server URL, and writes the
+// result to a temp file. Returns the path to the temp file.
+func writeRoutingConfig(bppURL string) (string, error) {
+	templatePath := filepath.Join("testdata", "routing-BAPCaller.yaml")
+	data, err := os.ReadFile(templatePath)
+	if err != nil {
+		return "", fmt.Errorf("reading routing config template: %w", err)
+	}
+	content := strings.ReplaceAll(string(data), "BENCH_BPP_URL", bppURL)
+	f, err := os.CreateTemp("", "bench-routing-*.yaml")
+	if err != nil {
+		return "", fmt.Errorf("creating temp routing config: %w", err)
+	}
+	if _, err := f.WriteString(content); err != nil {
+		f.Close()
+		return "", fmt.Errorf("writing routing config: %w", err)
+	}
+	f.Close()
+	return f.Name(), nil
+}
+
+// startAdapter constructs a fully wired adapter using the compiled plugins and
+// returns it as an *httptest.Server. All external dependencies are replaced with
+// local mock servers: Redis → miniredis, BPP → mockBPP, registry → mockRegistry.
+func startAdapter(ctx context.Context) (*httptest.Server, error) {
+	routingConfigPath, err := writeRoutingConfig(mockBPP.URL)
+	if err != nil {
+		return nil, fmt.Errorf("writing routing config: %w", err)
+	}
+
+	// Plugin manager: load all compiled .so files from pluginDir.
+	mgr, closer, err := plugin.NewManager(ctx, &plugin.ManagerConfig{
+		Root: pluginDir,
+	})
+	if err != nil {
+		return nil, fmt.Errorf("creating plugin manager: %w", err)
+	}
+	_ = closer // closer is called when the server shuts down; deferred in TestMain via server.Close
+
+	// Build module configurations.
+	mCfgs := []module.Config{
+		buildBAPCallerConfig(routingConfigPath, mockRegistry.URL),
+	}
+
+	mux := http.NewServeMux()
+	if err := module.Register(ctx, mCfgs, mux, mgr); err != nil {
+		return nil, fmt.Errorf("registering modules: %w", err)
+	}
+
+	srv := httptest.NewServer(mux)
+	return srv, nil
+}
+
+// buildBAPCallerConfig returns the module.Config for the bapTxnCaller handler,
+// mirroring config/local-retail-bap.yaml but pointing at benchmark mock services.
+// registryURL must point at the mock registry so simplekeymanager can satisfy the
+// Registry requirement imposed by stdHandler — even though the caller path never
+// performs signature validation, the handler wiring requires it to be present.
+func buildBAPCallerConfig(routingConfigPath, registryURL string) module.Config {
+	return module.Config{
+		Name: "bapTxnCaller",
+		Path: "/bap/caller/",
+		Handler: handler.Config{
+			Type:         handler.HandlerTypeStd,
+			Role:         model.RoleBAP,
+			SubscriberID: benchSubscriberID,
+			HttpClientConfig: handler.HttpClientConfig{
+				MaxIdleConns:          1000,
+				MaxIdleConnsPerHost:   200,
+				IdleConnTimeout:       300 * time.Second,
+				ResponseHeaderTimeout: 5 * time.Second,
+			},
+			Plugins: handler.PluginCfg{
+				// Registry is required by stdHandler before it will wire KeyManager,
+				// even on the caller path where sign-validation never runs. We point
+				// it at the mock registry (retry_max=0 so failures are immediate).
+				Registry: &plugin.Config{
+					ID: "registry",
+					Config: map[string]string{
+						"url":       registryURL,
+						"retry_max": "0",
+					},
+				},
+				KeyManager: &plugin.Config{
+					ID: "simplekeymanager",
+					Config: map[string]string{
+						"networkParticipant": benchSubscriberID,
+						"keyId":              benchKeyID,
+						"signingPrivateKey":  benchPrivKey,
+						"signingPublicKey":   benchPubKey,
+						"encrPrivateKey":     benchEncrPrivKey,
+						"encrPublicKey":      benchEncrPubKey,
+					},
+				},
+				SchemaValidator: &plugin.Config{
+					ID: "schemav2validator",
+					Config: map[string]string{
+						"type":     "file",
+						"location": filepath.Join(moduleRoot, "benchmarks/e2e/testdata/beckn.yaml"),
+						"cacheTTL": "3600",
+					},
+				},
+				Cache: &plugin.Config{
+					ID: "cache",
+					Config: map[string]string{
+						"addr": miniRedis.Addr(),
+					},
+				},
+				Router: &plugin.Config{
+					ID: "router",
+					Config: map[string]string{
+						"routingConfig": routingConfigPath,
+					},
+				},
+				Signer: &plugin.Config{
+					ID: "signer",
+				},
+			},
+			Steps: []string{"addRoute", "sign", "validateSchema"},
+		},
+	}
+}
+
+// ── Request builder and Beckn signing helper ─────────────────────────────────
+
+// becknPayloadTemplate holds the raw JSON for a fixture file with sentinels.
+var fixtureCache = map[string][]byte{}
+
+// loadFixture reads a fixture file from testdata/ and caches it.
+func loadFixture(action string) ([]byte, error) {
+	if data, ok := fixtureCache[action]; ok {
+		return data, nil
+	}
+	path := filepath.Join("testdata", action+"_request.json")
+	data, err := os.ReadFile(path)
+	if err != nil {
+		return nil, fmt.Errorf("loading fixture %s: %w", action, err)
+	}
+	fixtureCache[action] = data
+	return data, nil
+}
+
+// buildSignedRequest reads the fixture for the given action, substitutes
+// BENCH_TIMESTAMP / BENCH_MESSAGE_ID / BENCH_TRANSACTION_ID with fresh values,
+// signs the body using the Beckn Ed25519 spec, and returns a ready-to-send
+// *http.Request targeting the adapter's /bap/caller/<action> path.
+func buildSignedRequest(tb testing.TB, action string) *http.Request {
+	tb.Helper()
+
+	fixture, err := loadFixture(action)
+	if err != nil {
+		tb.Fatalf("buildSignedRequest: %v", err)
+	}
+
+	// Substitute sentinels with fresh values for this iteration.
+	now := time.Now().UTC().Format(time.RFC3339)
+	msgID := uuid.New().String()
+	txnID := uuid.New().String()
+
+	body := bytes.ReplaceAll(fixture, []byte("BENCH_TIMESTAMP"), []byte(now))
+	body = bytes.ReplaceAll(body, []byte("BENCH_MESSAGE_ID"), []byte(msgID))
+	body = bytes.ReplaceAll(body, []byte("BENCH_TRANSACTION_ID"), []byte(txnID))
+
+	// Sign the body per the Beckn Ed25519 spec.
+	authHeader, err := signBecknPayload(body)
+	if err != nil {
+		tb.Fatalf("buildSignedRequest: signing failed: %v", err)
+	}
+
+	url := adapterServer.URL + "/bap/caller/" + action
+	req, err := http.NewRequest(http.MethodPost, url, bytes.NewReader(body))
+	if err != nil {
+		tb.Fatalf("buildSignedRequest: http.NewRequest: %v", err)
+	}
+	req.Header.Set("Content-Type", "application/json")
+	req.Header.Set(model.AuthHeaderSubscriber, authHeader)
+
+	return req
+}
+
+// buildSignedRequestFixed builds a signed request with a fixed body (same
+// message_id every call) — used for cache-warm benchmarks.
+func buildSignedRequestFixed(tb testing.TB, action string, body []byte) *http.Request {
+	tb.Helper()
+
+	authHeader, err := signBecknPayload(body)
+	if err != nil {
+		tb.Fatalf("buildSignedRequestFixed: signing failed: %v", err)
+	}
+
+	url := adapterServer.URL + "/bap/caller/" + action
+	req, err := http.NewRequest(http.MethodPost, url, bytes.NewReader(body))
+	if err != nil {
+		tb.Fatalf("buildSignedRequestFixed: http.NewRequest: %v", err)
+	}
+	req.Header.Set("Content-Type", "application/json")
+	req.Header.Set(model.AuthHeaderSubscriber, authHeader)
+	return req
+}
+
+// signBecknPayload signs a request body using the Beckn Ed25519 signing spec
+// and returns a formatted Authorization header value.
+//
+// Beckn signing spec:
+//  1. Digest:  "BLAKE-512=" + base64(blake2b-512(body))
+//  2. Signing string: "(created): <ts>\n(expires): <ts+5m>\ndigest: <digest>"
+//  3. Signature: base64(ed25519.Sign(privKey, signingString))
+//  4. Header: Signature keyId="<sub>|<keyId>|ed25519",algorithm="ed25519",
+//     created="<ts>",expires="<ts+5m>",headers="(created) (expires) digest",
+//     signature="<sig>"
+//
+// Reference: pkg/plugin/implementation/signer/signer.go
+func signBecknPayload(body []byte) (string, error) {
+	createdAt := time.Now().Unix()
+	expiresAt := time.Now().Add(5 * time.Minute).Unix()
+
+	// Step 1: BLAKE-512 digest.
+	hasher, _ := blake2b.New512(nil)
+	hasher.Write(body)
+	digest := "BLAKE-512=" + base64.StdEncoding.EncodeToString(hasher.Sum(nil))
+
+	// Step 2: Signing string.
+	signingString := fmt.Sprintf("(created): %d\n(expires): %d\ndigest: %s", createdAt, expiresAt, digest)
+
+	// Step 3: Ed25519 signature.
+	privKeyBytes, err := base64.StdEncoding.DecodeString(benchPrivKey)
+	if err != nil {
+		return "", fmt.Errorf("decoding private key: %w", err)
+	}
+	privKey := ed25519.NewKeyFromSeed(privKeyBytes)
+	sig := base64.StdEncoding.EncodeToString(ed25519.Sign(privKey, []byte(signingString)))
+
+	// Step 4: Format Authorization header (matches generateAuthHeader in step.go).
+	header := fmt.Sprintf(
+		`Signature keyId="%s|%s|ed25519",algorithm="ed25519",created="%d",expires="%d",headers="(created) (expires) digest",signature="%s"`,
+		benchSubscriberID, benchKeyID, createdAt, expiresAt, sig,
+	)
+	return header, nil
+}
+
+// warmFixtureBody returns a fixed body for the given action with stable IDs —
+// used to pre-warm the cache so cache-warm benchmarks hit the Redis fast path.
+func warmFixtureBody(tb testing.TB, action string) []byte {
+	tb.Helper()
+	fixture, err := loadFixture(action)
+	if err != nil {
+		tb.Fatalf("warmFixtureBody: %v", err)
+	}
+	body := bytes.ReplaceAll(fixture, []byte("BENCH_TIMESTAMP"), []byte("2025-01-01T00:00:00Z"))
+	body = bytes.ReplaceAll(body, []byte("BENCH_MESSAGE_ID"), []byte("00000000-warm-0000-0000-000000000000"))
+	body = bytes.ReplaceAll(body, []byte("BENCH_TRANSACTION_ID"), []byte("00000000-warm-txn-0000-000000000000"))
+	return body
+}
+
+// sendRequest executes an HTTP request using the shared bench client and
+// discards the response body. Returns a non-nil error for non-2xx responses.
+func sendRequest(req *http.Request) error {
+	resp, err := benchHTTPClient.Do(req)
+	if err != nil {
+		return fmt.Errorf("http do: %w", err)
+	}
+	defer resp.Body.Close()
+	// Drain the body so the connection is returned to the pool for reuse.
+	// Without this, Go discards the connection after each request, causing
+	// port exhaustion under parallel load ("can't assign requested address").
+	_, _ = io.Copy(io.Discard, resp.Body)
+	// We accept any 2xx response (ACK or forwarded BPP response).
+	if resp.StatusCode < 200 || resp.StatusCode >= 300 {
+		return fmt.Errorf("unexpected status: %d", resp.StatusCode)
+	}
+	return nil
+}
+
+// ── TestSignBecknPayload: validation test before running benchmarks ───────────
+// Sends a signed discover request to the live adapter and asserts a 200 response,
+// confirming the signing helper produces headers accepted by the adapter pipeline.
+func TestSignBecknPayload(t *testing.T) {
+	if adapterServer == nil {
+		t.Skip("adapterServer not initialised (run via TestMain)")
+	}
+	fixture, err := loadFixture("discover")
+	if err != nil {
+		t.Fatalf("loading fixture: %v", err)
+	}
+
+	// Substitute sentinels.
+	now := time.Now().UTC().Format(time.RFC3339)
+	body := bytes.ReplaceAll(fixture, []byte("BENCH_TIMESTAMP"), []byte(now))
+	body = bytes.ReplaceAll(body, []byte("BENCH_MESSAGE_ID"), []byte(uuid.New().String()))
+	body = bytes.ReplaceAll(body, []byte("BENCH_TRANSACTION_ID"), []byte(uuid.New().String()))
+
+	authHeader, err := signBecknPayload(body)
+	if err != nil {
+		t.Fatalf("signBecknPayload: %v", err)
+	}
+
+	url := adapterServer.URL + "/bap/caller/discover"
+	req, err := http.NewRequest(http.MethodPost, url, bytes.NewReader(body))
+	if err != nil {
+		t.Fatalf("http.NewRequest: %v", err)
+	}
+	req.Header.Set("Content-Type", "application/json")
+	req.Header.Set(model.AuthHeaderSubscriber, authHeader)
+
+	resp, err := http.DefaultClient.Do(req)
+	if err != nil {
+		t.Fatalf("sending request: %v", err)
+	}
+	defer resp.Body.Close()
+
+	var result map[string]interface{}
+	json.NewDecoder(resp.Body).Decode(&result)
+	t.Logf("Response status: %d, body: %v", resp.StatusCode, result)
+
+	if resp.StatusCode != http.StatusOK {
+		t.Errorf("expected 200 OK, got %d", resp.StatusCode)
+	}
+}
--- a/benchmarks/e2e/testdata/beckn.yaml
+++ b/benchmarks/e2e/testdata/beckn.yaml
--- a/benchmarks/e2e/testdata/confirm_request.json
+++ b/benchmarks/e2e/testdata/confirm_request.json
@@ -0,0 +1,84 @@
+{
+  "context": {
+    "action": "confirm",
+    "bapId": "sandbox.food-finder.com",
+    "bapUri": "http://bench-bap.example.com",
+    "bppId": "bench-bpp.example.com",
+    "bppUri": "BENCH_BPP_URL",
+    "messageId": "BENCH_MESSAGE_ID",
+    "transactionId": "BENCH_TRANSACTION_ID",
+    "timestamp": "BENCH_TIMESTAMP",
+    "ttl": "PT30S",
+    "version": "2.0.0"
+  },
+  "message": {
+    "order": {
+      "provider": {
+        "id": "bench-provider-001"
+      },
+      "items": [
+        {
+          "id": "bench-item-001",
+          "quantity": {
+            "selected": {
+              "count": 1
+            }
+          }
+        }
+      ],
+      "billing": {
+        "name": "Bench User",
+        "address": "123 Bench Street, Bangalore, 560001",
+        "city": {
+          "name": "Bangalore"
+        },
+        "state": {
+          "name": "Karnataka"
+        },
+        "country": {
+          "code": "IND"
+        },
+        "area_code": "560001",
+        "email": "bench@example.com",
+        "phone": "9999999999"
+      },
+      "fulfillments": [
+        {
+          "id": "f1",
+          "type": "Delivery",
+          "stops": [
+            {
+              "type": "end",
+              "location": {
+                "gps": "12.9716,77.5946",
+                "area_code": "560001"
+              },
+              "contact": {
+                "phone": "9999999999",
+                "email": "bench@example.com"
+              }
+            }
+          ],
+          "customer": {
+            "person": {
+              "name": "Bench User"
+            },
+            "contact": {
+              "phone": "9999999999",
+              "email": "bench@example.com"
+            }
+          }
+        }
+      ],
+      "payments": [
+        {
+          "type": "ON-FULFILLMENT",
+          "params": {
+            "amount": "150.00",
+            "currency": "INR"
+          }
+        }
+      ]
+    }
+  }
+}
--- a/benchmarks/e2e/testdata/discover_request.json
+++ b/benchmarks/e2e/testdata/discover_request.json
@@ -0,0 +1,17 @@
+{
+  "context": {
+    "action": "discover",
+    "bapId": "sandbox.food-finder.com",
+    "bapUri": "http://bench-bap.example.com",
+    "messageId": "BENCH_MESSAGE_ID",
+    "transactionId": "BENCH_TRANSACTION_ID",
+    "timestamp": "BENCH_TIMESTAMP",
+    "ttl": "PT30S",
+    "version": "2.0.0"
+  },
+  "message": {
+    "intent": {
+      "textSearch": "pizza"
+    }
+  }
+}
--- a/benchmarks/e2e/testdata/init_request.json
+++ b/benchmarks/e2e/testdata/init_request.json
@@ -0,0 +1,80 @@
+{
+  "context": {
+    "action": "init",
+    "bapId": "sandbox.food-finder.com",
+    "bapUri": "http://bench-bap.example.com",
+    "bppId": "bench-bpp.example.com",
+    "bppUri": "BENCH_BPP_URL",
+    "messageId": "BENCH_MESSAGE_ID",
+    "transactionId": "BENCH_TRANSACTION_ID",
+    "timestamp": "BENCH_TIMESTAMP",
+    "ttl": "PT30S",
+    "version": "2.0.0"
+  },
+  "message": {
+    "order": {
+      "provider": {
+        "id": "bench-provider-001"
+      },
+      "items": [
+        {
+          "id": "bench-item-001",
+          "quantity": {
+            "selected": {
+              "count": 1
+            }
+          }
+        }
+      ],
+      "billing": {
+        "name": "Bench User",
+        "address": "123 Bench Street, Bangalore, 560001",
+        "city": {
+          "name": "Bangalore"
+        },
+        "state": {
+          "name": "Karnataka"
+        },
+        "country": {
+          "code": "IND"
+        },
+        "area_code": "560001",
+        "email": "bench@example.com",
+        "phone": "9999999999"
+      },
+      "fulfillments": [
+        {
+          "id": "f1",
+          "type": "Delivery",
+          "stops": [
+            {
+              "type": "end",
+              "location": {
+                "gps": "12.9716,77.5946",
+                "area_code": "560001"
+              },
+              "contact": {
+                "phone": "9999999999",
+                "email": "bench@example.com"
+              }
+            }
+          ],
+          "customer": {
+            "person": {
+              "name": "Bench User"
+            },
+            "contact": {
+              "phone": "9999999999",
+              "email": "bench@example.com"
+            }
+          }
+        }
+      ],
+      "payments": [
+        {
+          "type": "ON-FULFILLMENT"
+        }
+      ]
+    }
+  }
+}
--- a/benchmarks/e2e/testdata/routing-BAPCaller.yaml
+++ b/benchmarks/e2e/testdata/routing-BAPCaller.yaml
@@ -0,0 +1,13 @@
+# Routing config for v2.0.0 benchmark. Domain is not required for v2.x.x — the
+# router ignores it and routes purely by version + endpoint.
+# BENCH_BPP_URL is substituted at runtime with the mock BPP server URL.
+routingRules:
+  - version: "2.0.0"
+    targetType: "url"
+    target:
+      url: "BENCH_BPP_URL"
+    endpoints:
+      - discover
+      - select
+      - init
+      - confirm
--- a/benchmarks/e2e/testdata/select_request.json
+++ b/benchmarks/e2e/testdata/select_request.json
@@ -0,0 +1,55 @@
+{
+  "context": {
+    "action": "select",
+    "bapId": "sandbox.food-finder.com",
+    "bapUri": "http://bench-bap.example.com",
+    "bppId": "bench-bpp.example.com",
+    "bppUri": "BENCH_BPP_URL",
+    "messageId": "BENCH_MESSAGE_ID",
+    "transactionId": "BENCH_TRANSACTION_ID",
+    "timestamp": "BENCH_TIMESTAMP",
+    "ttl": "PT30S",
+    "version": "2.0.0"
+  },
+  "message": {
+    "order": {
+      "provider": {
+        "id": "bench-provider-001"
+      },
+      "items": [
+        {
+          "id": "bench-item-001",
+          "quantity": {
+            "selected": {
+              "count": 1
+            }
+          }
+        }
+      ],
+      "fulfillments": [
+        {
+          "id": "f1",
+          "type": "Delivery",
+          "stops": [
+            {
+              "type": "end",
+              "location": {
+                "gps": "12.9716,77.5946",
+                "area_code": "560001"
+              },
+              "contact": {
+                "phone": "9999999999",
+                "email": "bench@example.com"
+              }
+            }
+          ]
+        }
+      ],
+      "payments": [
+        {
+          "type": "ON-FULFILLMENT"
+        }
+      ]
+    }
+  }
+}
--- a/benchmarks/reports/REPORT_ONIX_v150.md
+++ b/benchmarks/reports/REPORT_ONIX_v150.md
@@ -0,0 +1,255 @@
+# beckn-onix Adapter — Benchmark Report
+
+> **Run:** `2026-03-31_14-19-19`
+> **Platform:** Apple M5 · darwin/arm64 · GOMAXPROCS=10 (default)
+> **Protocol:** Beckn v2.0.0
+
+---
+
+## Part A — Executive Summary
+
+### What Was Tested
+
+The beckn-onix ONIX adapter was benchmarked end-to-end using Go's native `testing.B` framework and `net/http/httptest`. Requests flowed through a real compiled adapter — with all production plugins active — against in-process mock servers, isolating adapter-internal latency from network variables.
+
+**Pipeline tested (bapTxnCaller):** `addRoute → sign → validateSchema`
+
+**Plugins active:** `router`, `signer`, `simplekeymanager`, `cache` (miniredis), `schemav2validator`
+
+**Actions benchmarked:** `discover`, `select`, `init`, `confirm`
+
+---
+
+### Key Results
+
+| Metric | Value |
+|--------|-------|
+| Serial p50 latency (discover) | **130 µs** |
+| Serial p95 latency (discover) | **144 µs** |
+| Serial p99 latency (discover) | **317 µs** |
+| Serial mean latency (discover) | **164 µs** |
+| Serial throughput (discover, GOMAXPROCS=10) | **~6,095 req/s** |
+| Peak parallel throughput (GOMAXPROCS=10) | **25,502 req/s** |
+| Cache warm vs cold delta | **≈ 0** (noise-level, ~3.7 µs) |
+| Memory per request (discover) | **~81 KB · 662 allocs** |
+
+### Interpretation
+
+The adapter delivers sub-200 µs median end-to-end latency for all four Beckn actions on a single goroutine. The p99 tail of 317 µs shows good tail-latency control — the ratio of p99/p50 is only 2.4×, indicating no significant outlier spikes.
+
+Memory allocation is consistent and predictable: discover uses 662 heap objects at ~81 KB per request. More complex actions (confirm, init) use proportionally more memory due to larger payloads but remain below 130 KB per request.
+
+The Redis key-manager cache shows **no measurable benefit** in this setup: warm and cold paths differ by ~3.7 µs (< 2%), which is within measurement noise for a 164 µs mean. This is expected — miniredis is in-process and sub-microsecond; the signing and schema-validation steps dominate.
+
+Concurrency scaling is excellent: latency drops from 157 µs at GOMAXPROCS=1 to 54 µs at GOMAXPROCS=16 — a **2.9× improvement**. Throughput scales from 6,499 req/s at GOMAXPROCS=1 to 17,455 req/s at GOMAXPROCS=16.
+
+### Recommendation
+
+The adapter is ready for staged load testing against a real BPP. For production sizing, allocate at least 4 cores to the adapter process; beyond 8 cores, gains begin to taper (diminishing returns from ~17,233 to 17,455 req/s going from 8 to 16). If schema validation dominates CPU, profile with `go tool pprof` (see B5).
+
+---
+
+## Part B — Technical Detail
+
+### B0 — Test Environment
+
+| Parameter | Value |
+|-----------|-------|
+| CPU | Apple M5 (arm64) |
+| OS | darwin/arm64 |
+| Go package | `github.com/beckn-one/beckn-onix/benchmarks/e2e` |
+| Default GOMAXPROCS | 10 |
+| Benchmark timeout | 30 minutes |
+| Serial run duration | 10s per benchmark × 3 runs |
+| Parallel sweep duration | 30s per GOMAXPROCS level |
+| GOMAXPROCS sweep | 1, 2, 4, 8, 16 |
+| Redis | miniredis (in-process, no network) |
+| BPP | httptest mock (instant ACK) |
+| Registry | httptest mock (dev key pair) |
+| Schema spec | Beckn v2.0.0 OpenAPI (`beckn.yaml`, local file) |
+
+**Plugins and steps (bapTxnCaller):**
+
+| Step | Plugin | Role |
+|------|--------|------|
+| 1 | `router` | Resolves BPP URL from routing config |
+| 2 | `signer` + `simplekeymanager` | Signs request body (Ed25519/BLAKE-512) |
+| 3 | `schemav2validator` | Validates Beckn v2.0 API schema (kin-openapi, local file) |
+
+---
+
+### B1 — Latency by Action
+
+Averages from `run1.txt` (10s, GOMAXPROCS=10). Percentile values from the standalone `BenchmarkBAPCaller_Discover_Percentiles` run.
+
+| Action | Mean (µs) | p50 (µs) | p95 (µs) | p99 (µs) | Allocs/req | Bytes/req |
+|--------|----------:|--------:|--------:|--------:|----------:|----------:|
+| discover (serial) | 164 | 130 | 144 | 317 | 662 | 80,913 (~81 KB) |
+| discover (parallel) | 40 | — | — | — | 660 | 80,792 (~79 KB) |
+| select | 194 | — | — | — | 1,033 | 106,857 (~104 KB) |
+| init | 217 | — | — | — | 1,421 | 126,842 (~124 KB) |
+| confirm | 221 | — | — | — | 1,485 | 129,240 (~126 KB) |
+
+**Observations:**
+- Latency increases linearly with payload complexity: select (+18%), init (+32%), confirm (+35%) vs discover baseline.
+- Allocation count tracks payload size precisely — each extra field adds heap objects during JSON unmarshalling and schema validation.
+- Memory is extremely stable across the 3 serial runs (geomean memory: 91.18 Ki, ±0.02%).
+- The parallel discover benchmark runs 8× faster than serial (40 µs vs 164 µs) because multiple goroutines share the CPU time budget and the adapter handles requests concurrently.
+
+---
+
+### B2 — Throughput vs Concurrency
+
+Results from the concurrency sweep (`parallel_cpu*.txt`, 30s per level).
+
+| GOMAXPROCS | Mean Latency (µs) | Improvement vs cpu=1 | RPS (BenchmarkRPS) |
+|:----------:|------------------:|---------------------:|-------------------:|
+| 1 | 157 | baseline | 6,499 |
+| 2 | 118 | 1.33× | 7,606 |
+| 4 | 73 | 2.14× | 14,356 |
+| 8 | 62 | 2.53× | 17,233 |
+| 16 | 54 | 2.89× | 17,455 |
+| 10 (default) | 40\* | ~3.9×\* | 25,502\* |
+
+\* _The default GOMAXPROCS=10 serial run has a different benchmark structure (not the concurrency sweep), so latency and RPS are not directly comparable — they include warm connection pool effects from the serial baseline._
+
+**Scaling efficiency:**
+- Doubling cores from 1→2 yields 1.33× latency improvement (67% efficiency).
+- From 2→4: 1.61× improvement (80% efficiency) — best scaling band.
+- From 4→8: 1.18× improvement (59% efficiency) — adapter starts becoming compute-bound.
+- From 8→16: 1.14× improvement (57% efficiency) — diminishing returns; likely the signing/validation pipeline serialises on some shared resource (e.g., key derivation, kin-openapi schema tree reads).
+
+**Recommendation:** 4–8 cores offers the best throughput/cost ratio.
+
+---
+
+### B3 — Cache Impact (Redis warm vs cold)
+
+Results from `cache_comparison.txt` (10s each, GOMAXPROCS=10).
+
+| Scenario | Mean (µs) | Allocs/req | Bytes/req |
+|----------|----------:|-----------:|----------:|
+| CacheWarm | 190 | 654 | 81,510 |
+| CacheCold | 186 | 662 | 82,923 |
+| **Delta** | **+3.7 µs (warm slower)** | **−8** | **−1,413** |
+
+**Interpretation:** There is no meaningful difference between warm and cold cache paths. The apparent 3.7 µs "advantage" for the cold path is within normal measurement noise for a 186–190 µs benchmark. The Redis key-manager cache does not dominate latency in this in-process test setup.
+
+The warm path allocates 8 fewer objects per request (652 vs 662 allocs) — consistent with cache hits skipping key-derivation allocation paths — but this saving is too small to affect wall-clock time at current throughput levels.
+
+In a **production environment** with real Redis over the network (1–5 ms round-trip), the cache warm path would show a meaningful advantage. These numbers represent the lower bound on signing latency with zero-latency Redis.
+
+---
+
+### B4 — benchstat Statistical Summary (3 Runs)
+
+```
+goos: darwin
+goarch: arm64
+pkg: github.com/beckn-one/beckn-onix/benchmarks/e2e
+cpu: Apple M5
+                                  │   run1.txt    │              run2.txt               │              run3.txt               │
+                                  │    sec/op     │    sec/op     vs base                │    sec/op     vs base                │
+BAPCaller_Discover-10               164.2µ ± ∞ ¹   165.4µ ± ∞ ¹  ~ (p=1.000 n=1) ²      165.3µ ± ∞ ¹  ~ (p=1.000 n=1) ²
+BAPCaller_Discover_Parallel-10       39.73µ ± ∞ ¹   41.48µ ± ∞ ¹  ~ (p=1.000 n=1) ²      52.84µ ± ∞ ¹  ~ (p=1.000 n=1) ²
+BAPCaller_AllActions/discover-10    165.4µ ± ∞ ¹   164.9µ ± ∞ ¹  ~ (p=1.000 n=1) ²      163.1µ ± ∞ ¹  ~ (p=1.000 n=1) ²
+BAPCaller_AllActions/select-10      194.5µ ± ∞ ¹   194.5µ ± ∞ ¹  ~ (p=1.000 n=1) ²      186.7µ ± ∞ ¹  ~ (p=1.000 n=1) ²
+BAPCaller_AllActions/init-10        217.1µ ± ∞ ¹   216.6µ ± ∞ ¹  ~ (p=1.000 n=1) ²      218.0µ ± ∞ ¹  ~ (p=1.000 n=1) ²
+BAPCaller_AllActions/confirm-10     221.0µ ± ∞ ¹   219.8µ ± ∞ ¹  ~ (p=1.000 n=1) ²      221.9µ ± ∞ ¹  ~ (p=1.000 n=1) ²
+BAPCaller_Discover_Percentiles-10   164.5µ ± ∞ ¹   165.3µ ± ∞ ¹  ~ (p=1.000 n=1) ²      162.2µ ± ∞ ¹  ~ (p=1.000 n=1) ²
+BAPCaller_CacheWarm-10              162.7µ ± ∞ ¹   162.8µ ± ∞ ¹  ~ (p=1.000 n=1) ²      169.4µ ± ∞ ¹  ~ (p=1.000 n=1) ²
+BAPCaller_CacheCold-10              164.2µ ± ∞ ¹   205.1µ ± ∞ ¹  ~ (p=1.000 n=1) ²      171.9µ ± ∞ ¹  ~ (p=1.000 n=1) ²
+geomean                             152.4µ          157.0µ  +3.02%                         157.8µ  +3.59%
+
+Memory (B/op) — geomean: 91.18 Ki across all runs (±0.02%)
+Allocs/op   — geomean: 825.9 across all runs (perfectly stable across all 3 runs)
+```
+
+> **Note on confidence intervals:** benchstat requires ≥6 samples per benchmark for confidence intervals. With `-count=1` and 3 runs, results show ∞ uncertainty bands. The geomean drift of +3.59% across runs is within normal OS scheduler noise. To narrow confidence intervals, re-run with `-count=6` and `benchstat` will produce meaningful p-values.
+
+---
+
+### B5 — Bottleneck Analysis
+
+Based on the allocation profile and latency data:
+
+| Rank | Plugin / Step | Estimated contribution | Evidence |
+|:----:|---------------|------------------------|---------|
+| 1 | `schemav2validator` (kin-openapi validation) | 40–60% | Alloc count proportional to payload complexity; JSON schema traversal creates many short-lived objects |
+| 2 | `signer` (Ed25519/BLAKE-512) | 20–30% | Cryptographic operations are CPU-bound; scaling efficiency plateau at 8+ cores consistent with crypto serialisation |
+| 3 | `simplekeymanager` (key derivation, Redis) | 5–10% | 8-alloc savings on cache-warm path; small but detectable |
+| 4 | `router` (YAML routing lookup) | < 5% | Minimal; in-memory map lookup |
+
+**Key insight from the concurrency data:** RPS plateaus at ~17,000–17,500 between GOMAXPROCS=8 and 16. This suggests a shared serialisation point — most likely the kin-openapi schema validation tree (a read-heavy but non-trivially-lockable data structure), or the Ed25519 key operations.
+
+**Profiling commands to isolate the bottleneck:**
+
+```bash
+# CPU profile — run from beckn-onix root
+go test ./benchmarks/e2e/... \
+  -bench=BenchmarkBAPCaller_Discover \
+  -benchtime=30s \
+  -cpuprofile=benchmarks/results/cpu.prof \
+  -timeout=5m
+
+go tool pprof -http=:6060 benchmarks/results/cpu.prof
+
+# Memory profile
+go test ./benchmarks/e2e/... \
+  -bench=BenchmarkBAPCaller_Discover \
+  -benchtime=30s \
+  -memprofile=benchmarks/results/mem.prof \
+  -timeout=5m
+
+go tool pprof -http=:6060 benchmarks/results/mem.prof
+
+# Parallel profile (find lock contention)
+go test ./benchmarks/e2e/... \
+  -bench=BenchmarkBAPCaller_Discover_Parallel \
+  -benchtime=30s \
+  -blockprofile=benchmarks/results/block.prof \
+  -mutexprofile=benchmarks/results/mutex.prof \
+  -timeout=5m
+
+go tool pprof -http=:6060 benchmarks/results/mutex.prof
+```
+
+---
+
+## Running the Benchmarks
+
+```bash
+# Full run: compile plugins, run all scenarios, generate CSV and benchstat summary
+cd beckn-onix
+bash benchmarks/run_benchmarks.sh
+
+# Quick smoke test (fast, lower iteration counts):
+# Edit BENCH_TIME_SERIAL="2s" and BENCH_TIME_PARALLEL="5s" at the top of the script.
+
+# Individual benchmark (manual):
+go test ./benchmarks/e2e/... \
+  -bench=BenchmarkBAPCaller_Discover \
+  -benchtime=10s \
+  -benchmem \
+  -timeout=30m
+
+# Race detector check:
+go test ./benchmarks/e2e/... \
+  -bench=BenchmarkBAPCaller_Discover_Parallel \
+  -benchtime=5s \
+  -race \
+  -timeout=30m
+
+# Concurrency sweep (manual):
+for cpu in 1 2 4 8 16; do
+  go test ./benchmarks/e2e/... \
+    -bench="BenchmarkBAPCaller_Discover_Parallel|BenchmarkBAPCaller_RPS" \
+    -benchtime=30s -cpu=$cpu -benchmem -timeout=10m
+done
+```
+
+> **Note:** The first run takes 60–90 s while plugins compile. Subsequent runs use Go's build cache and start in seconds.
+
+---
+
+*Generated from run `2026-03-31_14-19-19` · beckn-onix · Beckn Protocol v2.0.0*
--- a/benchmarks/reports/REPORT_TEMPLATE.md
+++ b/benchmarks/reports/REPORT_TEMPLATE.md
@@ -0,0 +1,148 @@
+# beckn-onix Adapter — Benchmark Report
+
+> **Run:** `__TIMESTAMP__`
+> **Platform:** __CPU__ · __GOOS__/__GOARCH__ · GOMAXPROCS=__GOMAXPROCS__ (default)
+> **Adapter version:** __ONIX_VERSION__
+> **Beckn Protocol:** v2.0.0
+
+---
+
+## Part A — Executive Summary
+
+### What Was Tested
+
+The beckn-onix ONIX adapter was benchmarked end-to-end using Go's native `testing.B`
+framework and `net/http/httptest`. Requests flowed through a real compiled adapter —
+with all production plugins active — against in-process mock servers, isolating
+adapter-internal latency from network variables.
+
+**Pipeline tested (bapTxnCaller):** `addRoute → sign → validateSchema`
+
+**Plugins active:** `router`, `signer`, `simplekeymanager`, `cache` (miniredis), `schemav2validator`
+
+**Actions benchmarked:** `discover`, `select`, `init`, `confirm`
+
+### Key Results
+
+| Metric | Value |
+|--------|-------|
+| Serial p50 latency (discover) | **__P50_US__ µs** |
+| Serial p95 latency (discover) | **__P95_US__ µs** |
+| Serial p99 latency (discover) | **__P99_US__ µs** |
+| Serial mean latency (discover) | **__MEAN_DISCOVER_US__ µs** |
+| Peak parallel throughput | **__PEAK_RPS__ req/s** |
+| Cache warm vs cold delta | **__CACHE_DELTA__** |
+| Memory per request (discover) | **~__MEM_DISCOVER_KB__ KB · __ALLOCS_DISCOVER__ allocs** |
+
+### Interpretation
+
+__INTERPRETATION__
+
+### Recommendation
+
+__RECOMMENDATION__
+
+---
+
+## Part B — Technical Detail
+
+### B0 — Test Environment
+
+| Parameter | Value |
+|-----------|-------|
+| CPU | __CPU__ (__GOARCH__) |
+| OS | __GOOS__/__GOARCH__ |
+| Go package | `github.com/beckn-one/beckn-onix/benchmarks/e2e` |
+| Default GOMAXPROCS | __GOMAXPROCS__ |
+| Benchmark timeout | 30 minutes |
+| Serial run duration | 10s per benchmark × 3 runs |
+| Parallel sweep duration | 30s per GOMAXPROCS level |
+| GOMAXPROCS sweep | 1, 2, 4, 8, 16 |
+| Redis | miniredis (in-process, no network) |
+| BPP | httptest mock (instant ACK) |
+| Registry | httptest mock (dev key pair) |
+| Schema spec | Beckn v2.0.0 OpenAPI (`beckn.yaml`, local file) |
+
+**Plugins and steps (bapTxnCaller):**
+
+| Step | Plugin | Role |
+|------|--------|------|
+| 1 | `router` | Resolves BPP URL from routing config |
+| 2 | `signer` + `simplekeymanager` | Signs request body (Ed25519/BLAKE-512) |
+| 3 | `schemav2validator` | Validates Beckn v2.0 API schema |
+
+---
+
+### B1 — Latency by Action
+
+Averages from `run1.txt` (10s, GOMAXPROCS=__GOMAXPROCS__). Percentile values from `percentiles.txt`.
+
+| Action | Mean (µs) | p50 (µs) | p95 (µs) | p99 (µs) | Allocs/req | Bytes/req |
+|--------|----------:|--------:|--------:|--------:|----------:|----------:|
+| discover (serial) | __MEAN_DISCOVER_US__ | __P50_US__ | __P95_US__ | __P99_US__ | __ALLOCS_DISCOVER__ | __BYTES_DISCOVER__ (~__MEM_DISCOVER_KB__ KB) |
+| select | __MEAN_SELECT_US__ | — | — | — | __ALLOCS_SELECT__ | __BYTES_SELECT__ (~__MEM_SELECT_KB__ KB) |
+| init | __MEAN_INIT_US__ | — | — | — | __ALLOCS_INIT__ | __BYTES_INIT__ (~__MEM_INIT_KB__ KB) |
+| confirm | __MEAN_CONFIRM_US__ | — | — | — | __ALLOCS_CONFIRM__ | __BYTES_CONFIRM__ (~__MEM_CONFIRM_KB__ KB) |
+
+---
+
+### B2 — Throughput vs Concurrency
+
+Results from the concurrency sweep (`parallel_cpu*.txt`, 30s per level).
+
+__THROUGHPUT_TABLE__
+
+---
+
+### B3 — Cache Impact (Redis warm vs cold)
+
+Results from `cache_comparison.txt` (10s each, GOMAXPROCS=__GOMAXPROCS__).
+
+| Scenario | Mean (µs) | Allocs/req | Bytes/req |
+|----------|----------:|-----------:|----------:|
+| CacheWarm | __CACHE_WARM_US__ | __CACHE_WARM_ALLOCS__ | __CACHE_WARM_BYTES__ |
+| CacheCold | __CACHE_COLD_US__ | __CACHE_COLD_ALLOCS__ | __CACHE_COLD_BYTES__ |
+| **Delta** | **__CACHE_DELTA__** | — | — |
+
+---
+
+### B4 — benchstat Statistical Summary (3 Runs)
+
+```
+__BENCHSTAT_SUMMARY__
+```
+
+---
+
+### B5 — Bottleneck Analysis
+
+> Populate after reviewing the numbers above and profiling with `go tool pprof`.
+
+| Rank | Plugin / Step | Estimated contribution | Evidence |
+|:----:|---------------|------------------------|---------|
+| 1 | | | |
+| 2 | | | |
+| 3 | | | |
+
+**Profiling commands:**
+
+```bash
+# CPU profile
+go test ./benchmarks/e2e/... -bench=BenchmarkBAPCaller_Discover \
+  -benchtime=30s -cpuprofile=benchmarks/results/cpu.prof -timeout=5m
+go tool pprof -http=:6060 benchmarks/results/cpu.prof
+
+# Memory profile
+go test ./benchmarks/e2e/... -bench=BenchmarkBAPCaller_Discover \
+  -benchtime=30s -memprofile=benchmarks/results/mem.prof -timeout=5m
+go tool pprof -http=:6060 benchmarks/results/mem.prof
+
+# Lock contention (find serialisation under parallel load)
+go test ./benchmarks/e2e/... -bench=BenchmarkBAPCaller_Discover_Parallel \
+  -benchtime=30s -mutexprofile=benchmarks/results/mutex.prof -timeout=5m
+go tool pprof -http=:6060 benchmarks/results/mutex.prof
+```
+
+---
+
+*Generated from run `__TIMESTAMP__` · beckn-onix __ONIX_VERSION__ · Beckn Protocol v2.0.0*
--- a/benchmarks/run_benchmarks.sh
+++ b/benchmarks/run_benchmarks.sh
@@ -0,0 +1,200 @@
+#!/usr/bin/env bash
+# =============================================================================
+# run_benchmarks.sh — beckn-onix adapter benchmark runner
+#
+# Usage:
+#   cd beckn-onix
+#   bash benchmarks/run_benchmarks.sh
+#
+# Requirements:
+#   - Go 1.24+ installed
+#   - benchstat is declared as a tool in go.mod; invoked via "go tool benchstat"
+#
+# Output:
+#   benchmarks/results/<YYYY-MM-DD_HH-MM-SS>/
+#     run1.txt, run2.txt, run3.txt   — raw go test -bench output
+#     parallel_cpu1.txt ... cpu16.txt — concurrency sweep
+#     benchstat_summary.txt           — statistical aggregation
+# =============================================================================
+set -euo pipefail
+
+SCRIPT_START=$(date +%s)
+REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
+BENCH_PKG="./benchmarks/e2e/..."
+BENCH_TIMEOUT="10m"
+BENCH_TIME_SERIAL="10s"
+BENCH_TIME_PARALLEL="30s"
+BENCH_COUNT=1             # benchstat uses the 3 serial files for stability
+
+# Adapter version — reads from git tag, falls back to "dev"
+ONIX_VERSION="$(git -C "$REPO_ROOT" describe --tags --abbrev=0 2>/dev/null || echo "dev")"
+REPORT_TEMPLATE="$REPO_ROOT/benchmarks/reports/REPORT_TEMPLATE.md"
+
+# ── -report-only <dir>: regenerate report from an existing results directory ──
+if [[ "${1:-}" == "-report-only" ]]; then
+  RESULTS_DIR="${2:-}"
+  if [[ -z "$RESULTS_DIR" ]]; then
+    echo "Usage: bash benchmarks/run_benchmarks.sh -report-only <results-dir>"
+    echo "Example: bash benchmarks/run_benchmarks.sh -report-only benchmarks/results/2026-04-09_10-30-00"
+    exit 1
+  fi
+  if [[ ! -d "$RESULTS_DIR" ]]; then
+    echo "ERROR: results directory not found: $RESULTS_DIR"
+    exit 1
+  fi
+  echo "=== Regenerating report from existing results ==="
+  echo "Results dir : $RESULTS_DIR"
+  echo ""
+  cd "$REPO_ROOT"
+  echo "Parsing results to CSV..."
+  go run "$REPO_ROOT/benchmarks/tools/parse_results.go" \
+    -dir="$RESULTS_DIR" -out="$RESULTS_DIR" 2>&1 || true
+  echo ""
+  echo "Generating benchmark report..."
+  go run "$REPO_ROOT/benchmarks/tools/generate_report.go" \
+    -dir="$RESULTS_DIR" \
+    -template="$REPORT_TEMPLATE" \
+    -version="$ONIX_VERSION"
+  echo ""
+  echo "Done. Report written to: $RESULTS_DIR/BENCHMARK_REPORT.md"
+  exit 0
+fi
+
+RESULTS_DIR="$REPO_ROOT/benchmarks/results/$(date +%Y-%m-%d_%H-%M-%S)"
+
+cd "$REPO_ROOT"
+
+# ── benchstat is declared as a go tool in go.mod; no separate install needed ──
+# Use: go tool benchstat  (works anywhere without PATH changes)
+
+# bench_filter: tee full output to the .log file for debugging, and write a
+# clean copy (only benchstat-parseable lines) to the .txt file.
+# The adapter logger is silenced via zerolog.SetGlobalLevel(zerolog.Disabled)
+# in TestMain, so stdout should already be clean; the grep is a safety net for
+# any stray lines from go test itself (build output, redis warnings, etc.).
+bench_filter() {
+  local txt="$1" log="$2"
+  tee "$log" | grep -E "^(Benchmark|goos:|goarch:|pkg:|cpu:|ok |PASS|FAIL|--- )" > "$txt" || true
+}
+
+# ── Create results directory ──────────────────────────────────────────────────
+mkdir -p "$RESULTS_DIR"
+echo "=== beckn-onix Benchmark Runner ==="
+echo "Results dir : $RESULTS_DIR"
+echo "Package     : $BENCH_PKG"
+echo ""
+
+# ── Serial runs (3x for benchstat stability) ──────────────────────────────────
+echo "Running serial benchmarks (3 runs × ${BENCH_TIME_SERIAL})..."
+for run in 1 2 3; do
+  echo "  Run $run/3..."
+  go test \
+    -timeout="$BENCH_TIMEOUT" \
+    -run=^$ \
+    -bench="." \
+    -benchtime="$BENCH_TIME_SERIAL" \
+    -benchmem \
+    -count="$BENCH_COUNT" \
+    "$BENCH_PKG" 2>&1 | bench_filter "$RESULTS_DIR/run${run}.txt" "$RESULTS_DIR/run${run}.log"
+  echo "    Saved → $RESULTS_DIR/run${run}.txt (full log → run${run}.log)"
+done
+echo ""
+
+# ── Concurrency sweep ─────────────────────────────────────────────────────────
+echo "Running parallel concurrency sweep (cpu=1,2,4,8,16; ${BENCH_TIME_PARALLEL} each)..."
+for cpu in 1 2 4 8 16; do
+  echo "  GOMAXPROCS=$cpu..."
+  go test \
+    -timeout="$BENCH_TIMEOUT" \
+    -run=^$ \
+    -bench="BenchmarkBAPCaller_Discover_Parallel|BenchmarkBAPCaller_RPS" \
+    -benchtime="$BENCH_TIME_PARALLEL" \
+    -benchmem \
+    -cpu="$cpu" \
+    -count=1 \
+    "$BENCH_PKG" 2>&1 | bench_filter "$RESULTS_DIR/parallel_cpu${cpu}.txt" "$RESULTS_DIR/parallel_cpu${cpu}.log"
+  echo "    Saved → $RESULTS_DIR/parallel_cpu${cpu}.txt (full log → parallel_cpu${cpu}.log)"
+done
+echo ""
+
+# ── Percentile benchmark ──────────────────────────────────────────────────────
+echo "Running percentile benchmark (${BENCH_TIME_SERIAL})..."
+go test \
+  -timeout="$BENCH_TIMEOUT" \
+  -run=^$ \
+  -bench="BenchmarkBAPCaller_Discover_Percentiles" \
+  -benchtime="$BENCH_TIME_SERIAL" \
+  -benchmem \
+  -count=1 \
+  "$BENCH_PKG" 2>&1 | bench_filter "$RESULTS_DIR/percentiles.txt" "$RESULTS_DIR/percentiles.log"
+echo "  Saved → $RESULTS_DIR/percentiles.txt (full log → percentiles.log)"
+echo ""
+
+# ── Cache comparison ──────────────────────────────────────────────────────────
+echo "Running cache warm vs cold comparison..."
+go test \
+  -timeout="$BENCH_TIMEOUT" \
+  -run=^$ \
+  -bench="BenchmarkBAPCaller_Cache" \
+  -benchtime="$BENCH_TIME_SERIAL" \
+  -benchmem \
+  -count=1 \
+  "$BENCH_PKG" 2>&1 | bench_filter "$RESULTS_DIR/cache_comparison.txt" "$RESULTS_DIR/cache_comparison.log"
+echo "  Saved → $RESULTS_DIR/cache_comparison.txt (full log → cache_comparison.log)"
+echo ""
+
+# ── benchstat statistical summary ─────────────────────────────────────────────
+echo "Running benchstat statistical analysis..."
+go tool benchstat \
+  "$RESULTS_DIR/run1.txt" \
+  "$RESULTS_DIR/run2.txt" \
+  "$RESULTS_DIR/run3.txt" \
+  > "$RESULTS_DIR/benchstat_summary.txt" 2>&1
+echo "  Saved → $RESULTS_DIR/benchstat_summary.txt"
+echo ""
+
+# ── Parse results to CSV ──────────────────────────────────────────────────────
+echo "Parsing results to CSV..."
+go run "$REPO_ROOT/benchmarks/tools/parse_results.go" \
+  -dir="$RESULTS_DIR" \
+  -out="$RESULTS_DIR" 2>&1 || echo "  (parse_results.go: skipping on error)"
+echo ""
+
+# ── Generate human-readable report ───────────────────────────────────────────
+echo "Generating benchmark report..."
+if [[ -f "$REPORT_TEMPLATE" ]]; then
+  go run "$REPO_ROOT/benchmarks/tools/generate_report.go" \
+    -dir="$RESULTS_DIR" \
+    -template="$REPORT_TEMPLATE" \
+    -version="$ONIX_VERSION" 2>&1 || echo "  (generate_report.go: skipping on error)"
+else
+  echo "  WARNING: template not found at $REPORT_TEMPLATE — skipping report generation"
+fi
+
+# ── Summary ───────────────────────────────────────────────────────────────────
+SCRIPT_END=$(date +%s)
+ELAPSED_SECS=$(( SCRIPT_END - SCRIPT_START ))
+ELAPSED_MIN=$(( ELAPSED_SECS / 60 ))
+ELAPSED_SEC_REM=$(( ELAPSED_SECS % 60 ))
+
+echo ""
+echo "========================================"
+echo "✅ Benchmark run complete!"
+echo ""
+echo "Total runtime : ${ELAPSED_MIN}m ${ELAPSED_SEC_REM}s"
+echo ""
+echo "Results written to:"
+echo "  $RESULTS_DIR"
+echo ""
+echo "Key files:"
+echo "  BENCHMARK_REPORT.md   — generated human-readable report"
+echo "  benchstat_summary.txt — statistical analysis of 3 serial runs"
+echo "  latency_report.csv    — per-benchmark latency and allocation data"
+echo "  throughput_report.csv — RPS and latency by GOMAXPROCS level"
+echo "  parallel_cpu*.txt     — concurrency sweep raw output"
+echo "  percentiles.txt       — p50/p95/p99 latency data"
+echo "  cache_comparison.txt  — warm vs cold Redis cache comparison"
+echo ""
+echo "To review the report:"
+echo "  open $RESULTS_DIR/BENCHMARK_REPORT.md"
+echo "========================================"
--- a/benchmarks/tools/generate_report.go
+++ b/benchmarks/tools/generate_report.go
@@ -0,0 +1,595 @@
+// generate_report.go — Fills REPORT_TEMPLATE.md with data from a completed
+// benchmark run and writes BENCHMARK_REPORT.md to the results directory.
+//
+// Usage:
+//
+//	go run benchmarks/tools/generate_report.go \
+//	  -dir=benchmarks/results/<timestamp>/ \
+//	  -template=benchmarks/reports/REPORT_TEMPLATE.md \
+//	  -version=<onix-version>
+//
+// The generator reads:
+//   - latency_report.csv       — per-benchmark latency and allocation data
+//   - throughput_report.csv    — RPS and latency by GOMAXPROCS level
+//   - benchstat_summary.txt    — raw benchstat output block
+//   - run1.txt                 — goos / goarch / cpu metadata
+//
+// Placeholders filled in the template:
+//
+//	__TIMESTAMP__         results dir basename (YYYY-MM-DD_HH-MM-SS)
+//	__ONIX_VERSION__      -version flag value
+//	__GOOS__              from run1.txt header
+//	__GOARCH__            from run1.txt header
+//	__CPU__               from run1.txt header
+//	__GOMAXPROCS__        derived from the benchmark name suffix in run1.txt
+//	__P50_US__            p50 latency in µs (from Discover_Percentiles row)
+//	__P95_US__            p95 latency in µs
+//	__P99_US__            p99 latency in µs
+//	__MEAN_DISCOVER_US__  mean latency in µs for discover
+//	__MEAN_SELECT_US__    mean latency in µs for select
+//	__MEAN_INIT_US__      mean latency in µs for init
+//	__MEAN_CONFIRM_US__   mean latency in µs for confirm
+//	__ALLOCS_DISCOVER__   allocs/req for discover
+//	__ALLOCS_SELECT__     allocs/req for select
+//	__ALLOCS_INIT__       allocs/req for init
+//	__ALLOCS_CONFIRM__    allocs/req for confirm
+//	__BYTES_DISCOVER__    bytes/req for discover
+//	__BYTES_SELECT__      bytes/req for select
+//	__BYTES_INIT__        bytes/req for init
+//	__BYTES_CONFIRM__     bytes/req for confirm
+//	__MEM_DISCOVER_KB__   bytes/req converted to KB for discover
+//	__MEM_SELECT_KB__     bytes/req converted to KB for select
+//	__MEM_INIT_KB__       bytes/req converted to KB for init
+//	__MEM_CONFIRM_KB__    bytes/req converted to KB for confirm
+//	__PEAK_RPS__          highest RPS across all GOMAXPROCS levels
+//	__CACHE_WARM_US__     mean latency in µs for CacheWarm
+//	__CACHE_COLD_US__     mean latency in µs for CacheCold
+//	__CACHE_WARM_ALLOCS__ allocs/req for CacheWarm
+//	__CACHE_COLD_ALLOCS__ allocs/req for CacheCold
+//	__CACHE_WARM_BYTES__  bytes/req for CacheWarm
+//	__CACHE_COLD_BYTES__  bytes/req for CacheCold
+//	__CACHE_DELTA__       formatted warm-vs-cold delta string
+//	__THROUGHPUT_TABLE__  generated markdown table from throughput_report.csv
+//	__BENCHSTAT_SUMMARY__ raw contents of benchstat_summary.txt
+package main
+
+import (
+	"bufio"
+	"encoding/csv"
+	"flag"
+	"fmt"
+	"io"
+	"math"
+	"os"
+	"path/filepath"
+	"regexp"
+	"strconv"
+	"strings"
+)
+
+func main() {
+	dir := flag.String("dir", "", "Results directory (required)")
+	tmplPath := flag.String("template", "benchmarks/reports/REPORT_TEMPLATE.md", "Path to report template")
+	version := flag.String("version", "unknown", "Adapter version (e.g. v1.5.0)")
+	flag.Parse()
+
+	if *dir == "" {
+		fmt.Fprintln(os.Stderr, "ERROR: -dir is required")
+		os.Exit(1)
+	}
+
+	// Derive timestamp from the directory basename.
+	timestamp := filepath.Base(*dir)
+
+	// ── Read template ──────────────────────────────────────────────────────────
+	tmplBytes, err := os.ReadFile(*tmplPath)
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "ERROR: reading template %s: %v\n", *tmplPath, err)
+		os.Exit(1)
+	}
+	report := string(tmplBytes)
+
+	// ── Parse run1.txt for environment metadata ────────────────────────────────
+	env := parseEnv(filepath.Join(*dir, "run1.txt"))
+
+	// ── Parse latency_report.csv ──────────────────────────────────────────────
+	latency, err := parseLatencyCSV(filepath.Join(*dir, "latency_report.csv"))
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "WARNING: could not parse latency_report.csv: %v\n", err)
+	}
+
+	// ── Parse throughput_report.csv ───────────────────────────────────────────
+	throughput, err := parseThroughputCSV(filepath.Join(*dir, "throughput_report.csv"))
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "WARNING: could not parse throughput_report.csv: %v\n", err)
+	}
+
+	// ── Read benchstat_summary.txt ────────────────────────────────────────────
+	benchstat := readFileOrDefault(filepath.Join(*dir, "benchstat_summary.txt"),
+		"(benchstat output not available)")
+
+	// ── Compute derived values ─────────────────────────────────────────────────
+
+	// Mean latency: convert ms → µs, round to integer.
+	meanDiscoverUS := msToUS(latency["BenchmarkBAPCaller_Discover"]["mean_ms"])
+	meanSelectUS := msToUS(latency["BenchmarkBAPCaller_AllActions/select"]["mean_ms"])
+	meanInitUS := msToUS(latency["BenchmarkBAPCaller_AllActions/init"]["mean_ms"])
+	meanConfirmUS := msToUS(latency["BenchmarkBAPCaller_AllActions/confirm"]["mean_ms"])
+
+	// Percentiles come from the Discover_Percentiles row.
+	perc := latency["BenchmarkBAPCaller_Discover_Percentiles"]
+	p50 := fmtMetric(perc["p50_µs"], "µs")
+	p95 := fmtMetric(perc["p95_µs"], "µs")
+	p99 := fmtMetric(perc["p99_µs"], "µs")
+
+	// Memory: bytes → KB (1 decimal place).
+	memDiscoverKB := bytesToKB(latency["BenchmarkBAPCaller_Discover"]["bytes_op"])
+	memSelectKB := bytesToKB(latency["BenchmarkBAPCaller_AllActions/select"]["bytes_op"])
+	memInitKB := bytesToKB(latency["BenchmarkBAPCaller_AllActions/init"]["bytes_op"])
+	memConfirmKB := bytesToKB(latency["BenchmarkBAPCaller_AllActions/confirm"]["bytes_op"])
+
+	// Cache delta.
+	warmUS := msToUS(latency["BenchmarkBAPCaller_CacheWarm"]["mean_ms"])
+	coldUS := msToUS(latency["BenchmarkBAPCaller_CacheCold"]["mean_ms"])
+	cacheDelta := formatCacheDelta(warmUS, coldUS)
+
+	// Peak RPS across all concurrency levels.
+	peakRPS := "—"
+	var peakRPSVal float64
+	for _, row := range throughput {
+		if v := parseFloatOrZero(row["rps"]); v > peakRPSVal {
+			peakRPSVal = v
+			peakRPS = fmt.Sprintf("%.0f", peakRPSVal)
+		}
+	}
+
+	// ── Build throughput table ─────────────────────────────────────────────────
+	throughputTable := buildThroughputTable(throughput)
+
+	// ── Generate interpretation and recommendation ─────────────────────────────
+	interpretation := buildInterpretation(perc, latency, throughput, warmUS, coldUS)
+	recommendation := buildRecommendation(throughput)
+
+	// ── Apply substitutions ────────────────────────────────────────────────────
+	replacements := map[string]string{
+		"__TIMESTAMP__":        timestamp,
+		"__ONIX_VERSION__":     *version,
+		"__GOOS__":             env["goos"],
+		"__GOARCH__":           env["goarch"],
+		"__CPU__":              env["cpu"],
+		"__GOMAXPROCS__":       env["gomaxprocs"],
+		"__P50_US__":           p50,
+		"__P95_US__":           p95,
+		"__P99_US__":           p99,
+		"__MEAN_DISCOVER_US__": meanDiscoverUS,
+		"__MEAN_SELECT_US__":   meanSelectUS,
+		"__MEAN_INIT_US__":     meanInitUS,
+		"__MEAN_CONFIRM_US__":  meanConfirmUS,
+		"__ALLOCS_DISCOVER__":  fmtInt(latency["BenchmarkBAPCaller_Discover"]["allocs_op"]),
+		"__ALLOCS_SELECT__":    fmtInt(latency["BenchmarkBAPCaller_AllActions/select"]["allocs_op"]),
+		"__ALLOCS_INIT__":      fmtInt(latency["BenchmarkBAPCaller_AllActions/init"]["allocs_op"]),
+		"__ALLOCS_CONFIRM__":   fmtInt(latency["BenchmarkBAPCaller_AllActions/confirm"]["allocs_op"]),
+		"__BYTES_DISCOVER__":   fmtInt(latency["BenchmarkBAPCaller_Discover"]["bytes_op"]),
+		"__BYTES_SELECT__":     fmtInt(latency["BenchmarkBAPCaller_AllActions/select"]["bytes_op"]),
+		"__BYTES_INIT__":       fmtInt(latency["BenchmarkBAPCaller_AllActions/init"]["bytes_op"]),
+		"__BYTES_CONFIRM__":    fmtInt(latency["BenchmarkBAPCaller_AllActions/confirm"]["bytes_op"]),
+		"__MEM_DISCOVER_KB__":  memDiscoverKB,
+		"__MEM_SELECT_KB__":    memSelectKB,
+		"__MEM_INIT_KB__":      memInitKB,
+		"__MEM_CONFIRM_KB__":   memConfirmKB,
+		"__PEAK_RPS__":         peakRPS,
+		"__CACHE_WARM_US__":    warmUS,
+		"__CACHE_COLD_US__":    coldUS,
+		"__CACHE_WARM_ALLOCS__": fmtInt(latency["BenchmarkBAPCaller_CacheWarm"]["allocs_op"]),
+		"__CACHE_COLD_ALLOCS__": fmtInt(latency["BenchmarkBAPCaller_CacheCold"]["allocs_op"]),
+		"__CACHE_WARM_BYTES__":  fmtInt(latency["BenchmarkBAPCaller_CacheWarm"]["bytes_op"]),
+		"__CACHE_COLD_BYTES__":  fmtInt(latency["BenchmarkBAPCaller_CacheCold"]["bytes_op"]),
+		"__CACHE_DELTA__":      cacheDelta,
+		"__THROUGHPUT_TABLE__":  throughputTable,
+		"__BENCHSTAT_SUMMARY__": benchstat,
+		"__INTERPRETATION__":   interpretation,
+		"__RECOMMENDATION__":   recommendation,
+	}
+
+	for placeholder, value := range replacements {
+		report = strings.ReplaceAll(report, placeholder, value)
+	}
+
+	// ── Write output ───────────────────────────────────────────────────────────
+	outPath := filepath.Join(*dir, "BENCHMARK_REPORT.md")
+	if err := os.WriteFile(outPath, []byte(report), 0o644); err != nil {
+		fmt.Fprintf(os.Stderr, "ERROR: writing report: %v\n", err)
+		os.Exit(1)
+	}
+	fmt.Printf("  Written → %s\n", outPath)
+}
+
+// ── Parsers ────────────────────────────────────────────────────────────────────
+
+var gomaxprocsRe = regexp.MustCompile(`-(\d+)$`)
+
+// parseEnv reads goos, goarch, cpu, and GOMAXPROCS from a run*.txt file header.
+func parseEnv(path string) map[string]string {
+	env := map[string]string{
+		"goos": "unknown", "goarch": "unknown",
+		"cpu": "unknown", "gomaxprocs": "unknown",
+	}
+	f, err := os.Open(path)
+	if err != nil {
+		return env
+	}
+	defer f.Close()
+
+	scanner := bufio.NewScanner(f)
+	for scanner.Scan() {
+		line := strings.TrimSpace(scanner.Text())
+		switch {
+		case strings.HasPrefix(line, "goos:"):
+			env["goos"] = strings.TrimSpace(strings.TrimPrefix(line, "goos:"))
+		case strings.HasPrefix(line, "goarch:"):
+			env["goarch"] = strings.TrimSpace(strings.TrimPrefix(line, "goarch:"))
+		case strings.HasPrefix(line, "cpu:"):
+			env["cpu"] = strings.TrimSpace(strings.TrimPrefix(line, "cpu:"))
+		case strings.HasPrefix(line, "Benchmark"):
+			// Extract GOMAXPROCS from first benchmark line suffix (e.g. "-10").
+			if m := gomaxprocsRe.FindStringSubmatch(strings.Fields(line)[0]); m != nil {
+				env["gomaxprocs"] = m[1]
+			}
+		}
+	}
+	return env
+}
+
+// parseLatencyCSV returns a map of benchmark name → field name → raw string value.
+// When multiple rows exist for the same benchmark (3 serial runs), values from
+// the first non-empty occurrence are used.
+func parseLatencyCSV(path string) (map[string]map[string]string, error) {
+	f, err := os.Open(path)
+	if err != nil {
+		return nil, err
+	}
+	defer f.Close()
+
+	r := csv.NewReader(f)
+	header, err := r.Read()
+	if err != nil {
+		return nil, err
+	}
+
+	result := map[string]map[string]string{}
+	for {
+		row, err := r.Read()
+		if err == io.EOF {
+			break
+		}
+		if err != nil || len(row) == 0 {
+			continue
+		}
+		name := row[0]
+		if _, exists := result[name]; !exists {
+			result[name] = map[string]string{}
+		}
+		for i, col := range header[1:] {
+			idx := i + 1
+			if idx < len(row) && row[idx] != "" && result[name][col] == "" {
+				result[name][col] = row[idx]
+			}
+		}
+	}
+	return result, nil
+}
+
+// parseThroughputCSV returns rows as a slice of field maps.
+func parseThroughputCSV(path string) ([]map[string]string, error) {
+	f, err := os.Open(path)
+	if err != nil {
+		return nil, err
+	}
+	defer f.Close()
+
+	r := csv.NewReader(f)
+	header, err := r.Read()
+	if err != nil {
+		return nil, err
+	}
+
+	var rows []map[string]string
+	for {
+		row, err := r.Read()
+		if err == io.EOF {
+			break
+		}
+		if err != nil || len(row) == 0 {
+			continue
+		}
+		m := map[string]string{}
+		for i, col := range header {
+			if i < len(row) {
+				m[col] = row[i]
+			}
+		}
+		rows = append(rows, m)
+	}
+	return rows, nil
+}
+
+// buildThroughputTable renders the throughput CSV as a markdown table.
+func buildThroughputTable(rows []map[string]string) string {
+	if len(rows) == 0 {
+		return "_No concurrency sweep data available._"
+	}
+	var sb strings.Builder
+	sb.WriteString("| GOMAXPROCS | Mean Latency (µs) | RPS |\n")
+	sb.WriteString("|:----------:|------------------:|----:|\n")
+	for _, row := range rows {
+		cpu := orDash(row["gomaxprocs"])
+		latUS := "—"
+		if v := parseFloatOrZero(row["mean_latency_ms"]); v > 0 {
+			latUS = fmt.Sprintf("%.0f", v*1000)
+		}
+		rps := orDash(row["rps"])
+		sb.WriteString(fmt.Sprintf("| %s | %s | %s |\n", cpu, latUS, rps))
+	}
+	return sb.String()
+}
+
+// ── Formatters ─────────────────────────────────────────────────────────────────
+
+// msToUS converts a ms string to a rounded µs string.
+func msToUS(ms string) string {
+	v := parseFloatOrZero(ms)
+	if v == 0 {
+		return "—"
+	}
+	return fmt.Sprintf("%.0f", v*1000)
+}
+
+// bytesToKB converts a bytes string to a KB string with 1 decimal place.
+func bytesToKB(bytes string) string {
+	v := parseFloatOrZero(bytes)
+	if v == 0 {
+		return "—"
+	}
+	return fmt.Sprintf("%.1f", v/1024)
+}
+
+// fmtInt formats a float string as a rounded integer string.
+func fmtInt(s string) string {
+	v := parseFloatOrZero(s)
+	if v == 0 {
+		return "—"
+	}
+	return fmt.Sprintf("%.0f", math.Round(v))
+}
+
+// fmtMetric formats a metric value with the given unit, or returns "—".
+func fmtMetric(s, unit string) string {
+	v := parseFloatOrZero(s)
+	if v == 0 {
+		return "—"
+	}
+	return fmt.Sprintf("%.0f %s", v, unit)
+}
+
+// formatCacheDelta produces a human-readable warm-vs-cold delta string.
+func formatCacheDelta(warmUS, coldUS string) string {
+	w := parseFloatOrZero(warmUS)
+	c := parseFloatOrZero(coldUS)
+	if w == 0 || c == 0 {
+		return "—"
+	}
+	delta := w - c
+	sign := "+"
+	if delta < 0 {
+		sign = ""
+	}
+	return fmt.Sprintf("%s%.0f µs (warm vs cold)", sign, delta)
+}
+
+func orDash(s string) string {
+	if s == "" {
+		return "—"
+	}
+	return s
+}
+
+func parseFloatOrZero(s string) float64 {
+	v, _ := strconv.ParseFloat(strings.TrimSpace(s), 64)
+	return v
+}
+
+func readFileOrDefault(path, def string) string {
+	b, err := os.ReadFile(path)
+	if err != nil {
+		return def
+	}
+	return strings.TrimRight(string(b), "\n")
+}
+
+// ── Narrative generators ───────────────────────────────────────────────────────
+
+// buildInterpretation generates a data-driven interpretation paragraph from the
+// benchmark results. It covers tail-latency control, action complexity trend,
+// concurrency scaling efficiency, and cache impact.
+func buildInterpretation(
+	perc map[string]string,
+	latency map[string]map[string]string,
+	throughput []map[string]string,
+	warmUS, coldUS string,
+) string {
+	var sb strings.Builder
+
+	p50 := parseFloatOrZero(perc["p50_µs"])
+	p99 := parseFloatOrZero(perc["p99_µs"])
+	meanDiscover := parseFloatOrZero(latency["BenchmarkBAPCaller_Discover"]["mean_ms"]) * 1000
+
+	// Tail-latency control.
+	if p50 > 0 && p99 > 0 {
+		ratio := p99 / p50
+		quality := "good"
+		if ratio > 5 {
+			quality = "poor"
+		} else if ratio > 3 {
+			quality = "moderate"
+		}
+		sb.WriteString(fmt.Sprintf(
+			"The adapter delivers a p50 latency of **%.0f µs** for the discover action. "+
+				"The p99/p50 ratio is **%.1f×**, indicating %s tail-latency control — "+
+				"spikes are %s relative to the median.\n\n",
+			p50, ratio, quality, tailDescription(ratio),
+		))
+	} else if meanDiscover > 0 {
+		sb.WriteString(fmt.Sprintf(
+			"The adapter delivers a mean latency of **%.0f µs** for the discover action. "+
+				"Run with `-bench=BenchmarkBAPCaller_Discover_Percentiles` to obtain p50/p95/p99 data.\n\n",
+			meanDiscover,
+		))
+	}
+
+	// Action complexity trend.
+	selectMS := parseFloatOrZero(latency["BenchmarkBAPCaller_AllActions/select"]["mean_ms"]) * 1000
+	initMS := parseFloatOrZero(latency["BenchmarkBAPCaller_AllActions/init"]["mean_ms"]) * 1000
+	confirmMS := parseFloatOrZero(latency["BenchmarkBAPCaller_AllActions/confirm"]["mean_ms"]) * 1000
+	if meanDiscover > 0 && selectMS > 0 && initMS > 0 && confirmMS > 0 {
+		sb.WriteString(fmt.Sprintf(
+			"Latency scales with payload complexity: select (+%.0f%%), init (+%.0f%%), confirm (+%.0f%%) "+
+				"vs the discover baseline. Allocation counts track proportionally, driven by JSON "+
+				"unmarshalling and schema validation of larger payloads.\n\n",
+			pctChange(meanDiscover, selectMS),
+			pctChange(meanDiscover, initMS),
+			pctChange(meanDiscover, confirmMS),
+		))
+	}
+
+	// Concurrency scaling.
+	lat1 := latencyAtCPU(throughput, "1")
+	lat16 := latencyAtCPU(throughput, "16")
+	if lat1 > 0 && lat16 > 0 {
+		improvement := lat1 / lat16
+		sb.WriteString(fmt.Sprintf(
+			"Concurrency scaling is effective: mean latency drops from **%.0f µs** at GOMAXPROCS=1 "+
+				"to **%.0f µs** at GOMAXPROCS=16 — a **%.1f× improvement**.",
+			lat1*1000, lat16*1000, improvement,
+		))
+		if improvement < 4 {
+			sb.WriteString(" Gains taper beyond 8 cores, suggesting a shared serialisation point " +
+				"(likely schema validation or key derivation).")
+		}
+		sb.WriteString("\n\n")
+	}
+
+	// Cache impact.
+	w := parseFloatOrZero(warmUS)
+	c := parseFloatOrZero(coldUS)
+	if w > 0 && c > 0 {
+		delta := math.Abs(w-c) / w * 100
+		if delta < 5 {
+			sb.WriteString(fmt.Sprintf(
+				"The Redis key-manager cache shows **no measurable impact** in this setup "+
+					"(warm vs cold delta: %.0f µs, %.1f%% of mean). "+
+					"miniredis is in-process; signing and schema validation dominate. "+
+					"Cache benefit would be visible with real Redis over a network.",
+				math.Abs(w-c), delta,
+			))
+		} else {
+			sb.WriteString(fmt.Sprintf(
+				"The Redis key-manager cache provides a **%.0f µs improvement** (%.1f%%) "+
+					"on the warm path vs cold.",
+				math.Abs(w-c), delta,
+			))
+		}
+		sb.WriteString("\n")
+	}
+
+	if sb.Len() == 0 {
+		return "_Insufficient data to generate interpretation. Ensure all benchmark scenarios completed successfully._"
+	}
+	return strings.TrimRight(sb.String(), "\n")
+}
+
+// buildRecommendation generates a sizing and tuning recommendation based on the
+// concurrency sweep results.
+func buildRecommendation(throughput []map[string]string) string {
+	if len(throughput) == 0 {
+		return "_Run the concurrency sweep to generate sizing recommendations._"
+	}
+
+	// Find the GOMAXPROCS level with best scaling efficiency (RPS gain per core).
+	type cpuPoint struct {
+		cpu int
+		rps float64
+		lat float64
+	}
+	var points []cpuPoint
+	for _, row := range throughput {
+		cpu := int(parseFloatOrZero(row["gomaxprocs"]))
+		rps := parseFloatOrZero(row["rps"])
+		lat := parseFloatOrZero(row["mean_latency_ms"]) * 1000
+		if cpu > 0 && lat > 0 {
+			points = append(points, cpuPoint{cpu, rps, lat})
+		}
+	}
+
+	if len(points) == 0 {
+		return "_Run the concurrency sweep (parallel_cpu*.txt) to generate sizing recommendations._"
+	}
+
+	// Find sweet spot: largest latency improvement per doubling of cores.
+	bestEffCPU := points[0].cpu
+	bestEff := 0.0
+	for i := 1; i < len(points); i++ {
+		if points[i-1].lat > 0 {
+			eff := (points[i-1].lat - points[i].lat) / points[i-1].lat
+			if eff > bestEff {
+				bestEff = eff
+				bestEffCPU = points[i].cpu
+			}
+		}
+	}
+
+	var sb strings.Builder
+	sb.WriteString(fmt.Sprintf(
+		"**%d cores** offers the best throughput/cost ratio based on the concurrency sweep — "+
+			"scaling efficiency begins to taper beyond this point.\n\n",
+		bestEffCPU,
+	))
+	sb.WriteString("The adapter is ready for staged load testing against a real BPP. " +
+		"For production sizing, start with the recommended core count above and adjust based " +
+		"on observed throughput targets. If schema validation dominates CPU (likely at high " +
+		"concurrency), profile with `go tool pprof` using the commands in B5 to isolate the bottleneck.")
+
+	return sb.String()
+}
+
+// ── Narrative helpers ──────────────────────────────────────────────────────────
+
+func tailDescription(ratio float64) string {
+	switch {
+	case ratio <= 2:
+		return "minimal"
+	case ratio <= 3:
+		return "modest"
+	case ratio <= 5:
+		return "noticeable"
+	default:
+		return "significant"
+	}
+}
+
+func pctChange(base, val float64) float64 {
+	if base == 0 {
+		return 0
+	}
+	return (val - base) / base * 100
+}
+
+func latencyAtCPU(throughput []map[string]string, cpu string) float64 {
+	for _, row := range throughput {
+		if row["gomaxprocs"] == cpu {
+			if v := parseFloatOrZero(row["mean_latency_ms"]); v > 0 {
+				return v
+			}
+		}
+	}
+	return 0
+}
--- a/benchmarks/tools/parse_results.go
+++ b/benchmarks/tools/parse_results.go
@@ -0,0 +1,256 @@
+// parse_results.go — Parses raw go test -bench output from the benchmark results
+// directory and produces two CSV files for analysis and reporting.
+//
+// Usage:
+//
+//	go run benchmarks/tools/parse_results.go \
+//	  -dir=benchmarks/results/<timestamp>/ \
+//	  -out=benchmarks/results/<timestamp>/
+//
+// Output files:
+//
+//	latency_report.csv    — per-benchmark mean, p50, p95, p99 latency, allocs
+//	throughput_report.csv — RPS and mean latency at each GOMAXPROCS level from the parallel sweep
+package main
+
+import (
+	"bufio"
+	"encoding/csv"
+	"flag"
+	"fmt"
+	"os"
+	"path/filepath"
+	"regexp"
+	"strconv"
+	"strings"
+)
+
+var (
+	// Matches the benchmark name and ns/op from a standard go test -bench output line.
+	// Go outputs custom metrics (p50_µs, req/s, …) BEFORE B/op and allocs/op, so we
+	// extract those fields with dedicated regexps rather than relying on positional groups.
+	//
+	// Example lines:
+	//   BenchmarkBAPCaller_Discover-10        73542  164193 ns/op  82913 B/op  662 allocs/op
+	//   BenchmarkBAPCaller_Discover_Percentiles-10  72849  164518 ns/op  130.0 p50_µs  144.0 p95_µs  317.0 p99_µs  82528 B/op  660 allocs/op
+	//   BenchmarkBAPCaller_RPS-4              700465  73466 ns/op  14356.0 req/s  80375 B/op  660 allocs/op
+	benchLineRe = regexp.MustCompile(`^(Benchmark\S+)\s+\d+\s+([\d.]+)\s+ns/op`)
+	bytesRe     = regexp.MustCompile(`([\d.]+)\s+B/op`)
+	allocsRe    = regexp.MustCompile(`([\d.]+)\s+allocs/op`)
+
+	// Extracts any custom metric value from a benchmark line.
+	metricRe = regexp.MustCompile(`([\d.]+)\s+(p50_µs|p95_µs|p99_µs|req/s)`)
+)
+
+type benchResult struct {
+	name     string
+	nsPerOp  float64
+	bytesOp  float64
+	allocsOp float64
+	p50      float64
+	p95      float64
+	p99      float64
+	rps      float64
+}
+
+// cpuResult pairs a GOMAXPROCS value with a benchmark result from the parallel sweep.
+type cpuResult struct {
+	cpu int
+	res benchResult
+}
+
+func main() {
+	dir := flag.String("dir", ".", "Directory containing benchmark result files")
+	out := flag.String("out", ".", "Output directory for CSV files")
+	flag.Parse()
+
+	if err := os.MkdirAll(*out, 0o755); err != nil {
+		fmt.Fprintf(os.Stderr, "ERROR creating output dir: %v\n", err)
+		os.Exit(1)
+	}
+
+	// ── Parse serial runs (run1.txt, run2.txt, run3.txt) ─────────────────────
+	var latencyResults []benchResult
+	for _, runFile := range []string{"run1.txt", "run2.txt", "run3.txt"} {
+		path := filepath.Join(*dir, runFile)
+		results, err := parseRunFile(path)
+		if err != nil {
+			fmt.Fprintf(os.Stderr, "WARNING: could not parse %s: %v\n", runFile, err)
+			continue
+		}
+		latencyResults = append(latencyResults, results...)
+	}
+
+	// Also parse percentiles file for p50/p95/p99.
+	percPath := filepath.Join(*dir, "percentiles.txt")
+	if percResults, err := parseRunFile(percPath); err == nil {
+		latencyResults = append(latencyResults, percResults...)
+	}
+
+	if err := writeLatencyCSV(filepath.Join(*out, "latency_report.csv"), latencyResults); err != nil {
+		fmt.Fprintf(os.Stderr, "ERROR writing latency CSV: %v\n", err)
+		os.Exit(1)
+	}
+	fmt.Printf("Written: %s\n", filepath.Join(*out, "latency_report.csv"))
+
+	// ── Parse parallel sweep (parallel_cpu*.txt) ──────────────────────────────
+	var throughputRows []cpuResult
+
+	for _, cpu := range []int{1, 2, 4, 8, 16} {
+		path := filepath.Join(*dir, fmt.Sprintf("parallel_cpu%d.txt", cpu))
+		results, err := parseRunFile(path)
+		if err != nil {
+			fmt.Fprintf(os.Stderr, "WARNING: could not parse parallel_cpu%d.txt: %v\n", cpu, err)
+			continue
+		}
+		for _, r := range results {
+			throughputRows = append(throughputRows, cpuResult{cpu: cpu, res: r})
+		}
+	}
+
+	if err := writeThroughputCSV(filepath.Join(*out, "throughput_report.csv"), throughputRows); err != nil {
+		fmt.Fprintf(os.Stderr, "ERROR writing throughput CSV: %v\n", err)
+		os.Exit(1)
+	}
+	fmt.Printf("Written: %s\n", filepath.Join(*out, "throughput_report.csv"))
+}
+
+// parseRunFile reads a go test -bench output file and returns all benchmark results.
+func parseRunFile(path string) ([]benchResult, error) {
+	f, err := os.Open(path)
+	if err != nil {
+		return nil, err
+	}
+	defer f.Close()
+
+	var results []benchResult
+
+	scanner := bufio.NewScanner(f)
+	for scanner.Scan() {
+		line := strings.TrimSpace(scanner.Text())
+
+		m := benchLineRe.FindStringSubmatch(line)
+		if m == nil {
+			continue
+		}
+
+		r := benchResult{name: stripCPUSuffix(m[1])}
+		r.nsPerOp = parseFloat(m[2])
+
+		// B/op and allocs/op — extracted independently because Go places custom
+		// metrics (p50_µs, req/s, …) between ns/op and B/op on the same line.
+		if bm := bytesRe.FindStringSubmatch(line); bm != nil {
+			r.bytesOp = parseFloat(bm[1])
+		}
+		if am := allocsRe.FindStringSubmatch(line); am != nil {
+			r.allocsOp = parseFloat(am[1])
+		}
+
+		// Custom metrics — scan the whole line regardless of position.
+		for _, mm := range metricRe.FindAllStringSubmatch(line, -1) {
+			switch mm[2] {
+			case "p50_µs":
+				r.p50 = parseFloat(mm[1])
+			case "p95_µs":
+				r.p95 = parseFloat(mm[1])
+			case "p99_µs":
+				r.p99 = parseFloat(mm[1])
+			case "req/s":
+				r.rps = parseFloat(mm[1])
+			}
+		}
+
+		results = append(results, r)
+	}
+	return results, scanner.Err()
+}
+
+func writeLatencyCSV(path string, results []benchResult) error {
+	f, err := os.Create(path)
+	if err != nil {
+		return err
+	}
+	defer f.Close()
+
+	w := csv.NewWriter(f)
+	defer w.Flush()
+
+	header := []string{"benchmark", "mean_ms", "p50_µs", "p95_µs", "p99_µs", "allocs_op", "bytes_op"}
+	if err := w.Write(header); err != nil {
+		return err
+	}
+
+	for _, r := range results {
+		row := []string{
+			r.name,
+			fmtFloat(r.nsPerOp / 1e6), // ns/op → ms
+			fmtFloat(r.p50),
+			fmtFloat(r.p95),
+			fmtFloat(r.p99),
+			fmtFloat(r.allocsOp),
+			fmtFloat(r.bytesOp),
+		}
+		if err := w.Write(row); err != nil {
+			return err
+		}
+	}
+	return nil
+}
+
+func writeThroughputCSV(path string, rows []cpuResult) error {
+	f, err := os.Create(path)
+	if err != nil {
+		return err
+	}
+	defer f.Close()
+
+	w := csv.NewWriter(f)
+	defer w.Flush()
+
+	// p95 latency is not available from the parallel sweep files — those benchmarks
+	// only emit ns/op and req/s. p95 data comes exclusively from
+	// BenchmarkBAPCaller_Discover_Percentiles, which runs at a single GOMAXPROCS
+	// setting and is not part of the concurrency sweep.
+	header := []string{"gomaxprocs", "benchmark", "rps", "mean_latency_ms"}
+	if err := w.Write(header); err != nil {
+		return err
+	}
+
+	for _, row := range rows {
+		r := []string{
+			strconv.Itoa(row.cpu),
+			row.res.name,
+			fmtFloat(row.res.rps),
+			fmtFloat(row.res.nsPerOp / 1e6),
+		}
+		if err := w.Write(r); err != nil {
+			return err
+		}
+	}
+	return nil
+}
+
+// stripCPUSuffix removes trailing "-N" goroutine count suffixes from benchmark names.
+func stripCPUSuffix(name string) string {
+	if idx := strings.LastIndex(name, "-"); idx > 0 {
+		if _, err := strconv.Atoi(name[idx+1:]); err == nil {
+			return name[:idx]
+		}
+	}
+	return name
+}
+
+func parseFloat(s string) float64 {
+	if s == "" {
+		return 0
+	}
+	v, _ := strconv.ParseFloat(s, 64)
+	return v
+}
+
+func fmtFloat(v float64) string {
+	if v == 0 {
+		return ""
+	}
+	return strconv.FormatFloat(v, 'f', 3, 64)
+}
--- a/go.mod
+++ b/go.mod
@@ -4,7 +4,7 @@ go 1.24.6

 require (
 	github.com/santhosh-tekuri/jsonschema/v6 v6.0.1
-	golang.org/x/crypto v0.47.0
+	golang.org/x/crypto v0.49.0
 )

 require github.com/stretchr/testify v1.11.1
@@ -19,9 +19,12 @@ require (

 require github.com/zenazn/pkcs7pad v0.0.0-20170308005700-253a5b1f0e03

-require golang.org/x/text v0.33.0 // indirect
+tool golang.org/x/perf/cmd/benchstat
+
+require golang.org/x/text v0.35.0 // indirect

 require (
+	github.com/aclements/go-moremath v0.0.0-20210112150236-f10218a38794 // indirect
 	github.com/agnivade/levenshtein v1.2.1 // indirect
 	github.com/beorn7/perks v1.0.1 // indirect
 	github.com/cenkalti/backoff/v4 v4.3.0 // indirect
@@ -82,9 +85,10 @@ require (
 	go.opentelemetry.io/proto/otlp v1.9.0 // indirect
 	go.yaml.in/yaml/v2 v2.4.2 // indirect
 	go.yaml.in/yaml/v3 v3.0.4 // indirect
-	golang.org/x/net v0.49.0 // indirect
-	golang.org/x/sync v0.19.0 // indirect
-	golang.org/x/sys v0.40.0 // indirect
+	golang.org/x/net v0.52.0 // indirect
+	golang.org/x/perf v0.0.0-20260312031701-16a31bc5fbd0 // indirect
+	golang.org/x/sync v0.20.0 // indirect
+	golang.org/x/sys v0.42.0 // indirect
 	golang.org/x/time v0.14.0 // indirect
 	google.golang.org/genproto/googleapis/api v0.0.0-20260128011058-8636f8732409 // indirect
 	google.golang.org/genproto/googleapis/rpc v0.0.0-20260128011058-8636f8732409 // indirect
--- a/go.sum
+++ b/go.sum
@@ -1,3 +1,5 @@
+github.com/aclements/go-moremath v0.0.0-20210112150236-f10218a38794 h1:xlwdaKcTNVW4PtpQb8aKA4Pjy0CdJHEqvFbAnvR5m2g=
+github.com/aclements/go-moremath v0.0.0-20210112150236-f10218a38794/go.mod h1:7e+I0LQFUI9AXWxOfsQROs9xPhoJtbsyWcjJqDd4KPY=
 github.com/agnivade/levenshtein v1.2.1 h1:EHBY3UOn1gwdy/VbFwgo4cxecRznFk7fKWN1KOX7eoM=
 github.com/agnivade/levenshtein v1.2.1/go.mod h1:QVVI16kDrtSuwcpd0p1+xMC6Z/VfhtCyDIjcwga4/DU=
 github.com/andreyvit/diff v0.0.0-20170406064948-c7f18ee00883 h1:bvNMNQO63//z+xNgfBlViaCIJKLlCJ6/fmUseuG0wVQ=
@@ -274,26 +276,28 @@ go.yaml.in/yaml/v2 v2.4.2 h1:DzmwEr2rDGHl7lsFgAHxmNz/1NlQ7xLIrlN2h5d1eGI=
 go.yaml.in/yaml/v2 v2.4.2/go.mod h1:081UH+NErpNdqlCXm3TtEran0rJZGxAYx9hb/ELlsPU=
 go.yaml.in/yaml/v3 v3.0.4 h1:tfq32ie2Jv2UxXFdLJdh3jXuOzWiL1fo0bu/FbuKpbc=
 go.yaml.in/yaml/v3 v3.0.4/go.mod h1:DhzuOOF2ATzADvBadXxruRBLzYTpT36CKvDb3+aBEFg=
-golang.org/x/crypto v0.47.0 h1:V6e3FRj+n4dbpw86FJ8Fv7XVOql7TEwpHapKoMJ/GO8=
-golang.org/x/crypto v0.47.0/go.mod h1:ff3Y9VzzKbwSSEzWqJsJVBnWmRwRSHt/6Op5n9bQc4A=
-golang.org/x/mod v0.31.0 h1:HaW9xtz0+kOcWKwli0ZXy79Ix+UW/vOfmWI5QVd2tgI=
-golang.org/x/mod v0.31.0/go.mod h1:43JraMp9cGx1Rx3AqioxrbrhNsLl2l/iNAvuBkrezpg=
-golang.org/x/net v0.49.0 h1:eeHFmOGUTtaaPSGNmjBKpbng9MulQsJURQUAfUwY++o=
-golang.org/x/net v0.49.0/go.mod h1:/ysNB2EvaqvesRkuLAyjI1ycPZlQHM3q01F02UY/MV8=
-golang.org/x/sync v0.19.0 h1:vV+1eWNmZ5geRlYjzm2adRgW2/mcpevXNg50YZtPCE4=
-golang.org/x/sync v0.19.0/go.mod h1:9KTHXmSnoGruLpwFjVSX0lNNA75CykiMECbovNTZqGI=
+golang.org/x/crypto v0.49.0 h1:+Ng2ULVvLHnJ/ZFEq4KdcDd/cfjrrjjNSXNzxg0Y4U4=
+golang.org/x/crypto v0.49.0/go.mod h1:ErX4dUh2UM+CFYiXZRTcMpEcN8b/1gxEuv3nODoYtCA=
+golang.org/x/mod v0.33.0 h1:tHFzIWbBifEmbwtGz65eaWyGiGZatSrT9prnU8DbVL8=
+golang.org/x/mod v0.33.0/go.mod h1:swjeQEj+6r7fODbD2cqrnje9PnziFuw4bmLbBZFrQ5w=
+golang.org/x/net v0.52.0 h1:He/TN1l0e4mmR3QqHMT2Xab3Aj3L9qjbhRm78/6jrW0=
+golang.org/x/net v0.52.0/go.mod h1:R1MAz7uMZxVMualyPXb+VaqGSa3LIaUqk0eEt3w36Sw=
+golang.org/x/perf v0.0.0-20260312031701-16a31bc5fbd0 h1:VgUwdbeBqkERh4BX46p4O2fSng7duMS+0V01EEAt2Vk=
+golang.org/x/perf v0.0.0-20260312031701-16a31bc5fbd0/go.mod h1:UWOuhEKaiVtLW8tca1eEwpuNy4tzUubUXNAnA51k48o=
+golang.org/x/sync v0.20.0 h1:e0PTpb7pjO8GAtTs2dQ6jYa5BWYlMuX047Dco/pItO4=
+golang.org/x/sync v0.20.0/go.mod h1:9xrNwdLfx4jkKbNva9FpL6vEN7evnE43NNNJQ2LF3+0=
 golang.org/x/sys v0.0.0-20180823144017-11551d06cbcc/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
 golang.org/x/sys v0.0.0-20220811171246-fbc7d0a398ab/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
 golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
 golang.org/x/sys v0.12.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
-golang.org/x/sys v0.40.0 h1:DBZZqJ2Rkml6QMQsZywtnjnnGvHza6BTfYFWY9kjEWQ=
-golang.org/x/sys v0.40.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks=
-golang.org/x/text v0.33.0 h1:B3njUFyqtHDUI5jMn1YIr5B0IE2U0qck04r6d4KPAxE=
-golang.org/x/text v0.33.0/go.mod h1:LuMebE6+rBincTi9+xWTY8TztLzKHc/9C1uBCG27+q8=
+golang.org/x/sys v0.42.0 h1:omrd2nAlyT5ESRdCLYdm3+fMfNFE/+Rf4bDIQImRJeo=
+golang.org/x/sys v0.42.0/go.mod h1:4GL1E5IUh+htKOUEOaiffhrAeqysfVGipDYzABqnCmw=
+golang.org/x/text v0.35.0 h1:JOVx6vVDFokkpaq1AEptVzLTpDe9KGpj5tR4/X+ybL8=
+golang.org/x/text v0.35.0/go.mod h1:khi/HExzZJ2pGnjenulevKNX1W67CUy0AsXcNubPGCA=
 golang.org/x/time v0.14.0 h1:MRx4UaLrDotUKUdCIqzPC48t1Y9hANFKIRpNx+Te8PI=
 golang.org/x/time v0.14.0/go.mod h1:eL/Oa2bBBK0TkX57Fyni+NgnyQQN4LitPmob2Hjnqw4=
-golang.org/x/tools v0.40.0 h1:yLkxfA+Qnul4cs9QA3KnlFu0lVmd8JJfoq+E41uSutA=
-golang.org/x/tools v0.40.0/go.mod h1:Ik/tzLRlbscWpqqMRjyWYDisX8bG13FrdXp3o4Sr9lc=
+golang.org/x/tools v0.42.0 h1:uNgphsn75Tdz5Ji2q36v/nsFSfR/9BRFvqhGBaJGd5k=
+golang.org/x/tools v0.42.0/go.mod h1:Ma6lCIwGZvHK6XtgbswSoWroEkhugApmsXyrUmBhfr0=
 gonum.org/v1/gonum v0.16.0 h1:5+ul4Swaf3ESvrOnidPp4GZbzf0mxVQpDCYUQE7OJfk=
 gonum.org/v1/gonum v0.16.0/go.mod h1:fef3am4MQ93R2HHpKnLk4/Tbh/s0+wqD5nfa6Pnwy4E=
 google.golang.org/genproto/googleapis/api v0.0.0-20260128011058-8636f8732409 h1:merA0rdPeUV3YIIfHHcH4qBkiQAc1nfCKSI7lB4cV2M=