MatMul Proof-of-Work

Specification of the BTX MatMul proof-of-work consensus algorithm: finite-field matrix multiplication, transcript compression, Freivalds verification, and DoS mitigation.

Updated: 2026-03-30 Edit on GitHub View as Markdown

Mine Context

This reference is part of the BTX operator rollout.

Use the Mine microsite to reconnect this page to the broader operator thesis, AI-infrastructure framing, and mining launch kit.

Current Track

Back to Mine

Return to the operator launch kit and the higher-level mining narrative around this guide.

Open Mine Adjacent Track

See Develop Too

BTX also exposes a builder-facing path for agentic apps, challenge flows, and PQ identity.

Open Develop

Overview

BTX replaces traditional hash-based proof-of-work with MatMul PoW, an AI-native proof-of-work system based on matrix multiplication over a finite field. The protocol, derived from the cuPOW construction (Komargodski, Schen, Weinstein 2025), achieves a multiplicative overhead of only ~16.5% above a bare matrix multiply at the production parameters. Miners perform real linear-algebra computation rather than brute-force hashing, making the work inherently useful as a GPU compute benchmark and a stepping stone toward externally useful matrix workloads in a future v2.

Consensus Parameters

Parameter	Symbol	Value	Role
Matrix dimension	n	512	Compute cost O(n³); ~134M field multiplications per attempt
Transcript block size	b	16	Hashing granularity; (n/b)³ = 32,768 intermediates per attempt
Noise rank	r	8	Security parameter; denoising cost O(n²r)
Field modulus	q	2³¹ − 1 (2,147,483,647)	Mersenne prime M31; GPU-friendly int32 arithmetic
Pre-hash epsilon bits	ε	10 (18 after height 55,000)	Sigma gate: target « ε before expensive MatMul path
Freivalds rounds	k	2	False-positive probability < 2⁻⁶²
Validation window	—	1,000 blocks	Depth beyond which full Phase 2 revalidation is not required
Block time	—	90 s (steady-state); 250 ms (fast-mine, h < 50,000)	Target spacing for difficulty adjustment

Note: b (transcript block size) and r (noise rank) are independent parameters. Changing b affects transcript size and cache behavior; changing r affects security conjecture strength and denoising overhead.

Finite-Field Arithmetic (F_M31)

All arithmetic operates in the Mersenne prime field F_q where q = 2³¹ − 1. This prime was chosen for GPU friendliness: reduction is a single add-and-mask on int32 hardware, and the field supports native Apple Metal and CUDA int32 pipelines.

Reduction

The reduce64 function uses a double Mersenne fold, proven safe for all uint64 inputs (the single-fold variant is buggy for x ≥ 2⁶²). The dot() accumulator is the only sanctioned path, applying per-step reduction with a length-independent safety proof.

Matrix Generation from Seeds

Matrices A and B are deterministically expanded from 32-byte seeds carried in the block header. The expansion uses SHA-256 as a PRF with LE32 counters and rejection sampling for uniform distribution mod M31. Domain separation tags prevent cross-component collisions:

matmul_noise_EL_v1 — left noise factor E_L
matmul_noise_ER_v1 — right noise factor E_R
matmul_noise_FL_v1 — left noise factor F_L
matmul_noise_FR_v1 — right noise factor F_R
matmul_compress_v1 — compression vector σ

Mining Algorithm (Solve)

The mining loop for a single nonce attempt proceeds as follows:

Sigma gate (pre-hash): Compute a cheap SHA-256 pre-hash from the candidate header. If the pre-hash does not satisfy target « ε, skip the expensive MatMul entirely. This filters out ~99.9% of nonces at ε=10.
Noise generation: Derive four low-rank factor matrices from the oracle seed σ: E_L ∈ F_q^n×r, E_R ∈ F_q^r×n, F_L ∈ F_q^n×r, F_R ∈ F_q^r×n. Form E = E_L·E_R and F = F_L·F_R (each rank ≤ r).
Noisy MatMul: Compute C' = (A + E)·(B + F) using the canonical block decomposition with block size b=16. The computation iterates over (n/b)³ = 32,768 intermediate b×b partial-sum blocks in a fixed order (i, j, l).
Transcript compression: Each intermediate block is compressed to a single field element via a random inner-product derived from σ. The compressed elements are fed into a rolling SHA-256d hasher, reducing hashed data from ~33.5 MB (naive) to ~131 KB.
Difficulty check: If H(compressed transcript) < target, the nonce is valid. Denoise: C = C' − A·F − E·(B + F). Emit the block with matmul_digest, seed_a, seed_b, and the product matrix C'.

Transcript Compression

Naive full-block hashing would require SHA-256 over ~33.5 MB per nonce attempt, adding 30–100% overhead on GPU miners. BTX uses field-algebraic inner-product compression to reduce this by ~256×:

Approach	Bytes Hashed	SHA-256 Time (SHA-NI)	Overhead vs MatMul
Naive full-block	33.5 MB	~67 ms	30–100%
Field-algebraic compression	~131 KB	~0.26 ms	< 1%

Security is preserved: the random inner-product family is pairwise independent, so forging a different b×b block that produces the same compressed element requires guessing a field element (probability 1/q ≈ 2⁻³¹ per intermediate). A single σ-derived compression vector is reused across all 32,768 intermediates; Carter-Wegman per-intermediate binding plus σ-unpredictability makes per-intermediate vectors unnecessary.

On GPU architectures, the compression dot-product is fused into the matmul kernel to avoid the ~33.5 MiB data transfer penalty. Compressed elements (~128 KiB total) are written to a ring buffer and consumed by a CPU SHA-256 thread.

Verification (Freivalds' Algorithm)

Full verification requires recomputing the entire O(n³) transcript — the dominant engineering constraint. BTX uses a two-phase validation model:

Phase 1: Header Checks (O(1))

Validate header format, timestamp, and nBits target.
Verify matmul_digest matches the claimed SHA-256d of the compressed transcript.
Check proof-of-work: H(header) < target.

Phase 2: Transcript Recomputation (O(n³))

Reconstruct A, B from seeds. Regenerate noise. Recompute the canonical block MatMul.
Verify the compressed transcript matches matmul_digest.
Apply Freivalds' algorithm as a fast product check: verify (A+E)·((B+F)·r) = C'·r for k=2 random vectors r. Each round has false-positive probability 1/(2³¹−1); with k=2, error < 2⁻⁶².

Graduated Peer Punishment

Phase 1 pass followed by Phase 2 fail triggers graduated response: disconnect → discourage → ban (after 3rd offense in 24h). The fMatMulStrictPunishment flag enables immediate bans for post-stabilization networks.

DoS Mitigation

Phase 2 verification at n=512 costs approximately 5 ms per block. Rate limits bound CPU exposure:

Per-peer budget: 32 Phase 2 verifications per minute per peer.
Global budget: 512 Phase 2 verifications per minute across all peers (~2.56 s CPU/min).
Validation window: Blocks deeper than 1,000 from the tip skip Phase 2 (assumevalid model for IBD).
Fast-phase scheduling: During the 250 ms fast-mine phase, Phase 2 is deferred to a bounded queue that must drain completely after the transition to 90 s blocks.

Block Header Fields

The MatMul block header extends the standard Bitcoin header with:

Field	Size	Description
`nNonce64`	8 bytes	64-bit mining nonce
`matmul_digest`	32 bytes	SHA-256d of compressed transcript
`matmul_dim`	2 bytes	Matrix dimension (512)
`seed_a`	32 bytes	Deterministic seed for matrix A
`seed_b`	32 bytes	Deterministic seed for matrix B

Block size with seed-derived matrices is approximately 200 KiB, compared to ~2.2 MiB if full matrices were included. All serialization uses little-endian byte order; big-endian architectures are unsupported.

Overhead Analysis

At production parameters (n=512, b=16, r=8), total protocol overhead is ~16.5% above a bare n×n matrix multiply:

Component	Cost	% of Baseline
Noisy MatMul (A+E)·(B+F)	134M muls	baseline
Transcript compression dot-products	8.4M muls	6.3%
Denoising (rank-r factor products)	8.4M muls	6.3%
Noise generation + matrix additions	4.7M ops	3.5%
Rolling SHA-256d on compressed elements	131 KB hashed	< 0.1%
Total overhead	~22M muls	~16.5%

On GPUs where the compression dot-product is fused into the matmul kernel, measured total overhead drops to ~10–12%.

Difficulty Adjustment Interaction

MatMul PoW difficulty is governed by the ASERT algorithm (see the Difficulty Adjustment specification). The sigma pre-hash gate interacts with the difficulty target: at higher difficulty, the pre-hash filter rejects more nonces before the expensive MatMul path, keeping average GPU utilization efficient. The pre-hash epsilon upgrade at height 55,000 (from 10 to 18 bits) further reduces wasted compute as the chain matures.

GPU Optimization

The field choice (M31 = 2³¹ − 1) was selected specifically for GPU friendliness. Key optimization points:

M31 reduction is a single add-and-mask on int32 ALUs, native on Apple Metal and CUDA.
The compression dot-product must be fused into the matmul kernel to avoid a ~33.5 MiB device-to-host transfer per attempt.
Compressed elements (~128 KiB) are written to a ring buffer consumed asynchronously by a CPU SHA-256 thread.
The 512×512 working set (~2 MiB) fits comfortably in GPU L2 cache on modern hardware.
Denoising runs only once per found block (not per attempt), so its O(n²r) cost is negligible in practice.

Security Properties

Completeness: An honest miner performing the full MatMul is accepted with probability ε per attempt, determined by the difficulty target.
Hardness: Under the direct-product conjecture for random rank-r matrix products, no adversary gains meaningful advantage with less than a full matrix multiplication.
Collision resistance: Per-intermediate collision probability is 2⁻³¹; per-block union bound gives ≤ 2⁻¹⁶. A dual-projection upgrade path exists for 2⁻⁶² per intermediate if future risk assessment warrants.
ASIC resistance: ASICs are not cryptographically impossible but are economically misaligned with commodity AI/GPU hardware. The MatMul workload is identical to standard GEMM operations that AI accelerators are optimized for.

Monetary Policy

Parameter	Value
Total supply	21,000,000 BTX
Initial block subsidy	20 BTX
Halving interval	525,000 blocks
Fast-mine emission (blocks 0–49,999)	1,000,000 BTX (4.76% of supply)

The issuance follows an infinite geometric series: 20 × 525,000 × 2 = 21M. MoneyRange() and GetBlockSubsidy() enforce the hard cap at consensus level.