# l402-train — Full Content > This file contains the complete text of all documents on l402-train.ai for agent consumption. > For the summary index, see llms.txt --- ## Whitepaper # Lightning-Coordinated Decentralized AI Training **A protocol for permissionless machine learning using Bitcoin's Lightning Network as the coordination, payment, and incentive layer.** --- ## Abstract We propose a protocol for decentralized AI model training that replaces token-based coordination (Bittensor TAO, custom governance tokens) with Bitcoin Lightning Network micropayments. Participants contribute GPU compute to distributed training runs and receive payment proportional to their measured contribution quality — settled in seconds, denominated in sats or USDT via Taproot Assets, with no staking requirements, no token governance, and no identity verification. The protocol combines three proven components: (1) SparseLoCo gradient compression enabling training over commodity internet at 146x compression, (2) L402 HTTP payment gating for gradient exchange, and (3) hold-invoice conditional payments that release funds only upon validator attestation of gradient quality. We address the semi-centralized coordinator problem — the primary trust assumption — through a layered mitigation strategy combining deterministic validation replay, DLC-bound payment settlement, and federated multi-validator consensus that reduces trust requirements to "at least one of N validators is honest." We extend the protocol to decentralized autoresearch — autonomous optimization bounties where AI agents compete to improve quantifiable metrics, paid per validated improvement. --- ## 1. Introduction ### 1.1 The Coordination Problem Training large language models requires coordinating hundreds of GPUs for weeks. This has historically required centralized clusters with high-bandwidth interconnects (400 Gb/s InfiniBand). Recent work (DiLoCo, SparseLoCo, Covenant-72B) has proven that training can occur over commodity internet with dramatically compressed communication — but coordination and incentive design remain unsolved. ### 1.2 The Token Problem Current decentralized training systems (Bittensor, PrimeIntellect) use custom tokens for incentives. This introduces: - **Price volatility** affecting miner economics and participation stability - **Governance overhead** — token holders control subnet parameters, creating political dynamics - **Wealth concentration** — Bittensor analysis shows top 1% of wallets control median 89.8% of stake, with miner performance-to-reward correlation of only r=0.10-0.30 - **Regulatory uncertainty** — utility token classification varies by jurisdiction - **Barrier to entry** — staking requirements exclude casual participants ### 1.3 The Lightning Alternative Bitcoin's Lightning Network provides: - **Instant settlement** (~182ms average, <500ms multi-hop) - **Negligible fees** (median 63 ppm, ~$0.001 for sub-$100 payments) - **No identity requirements** — permissionless participation - **Programmable payments** — hold invoices, PTLCs, DLCs for conditional settlement - **Proven scale** — 5,606 BTC capacity ($490M), $1.17B monthly volume, 99.7% success rate - **Denomination stability** — USDT on Lightning (Taproot Assets) available since Jan 2025 for those preferring fiat-denominated payments --- ## 2. Background ### 2.1 Decentralized Training Methods Survey of the progression from DiLoCo (2023, 500x communication reduction) through INTELLECT-1 (2024, 10B, whitelisted) to Covenant-72B (2026, 72B, permissionless). Key technical components: local SGD, gradient compression, error feedback, asynchronous aggregation. ### 2.2 SparseLoCo Detailed description of the compression pipeline: 30 local steps → top-k sparsification (1.56% density) → 2-bit quantization → error feedback (decay=0.95). Result: 146x compression, 94.5% compute utilization, 70-second communication rounds over 500 Mbps internet. ### 2.3 Lightning Network & L402 Overview of Lightning payment channels, HTLCs, the L402 HTTP authentication protocol, and hold invoices. How L402 turns any HTTP endpoint into a paid API with sub-second settlement. ### 2.4 Limitations of Token-Based Coordination Analysis of Bittensor's incentive failures: stake-to-reward correlation (r=0.80-0.95) vs performance-to-reward (r=0.10-0.30), 51% attack vulnerability in majority of subnets, emission-based rewards disconnected from market value of compute. --- ## 3. Protocol Design ### 3.1 Architecture Overview ``` ┌─────────────────────────────────────────────────────────┐ │ COORDINATOR │ │ │ │ ┌──────────┐ ┌──────────┐ ┌────────────────────┐ │ │ │ Aperture │ │ Gradient │ │ Validation Oracle │ │ │ │ (L402 │ │ Store │ │ (Gauntlet-style │ │ │ │ Proxy) │ │ (R2/S3) │ │ loss scoring) │ │ │ └────┬─────┘ └────┬─────┘ └────────┬───────────┘ │ │ │ │ │ │ │ ┌────┴──────────────┴──────────────────┴───────────┐ │ │ │ Lightning Node (LND) │ │ │ │ Hold invoices + payment routing │ │ │ └───────────────────┬───────────────────────────────┘ │ └──────────────────────┼──────────────────────────────────┘ │ Lightning Network ┌────────────┼────────────────────┐ │ │ │ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ │ Peer 1 │ │ Peer 2 │ ... │ Peer N │ │ 8xGPU │ │ 4xGPU │ │ 8xGPU │ │ LN Node │ │ LN Node │ │ LN Node │ └─────────┘ └─────────┘ └─────────┘ ``` ### 3.2 Gradient Exchange Protocol 1. **Peer trains locally** for K steps (K=30 in reference implementation) 2. **Peer compresses** pseudo-gradient via SparseLoCo 3. **Peer uploads** compressed gradient to coordinator via L402-gated HTTP PUT - Peer pays small submission fee (anti-spam, covers storage) - Coordinator issues hold invoice (payment held pending validation) 4. **Coordinator validates** gradient quality: - Forward pass on validation batch before and after applying gradient - Loss reduction score computed - Assigned-vs-unassigned data check (catches plagiarism) - Norm calibration 5. **On validation pass**: coordinator settles hold invoice — peer receives reward proportional to quality score 6. **On validation fail**: hold invoice expires — funds return to peer automatically 7. **Coordinator aggregates** validated gradients and publishes updated model checkpoint 8. **Peers download** new checkpoint and resume local training ### 3.3 Payment Mechanics #### Submission Fee (Peer → Coordinator) - Small fixed fee per gradient submission (anti-spam, covers validation compute + storage) - Paid via standard L402 on the upload endpoint - Suggested: 100-1,000 sats (~$0.10-$1.00) #### Quality Reward (Coordinator → Peer) - Hold invoice issued at gradient upload time - Amount determined by coordinator's posted reward schedule - Settlement conditional on validation oracle attestation - Reward proportional to measured loss reduction (not flat rate) - Formula: `reward = base_rate × quality_score × normalization_factor` #### Channel Design for 70 Peers - Pre-opened direct channels between coordinator and each peer - Bidirectional flow (fees in, rewards out) naturally rebalances - At 1,000 sats/submission + 10,000 sats average reward per round: - ~9M sats/week capacity per channel - All 70 payments settle in ~13 seconds (well within 70-second round window) ### 3.4 Coordinator Trust Model The coordinator is the primary trust assumption in this protocol. Rather than claiming full trustlessness — which remains an open problem for gradient validation at scale — we constrain the coordinator's power through layered mitigations that make misbehavior detectable, unprofitable, and recoverable. #### 3.4.1 Attack Surface Analysis The coordinator performs four trusted functions: gradient validation, payment settlement, checkpoint publication, and data partitioning. A malicious coordinator could attempt: | Attack | Impact | Detectability | |--------|--------|---------------| | Selective censorship (reject valid gradients) | Suppresses specific peers | **High** — deterministic replay proves the gradient was valid | | Front-running (copy technique, reject original) | Steals intellectual contribution | **Medium** — requires semantic analysis of gradient similarity | | Checkpoint poisoning (publish degraded model) | Degrades training for all peers | **High** — any peer can verify checkpoint quality on public eval | | Payment withholding (let hold invoices expire) | Steals labor (but not funds) | **High** — hold invoices auto-refund; peers see non-settlement | | Validation set leakage (share with favored peers) | Unfair advantage | **Low** — hard to detect without canary probes | Critically, the coordinator **cannot steal funds** — hold invoices auto-refund on timeout. The worst-case attack is labor theft (accept gradient, refuse payment), which is immediately visible to the affected peer and destroys coordinator reputation. #### 3.4.2 Layered Mitigation Strategy **Layer 1 — Deterministic Replay (Accountability)** Loss evaluation is a pure function: `f(model_checkpoint, gradient, validation_data) → loss_score`. If the model checkpoint, validation data, and evaluation code are published, any party can independently replay the computation and verify the coordinator's scoring. A coordinator that rejects a valid gradient or accepts an invalid one is provably dishonest. This does not prevent misbehavior, but it makes misbehavior **cryptographically provable** — a property that token-based systems like Bittensor lack, where subnet owner scoring is opaque. **Layer 2 — DLC-Bound Payment Settlement (Cryptographic Constraint)** Discreet Log Contracts (DLCs) bind payment settlement to an oracle-signed attestation of the loss score. The coordinator cannot settle a hold invoice without producing a valid oracle signature over the actual computed loss. This removes the coordinator's ability to lie about validation results — the payment math is enforced by Bitcoin script, not coordinator honesty. DLCs are production-ready Bitcoin primitives (not experimental). Combined with deterministic replay, they ensure that: (a) the coordinator must publish a loss score to settle payment, and (b) anyone can verify that score is correct. **Layer 3 — Federated Multi-Validator Consensus (Decentralized Trust)** Multiple independent validators evaluate each gradient submission. Payment requires majority attestation (e.g., 3-of-5 validators agree on loss score within tolerance). Validators are: - Selected per-round from a pool (rotation prevents collusion) - Paid via Lightning for their validation compute - Subject to the same deterministic replay accountability This reduces the trust assumption from "trust the coordinator" to "trust that at least one of N validators is honest" — the same security model used by most blockchain systems. **Layer 4 — Market Competition (Economic Constraint)** The gradient exchange protocol is open. If a coordinator misbehaves, peers can migrate to a competing coordinator running the same protocol. The switching cost is low: download the latest checkpoint (public), open channels to the new coordinator, resume training. This creates economic pressure for honest behavior — a coordinator that censors or cheats loses its peer network and revenue. #### 3.4.3 Trust Comparison | System | Trust Assumption | Transparency | Switching Cost | |--------|-----------------|--------------|----------------| | Centralized (AWS/Azure) | Full trust in employer | None | Employment contract | | Bittensor | Trust stake-weighted validators | Opaque scoring | Token lock-in + staking | | **Lightning Protocol** | **≥1 of N validators honest** | **Full deterministic replay** | **Low (open protocol)** | The coordinator role in this protocol is constrained, auditable, and replaceable — not an unchecked central authority. This is a meaningfully different trust model than both centralized training and stake-weighted token systems. #### 3.4.4 Remaining Open Problems Fully trustless gradient validation — where no trusted party is required at all — remains unsolved. Potential future approaches include zero-knowledge proofs for forward-pass computation (currently prohibitive at 72B scale) and trusted execution environments (TEEs) for validation. We consider the federated validator model sufficient for practical deployment while these research directions mature. ### 3.5 Heterogeneous Participation Following Heterogeneous SparseLoCo, peers with fewer GPUs can participate: - Peers grouped into "virtual replicas" via pipeline parallelism - Activation compression between pipeline stages - Payment distributed proportionally within the group - Lowers hardware barrier from 8xB200 (~$200K) to potentially 1-2 GPUs --- ## 4. Decentralized Autoresearch ### 4.1 The Autoresearch Pattern The autoresearch pattern (Karpathy, March 2026): an AI agent iteratively edits code, evaluates against a metric, keeps improvements, discards regressions. We extend this to a decentralized bounty market. ### 4.2 Bounty Protocol 1. **Sponsor** publishes: - Target: mutable file(s) to optimize - Metric: quantifiable score + eval command - Bounty: total sats available + payment schedule - Rules: constraints, held-out eval set hash, deadline 2. **Agents** (running on any hardware) compete: - Download baseline + eval framework - Run autonomous experiments - Submit improvements: code diff + claimed score 3. **Validation**: - Coordinator runs submitted diff against held-out eval set (not the public one) - Checks for metric gaming (canary inputs, distribution shift, temporal stability) - 80% payment on primary eval, 20% holdback on held-out eval 4. **Payment via Lightning**: - Hold invoice created at submission - Settled proportional to improvement magnitude - Formula: `payment = base_bounty × (score_improvement / target_improvement)` - Bonus for exceeding target; minimum threshold to qualify ### 4.3 Anti-Gaming Measures - **Held-out evaluation**: Sponsor evaluates on secret data not available to agents - **Multi-metric composite**: Require improvement across multiple metrics simultaneously - **Canary detection**: Embed known-answer probes in eval set - **Temporal stability**: Re-evaluate after delay to catch overfitting - **Semantic review**: Top-N improvements reviewed by sponsor before final payment release - **Proportional payment**: Goodhart gaming pays less than genuine improvement --- ## 5. Economics ### 5.1 Cost Comparison | Model | Coordinator Cost | Peer Incentive | Settlement | Identity | |-------|-----------------|----------------|------------|----------| | Centralized (Azure/AWS) | $5-10M for 70B | Salary/contract | Net-30+ | Full KYC | | Bittensor | TAO emissions | TAO tokens | ~12s consensus | Wallet only | | **Lightning Protocol** | Sats per gradient | Sats per quality | **<500ms** | **None** | ### 5.2 Pricing Model Market-driven pricing: - Coordinator posts reward schedule (sats per unit of loss reduction) - Peers self-select based on their hardware costs and expected rewards - Natural equilibrium: reward > cost of compute → more peers join → competition increases → quality improves - No token issuance, no inflation schedule, no governance votes ### 5.3 Validation Economics Validation is the coordinator's largest operational cost. Each peer submission requires a forward pass on the validation batch (before and after applying the gradient) to compute loss reduction. With 70 peers submitting every 70 seconds, this is substantial. **Scaling strategies:** - **Sampling-based validation**: Evaluate a random subset of the validation batch per round rather than the full set. Reduces compute proportionally while maintaining statistical confidence. A 10% sample with 70 peers still provides robust scoring with 95% confidence intervals. - **Amortized forward passes**: The "before" forward pass is shared across all submissions in a round — only compute it once per round, not once per peer. This cuts validation cost nearly in half. - **Staggered validation**: Not all 70 peers submit simultaneously. Spread validation across the 70-second window, smoothing GPU utilization. - **Validator pools**: Distribute validation across multiple paid validators (Layer 3 in §3.4.2). Each validator handles a subset of submissions. Validators are paid from submission fees, creating a self-sustaining market for validation compute. **Cost estimate**: At 70 peers with amortized forward passes and 10% sampling, coordinator validation requires approximately 1-2 GPUs dedicated to scoring — roughly 2-3% of total network compute, well within the 5-15% overhead budget funded by submission fees. ### 5.4 Channel Liquidity The protocol requires the coordinator to maintain payment channels with each peer. At 10,000 sats average reward per round with 70-second rounds, each channel needs approximately 9M sats/week of throughput capacity. **Liquidity optimizations:** - **Bidirectional flow**: Submission fees flow peer→coordinator while rewards flow coordinator→peer. This naturally rebalances channels, reducing the locked capital requirement. - **Just-in-time channels**: Lightning Service Providers (LSPs) can open channels on demand when new peers join, eliminating the need for the coordinator to pre-fund channels with every potential peer. - **Epoch-batched settlement**: Aggregate rewards across multiple rounds and settle once per epoch (e.g., every 10 rounds / ~12 minutes). This reduces per-round channel throughput requirements by 10x while maintaining sub-hour settlement — still orders of magnitude faster than Bittensor's ~12-second consensus or centralized Net-30 contracts. - **Submarine swaps**: Peers can receive on-chain payments for large accumulated rewards, freeing channel capacity for ongoing micropayments. ### 5.5 Coordinator Funding Who pays the coordinator? - **Corporate R&D**: Company funds training bounty, pays in sats, receives trained model - **DAO/community**: Pooled funding via multisig, model released open-source - **Self-funding**: Coordinator sells API access to trained model via L402, reinvests revenue into training bounties - **Autoresearch sponsors**: Companies pay to optimize their prompts, configs, or models --- ## 6. Security Analysis ### 6.1 Threat Model | Threat | Defense | |--------|---------| | Free-riding (copying gradients) | Assigned-vs-unassigned data scoring | | Gradient poisoning | Byzantine-robust aggregation (trimmed mean) + loss-reduction scoring | | Sybil attack | Submission fee + quality-weighted rewards (no benefit from multiple low-quality identities) | | Coordinator censorship | Multiple competing coordinators; gradient exchange protocol is open | | Payment fraud | Hold invoices — funds locked until validation, auto-refund on timeout | | Validator collusion | Multi-validator consensus + validator rotation + public reproducibility | | Data poisoning | Curator-verified training data sources; peers validate on shared dataset | ### 6.2 Coordinator Threat Mitigation The coordinator is the primary attack target. See §3.4 for the full trust model. Summary of defense layers: 1. **Can't steal funds** — hold invoices auto-refund on timeout (Lightning protocol guarantee) 2. **Can't lie about scores** — DLC-bound settlement requires oracle signature over computed loss (§3.4.2, Layer 2) 3. **Can't censor undetected** — deterministic replay proves valid gradients were rejected (§3.4.2, Layer 1) 4. **Can't act unilaterally** — federated validators require majority consensus (§3.4.2, Layer 3) 5. **Can't hold peers captive** — open protocol, low switching cost to competing coordinator (§3.4.2, Layer 4) The residual risk is a coordinator that subtly degrades checkpoint quality over time — difficult to detect in any decentralized training system. Public eval benchmarks on each checkpoint provide the best available defense. ### 6.3 Comparison to Bittensor Security Bittensor's stake-weighted consensus means wealthy actors control the network — top 1% of wallets control ~89.8% of stake, and performance-to-reward correlation is only r=0.10-0.30 vs stake-to-reward correlation of r=0.80-0.95. Validators score opaquely with no deterministic replay mechanism. The Lightning protocol inverts this: payment is proportional to measured contribution quality, not capital staked. Validation is deterministic and publicly replayable. There is no governance token, no staking requirement, and no emission schedule disconnected from compute value. The attack surface is narrower and more auditable. --- ## 7. Limitations 1. **Coordinator is semi-centralized**: Despite the layered mitigations in §3.4 (deterministic replay, DLCs, federated validators), fully trustless gradient validation at scale remains an open problem. The trust requirement is reduced to "at least one of N validators is honest," but not eliminated entirely. 2. **GPU requirements still high**: Even with Heterogeneous SparseLoCo, meaningful participation in 72B training requires significant hardware. Consumer GPUs are insufficient. The protocol lowers the *financial* barrier (no staking) but not the *hardware* barrier. Fine-tuning and smaller-scale training (7B-13B) are more accessible entry points. 3. **Denomination volatility**: Rewards denominated in sats fluctuate with BTC price. USDT on Lightning (Taproot Assets, production since Jan 2025) provides fiat-stable denomination for peers who prefer predictable economics. The protocol is denomination-agnostic — coordinators can post bounties in either currency. 4. **Model quality gap**: Decentralized training currently produces models ~3 years behind frontier (Covenant-72B matches LLaMA-2-70B, not LLaMA-3.1). This is a limitation of decentralized training methods in general, not of the Lightning coordination layer specifically. The protocol's value proposition is a superior *coordination mechanism* — it makes decentralized training economically viable and fairly incentivized, which accelerates the gap closing as methods improve. Near-term applications favor domain-specific fine-tuning where the base quality gap matters less. 5. **Operational complexity**: Peers must manage a Lightning node and payment channels in addition to GPU infrastructure. Lightning agent tooling (Lightning Labs, 2026) reduces this burden, but it remains non-trivial compared to simply joining a managed training cluster. LSP integration and automated channel management are required for consumer-grade UX. 6. **Validation compute**: Gauntlet-style validation adds compute overhead, funded by submission fees. Scaling strategies (sampling, amortized passes, validator pools — see §5.3) keep this manageable, but validator economics require careful tuning to avoid creating a centralized compute bottleneck. --- ## 8. Related Work - DiLoCo (Douillard et al., 2023) — Local SGD foundations - INTELLECT-1 (PrimeIntellect, 2024) — 10B decentralized training - Covenant-72B (Templar, 2026) — 72B permissionless training on Bittensor - SparseLoCo (Sarfi et al., 2025) — 146x gradient compression - Heterogeneous SparseLoCo (2026) — Multi-tier peer participation - DeMo (Lambda, 2024) — Decoupled momentum, extreme compression - FEDSTR (2024) — Federated learning marketplace on Nostr + Lightning. The closest prior art: FEDSTR uses Lightning payments and Nostr relays for peer discovery in a federated learning marketplace. Key differences: FEDSTR targets federated learning (data stays local, model aggregation) rather than distributed pre-training (shared data, gradient exchange); it does not address gradient compression for large-scale models; and it lacks the hold-invoice conditional payment mechanism for trustless quality validation. Our protocol extends FEDSTR's core insight (Lightning as AI coordination payment rail) to the harder problem of permissionless pre-training at 72B scale with SparseLoCo compression and DLC-bound validation - Lightning Agent Tools (Lightning Labs, 2026) — AI agent payment infrastructure - L402 Protocol — HTTP 402 payment authentication - Bittensor (OpenTensor Foundation) — Token-based decentralized AI network - x402 (Coinbase, 2025) — Stablecoin alternative to L402 --- ## 9. Conclusion Bitcoin's Lightning Network is a superior coordination layer for decentralized AI training compared to custom tokens. It provides instant settlement, negligible fees, conditional payments via hold invoices, and permissionless participation — without the governance overhead, wealth concentration, and incentive misalignment of token-based systems. The semi-centralized coordinator — the primary trust assumption — is constrained through four independent layers: deterministic validation replay (accountability), DLC-bound payment settlement (cryptographic constraint), federated multi-validator consensus (decentralized trust), and open-protocol market competition (economic constraint). This reduces the trust requirement to "at least one of N validators is honest" — a well-understood security model, and a meaningful improvement over both centralized training (full trust in employer) and token-based systems (trust in stake-weighted, opaque validators). Combined with SparseLoCo gradient compression, Gauntlet-style validation, and practical solutions for validation scaling and channel liquidity, Lightning enables a viable protocol for permissionless large-scale model training. The extension to decentralized autoresearch — autonomous optimization bounties paid in sats — opens a new paradigm for AI improvement that is market-driven, permissionless, and aligned with contributor quality rather than capital concentration. --- ## Research Files | File | Contents | Size | |------|----------|------| | [research/covenant-72b-analysis.md](research/covenant-72b-analysis.md) | SparseLoCo, Gauntlet, benchmarks, team assessment | 21 KB | | [research/incentive-mechanisms.md](research/incentive-mechanisms.md) | Game theory, validation methods, Bitcoin primitives, bounty design | 54 KB | | [research/lightning-ml-coordination.md](research/lightning-ml-coordination.md) | L402, channel math, Lightning vs alternatives, architecture | 34 KB | | [research/federated-vs-decentralized.md](research/federated-vs-decentralized.md) | FL vs decentralized, DiLoCo, gradient privacy, trust spectrum | 15 KB | ## Next Steps - [ ] Draft Section 3 (Protocol Design) in full technical detail with sequence diagrams - [ ] Build proof-of-concept: L402-gated gradient exchange between 2 peers - [ ] Benchmark hold invoice latency for gradient-sized payloads - [ ] Consult with Lightning Labs on agent payment flows for training (Jim is advisor/collaborator) - [ ] Model the economics: what reward schedule attracts sufficient peers for a 7B training run? - [ ] Write formal security analysis (game-theoretic equilibrium proofs) - [ ] Identify target venue (Bitcoin conference paper? AI conference? Independent publication?) --- ## Covenant-72B Analysis # Covenant-72B: Deep Technical Analysis **Date:** 2026-03-12 **Paper:** [arxiv.org/abs/2603.08163](https://arxiv.org/abs/2603.08163) **Model:** [huggingface.co/1Covenant/Covenant-72B](https://huggingface.co/1Covenant/Covenant-72B) **License:** Apache 2.0 --- ## Executive Summary Covenant-72B is a 72-billion parameter LLM pre-trained over the internet by 70+ permissionless peers using the Bittensor blockchain (Subnet 3) for coordination. It is the largest model ever trained in a fully decentralized, non-whitelisted manner. The core technical innovation is SparseLoCo, a communication-efficient optimizer achieving 146x gradient compression, which makes synchronization over commodity internet feasible. The model was trained on ~1.1T tokens of DCLM web text and achieves results competitive with LLaMA-2-70B (trained on 2T tokens in a centralized datacenter). **Bottom line:** A legitimate and impressive systems/coordination achievement. The model itself is mediocre by 2026 standards — roughly LLaMA-2-70B quality, which is 2-3 generations behind current frontier open models. The significance is entirely in proving the method works at scale, not in producing a useful model. --- ## 1. Technical Architecture ### 1.1 SparseLoCo Algorithm SparseLoCo (Sparse Low-Communication) is a variant of DiLoCo (Distributed Low-Communication) from DeepMind. The key idea: instead of synchronizing gradients every step (which requires enormous bandwidth), each peer trains locally for many steps, then shares a heavily compressed summary of what it learned. **Algorithm steps per round:** 1. **Local training (H=30 inner steps):** Each peer runs AdamW on its own data shard for 30 optimization steps, accumulating ~192 sequences per batch (seq_len=2048). 2. **Pseudo-gradient computation:** After 30 steps, compute the difference between current weights and the weights at the start of the round: ``` Delta_r = theta_current - theta_start_of_round ``` This "pseudo-gradient" captures what the peer learned. 3. **Error-feedback accumulation:** Combine the new pseudo-gradient with previously discarded information: ``` combined = 0.95 * error_buffer + Delta_r ``` The error buffer remembers what got thrown away last round (critical for convergence). 4. **Top-k sparsification:** Divide the combined tensor into chunks of 4096 elements. Within each chunk, keep only the k=64 largest-magnitude values (1.56% density). Everything else goes back into the error buffer. 5. **2-bit quantization:** Quantize the surviving values to 2 bits. 6. **Index encoding:** Each selected value needs its position encoded — 12 bits per value (the information-theoretic minimum is ~7.36 bits for choosing 64 from 4096). 7. **All-gather:** Peers upload compressed pseudo-gradients to Cloudflare R2 object storage. Other peers download and average them. 8. **Global update:** Apply the averaged compressed pseudo-gradient: ``` theta_new = theta_old - alpha * (1/R) * sum(compressed_deltas) ``` ### 1.2 The 146x Compression Ratio This number comes from composing three compression stages: - **Temporal compression:** Synchronize every 30 steps instead of every step → ~30x - **Spatial compression (top-k):** Keep 64 out of 4096 elements per chunk → ~64x within the synced message - **Bit compression:** 2-bit quantization + 12-bit indices instead of 32-bit floats → additional ~2.3x Combined: the per-round communication is 146x smaller than naive dense gradient sync every step. **Practical impact:** A 72B model's full gradient is ~290 GB in FP32. With 146x compression, each peer sends/receives ~2 GB per round. At 110 Mb/s uplink, that takes ~150 seconds. Actual measured communication time: 70 seconds per round (likely due to overlap and pipeline optimization). Computation per round: 20 minutes. Resulting compute utilization: ~94.5%. ### 1.3 Gauntlet Validator The Gauntlet is the system that prevents free-riders, lazy peers, and adversarial attacks in a permissionless network. It runs on Bittensor's blockchain infrastructure. **Scoring mechanisms:** - **LossScore:** The primary signal. The validator takes a peer's submitted pseudo-gradient and measures the loss improvement on a held-out batch before vs. after applying it. If your gradient doesn't help, you score poorly. - **Assigned vs. unassigned data check:** Each peer is assigned specific data shards. The validator checks whether the gradient helps more on the peer's assigned data than on random data — this catches peers who copy gradients from others rather than doing their own training. - **Norm calibration:** Pseudo-gradients are scaled relative to the median norm across all submissions. This prevents a peer from submitting outsized or undersized updates. - **OpenSkill ranking:** Scores are accumulated over time using an ELO-like system (OpenSkill), creating a reputation that's hard to game with a single round. - **Liveness and sync checks:** Validators verify that peers are actually synchronized with the current model state — you can't submit stale gradients from an old checkpoint. **Key design point:** Not every peer is evaluated every round. A random subset of peers is scored on a random subset of data, keeping validation costs manageable while maintaining statistical deterrence. ### 1.4 Blockchain Component (Bittensor Subnet 3) Bittensor is a decentralized network where "subnets" run specific AI tasks. Each subnet has: - **Miners:** Do the actual work (in this case, training the model) - **Validators:** Score the miners' contributions (run the Gauntlet) - **TAO token:** The native cryptocurrency. Validators stake TAO and set weights on miners. The Bittensor consensus mechanism (Yuma Consensus) translates these weights into TAO emissions — miners who contribute better gradients earn more TAO. Covenant runs on **Subnet 3** (also called "Templar" or "tplr"). The team behind it is Covenant AI (formerly Templar), led by Samuel Dare, with researchers including Joel Lidin, Amir Sarfi, and Eugene Belilovsky (Mila/Concordia). **The incentive loop:** Peers invest ~8x B200 GPUs → train honestly → Gauntlet scores them highly → they earn TAO emissions → TAO has market value → covers GPU costs (ideally). This creates an economic flywheel where the better the model gets, the more valuable participation becomes (in theory). **Practical note:** TAO is a real cryptocurrency trading on exchanges. As of early 2026, Bittensor's total market cap fluctuates significantly. The crypto incentive is both the project's key enabler (it pays for compute without a central funder) and its biggest credibility risk (crypto projects carry baggage). ### 1.5 Peer Coordination - **No central cluster.** Peers discover each other through the Bittensor blockchain. - **Dynamic participation.** Peers join and leave freely. Average active peers: 24.4 per round. Average actually contributing to aggregation: 16.9 (capped at 20). Over the full run: 70+ unique participants. - **Asynchronous communication.** Compressed pseudo-gradients are uploaded to Cloudflare R2 (object storage). Other peers download asynchronously. This avoids the need for all peers to be online simultaneously. - **Fault tolerance.** If a peer drops out, the round continues without it. The outer learning rate was adjusted during training (reduced from 1.0 to 0.65 at step 110K) based on training dynamics, but this was a manual intervention, not an automatic mechanism. ### 1.6 Hardware Requirements **Per peer minimum:** 8x NVIDIA B200 (or equivalent, e.g., 8x H200) This is NOT consumer hardware. An 8x B200 node costs roughly $200-300K+ to purchase, or $15-25/hour to rent. The "commodity internet" claim refers to bandwidth (commodity 500 Mb/s down / 110 Mb/s up), not to commodity hardware. You need serious GPUs. **Parallelism:** Dynamic FSDP (Fully Sharded Data Parallel) across the 8 GPUs within each peer. The error-feedback buffer is sharded using the same FSDP strategy to avoid doubling memory. **Network:** 500 Mb/s downlink, 110 Mb/s uplink — this is the key claim. Regular internet, not InfiniBand. ### 1.7 Data Pipeline **Pre-training (main phase, ~1.09T tokens):** - Dataset: DCLM-baseline-1.0 (Diverse Corpus of Language Models) — curated web text - English only - Pre-tokenized and hosted on object storage - Each peer gets distinct (potentially overlapping) data shards - Background shard downloading for seamless replacement **Annealing phase (~14.2B tokens):** - Higher-quality data mixture: - Instruction data: ~27% - Synthetic web: ~20% - Code: ~15% - Math: ~13% - Pre-training replay: ~25% - Outer learning rate reduced + rapid inner LR decay **Post-training (Covenant-72B-Chat):** - Stage 1: 36,500 SFT steps at 4K context, batch size 256 - Stage 2: 20,500 SFT steps at 8K context + 20% pre-training replay - Both stages: AdamW, weight decay 0.01, gradient clipping 1.0 --- ## 2. Model Architecture | Parameter | Value | |---|---| | Parameters | 72,747,327,488 (72.7B) | | Layers | 80 | | Hidden size | 8192 | | Intermediate size | 28672 | | Query heads | 64 | | KV heads | 8 (GQA) | | Head dimension | 128 | | RoPE base frequency | 500,000 | | Vocabulary size | 262,208 | | Tokenizer | Gemma 3 SentencePiece | | Pre-training context length | 2048 | | Post-training context length | 8192 | Architecture is LLaMA-style (LlamaForCausalLM) with Grouped Query Attention (8 KV heads for 64 query heads). Nothing novel about the architecture itself — the innovation is entirely in the training method. **Context length note:** 2048 during pre-training is very short by 2026 standards. The SFT stage extends to 8K, which is still short. Modern models support 128K+ (LLaMA 3.1, Qwen 2.5). --- ## 3. Benchmarks & Quality Assessment ### 3.1 Pre-Training Benchmarks (0-shot) | Benchmark | Covenant-72B | LLM360 K2 (65B, 1.4T tok) | LLaMA-2-70B (2T tok) | |---|---|---|---| | ARC-Challenge | 56.8 | 53.8 | 57.4 | | ARC-Easy | 80.9 | 76.0 | 79.6 | | PIQA | 81.6 | 82.5 | 82.6 | | OpenBookQA | 44.0 | 48.0 | 49.4 | | HellaSwag | 80.6 | 82.9 | 84.3 | | WinoGrande | 75.9 | 76.4 | 80.4 | | MMLU | **67.1** | 65.5 | 65.6 | **Interpretation:** Covenant-72B wins on MMLU (+1.5 over LLaMA-2-70B) and ARC-Easy (+1.3), but loses on HellaSwag (-3.7), WinoGrande (-4.5), OpenBookQA (-5.4), and PIQA (-1.0). Overall it's roughly competitive with LLaMA-2-70B — a fair characterization given the 1.1T vs 2T token disadvantage. ### 3.2 Decentralized Baselines | Model | Size | Tokens | MMLU | ARC-C | Whitelisted? | |---|---|---|---|---|---| | INTELLECT-1 | 10B | 1T | 32.7 | 44.8 | Yes (curated participants) | | Psyche Consilience | 40B | 1.2T | 24.2 | 31.1 | Yes | | **Covenant-72B** | **72B** | **1.1T** | **67.1** | **56.8** | **No (permissionless)** | Covenant-72B dramatically outperforms prior decentralized efforts. Psyche Consilience at 40B with 1.2T tokens getting only 24.2 MMLU is bizarre — it suggests that project had serious training instability issues. INTELLECT-1 at 10B/32.7 MMLU is more reasonable for its scale. ### 3.3 Chat Model (5-shot) | Benchmark | Covenant-72B-Chat | K2-Chat (65B) | LLaMA-2-70B-Chat | |---|---|---|---| | ARC-Challenge | 64.2 | 62.0 | 65.4 | | GSM8K | 63.9 | 79.0 | 52.2 | | MMLU | 67.4 | 67.9 | 63.1 | | IFEval | **64.7** | 45.5 | 40.7 | | MATH | **26.3** | 19.1 | 10.7 | | MMLU-Pro | 40.9 | 45.4 | 35.2 | The chat model shows clear IFEval and MATH advantages over LLaMA-2-70B-Chat, but falls short of K2-Chat on GSM8K and MMLU-Pro. ### 3.4 Honest Comparison to Modern 70B Models (2026 Context) This is where the picture gets sobering. Here's how Covenant-72B stacks up against current-generation models: | Metric | Covenant-72B | LLaMA 3.1 70B | Qwen 2.5 72B | |---|---|---|---| | MMLU | 67.1 | **79.3** | ~85 | | ARC-Challenge | 56.8 | **92.9** | ~90+ | | Training tokens | 1.1T | 15T+ | 18T | | Context length | 2K (8K chat) | 128K | 128K | **The gap is enormous.** LLaMA 3.1 70B scores 79.3 on MMLU vs. Covenant's 67.1. On ARC-Challenge, it's 92.9 vs. 56.8 — a 36-point gap. Qwen 2.5 72B is even further ahead. This isn't surprising: those models were trained on 14-16x more data with state-of-the-art data curation. As @WillSpagnoli noted on X: "Comparing to LLAMA 2 in 2026 is wild." The Covenant team's response (@DistStateAndMe): "Fair point on Llama 2, we own that one. But you've nailed exactly why this matters. 70 contributors is just the proof of concept." --- ## 4. Heterogeneous SparseLoCo (Follow-Up Paper) **Paper:** [arxiv.org/abs/2601.02360](https://arxiv.org/abs/2601.02360) **Authors:** Yazan Obeidi, Amir Sarfi, Joel Lidin (Covenant AI), Paul Janson, Eugene Belilovsky (Mila/Concordia) This paper addresses the biggest practical limitation of Covenant-72B: every peer needs identical high-end hardware (8x B200). Heterogeneous SparseLoCo allows peers with different hardware to participate. ### How it works: - Peers with enough GPU memory host a full model replica (standard SparseLoCo) - Peers with less memory split the model across GPUs using pipeline parallelism - Inter-stage activations (which normally require high bandwidth within a pipeline) are compressed using **subspace projection** — project activations onto a low-rank subspace via random orthonormal matrix U ### Key results (tested at 178M - 1B scale): - At 87.5% activation compression: 3.3-3.8% loss degradation - At 99% compression: 7.4-8.1% degradation - Heterogeneous setups (mix of compressed and uncompressed) outperform uniform compression - The advantage grows with more aggressive compression (2.6 percentage points at 99.9%) ### Practical implication: This could eventually allow peers with 4x or even 2x GPUs to participate alongside 8x GPU peers, dramatically lowering the barrier to entry. But it's only been validated at small scale (up to 1B parameters). Whether it works at 72B is unproven. --- ## 5. Significance & Critical Assessment ### 5.1 What's genuinely impressive 1. **First permissionless large-scale training.** INTELLECT-1 and Psyche both whitelisted participants. Covenant let anyone join. Making this work with Byzantine fault tolerance is a real systems achievement. 2. **94.5% compute utilization.** Despite training a model 7.2x larger than INTELLECT-1, Covenant achieved higher utilization (94.5% vs. 82.1%) with lower per-round communication overhead (70 seconds vs. 8.3 minutes). The compression engineering is excellent. 3. **Convergence despite extreme compression.** 146x compression with error feedback actually works — the model converges to competitive performance. This is a non-obvious result. 4. **Scale milestone.** 72B is the largest model ever trained this way, by a factor of ~2x over the next largest (Psyche at 40B, which didn't even work well). ### 5.2 What's not impressive / limitations 1. **The model is weak by 2026 standards.** LLaMA-2-70B quality puts it roughly 2-3 years behind the frontier. You would never choose Covenant-72B for any practical application when LLaMA 3.1, Qwen 2.5, or DeepSeek V3 exist. 2. **"Commodity internet" is misleading — the hardware isn't commodity at all.** 8x B200 GPUs is a ~$200-300K investment per peer. The 70+ "unique participants" likely includes many cloud instances rented by a small number of entities, not 70 different organizations. 3. **Only 1.1T tokens.** Modern models train on 15-18T tokens. The team could argue "same compute budget comparison is fair," but the result is a model that's not useful in practice. 4. **2048 context length** is absurdly short. Even the chat model only extends to 8K. This alone makes it impractical. 5. **Benchmark cherry-picking.** Comparing against LLaMA-2-70B (July 2023) in March 2026 lets you claim "competitive" while avoiding embarrassing comparisons to current models. The paper doesn't include any comparison to LLaMA 3, Qwen 2, Mistral, or DeepSeek. 6. **Psyche Consilience at 24.2 MMLU** looks like a broken baseline rather than a genuine comparison point. Including it flatters Covenant's results. 7. **The crypto angle.** Bittensor's TAO token creates real incentives but also real conflicts of interest. The project announcement is structured to pump the token ("largest decentralized training run in history!") as much as to advance science. The GitHub org has essentially no public code (one TypeScript repo with 0 stars). ### 5.3 Comparison to prior decentralized training | Project | Date | Scale | Permissionless? | Algorithm | Quality | |---|---|---|---|---|---| | DiLoCo (DeepMind) | 2023 | Research paper | N/A (internal) | DiLoCo | Proof of concept | | INTELLECT-1 (PrimeIntellect) | 2024 | 10B, 1T tokens | Whitelisted | DiLoCo + int8 | Weak (32.7 MMLU) | | Psyche Consilience | 2025 | 40B, 1.2T tokens | Whitelisted | DiLoCo variant | Broken (24.2 MMLU) | | **Covenant-72B** | **2026** | **72B, 1.1T tokens** | **Permissionless** | **SparseLoCo** | **Decent (67.1 MMLU)** | Covenant is clearly the best result in this lineage. The jump from 10B to 72B with better quality is meaningful. The permissionless aspect is a genuine advance. --- ## 6. Bittensor / Token Economics - **TAO** is Bittensor's native token with a fixed supply cap of 21 million (modeled after Bitcoin) - Subnets receive TAO emissions based on their "weight" in the network (set by validators and root network) - Within Subnet 3 (Covenant/Templar), miners earn TAO proportional to their Gauntlet scores - Validators stake TAO and set weights on miners — their staking weight determines how much influence their scoring has - The economic proposition: spend $X on GPU rental → earn $Y in TAO → if Y > X, mining is profitable - This creates a market-driven compute allocation — if TAO price rises, more miners join, more compute is available - **Criticism:** This is ultimately a proof-of-stake system where the wealthy (large TAO holders) control which miners get rewarded. The "decentralization" is real but not as egalitarian as it sounds. --- ## 7. Relevance Assessment ### Is this relevant for running local models? **No.** This is about training, not inference. The resulting model (72B, 2048 context, LLaMA-2-tier quality) is not interesting for local inference — there are dramatically better options at every size. ### Is this relevant for decentralized AI infrastructure? **Yes, significantly.** If you believe that compute concentration is a problem (a few companies control frontier training), this is the most credible demonstration that decentralized training can work at meaningful scale. The SparseLoCo algorithm and Gauntlet validator are genuine contributions. ### Could this pattern be used for fine-tuning? **Likely yes, and more practically.** Fine-tuning requires far less compute and communication than pre-training. SparseLoCo's communication efficiency would be even more beneficial for fine-tuning, where you could have many more participants with smaller hardware. This could be a more practical near-term application. ### Could this catch up to frontier models? **Theoretically, if scaled.** The team's response to criticism acknowledges the LLaMA-2 comparison is dated. Their argument: "70 contributors is just the proof of concept. The whole bet is that the approach scales to thousands." If they could aggregate 10x more peers and train on 15T+ tokens, the quality gap could close. Whether that's economically viable via TAO emissions is the real question. ### What's the practical takeaway? **Watch the method, ignore the model.** The SparseLoCo algorithm and the proof that permissionless Byzantine-tolerant training works at 72B scale are the contributions. The model itself is an artifact of the proof, not a useful product. If this team (or someone using their methods) can scale to modern data volumes, it becomes much more interesting. --- ## 8. Team & Organization **Authors (paper):** Joel Lidin, Amir Sarfi, Erfan Miahi, Quentin Anthony, Shivam Chauhan, Evangelos Pappas, Benjamin Thérien, Eugene Belilovsky, Samuel Dare **Affiliations:** - Covenant AI (formerly Templar project) - Mila / Concordia University (Eugene Belilovsky, academic advisor) **Entity:** @tplr_ai on X (Templar). @covenant_ai is the project account. The team is relatively small, with academic connections to Mila (Montreal's AI institute). **Announced via:** @tplr_ai X thread on March 10, 2026. 5,628 likes on the lead tweet, ~1.3M views. Strong engagement from the Bittensor community but limited pickup from the broader ML community (only 2 upvotes on HuggingFace papers page, zero community comments). --- ## References 1. Covenant-72B paper: https://arxiv.org/abs/2603.08163 2. Heterogeneous SparseLoCo: https://arxiv.org/abs/2601.02360 3. Model weights: https://huggingface.co/1Covenant/Covenant-72B 4. Chat model: https://huggingface.co/1Covenant/Covenant-72B-Chat 5. DCLM dataset: https://huggingface.co/datasets/mlfoundations/dclm-baseline-1.0-parquet 6. INTELLECT-1: https://www.primeintellect.ai/blog/intellect-1 7. DiLoCo (DeepMind): Douillard et al., 2023 --- ## Incentive Mechanisms # Incentive Mechanism Design for Decentralized AI Training **Date:** 2026-03-12 **Scope:** Mechanism design, validation without trust, Bitcoin/Lightning conditional payments, existing literature, autoresearch bounty design --- ## Table of Contents 1. [Mechanism Design for Distributed Computation](#1-mechanism-design-for-distributed-computation) 2. [Validation Without Trust](#2-validation-without-trust) 3. [Bitcoin Script / Lightning for Conditional Payments](#3-bitcoin-script--lightning-for-conditional-payments) 4. [Existing Literature & Projects](#4-existing-literature--projects) 5. [Autoresearch Bounty Design](#5-autoresearch-bounty-design) 6. [Synthesis: A Practical Lightning-Native Design](#6-synthesis-a-practical-lightning-native-design) --- ## 1. Mechanism Design for Distributed Computation ### 1.1 The Core Problem You want rational, self-interested actors to contribute honest computation (gradient computation for model training) to a shared objective (reducing model loss). The participants are: - **Workers (miners):** Compute gradients on assigned data shards - **Validators:** Evaluate gradient quality and authorize payments - **Coordinator:** Aggregates validated gradients into global model updates The mechanism must ensure that honest participation is a dominant strategy — no participant should be able to increase their payoff by deviating from the protocol. ### 1.2 Game-Theoretic Frameworks #### Shapley Values The Shapley value from cooperative game theory gives each player their **marginal contribution** averaged over all possible orderings of players. For distributed training: - **Player i's Shapley value** = average improvement in model loss when player i's gradient is added, computed over all possible subsets of other players' gradients - **Properties:** Efficiency (total value is distributed), symmetry (equal contributors get equal pay), null player (zero-value contributions get zero pay), additivity **Application to FL:** Multiple papers (MDPI Axioms 2023, Wireless Communications 2022) design federated learning incentive mechanisms using Shapley values. The Shapley value is computed as: ``` phi_i = (1/|N|!) * sum over all orderings S of [v(S ∪ {i}) - v(S)] ``` where v(S) is the model performance using only the gradients from players in set S. **Practical problem:** Computing exact Shapley values requires evaluating 2^N subsets — exponential in the number of participants. Approximation methods include: - **Monte Carlo Shapley:** Random sampling of permutations (~100-1000 samples) - **Truncated Shapley:** Only evaluate first k players in each permutation - **Fuzzy Shapley:** Extend to uncertain participation attitudes (Axioms 2024) **Key insight from recent work (arXiv 2504.05563, "Do Data Valuations Make Good Data Prices?"):** Popular valuation methods like Leave-One-Out and Data Shapley **make poor payment rules**. They fail to ensure truthful reporting of costs, leading to inefficient market outcomes. The authors recommend adapting VCG and Myerson payments from mechanism design literature. #### VCG (Vickrey-Clarke-Groves) Mechanism VCG achieves **truthful revelation** as a dominant strategy. Applied to federated learning (arXiv 2008.06680, FVCG mechanism): - Each worker reports their cost of participation - The mechanism selects participants to maximize social surplus (total value minus total cost) - Worker i's payment = value they provide to others (the marginal externality) - **Properties:** Dominant-strategy incentive compatible (DSIC), individually rational (IR), Pareto efficient, weakly budget-balanced **FVCG payment formula:** ``` Payment_i = V(S*) - V(S*\{i}) + c_i = (social welfare with i) - (social welfare without i) + i's reported cost ``` This makes each worker's utility equal to their marginal social contribution, aligning individual and collective incentives. **Practical limitation:** VCG requires the coordinator to know or estimate the value function V(S), which means evaluating model performance with and without each participant — similar overhead to Shapley computation. #### Scoring Rules (Proper Scoring Rules) A **strictly proper scoring rule** incentivizes an agent to truthfully report their probability distribution (or in this case, their gradient estimate). Examples: - **Logarithmic scoring rule:** Pays log(p) where p is the probability assigned to the realized outcome - **Brier score:** Pays 2p - sum(p_i^2) — bounded, simple to compute - **Peer prediction:** Score worker i's gradient against worker j's gradient on the same data — doesn't require ground truth For gradient validation, a natural scoring rule: **pay proportional to the correlation between the submitted gradient and the true gradient** (estimated by the validator through sampling). #### Stackelberg Games Model the coordinator as a **leader** and workers as **followers**. The coordinator announces a pricing/reward scheme, workers decide whether and how much to participate. Recent papers (IEEE TII 2023) use Stackelberg equilibrium for industrial IoT federated learning: - Leader: sets reward per unit of quality and minimum quality threshold - Followers: choose effort level to maximize (reward - cost) - Equilibrium: coordinator finds the reward schedule that maximizes their objective subject to workers' participation constraints #### Auction Mechanisms Workers bid their cost of participation, coordinator runs an auction: - **Reverse auction:** Workers bid to provide computation, lowest bidders win - **Combinatorial auctions:** Account for complementarities between workers (e.g., workers with diverse data are jointly more valuable) ### 1.3 Pricing Heterogeneous Contributions Different GPU types produce gradients of different quality and at different speeds. Pricing must account for: **Hardware heterogeneity:** - H100 vs A100 vs consumer GPUs: FLOPS/$ varies by 1.2-3x depending on task - Memory bandwidth matters more for some operations: workstation GPUs (A40, A6000, L40) offer 1.2x higher memory bandwidth and 1.8x greater memory capacity per unit price than datacenter GPUs (arXiv 2502.00722) - Communication bandwidth determines synchronization frequency **Gradient quality heterogeneity:** - Workers with larger local batch sizes produce lower-variance gradients - Workers training on higher-quality data produce more informative gradients - Slower workers may produce stale gradients if synchronization is asynchronous **Practical pricing approaches:** 1. **Output-based pricing (Templar/Covenant approach):** Price purely by measured loss reduction. Workers who produce better gradients — regardless of hardware — earn more. Simple and incentive-compatible but ignores cost differences. 2. **Cost-adjusted pricing:** Pay = quality_score / reported_cost. Workers with cheaper hardware earn more per dollar if they produce equal-quality gradients. 3. **Benchmark-normalized pricing (BOINC approach):** Normalize contributions by a hardware benchmark (e.g., Whetstone FLOPS). One BOINC "cobblestone" = one day of 1 GFLOPS CPU. This measures effort rather than outcome. 4. **Market-based pricing:** Let workers set their own prices (reverse auction). The coordinator selects the cheapest workers that meet quality thresholds. Markets naturally discover the right price for heterogeneous hardware. ### 1.4 Preventing Free-Riding Free-riders submit no work (or minimal work) but claim rewards. Attack variants: - **Zero-gradient attack:** Submit zero or random gradients - **Replay attack:** Resubmit another worker's gradient - **Partial work:** Train for fewer steps than required - **Data-skipping:** Process only easy examples **Defenses:** 1. **Loss-reduction scoring (Templar):** Directly measures whether a gradient improves the model. Zero or random gradients score poorly. Replay attacks are caught by checking whether the gradient helps more on the worker's assigned data than on random data. 2. **STD-DAGMM detection (FRAD, IEEE 2023):** Detects free-riders by analyzing the statistical properties of submitted gradients — variance, norm, direction relative to other submissions. 3. **FRIDA (arXiv 2410.05020):** Uses membership and property inference attacks to detect whether a worker actually trained on data — if their gradient doesn't encode properties of the assigned data, they're flagged. 4. **Weight evolution frequency (ScienceDirect 2024):** Track how each worker's model weights evolve over rounds. Free-riders show anomalous evolution patterns. 5. **Contribution measurement:** Estimate each worker's marginal contribution via: - Gradient direction similarity to aggregated gradient - Loss reduction on held-out validation set - Correlation with independently computed reference gradients ### 1.5 Gradient Poisoning Defense Malicious workers submit gradients designed to degrade the model (backdoor insertion, convergence disruption). This is a Byzantine fault tolerance problem. **Robust aggregation methods:** | Method | Mechanism | Tolerance | |--------|-----------|-----------| | **Krum** | Select gradient closest (in Euclidean distance) to most others | f < n/2 - 1 | | **Trimmed Mean** | Remove top and bottom k% values per coordinate, average rest | k < 25% | | **Geometric Median** | Minimize sum of distances to all gradients | f < n/2 | | **SignGuard** | Use element-wise sign of gradients for anomaly detection | Collaborative filtering | | **FLRAM** | Isolation forests + DBSCAN on gradient magnitudes/signs | Adaptive | **State of the art (2025):** - Dynamic gradient filtering (ScienceDirect 2024): Adapts filtering threshold based on observed gradient distributions - Trajectory anomaly detection (Nature Scientific Reports 2024): Uses singular values of gradient matrices as features, processed by improved Isolation Forest - Adaptive adversaries survey (ePrint 2025/510): Shows that adaptive adversaries who observe the defense mechanism can circumvent most static defenses — defense must be randomized or adaptive **Key insight:** No robust aggregation method is free. They all reduce the effective contribution of honest workers (you're throwing away some good data). The overhead is typically 10-30% slower convergence versus naive averaging with no adversaries. ### 1.6 Sybil Attack Prevention A Sybil attacker creates many fake identities to gain disproportionate influence (e.g., outvote honest validators, dilute rewards). **Defense layers:** 1. **Stake-based (proof of stake):** Each identity must lock up capital. Creating 100 fake identities requires 100x the stake. Bittensor requires staking TAO to participate. 2. **Hardware commitment:** Require each identity to demonstrate unique hardware (proof of GPU). NVIDIA's confidential computing features could attestate specific GPU serial numbers. 3. **Proof of personhood:** Verify that each participant is a unique human (World/Worldcoin approach). Not practical for compute-heavy tasks where one person may legitimately run multiple machines. 4. **Reputation systems:** New identities start with low reputation and earn it over time. Combined with stake, this makes Sybil attacks expensive over the long term. 5. **Performance-based filtering:** If each identity must independently produce useful work, the cost of a Sybil attack scales linearly with the number of identities — there's no advantage to splitting one worker into 10 fake workers unless they can share work. **Best practice for decentralized training:** Combine stake (economic barrier) with performance scoring (each identity must independently demonstrate useful computation). This makes Sybil attacks purely wasteful — 10 fake identities do 10x the work of 1 real identity, for no additional reward. --- ## 2. Validation Without Trust ### 2.1 Covenant's Gauntlet Validator (Detailed) The Gauntlet is the validation system used in Covenant/Templar (Bittensor Subnet 3). Based on our [covenant-72b-analysis.md](covenant-72b-analysis.md): **Architecture:** The Gauntlet runs as a set of validator nodes on the Bittensor blockchain. Validators must stake TAO to participate. Their scoring of miners determines TAO emission distribution. **Scoring Pipeline:** 1. **Loss Score (primary signal):** - Validator takes miner's compressed pseudo-gradient - Computes model loss on held-out batch BEFORE applying gradient: `L_before` - Applies the gradient to the model - Computes loss AFTER: `L_after` - Score = `L_before - L_after` (positive = gradient helped) 2. **Assigned vs. unassigned data check:** - Each miner is assigned specific data shards - Validator checks: does the gradient help MORE on the miner's assigned data than on random data? - If not, the miner likely copied someone else's gradient or used unauthorized data 3. **Norm calibration:** - Pseudo-gradients are scaled relative to the median norm across all submissions - Prevents miners from submitting outsized updates (could destabilize training) or undersized updates (minimal contribution) 4. **OpenSkill ranking:** - Scores are accumulated over time using OpenSkill (a Bayesian rating system similar to TrueSkill/Elo) - Uses the Plackett-Luce model to rank miners within each evaluation window - Reputation is hard to game with a single round — you need sustained quality 5. **Liveness and sync checks:** - Verify miners are synchronized with the current model state - Stale gradients (from old checkpoints) are rejected **Validation overhead:** - Not every miner is evaluated every round — random subsets - Validator computes two forward passes per miner per evaluation (before/after) - For a 72B model, each forward pass on a validation batch takes ~30-60 seconds on validator hardware - Total validation compute is ~5-15% of total training compute (estimated) **Key design insight:** The loss-reduction scoring creates a **directly measurable, objective metric** that doesn't require re-running the full training. You're checking the gradient's effect on a small held-out batch, not reproducing the entire training run. ### 2.2 Proof of Learning (PoL) **Paper:** Jia et al., IEEE S&P 2021 (arXiv 2103.05633) **Protocol:** - During training, the prover logs a **training transcript**: intermediate model checkpoints (weights at intervals), training data ordering, hyperparameters, random seeds - The proof P(T, f_W) = (W, I, H, A) where: - W = model weights at checkpoints - I = data point ordering information - H = signatures of training data points - A = auxiliary info (hyperparameters, architecture) - A verifier replays segments of the training transcript (random subset of checkpoint intervals) and checks that the weight changes are consistent with gradient descent on the claimed data **Verification overhead:** - Complexity: O(E * Q * k * C_{|W|}) where Q = fraction of intervals verified, k = steps per interval, C_{|W|} = cost of one update step - At Q = 10% (verify 10% of intervals), overhead is ~10% of training compute - The key finding: an adversary seeking to manufacture a fake PoL must perform **at least as much work** as genuine training **Spoofing attacks and defenses:** - **Directed retraining:** Adversary knows final weights W_T, tries to reconstruct a plausible training path. Defense: verification checks statistical properties of the trajectory, not just endpoints. - **Inverse gradient attack:** Given W_t, solve for W_{t-1} that would lead to it. Defense: this is computationally hard and introduces detectable artifacts. - **Limitations (arXiv 2208.03567, "Proof-of-Learning is Currently More Broken Than You Think"):** Shows that PoL is vulnerable to spoofing attacks that manipulate tolerance parameters. The verification's reliance on approximate matching (gradients are stochastic) creates a window for adversaries. **Enhancement:** Watermarking-enhanced PoL requires attackers to reproduce both authentic training logs AND watermark-consistent ownership signals, increasing attack cost by >10x. ### 2.3 Zero-Knowledge Machine Learning (ZKML) **Concept:** Use zero-knowledge proofs to verify that a claimed computation (training step, inference) was performed correctly, without revealing the model weights or data. **Survey:** arXiv 2502.18535 (Feb 2025) — comprehensive survey of ZK-based verifiable ML. **Three categories:** 1. **Verifiable training:** Prove that model was trained correctly on claimed data 2. **Verifiable inference:** Prove that output came from a specific model on specific input 3. **Verifiable testing:** Prove model achieves claimed accuracy on a benchmark **Overhead numbers (from the survey):** | System | Task | Proof Generation Time | Verification Time | Memory | |--------|------|----------------------|-------------------|--------| | zkCNN | VGG16 inference | 88.3 seconds | ~seconds | Moderate | | zkDT | Decision tree (23 levels) | 250 seconds | ~seconds | Moderate | | zkDL | 10M parameter NN training | ~10s (parallelized) | <1s | High | | MobileNet v2 | Inference verification | N/A | 10.27 seconds | Moderate | | Transformer | General | N/A | N/A | **148 GB** (!) | **Constraint-to-parameter ratio for transformers: 58-85x** — the ZK circuit is 58-85 times larger than the model computation itself. **Optimization strategies:** - **Quantization (ZEN):** 5.4-22x constraint reduction through neural network quantization - **Commitment optimization (Artemis):** 7.3x improvement in prover time - **Lookup tables:** Precomputed values reduce division overhead **Bottom line:** ZKML is currently impractical for large models. Verifying a single forward pass through a 72B model would require astronomical compute and memory. Useful today only for small models (<1M parameters) or specific inference verification. May become practical in 3-5 years with hardware acceleration and algorithmic improvements. ### 2.4 Other Validation Approaches **Proof of Training (Springer 2025):** Blockchain network trains models and proves the training was performed correctly. Workers are rewarded with cryptocurrency proportional to computational contributions. Different from PoL in that the blockchain coordinates training rather than just verifying it. **Redundant computation (BOINC approach):** Multiple workers compute the same task. Results must agree within tolerance. Quorum of agreement triggers credit. Overhead: at least 2-3x computation, but very simple and robust. **Statistical verification:** For gradient computation specifically, you can verify a gradient by: 1. Sampling a small random subset of the training batch 2. Computing the gradient on that subset independently 3. Checking that the submitted gradient is statistically consistent (cosine similarity, norm ratio) This gives partial verification at 1-5% overhead rather than full recomputation. **Commitment schemes:** Worker commits to their gradient (hash or Merkle root) before seeing others' gradients. After all commitments, gradients are revealed. Prevents copying attacks. Cheap (<1% overhead) but doesn't prevent garbage gradients. ### 2.5 Verification Overhead Summary | Method | Overhead (% of training compute) | What it verifies | Adversary model | |--------|----------------------------------|-------------------|-----------------| | Loss-reduction (Gauntlet) | 5-15% | Gradient quality | Lazy/free-rider | | Proof of Learning | ~10% (at 10% sampling) | Training integrity | Spoofing | | ZKML | 58-85x (!) | Computational correctness | Any | | Redundant computation | 100-200% | Exact correctness | Byzantine | | Statistical sampling | 1-5% | Approximate correctness | Lazy/noisy | | Commitment scheme | <1% | Non-copying | Plagiarism | **Practical recommendation:** Layer multiple cheap methods rather than using one expensive method. Commitment scheme (prevent copying) + statistical sampling (catch garbage) + loss-reduction scoring (measure quality) gives strong guarantees at ~10-20% total overhead. --- ## 3. Bitcoin Script / Lightning for Conditional Payments ### 3.1 What Bitcoin Script Can Express Bitcoin Script is deliberately limited — it's a stack-based, non-Turing-complete language. It can express: - **Hash locks:** "Payment unlockable by revealing preimage of hash H" - **Time locks:** "Payment unlockable only after block height N / time T" - **Signature checks:** "Payment requires valid signature from key K" - **Multi-signature:** "Payment requires M-of-N signatures" - **Conditional branches:** `OP_IF / OP_ELSE / OP_ENDIF` It **cannot** express: - Arbitrary computation (no loops, no state) - Floating-point arithmetic - Complex data structures - Direct verification of ML computations ### 3.2 HTLCs (Hash Time-Locked Contracts) The building block of Lightning payments: ``` OP_IF OP_HASH160 OP_EQUALVERIFY OP_CHECKSIG OP_ELSE OP_CHECKLOCKTIMEVERIFY OP_DROP OP_CHECKSIG OP_ENDIF ``` **Semantics:** Recipient can claim by revealing the preimage of the hash. If they don't claim before the timeout, sender gets their money back. **For gradient payments:** The preimage could encode gradient metadata, but HTLCs alone can't verify gradient quality. You need an external oracle to determine whether the gradient was good, then release the preimage. ### 3.3 PTLCs (Point Time-Locked Contracts) PTLCs replace hash locks with **adaptor signatures** on elliptic curve points: - Instead of sharing hash H and requiring preimage r, use point P = r*G on secp256k1 - Each hop in a multi-hop payment uses a different point (better privacy than HTLCs which share the same hash across all hops) - Require Schnorr signatures (available since Bitcoin Taproot, Nov 2021) **Advantages over HTLCs:** - Privacy: different adaptor per hop, no wormhole attacks - Efficiency: smaller on-chain footprint with Taproot - Composability: can combine with other signature conditions **For conditional computation payments:** PTLCs enable **privately-conditional payments** — the condition (adaptor point) is hidden from the blockchain. An oracle could produce a signature adaptor that corresponds to a specific computation result. ### 3.4 Hold Invoices (Lightning Escrow) A **hold invoice** (hodl invoice) in Lightning allows the receiver to delay settlement: 1. Sender pays the invoice, locking funds in an HTLC 2. Receiver sees the payment but doesn't immediately settle (doesn't reveal the preimage) 3. An external condition is checked (oracle attestation, computation verification) 4. If condition met: receiver settles (reveals preimage, claims funds) 5. If condition not met: payment times out, sender gets refund **This is the key primitive for "pay only if gradient improves loss":** ``` 1. Worker generates hold invoice for gradient payment 2. Coordinator pays the hold invoice (funds locked) 3. Worker submits gradient 4. Coordinator/validator evaluates gradient quality 5. If quality >= threshold: Coordinator releases preimage to worker (or oracle does) Worker settles the invoice, receives sats 6. If quality < threshold: Invoice times out (CLTV expiry) Coordinator's funds return automatically ``` **Current implementations:** - LND supports hold invoices natively - CLN (Core Lightning) supports them via plugins - Supertestnet's [hodlcontracts](https://github.com/supertestnet/hodlcontracts) — oracle + escrow system for Lightning with three contract templates (trading, lending, betting) **Limitations:** - Hold invoices lock liquidity in the payment channel for the entire hold period - Long hold times (hours) can strain channel capacity - The HTLC timeout must be set conservatively (computation time + validation time + buffer) - Maximum HTLC count per channel is limited (483 in the spec) — can't have thousands of outstanding hold invoices ### 3.5 Can You Do "Pay Only If Gradient Improves Loss" on Lightning? **Yes, with an oracle pattern.** Here's the specific mechanism: **Design: Oracle-Attested Gradient Payment** ``` Actors: - Worker: computes gradient - Coordinator: aggregates gradients, updates model - Validator Oracle: evaluates gradient quality, attests result Protocol: 1. Coordinator creates a hold invoice: "Pay W sats to Worker, locked by preimage P" 2. Coordinator locks funds via the hold invoice HTLC 3. Worker computes gradient G on assigned data shard 4. Worker submits G to Coordinator 5. Coordinator sends G to Validator Oracle 6. Validator runs: L_before = loss(model, validation_batch) model' = apply(model, G) L_after = loss(model', validation_batch) quality = L_before - L_after 7. If quality > threshold: Oracle reveals preimage P (or signs an adaptor) Worker claims payment 8. If quality <= threshold: Oracle withholds preimage HTLC times out, funds return to Coordinator ``` **Trust assumptions:** The validator oracle must be honest. Mitigation: - Use multiple independent validators (majority agreement) - Rotate validators randomly per round - Validators stake collateral (slashable if caught cheating) - Computation is deterministic — anyone can verify the oracle's claim by replaying the loss evaluation ### 3.6 DLCs (Discreet Log Contracts) for Training Outcome Bets DLCs enable **oracle-dependent conditional payments** that are private and efficient: **How DLCs work:** 1. Alice and Bob deposit funds into a 2-of-2 multisig (the "funding transaction") 2. They pre-sign a set of **Contract Execution Transactions (CETs)**, one for each possible outcome 3. Each CET distributes the locked funds according to the outcome it represents 4. The oracle commits to a public nonce R before the event 5. When the event occurs, the oracle attests the outcome by publishing signature s = k - hash(R, outcome) * x 6. The winning party uses the oracle's signature to complete their CET's signature and broadcast it **For training outcome bets:** Scenario: Alice bets that a decentralized training run will achieve MMLU > 70 within 30 days. Bob bets it won't. ``` Funding: Alice deposits 0.5 BTC, Bob deposits 0.5 BTC into 2-of-2 multisig CETs: - Outcome "MMLU > 70": Alice gets 0.9 BTC, Bob gets 0.1 BTC - Outcome "MMLU <= 70": Alice gets 0.1 BTC, Bob gets 0.9 BTC Oracle: Evaluates model on MMLU benchmark at deadline - Publishes signature attesting to the actual MMLU score - Winning party uses oracle signature to complete and broadcast winning CET ``` **Numeric outcome DLCs:** For continuous outcomes (exact MMLU score, loss value), DLCs can encode ranges using binary decomposition. The oracle attests to each digit of the outcome independently, and the CET distribution is a function of the numeric value. **DLCs on Lightning:** - DLC channels can be routed through Lightning payment channels - CETs function as off-chain commitments within the channel - Settlement is instant (no on-chain transaction needed unless disputed) - Papers: "Discreet Log Contract Channels and Integration in the Lightning Network" (Kuwahara) - Implementation: [Suredbits](https://suredbits.com/discreet-log-contracts-on-lightning-network/) **Practical DLC applications for decentralized training:** 1. **Quality bounties:** "I'll pay X sats if the next training round reduces loss by > Y" 2. **Milestone contracts:** "Pay on each of: 50% training complete, 75% complete, benchmark target hit" 3. **Performance insurance:** "If the model regresses (loss increases), coordinator pays workers a penalty" 4. **Compute futures:** "Lock in a price now for GPU time delivered over the next week" ### 3.7 What Lightning Can and Cannot Express **CAN express (with oracles):** | Payment Type | Mechanism | Practical? | |-------------|-----------|------------| | Pay per validated gradient | Hold invoice + oracle attestation | Yes | | Pay proportional to quality | Multiple hold invoices of varying amounts + oracle selects which to settle | Clunky but possible | | Escrow with refund | Hold invoice with CLTV timeout | Yes, native | | Milestone payments | DLC with multiple outcome ranges | Yes | | Training outcome bets | DLC with numeric oracle | Yes | | Atomic multi-party payments | Multi-hop HTLCs | Yes | **CANNOT express (fundamental limitations):** | Payment Type | Why Not | Workaround | |-------------|---------|------------| | On-chain gradient verification | Script can't do ML math | Oracle attestation | | Continuous payment streams | Lightning is discrete payments | Frequent micropayments | | Proportional payment (exact ratio) | Script can't compute ratios | Pre-define a set of amounts | | Multi-round commitments | HTLCs are single-use | New invoice per round | | Slashing (take FROM a participant) | Lightning is push-only | Pre-locked collateral in DLC | **Key constraint:** Lightning cannot verify computation. All gradient quality assessment must happen off-chain, with the result attested by an oracle. The trust model shifts from "trust the computation" to "trust the oracle" — but the oracle's job (loss evaluation) is deterministic and publicly verifiable by anyone who has the model and validation data. --- ## 4. Existing Literature & Projects ### 4.1 Bittensor Incentive Mechanism — Formal Analysis **Yuma Consensus** is Bittensor's on-chain mechanism for computing validator and miner emissions: - Validators submit a weight matrix (their scores of each miner) - Yuma Consensus applies stake-weighted median clipping to resist outlier validators - Exponentially smoothed bonds reward validators for consensus alignment - Emissions are split: 41% miners, 41% validators, 18% subnet creator **Critical empirical analysis** (arXiv 2507.02951, peer-reviewed): **Stake concentration:** - Top 1% of wallets control median 89.8% of stake across 64 subnets - Gini coefficient: 0.9825 (extreme inequality) - Over half of subnets: fewer than 1% of wallets needed for 51% attack **Performance-reward correlation:** - Validator stake→reward: r = 0.80-0.95 (dominant) - Validator performance→reward: r = 0.50 (moderate) - Miner stake→reward: r = 0.50-0.80 - **Miner performance→reward: r = 0.10-0.30 (very weak!)** **Translation:** "Economic power translates directly into earnings regardless of actual contribution quality." The system pays the wealthy, not the productive. **Proposed fixes:** 1. Performance-weighted emission split: +0.032 performance→reward, only -0.018 stake→reward 2. Composite scoring: +0.36 performance→reward but catastrophic -0.91 stake→reward 3. Performance bonus multiplier: conservative +0.009 improvement, minimal disruption 4. **88th percentile stake cap:** 20x improvement in coalition size needed for 51% attack **Takeaway for mechanism design:** Bittensor demonstrates that **stake-weighted consensus is fundamentally at odds with quality-weighted rewards**. Any system using staking for security will tend toward plutocracy unless explicitly corrected by performance metrics. The Gauntlet (loss-reduction scoring) is Covenant's attempt to solve this, but it operates within Yuma Consensus which still overweights stake. ### 4.2 Hivemind / Learning@home **Project:** [github.com/learning-at-home/hivemind](https://github.com/learning-at-home/hivemind) (NeurIPS 2020) **Approach:** Decentralized deep learning in PyTorch, designed for training on thousands of volunteers with unreliable connections. Uses Decentralized Mixture-of-Experts (DMoE) — different peers specialize in different parts of the model. **Coordination:** Kademlia-based Distributed Hash Table (DHT) for peer discovery. Scales to tens of thousands of peers with logarithmic search complexity. **Incentive design:** Hivemind notably **did NOT implement monetary incentives**. It relied on: - Volunteer altruism (like BOINC/Folding@Home) - Academic credit and community recognition - Shared access to the resulting model **Result:** The project demonstrated the technical feasibility of decentralized training but failed to attract large-scale sustained participation without monetary incentives. This is the core lesson — volunteer computing works for science (protein folding has intrinsic appeal) but struggles for general ML training where the output model is the only incentive. **Technical legacy:** Hivemind's DHT and fault-tolerant aggregation code is used by subsequent projects including INTELLECT-1 and (indirectly) Bittensor subnets. ### 4.3 BOINC — Lessons from Volunteer Computing **BOINC credit system** ([boinc.berkeley.edu](https://boinc.berkeley.edu/boinc_papers/credit/text.php)): **Credit design:** - 1 cobblestone = 1/200 of a day's work on a 1 GFLOPS machine - Credit has no monetary value — it's a reputation/competition metric - Used for: individual progress tracking, inter-volunteer competition, per-project throughput metrics **Validation:** - **Redundant computing:** Each work unit is sent to 2+ volunteers. Results must agree within tolerance. - **Quorum:** Minimum number of agreeing results before credit is granted - **Canonical result:** If all results agree, the most common result is canonical - **Credit granted only on validated work:** No validation = no credit **Cheating prevention:** - Homogeneous redundancy: send identical work to similar platforms to enable comparison - Result validation via quorum agreement - Project-specific validators that check result plausibility - Volunteer reputation (running average of validation success rate) **Key lessons for decentralized training:** 1. **Non-monetary incentives have ceiling.** BOINC attracted millions of volunteers but peak participation was driven by SETI@home's unique appeal. Most BOINC projects struggle for volunteers. 2. **Redundant computation is expensive but robust.** 2-3x overhead is the price of trustless validation. For ML training, this is too expensive — you'd rather do 3x more training than validate 3x. 3. **Credit gaming is real.** BOINC had persistent problems with volunteers gaming the credit system (overclocking, reporting inflated FLOPS, running on faster hardware than reported). 4. **Competition works.** Teams and leaderboards drove significant participation. Folding@home's points system + team competition sustained engagement for decades. 5. **Validation must be cheap relative to computation.** BOINC's approach (re-compute and compare) only works when the work units are small. For large neural network training, alternative validation methods are needed. ### 4.4 Folding@Home **Points system:** - Points awarded based on work unit difficulty and completion time - Bonus points for completing work units quickly (before deadline) - Individual and team leaderboards - No monetary value — purely competitive **Scale:** At peak (COVID-19, 2020), Folding@Home exceeded 2.4 exaFLOPS — more than the top 500 supercomputers combined. Driven by viral social media and the concrete goal of COVID drug discovery. **Lesson:** A compelling narrative (cure diseases!) can substitute for monetary incentives, but only temporarily. Participation declined 90%+ after COVID interest waned. ### 4.5 Gridcoin — Bridging BOINC and Cryptocurrency **Gridcoin** adds cryptocurrency rewards to BOINC contributions: - Miners earn GRC tokens by contributing to whitelisted BOINC projects - Token reward proportional to BOINC credit earned (Proof of BOINC) - Whitelist prevents gaming with self-created BOINC projects **Lesson:** Gridcoin proved that cryptocurrency can incentivize volunteer computing, but the token's low market value ($0.01-0.05/GRC) meant the economics rarely covered electricity costs. The incentive only works when the token has sufficient market value. ### 4.6 Academic Literature Summary **Key papers on incentive-compatible distributed learning:** | Paper | Year | Key Contribution | |-------|------|------------------| | Jia et al., "Proof of Learning" | 2021 | Training transcript verification, O(10%) overhead | | Cong et al., "FVCG" | 2020 | VCG mechanism for FL, truthful cost reporting | | "Gradient-Driven Rewards" (NeurIPS) | 2021 | Guarantees fairness via measured gradient contribution | | "Federated Learning Incentive via Shapley" (MDPI) | 2023 | Pareto-optimal payoff allocation | | "Incentive-Based FL" (arXiv 2510.14208) | 2025 | Survey of architectural elements and future directions | | ICLR 2025 conference paper | 2025 | Fine-grained influence propagation in decentralized networks | | "Coin.AI" (MDPI Entropy) | 2019 | Proof-of-useful-work for blockchain-based distributed deep learning | | DLchain (SERVICES 2020) | 2020 | Blockchain with deep learning as PoUW | | "Proof of Training" (Springer 2025) | 2025 | Verifiable model training via blockchain delegation | **Survey taxonomy (arXiv 2510.14208):** FL incentive mechanisms fall into four technical approaches: 1. **Shapley values:** Fair but computationally expensive 2. **Stackelberg games:** Leader-follower optimal pricing 3. **Auctions:** Market-based resource allocation 4. **Contracts:** Principal-agent with screening/signaling --- ## 5. Autoresearch Bounty Design ### 5.1 Structuring a Bounty: "Improve This Metric by X%" The autoresearch pattern (mutate file, evaluate, keep improvements, discard regressions) naturally suggests a bounty structure. Here's how to design one for decentralized workers: **Basic bounty structure:** ``` Bounty: Improve email classification accuracy from 87% to 92%+ Mutable: system_prompt.txt Eval: python eval.py --corpus labeled_emails.jsonl Metric: accuracy (higher is better) Payment: 50,000 sats per percentage point improvement Verification: coordinator runs eval.py independently Duration: 72 hours Holdback: 20% of payment released after 7-day holdout eval ``` **Key design parameters:** 1. **Fixed vs. proportional payment:** - **Fixed bounty:** "Pay 100,000 sats for reaching 92% accuracy." Simple, clear. Risk: pays the same for 92.01% and 99%. - **Proportional to improvement:** "Pay 10,000 sats per 0.1% above baseline." Better incentive alignment. Risk: small improvements earn small amounts (may not motivate). - **Recommended hybrid:** Fixed base payment for reaching threshold + proportional bonus above it. Example: 50,000 sats for reaching 92%, plus 5,000 sats per 0.1% above 92%. 2. **First-past-the-post vs. tournament:** - **First-past-the-post:** First worker to submit an improvement above threshold wins. Simple but discourages incremental improvement and rewards speed over quality. - **Tournament:** All submissions within deadline are evaluated, best wins. Better for quality but workers may withhold improvements until deadline (strategic delay). - **Rolling tournament:** Evaluate each submission as it arrives. If it improves on the current best, lock payment via hold invoice. If a better submission arrives before settlement, cancel previous hold and lock for new best. Workers have incentive to submit early (time value of money). 3. **Multiple concurrent workers:** - Allow parallel exploration with different strategies - Only the best result gets paid (tournament) - Optionally: pay top-N (e.g., top 3 get 50%, 30%, 20% of pool) - Shapley-value based: pay each worker proportional to their marginal contribution to the final result ### 5.2 Verification: Confirming Genuine Improvement **The core challenge:** How does the coordinator know the improvement is real and generalizes, not just overfitting to the eval set? **Multi-layer verification:** 1. **Reproduction check:** Coordinator runs the exact same eval.py on the exact same corpus. Deterministic evaluation (temperature=0, fixed seeds) should produce identical scores. 2. **Holdout evaluation:** Run the mutated file against a **held-out test set** that the worker never sees. This catches overfitting to the eval corpus. ``` Payment structure: 80% released on eval set improvement 20% released after holdout set evaluation (24-48 hours later) ``` 3. **Temporal stability:** Run the eval at multiple time points (same prompt, different API calls if LLM-based). Average over 3-5 runs. This catches non-deterministic gaming. 4. **Human review (spot check):** For prompt optimization bounties, a human reviews the top-scoring submission to verify it's not gaming the metric. This is the Goodhart defense of last resort. 5. **Canary detection:** Include a few "canary" examples in the eval set that are deliberately tricky. If a submission gets 100% on canaries (which are designed to be hard even for a perfect system), it's likely gaming. **Lightning implementation:** ``` 1. Worker submits improved prompt 2. Coordinator locks 80% payment via hold invoice (HTLC, 24h timeout) 3. Coordinator runs eval.py → if score > threshold, settles 80% immediately 4. Coordinator runs holdout eval (next day) → if holdout score > threshold, pays remaining 20% via new invoice 5. If holdout eval fails → 20% is not paid (worker keeps 80%) ``` ### 5.3 Payment Sizing **Cost-based pricing:** - Estimate the compute cost for a reasonable number of experiments - Set bounty to cover costs + margin (1.5-3x compute cost) - Example: email classification bounty requires ~50 experiments, each costing ~$0.30 in API calls = $15 compute. Bounty should be $25-50 to be attractive. **Value-based pricing:** - Estimate the value of the improvement to the bounty poster - Set bounty as fraction of that value (10-50%) - Example: better email classification saves 10 minutes/day of manual triage = ~$50/month. A 5% improvement saves $2.50/month. Over 2 years = $60 value. Bounty: $15-30. **Market-based pricing:** - Post the bounty and let workers decide if it's worth their time - If no takers at current price, increase it - If many takers, decrease it (or add more bounties) ### 5.4 Preventing Metric Gaming (Goodhart's Law) **The fundamental tension:** Any single metric, when optimized aggressively enough, will be gamed. "When a measure becomes a target, it ceases to be a good measure." **Historical examples:** - Karpathy caught autoresearch agents changing random seeds on the first experiment - AI leaderboards (Arena) gamed by selectively showcasing strongest model variants - BLEU scores in machine translation over-optimized at the expense of readability - Delhi's cobra bounties bred snakes for profit (canonical Goodhart example) **Defenses specific to autoresearch bounties:** 1. **Multi-metric scoring:** Don't optimize a single number. Use a weighted composite: ``` score = 0.6 * accuracy + 0.2 * (1 - false_positive_rate) + 0.1 * brevity + 0.1 * holdout_accuracy ``` Gaming requires simultaneously improving all components, which is much harder. 2. **Held-out evaluation set:** Worker never sees the holdout set. Payment partially contingent on holdout performance. If eval score is 95% but holdout is 82%, the submission is rejected or penalized. 3. **Anti-memorization:** Hash the eval examples and check that they don't appear verbatim in the mutated file (prompt). This prevents the obvious attack of embedding the answers. 4. **Semantic review:** For prompt optimization, require that the prompt is human-readable and doesn't contain encoded information. Maximum prompt length constraint. 5. **Diverse eval sets:** Rotate the eval set between rounds. Workers can't overfit to a single fixed set if the set changes. 6. **Red-team evaluation:** Include adversarial examples designed to exploit common gaming strategies. Score these separately. 7. **Budget cap on Goodharting:** Accept that some metric gaming is inevitable. Set a ceiling: "maximum payment is 3x the baseline value" — this limits the reward for extreme gaming while still rewarding genuine improvement. **The Karpathy insight:** "The human's job is program.md." The quality of the bounty specification — the eval set, the metric, the constraints — determines the quality of the resulting optimization. A poorly specified bounty will be gamed. A well-specified bounty channels gaming into genuine improvement. --- ## 6. Synthesis: A Practical Lightning-Native Design ### 6.1 Architecture: Lightning Payment Channel for Gradient Exchange Combining the research above into a concrete, implementable system: ``` ┌──────────────────────────────────────────────────────────┐ │ COORDINATOR NODE │ │ │ │ - Maintains current model state │ │ - Assigns data shards to workers │ │ - Aggregates validated gradients │ │ - LND node with payment channels to workers/validators │ │ │ │ Channels: │ │ ←→ Worker 1 (capacity: 500K sats) │ │ ←→ Worker 2 (capacity: 500K sats) │ │ ←→ Validator 1 (capacity: 100K sats) │ │ ←→ Validator 2 (capacity: 100K sats) │ └──────────────────────────────────────────────────────────┘ Payment flow per training round: 1. Coordinator creates hold invoices for each active worker 2. Workers compute gradients on assigned data 3. Workers submit gradients + commitment hashes 4. Validators randomly selected to evaluate subset of gradients 5. Validators compute loss-reduction scores 6. Coordinator pays validators a flat fee (settled immediately) 7. Workers with quality > threshold: hold invoices settled (payment released) 8. Workers with quality ≤ threshold: hold invoices time out (no payment) 9. Coordinator aggregates validated gradients into model update ``` ### 6.2 Payment Structure **Per-round worker payment:** ``` base_payment = 1000 sats (covers electricity for one round of computation) quality_bonus = max(0, quality_score - threshold) * 500 sats per unit total_payment = base_payment + quality_bonus ``` Where `quality_score = L_before - L_after` on the validator's held-out batch. **Validator payment:** ``` validator_fee = 200 sats per gradient evaluated (flat fee) accuracy_bonus = 100 sats if validator's score agrees with majority of other validators ``` **Anti-Sybil:** - Workers must open a payment channel with minimum capacity (100K sats = ~$50) - This serves as implicit stake — creating many fake identities requires proportional capital - Channels can be reused across rounds (amortize opening cost) ### 6.3 DLC Layer for Milestone Contracts On top of the per-round Lightning payments, use DLCs for longer-term commitments: ``` DLC: Training Milestone Contract Parties: Sponsor (wants model trained) + Worker Pool Oracle: Independent evaluator who runs benchmark at milestones Funding: Sponsor deposits 0.1 BTC CETs: - "Model reaches 60 MMLU by week 2": Pool gets 0.03 BTC - "Model reaches 65 MMLU by week 4": Pool gets 0.03 BTC - "Model reaches 70 MMLU by week 8": Pool gets 0.04 BTC - "No milestone reached": Sponsor gets 0.1 BTC back Oracle attestation: - Oracle runs lm-eval at each deadline - Signs the MMLU score (numeric outcome) - Winning CET is constructed from oracle's signature ``` ### 6.4 What This Design Achieves **Incentive compatibility:** - Workers are paid for quality (loss reduction), not just participation - Free-riders earn nothing (zero quality → threshold not met → timeout) - Gradient poisoners lose their opportunity cost (computed bad gradients, got no payment) - Validators are paid for honest evaluation (agreement with majority) **Trustlessness:** - Payments are conditional on measurable, reproducible metrics - Hold invoices provide automatic refund if conditions aren't met - DLCs provide private, oracle-attested milestone payments - No party can unilaterally seize funds **Limitations:** - Requires an honest majority among validators (same as any BFT system) - Payment channels require upfront capital - HTLC timeout constrains maximum round duration - Scalability limited by Lightning channel capacity and HTLC count ### 6.5 Open Questions for Implementation 1. **Oracle trust model:** Who are the validators and why should they be trusted? Options: staked validators (Bittensor-style), rotating committee, anyone who can reproduce the loss evaluation. 2. **Round timing:** How long should each round last? Too short → communication overhead dominates. Too long → hold invoices lock liquidity too long. 3. **Worker discovery:** How do workers find the coordinator and open channels? Options: Nostr relay announcements, DHT (Hivemind-style), centralized coordinator. 4. **Gradient privacy:** Workers may want to keep their gradients private until paid. Commitment schemes help but add protocol complexity. 5. **Heterogeneous hardware pricing:** Should workers with expensive GPUs get paid more? Pure output-based pricing (Templar approach) is simpler but may exclude workers with weaker hardware even if their contributions are valuable. 6. **Regulatory considerations:** Paying for computation with Bitcoin may have tax and regulatory implications depending on jurisdiction. --- ## References ### Mechanism Design & Federated Learning - [Federated Learning Incentive Mechanism Design via Shapley Value and Pareto Optimality](https://www.mdpi.com/2075-1680/12/7/636) (MDPI Axioms 2023) - [A VCG-based Fair Incentive Mechanism for Federated Learning](https://arxiv.org/abs/2008.06680) (arXiv 2020) - [Incentive-Based Federated Learning: Architectural Elements and Future](https://arxiv.org/pdf/2510.14208) (arXiv 2025) - [Do Data Valuations Make Good Data Prices?](https://arxiv.org/html/2504.05563) (arXiv 2025) - [ICLR 2025 — Influence propagation in decentralized networks](https://proceedings.iclr.cc/paper_files/paper/2025/file/ac8fbba029dadca99d6b8c3f913d3ed6-Paper-Conference.pdf) - [Gradient-Driven Rewards to Guarantee Fairness in Collaborative ML](https://proceedings.neurips.cc/paper/2021/file/8682cc30db9c025ecd3fee433f8ab54c-Paper.pdf) (NeurIPS 2021) ### Bittensor & Decentralized Training - [Bittensor Protocol: Critical and Empirical Analysis](https://arxiv.org/html/2507.02951v1) (arXiv, peer-reviewed) - [Yuma Consensus documentation](https://docs.learnbittensor.org/learn/yuma-consensus) - [Bittensor whitepaper](https://uploads-ssl.webflow.com/5cfe9427d35b15fd0afc4687/6021920718efe27873351f68_bittensor.pdf) - [Templar incentive design](https://docs.tplr.ai/incentive-design/) - [Templar overview (DeepWiki)](https://deepwiki.com/tplr-ai/templar/1-overview) - [Covenant-72B analysis](covenant-72b-analysis.md) (local) ### Validation & Verification - [Proof-of-Learning: Definitions and Practice](https://arxiv.org/abs/2103.05633) (IEEE S&P 2021) - [Proof-of-Learning is Currently More Broken Than You Think](https://arxiv.org/abs/2208.03567) (arXiv 2022) - [A Survey of Zero-Knowledge Proof Based Verifiable Machine Learning](https://arxiv.org/abs/2502.18535) (arXiv 2025) - [Proof of Training via Blockchain](https://link.springer.com/chapter/10.1007/978-3-031-97629-2_15) (Springer 2025) - [Coin.AI: Proof-of-Useful-Work for Distributed Deep Learning](https://pmc.ncbi.nlm.nih.gov/articles/PMC7515252/) (MDPI 2019) ### Byzantine Fault Tolerance - [FLRAM: Robust Aggregation Technique](https://www.mdpi.com/2079-9292/12/21/4463) (Electronics 2023) - [Adaptive Adversaries in Byzantine-Robust FL](https://eprint.iacr.org/2025/510.pdf) (ePrint 2025) - [Dynamic Gradient Filtering with Byzantine Robustness](https://www.sciencedirect.com/science/article/pii/S0167739X24003443) (FGCS 2024) - [SignGuard: Byzantine-robust FL through Collaborative Gradient Filtering](https://arxiv.org/abs/2109.05872) ### Free-Rider Detection - [Free-riders in Federated Learning: Attacks and Defenses](https://arxiv.org/abs/1911.12560) - [FRIDA: Free-Rider Detection using Privacy Attacks](https://arxiv.org/abs/2410.05020) - [Contributions Estimation in Federated Learning](https://www.vldb.org/pvldb/vol17/p2077-li.pdf) (VLDB 2024) ### Bitcoin / Lightning - [Point Time Locked Contracts (PTLCs)](https://bitcoinops.org/en/topics/ptlc/) (Bitcoin Optech) - [Hash Time Locked Contracts (HTLCs)](https://bitcoinops.org/en/topics/htlc/) (Bitcoin Optech) - [Discreet Log Contracts](https://bitcoinops.org/en/topics/discreet-log-contracts/) (Bitcoin Optech) - [DLCs on Lightning Network](https://suredbits.com/discreet-log-contracts-on-lightning-network/) (Suredbits) - [Lightning DLC Channels](https://hackmd.io/@lpQxZaCeTG6OJZI3awxQPQ/LN-DLC) (HackMD) - [Oracle-based Conditional Payments on Bitcoin](https://blog.dlcmarkets.com/oracle-based-conditional-payments-on-bitcoin/) (DLC Markets) - [Hodl Contracts — Oracle and Escrow for Lightning](https://github.com/supertestnet/hodlcontracts) - [Hold Invoices on Lightning](https://voltage.cloud/blog/understanding-hold-invoices-on-the-lightning-network/) (Voltage) ### Volunteer Computing - [BOINC Incentive System Design](https://boinc.berkeley.edu/boinc_papers/credit/text.php) - [BOINC Credit System](https://en.wikipedia.org/wiki/BOINC_Credit_System) (Wikipedia) - [Folding@home](https://en.wikipedia.org/wiki/Folding@home) (Wikipedia) - [Hivemind: Decentralized Deep Learning](https://github.com/learning-at-home/hivemind) (GitHub) ### Goodhart's Law & Metric Gaming - [Goodhart's Law in Reinforcement Learning](https://arxiv.org/abs/2310.09144) (ICLR 2024) - [Measuring Goodhart's Law](https://openai.com/index/measuring-goodharts-law/) (OpenAI) - [Reliance on Metrics is a Fundamental Challenge for AI](https://www.sciencedirect.com/science/article/pii/S2666389922000563) (Patterns 2022) ### GPU Heterogeneity - [Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUs](https://arxiv.org/html/2502.00722v1) (arXiv 2025) - [Fast and Fair Training in Heterogeneous GPU Clusters](https://dl.acm.org/doi/10.1145/3721145.3728488) (ICS 2025) ### Sybil Resistance - [Zero-Knowledge Proof-of-Identity](https://arxiv.org/abs/1905.09093) (arXiv) - [DCI Discreet Log Contracts](https://www.dci.mit.edu/projects/discreet-log-contracts) (MIT DCI) --- ## Lightning ML Coordination # Lightning Network for ML Coordination Payments — Technical Research > Compiled 2026-03-12. Sources: Lightning Labs documentation, GitHub repositories, > Lightning Labs blog posts (Feb-Mar 2026), CoinLaw statistics, Bitcoin Magazine, > academic papers, protocol specifications, web research. > For whitepaper comparing Lightning as a coordination layer for decentralized AI training. --- ## 1. L402 Protocol Deep Dive ### 1.1 How L402 Works L402 (formerly LSAT — Lightning Service Authentication Token) activates HTTP's long-dormant `402 Payment Required` status code by combining it with Lightning Network payments and Macaroon-based authentication. **Request flow:** ``` 1. Client → GET /api/data → Server 2. Server → 402 Payment Required WWW-Authenticate: L402 token=, invoice=, version=0 3. Client parses challenge: a. Extracts macaroon (contains payment_hash as caveat) b. Extracts BOLT 11 invoice c. Pays invoice via Lightning → receives preimage 4. Client → GET /api/data Authorization: L402 : 5. Server verifies: a. Validates macaroon signature chain (local computation, root key only) b. Checks sha256(preimage) == payment_hash (single hash check) c. Evaluates all caveats (expiry, service tier, spending limits) → No database lookup. No RPC to blockchain node. No external service. 6. Server → 200 OK + response data 7. Client caches macaroon:preimage pair for subsequent requests ``` **Key insight:** After initial payment, verification is a **local computation** — one SHA-256 hash plus macaroon signature validation. No database, no RPC, no external dependency. This is what makes L402 viable for high-frequency agent interactions. ### 1.2 Macaroon Structure ``` Macaroon { version: 0 // Protocol version (currently 0) user_id: // Per-user/agent identifier payment_hash: <32-byte-hash> // Links to Lightning invoice location: "service.example.com" // Issuing service (optional) caveats: [ // Restriction chain "services=data:0,compute:1" // Service access + tier "capabilities=read,write" // Allowed operations "valid_until=2026-04-01T00:00:00Z" // Time-based expiry "spend_limit=500" // Spending cap (sats) // Any custom key=value pair ] signature: // Chained HMAC signatures } ``` **Caveat mechanics:** Each caveat is a `key=value` pair. Caveats are cryptographically chained — a holder can **attenuate** (add restrictions) without contacting the issuer, but cannot remove existing caveats. This enables: - Parent agent creates `pay-only, 500-sat cap` macaroon - Passes to worker agent, which appends `valid_until=+1h` - Result: cryptographically valid credential, tighter permissions, zero round trips **Authentication header format:** ``` Authorization: L402 [,...]: ``` Multiple macaroons can be comma-separated before the colon (for multi-service auth). ### 1.3 L402 Request Latency Breakdown | Phase | Latency | Notes | |-------|---------|-------| | Initial HTTP request | ~50ms | Standard HTTPS round trip | | 402 response parsing | <1ms | Local string parsing | | Lightning invoice payment | 100-500ms | Route finding + HTLC settlement | | Preimage extraction | <1ms | Returned with payment confirmation | | Retry with auth header | ~50ms | Standard HTTPS round trip | | Server-side verification | <1ms | Single SHA-256 + HMAC chain | | **Total first request** | **~200-600ms** | Dominated by Lightning payment | | **Subsequent requests** | **~50ms** | Cached token, no payment needed | Measured data points: - Lightning payment settlement: **182ms average** for local/1-hop payments (academic measurement) - Record $1M payment: settled in **0.43 seconds** (Secure Digital Markets → Kraken, Jan 2026) - Optimal routing conditions: **< 500ms** end-to-end - L402 verification: **local computation only** — "the difference between 'verify locally with math' and 'call an external service' compounds fast for agents making thousands of API calls per minute" (Lightning Labs) ### 1.4 Streaming Payments L402 is primarily **request-response**, but the Lightning Network supports several streaming/recurring patterns: | Pattern | How It Works | L402 Compatible | |---------|-------------|-----------------| | **Keysend** | Push payment, no invoice needed. Sender knows recipient pubkey. | Parallel to L402, not via 402 flow | | **HODL Invoices** | Invoice held open, settled later. Enables escrow/conditional payment. | Yes — for deferred settlement | | **AMP (Atomic Multi-Path)** | Spontaneous payments, no prior invoice. Multiple paths. | Parallel to L402 | | **Subscription macaroons** | Single payment, time-bounded access via caveat | Yes — native L402 pattern | | **"upto" pricing** | Consumption-based: pay per token during LLM inference | Proposed for L402 | | **"deferred" batching** | High-frequency micropayments batched, settled periodically | Proposed for L402 | | **Offers (BOLT 12)** | Reusable payment endpoints, persistent merchant identity | Complementary — maturing | **For ML coordination:** The most relevant patterns are: - **Per-contribution payments** via standard L402 (gradient submission → 402 → pay → submit) - **Keysend** for coordinator → worker reward pushes (no invoice round trip) - **HODL invoices** for escrow-style gradient bounties (pay on verified improvement) - **Deferred batching** for high-frequency micro-rewards (batch 70 payments/round) ### 1.5 Existing L402 Services & Marketplaces **Production deployments:** - **Lightning Loop** — Non-custodial on/off ramp, L402 via Aperture since initial release (~5 years) - **Lightning Pool** — Channel marketplace, L402 authentication - **Aperture** — Open-source L402 reverse proxy (Go), production-grade, supports gRPC + REST - **Fewsats** — L402 tools directory, agent-01 dynamic tool discovery, L402-python SDK - **Fewsats Amazon MCP** — L402-gated product search/purchase for AI agents - **tools.l402.org** — L402 service directory for agent discovery - **LightningProx** — "The Payment Layer for AI" — L402 proxy service - **SatGate** — L402 gateway with spending limit enforcement (HTTP 402 when budget exceeded) **L402 SDKs/Libraries:** - `aperture` (Go) — reverse proxy - `L402-python` (Python, Fewsats) — client SDK - `lightning-agent-tools` (Lightning Labs) — full agent toolkit - `lightning-wallet-mcp` — MCP server with wallet - `lightning-enable-mcp` — MCP server for LN payments - `lightning-tools-mcp-server` (Alby) — Lightning address tools ### 1.6 Specification Status L402 is **not** an IETF RFC. Current status: - **Specification repository:** `github.com/lightninglabs/L402` (formerly LSAT) - **Format:** "intended to be along the lines of the document we would submit to a standards committee" - **bLIP submission:** bLIP-0026 (Bitcoin Lightning Improvement Proposal) — pull request at `github.com/lightning/blips/pull/26` - **RFC 7235 compliance:** L402 builds on the existing IETF Authentication Framework (RFC 7235) - **Version field:** Recently added (`version=0`) for forward compatibility - **HTTP 402:** The status code itself is in RFC 7231 but was "reserved for future use" — L402/x402 are the first serious activations **Competing standard:** Coinbase's **x402** protocol (Oct 2025) — same HTTP 402 activation but stablecoin-based (USDC on Base/Polygon/Solana), backed by the x402 Foundation + Cloudflare integration. Free tier: 1,000 txns/month, then $0.001/txn. The x402 whitepaper is published at `x402.org/x402-whitepaper.pdf`. --- ## 2. Lightning Payment Channels for Recurring Peers ### 2.1 Channel Capacity for 70 Peers at 70-Second Intervals **Scenario:** 70 peers each pay a coordinator every ~70 seconds. One payment per peer per round. **Assumptions for calculation:** - Payment size: 100-10,000 sats per gradient contribution (~$0.10-$10 at ~$100K BTC) - Round interval: 70 seconds - Payments per channel per day: 86,400/70 = ~1,234 payments - All payments flow peer → coordinator (unidirectional drain) **Channel capacity requirements:** | Payment Size | Daily Volume/Channel | Rebalance Frequency | Min Channel Capacity | |-------------|---------------------|---------------------|---------------------| | 100 sats | 123,400 sats/day | Weekly | ~900,000 sats (~$90) | | 1,000 sats | 1,234,000 sats/day | Weekly | ~9,000,000 sats (~$900) | | 10,000 sats | 12,340,000 sats/day | Weekly | ~90,000,000 sats (~$9,000) | **Key constraint:** Lightning channels are **unidirectional in capacity** — as peer pays coordinator, peer's local balance decreases and coordinator's increases. The channel drains from peer side. Options: 1. **Circular rebalancing:** Coordinator routes payments back to peers (via other channels) 2. **Periodic on-chain settlement:** Close and reopen channels when drained 3. **Bidirectional flow:** If coordinator also pays peers (reward distribution), capacity naturally rebalances 4. **Channel splicing (2025+):** Add/remove funds without closing channel For a decentralized training system where the coordinator both collects gradient fees and distributes rewards, **bidirectional flow naturally maintains balance** — this is the ideal topology. ### 2.2 Pre-Opening Channels to Known Peers **Yes, absolutely.** This is the recommended pattern for known, recurring peers. Benefits: - **Eliminates on-chain confirmation delay** (10+ minutes per channel open, ~$2-20 in fees) - **Removes routing hops** — direct channel = 1 hop, ~0ms routing overhead - **Lower fees** — no intermediary routing fees (median: 63 ppm / ~0.006%) - **Higher reliability** — 99.7%+ success rate on direct channels vs. multi-hop - **Deterministic latency** — no pathfinding needed **Channel opening costs:** - On-chain transaction: depends on mempool congestion - Typical: 2,000-10,000 sats ($2-10) per channel open - Time: 1-6 Bitcoin block confirmations (10-60 minutes) - For 70 peers: ~70 on-chain transactions = batch with `openchannel` in a single block **Wumbo channels:** No protocol-level size limit (removed 2020). Practical limits are operator-configured. Exchanges typically cap at 0.1-0.2 BTC per channel for routing reliability, but direct peer channels can be larger. **With channel splicing (production 2025):** Can add/remove capacity from existing channels without closing them, significantly reducing on-chain overhead for long-running peer relationships. ### 2.3 Multi-Path Payments (MPP) for Larger Gradient Bounties MPP splits a single payment into smaller parts, routes through multiple channels, and the recipient atomically reconstructs the full payment. **Relevance to gradient bounties:** - A 1,000,000 sat ($1,000) bounty can split across 10 channels of 100,000 sats each - Reduces individual channel capacity requirements - Improves success rate for large payments (no single channel needs full amount) - Privacy benefit: harder to correlate split payment parts **Two MPP variants:** 1. **Basic MPP (BOLT):** Receiver generates invoice, sender splits across paths. Atomic — all parts settle or none do. 2. **AMP (Atomic Multi-Path, lnd):** Sender-initiated, no invoice needed. Better for push-style reward distribution. **For ML coordination:** AMP is preferable for coordinator → worker reward payments (push model), while basic MPP suits worker → coordinator gradient submissions (invoice model with L402). ### 2.4 Channel Rebalancing Overhead For sustained bidirectional payments (gradient fees + reward distribution): | Rebalancing Method | Cost | Latency | Automation | |-------------------|------|---------|------------| | Circular rebalance | Routing fees only (~63 ppm) | Seconds | Fully automated (lnd auto-rebalance) | | Loop Out (on-chain) | On-chain fee + service fee | 10+ min | Automated via Lightning Loop | | Channel splice | On-chain fee | 1 confirmation | Semi-automated (2025+) | | Submarine swap | On-chain fee + swap fee | 10+ min | Automated | **Natural rebalancing advantage:** If the coordinator: 1. Collects gradient contribution fees from workers (inbound flow) 2. Distributes rewards back to workers (outbound flow) Then channels naturally rebalance without any explicit rebalancing operations. The ratio of collection to distribution determines drift. If symmetric, channels self-maintain. Design the fee/reward structure to approximate symmetry. **Auto-rebalancing tools:** lnd supports automated channel management, and tools like Lightning Loop handle automated rebalancing. In 2025, channel splicing and asynchronous payments further reduced the need for manual intervention. --- ## 3. Lightning + Automation (Agent Payments) ### 3.1 Lightning Labs `lightning-agent-tools` Released February 11, 2026. MIT license. Official toolkit for AI agent payments. **7 composable skills + MCP server:** | Skill | Function | |-------|----------| | `lnd` | Run Lightning node (Neutrino light client, SQLite, Docker) | | `lightning-security-module` | Remote signer (keys on separate hardware) | | `macaroon-bakery` | Bake scoped credentials (5 preset roles) | | `lnget` | CLI HTTP client with auto L402 payment | | `aperture` | L402 reverse proxy to monetize API endpoints | | `commerce` | Meta-skill orchestrating buyer/seller workflows | | `lightning-mcp-server` | 18 read-only tools via MCP over Lightning Node Connect | **MCP server tools (18 read-only):** Query balances, channels, invoices, payments, network graph. Connected via LNC (encrypted WebSocket tunnels, pairing-phrase auth, no credentials stored on disk). **Installation paths:** ```bash # Zero-install MCP (read-only node access): claude mcp add --transport stdio lnc -- npx -y @lightninglabs/lightning-mcp-server # Full plugin (all 7 skills): claude plugin marketplace add lightninglabs/lightning-agent-tools # Full stack (Docker node + commerce): # Clone repo → install scripts → Docker Compose ``` ### 3.2 Autonomous Agent Capabilities **What agents can do today:** - Open channels (via `lnd` skill) - Manage liquidity (query balances, rebalance) - Make payments (via `lnget` — auto L402) - Host paid endpoints (via `aperture`) - Scope sub-agent permissions (via `macaroon-bakery`) - Query node state (via MCP — 18 tools) **What requires human oversight:** - Initial node funding (on-chain BTC deposit) - Remote signer setup (Tier 1 — keys on separate hardware) - Channel capacity decisions (how much to allocate) **Three Lightning backends for agents:** | Backend | Setup | Use Case | |---------|-------|----------| | Direct gRPC to local lnd | Full node | Production | | Lightning Node Connect (LNC) | Pairing phrase, WebSocket | Remote access, no direct network | | Embedded Neutrino light wallet | Zero external deps | Quick experiments, regtest | ### 3.3 `lnget` Payment Flow Latency ``` lnget --max-cost 500 https://api.example.com/data.json 1. HTTP GET → api.example.com ~50ms 2. Receive 402 + WWW-Authenticate header ~50ms 3. Parse macaroon + BOLT 11 invoice <1ms 4. Check --max-cost against invoice amount <1ms 5. Pay Lightning invoice via configured backend 100-500ms 6. Cache macaroon:preimage pair <1ms 7. Retry GET with Authorization: L402 header ~50ms 8. Receive 200 OK ~50ms ───────────────────────────────────────────────────────── First request total: ~300-700ms Subsequent requests (cached token): ~50ms ``` The `--max-cost` flag enforces a per-request spending ceiling. The macaroon-bakery can additionally enforce node-level spending caps. ### 3.4 Security Model for Autonomous Payments **Defense in depth via macaroon scoping:** | Tier | Macaroon Role | Permissions | Use Case | |------|--------------|-------------|----------| | pay-only | Can send payments | No channel mgmt, no fund extraction | Buyer agents | | invoice-only | Can create invoices | No spending, no channel access | Seller agents | | read-only | Can query state | No payments, no modifications | Monitoring | | channel-admin | Can manage channels | No fund extraction | Node management | | signer-only | Can sign transactions | Isolated on separate hardware | Production (Tier 1) | **Security tiers:** | Tier | Mode | Key Location | Risk Level | |------|------|-------------|------------| | 1 (production) | Watch-only node + remote signer | Separate hardware | Lowest | | 2 (testing) | Local keys, restricted perms | Agent machine | Medium | | 3 (observation) | Read-only MCP via LNC | Ephemeral keypairs | Minimal | **Spending controls:** - `--max-cost` on `lnget`: per-request ceiling - Macaroon caveats: total spending caps, time-based expiry - Node-level budget: channel capacity = hard ceiling - SatGate: HTTP 402 when agent hits spend limit (before reaching upstream) **Delegation pattern for multi-agent systems:** ``` Human → creates 10,000 sat budget macaroon └→ Coordinator agent → attenuates to 500 sat cap, +1h expiry └→ Worker agent → can spend up to 500 sats in next hour ``` Each level can only restrict further, never expand permissions. --- ## 4. Comparison with Alternatives ### 4.1 Lightning vs. Ethereum L2s | Property | Lightning (Bitcoin) | Base/Arbitrum/Optimism (Ethereum L2) | |----------|-------------------|--------------------------------------| | **Settlement speed** | < 1 second (direct channel) | ~2 seconds (L2 block) | | **Hard finality** | Instant (channel update is final) | ~13 minutes (L1 finalization) | | **Typical fee** | < 1 sat (~$0.001) routing | $0.15-0.50 per transaction | | **Minimum viable payment** | 1 sat (~$0.001) | ~$0.01 (gas floor) | | **Channel/account setup** | On-chain tx (10+ min, $2-20) | Account abstraction (free) | | **Auth protocol** | L402 (macaroons, mature) | x402 (stablecoins, new - Oct 2025) | | **Privacy** | Onion routing, no public ledger | All transactions on public chain | | **Programmability** | Limited (HTLCs, scripts) | Full EVM (smart contracts) | | **Agent tooling** | lightning-agent-tools (Feb 2026) | x402 SDK, CDP (Coinbase) | | **Denomination** | BTC (volatile) | USDC/USDT (stable) | | **Network capacity** | ~5,600 BTC (~$490M, Dec 2025) | $20B+ TVL across L2s | **Lightning advantages for ML coordination:** - Sub-second finality (critical for 70-second training rounds) - Sub-cent fees (viable for per-gradient micropayments) - Privacy (competitors can't see gradient marketplace activity) - No account abstraction overhead - Macaroon-based auth enables hierarchical agent delegation **L2 advantages:** - Stablecoin denomination (no BTC volatility risk for compute pricing) - Smart contract programmability (on-chain verification of gradient quality) - Larger ecosystem liquidity - No channel capacity management - x402 has Cloudflare + Coinbase backing ### 4.2 Lightning vs. Solana | Property | Lightning | Solana | |----------|-----------|--------| | **TPS** | Millions (off-chain, no global consensus) | 600-700 real TPS | | **Latency** | < 500ms | ~400ms (block time) | | **Fees** | < $0.001 | $0.001-0.01 | | **Uptime** | 99.9%+ (no shared state) | ~99% (multiple outages historically) | | **Privacy** | High (onion routing) | None (public ledger) | | **Programmability** | Limited | Full (Rust programs) | | **Agent infra** | lightning-agent-tools | Solana Agent Kit | **For ML coordination:** Lightning and Solana are comparable on speed and fees. Lightning wins on privacy and fault isolation (no global consensus needed). Solana wins on programmability and stablecoin availability. ### 4.3 Lightning vs. Bittensor TAO | Property | Lightning L402 | Bittensor TAO | |----------|---------------|---------------| | **Payment model** | Per-contribution micropayment | Block emission rewards (41/41/18 split) | | **Settlement** | Sub-second, per gradient | Per-block (~12 seconds) | | **Consensus** | None needed (bilateral channels) | Yuma Consensus (stake-weighted) | | **Entry barrier** | Open channels (~$10 setup) | Stake TAO tokens (thousands of $) | | **Denomination** | BTC (or future Taproot Assets) | TAO token | | **Incentive alignment** | Direct market pricing (pay for value) | Emission-based (inflationary) | | **Validator overhead** | None (coordinator verifies directly) | High (scoring, weight setting) | | **Subnet flexibility** | N/A (any topology) | Subnet registration required | | **Privacy** | High | Low (on-chain weights visible) | | **Network effects** | Bitcoin (largest, most liquid) | TAO ecosystem only | | **Sybil resistance** | Channel capacity as stake | TAO staking | **Key technical tradeoffs:** 1. **Direct pricing vs. emission model:** Lightning enables **per-gradient market pricing** — a coordinator posts a bounty, contributors bid, payment is proportional to measured improvement. Bittensor uses fixed block emissions split by validator scoring, which creates indirection between value created and payment received. 2. **Consensus overhead:** Bittensor's Yuma Consensus requires validators to score miners, compute stake-weighted medians, clip outliers, and settle on-chain every block. Lightning has **zero consensus overhead** — payment is a bilateral channel update. For 70 peers at 70-second intervals, this difference is substantial. 3. **Capital requirements:** Bittensor requires TAO token staking (validators need significant stake for meaningful weight). Lightning requires channel capacity (fundable with BTC, reusable across many payments). Lightning's capital is **productive** (earns routing fees); TAO staking is **passive** (earns emissions). 4. **Latency:** Bittensor's consensus loop adds ~12 seconds per block plus validator scoring overhead. Lightning is sub-second. For iterative ML training where each round depends on the previous, this 10x+ latency difference matters. 5. **Flexibility:** Lightning imposes no subnet structure or registration requirement. Any topology works. Bittensor requires subnet registration ($$$) and conforming to the validator/miner/owner emission split. ### 4.4 Lightning vs. Stripe/PayPal | Property | Lightning | Stripe | PayPal | |----------|-----------|--------|--------| | **Min viable payment** | ~$0.001 (1 sat) | $0.50 (fee floor) | $0.05 (micropayment rate) | | **Fee structure** | ~0.006% (63 ppm median) | 2.9% + $0.30 | 5% + $0.05 (micropay) | | **Settlement** | Instant (seconds) | 2 business days | Instant (PayPal balance) | | **Global access** | Permissionless | 47 countries, KYC | 200+ countries, KYC | | **Agent autonomy** | Full (macaroon-scoped) | Requires API key + human entity | Requires account + human entity | | **Programmable auth** | L402 macaroons (hierarchical) | API keys (flat) | OAuth (flat) | | **$0.01 payment cost** | ~$0.000001 fee | $0.30 fee (3,000% of value) | $0.05 fee (500% of value) | | **Identity required** | None | Business entity | Business entity | | **Chargeback risk** | None (final settlement) | Yes | Yes | **What Lightning uniquely enables for compute marketplaces:** 1. **Sub-cent micropayments are economically viable.** At 63 ppm routing fees, a 100-sat (~$0.10) payment costs 0.0063 sats in fees. Stripe would charge $0.30. This is the difference between "per-gradient payment" being feasible or not. 2. **No identity requirement.** Anonymous contributors can participate. Compute marketplace doesn't need KYC/AML for every GPU provider. 3. **No chargeback risk.** Lightning payments are final. A coordinator can't reverse payment after receiving a valid gradient. 4. **Hierarchical agent delegation.** Macaroon attenuation enables a coordinator to issue scoped spending credentials to sub-agents — impossible with Stripe API keys. 5. **No business entity overhead.** An autonomous agent can transact without incorporating a legal entity. Critical for permissionless compute marketplaces. --- ## 5. Real-World Lightning Stats (2025-2026) ### 5.1 Network Capacity & Infrastructure | Metric | Value | Date | Source | |--------|-------|------|--------| | Public capacity | 5,606 BTC (~$490M) | Dec 2025 (ATH) | Bitcoin Visuals / Bitcoin Magazine | | Public nodes | ~12,600-16,300 | 2025 | 1ML / CoinLaw | | Public channels | ~41,000-44,000 | 2025 | 1ML / CoinLaw | | Avg channel capacity | Grown 214% over 4 years | 2025 | CoinLaw | | Capacity Gini coefficient | ~0.97 | 2025 | CoinLaw (high inequality) | | Private channels | Unknown (significant additional capacity) | — | By design | **Note on capacity decline concern:** Public capacity declined ~20% in mid-2025 (from ~5,300 to ~4,100 BTC) before recovering to ATH. Analysis suggests this reflects consolidation into fewer, larger, better-managed channels rather than abandonment. Average channel size increased proportionally. ### 5.2 Transaction Volume & Performance | Metric | Value | Date | Source | |--------|-------|------|--------| | Monthly volume | $1.17B (ATH) | Nov 2025 | Bitcoin Magazine | | Monthly transactions | ~5.2M | Nov 2025 | CoinLaw | | YoY volume growth | +266% | 2025 | CoinLaw | | Average transaction size | $223 | 2025 | CoinLaw (2x YoY increase) | | Monthly transactions (early 2025) | 8M+ | Q1 2025 | CoinLaw | | Payment success rate | 99.7% | Aug 2023 (308K txns) | CoinLaw | | Success rate (exchanges, < $10K) | > 99% | 2025 | Exchange reports | | Settlement time (optimal) | < 500ms | 2025 | Lightning Labs | | Settlement time (measured avg) | 182ms (1-hop) | Academic study | Suredbits | | Record single payment | $1M in 0.43s | Jan 28, 2026 | Cryptopolitan / Bitcoin Magazine | ### 5.3 Fee Structure | Metric | Value | Source | |--------|-------|--------| | Median base fee | 0.999839 sats | Glassnode | | Median fee rate | 63 ppm (0.0063%) | Glassnode | | 1-hop average fee | ~0.15% | CoinLaw | | 5+ hop average fee | ~6.90% | CoinLaw | | Fee for 1M sat payment (~$1K) | $0.39-1.27 | Exchange data | | Cash App fee (anomalous) | 2,147,483,647 ppm | Block's nodes (not representative) | ### 5.4 Largest Known Automated Payment Systems | System | Description | Scale | |--------|-------------|-------| | **Cash App** | Square/Block, 1 in 4 BTC payments via LN | Millions of users | | **Binance** | LN deposits/withdrawals | Institutional channel capacity | | **OKX** | LN integration | Institutional capacity | | **Kraken** | Received $1M single LN payment | Institutional grade | | **Lightning Loop** | Automated liquidity management (L402) | 5 years production | | **Bitfinex** | LN for settlements | Enterprise volumes | | **CoinGate** | Merchant payment processing via LN | Thousands of merchants | **Estimated Lightning wallet users:** 1.8-3.7M wallet downloads (2023 data), with significant growth since Cash App + exchange integrations. One enterprise wallet reported 1.8M users with 100% of BTC transactions on Lightning. ### 5.5 Network Trends Relevant to ML Coordination 1. **Institutional channels dominate.** Capacity is concentrating in large, reliable nodes (Gini ~0.97). Good for ML coordination — a coordinator node would be a large, well-connected hub by design. 2. **Channel splicing is production-ready (2025).** Add/remove capacity without closing channels. Critical for long-running training peer relationships. 3. **Multi-path payments are standard.** Large gradient bounties can split across channels without requiring massive individual channel capacity. 4. **Routing algorithm improvements in 2025** reduced payment failures. Combined with direct channels to known peers, expect 99.9%+ reliability. 5. **USDT on Lightning (Jan 2025, Tether).** Stablecoin-denominated payments on Lightning — potentially addresses the BTC volatility concern for compute pricing. Implemented via Taproot Assets protocol (Lightning Labs, formerly Taro). --- ## 6. Synthesis: Lightning for Decentralized ML Coordination ### 6.1 Viability Assessment | Requirement | Lightning Capability | Rating | |------------|---------------------|--------| | Sub-second payment (70s rounds) | 182ms avg, < 500ms | Excellent | | Sub-cent micropayments | < 1 sat fee at 63 ppm | Excellent | | 70 concurrent peers | Direct channels, MPP | Good (requires upfront channel setup) | | Autonomous agent operation | lightning-agent-tools (Feb 2026) | Good (maturing) | | Privacy | Onion routing, no public ledger | Excellent | | Hierarchical delegation | Macaroon attenuation | Excellent | | Stablecoin option | USDT on LN (Taproot Assets, Jan 2025) | Emerging | | Smart contract verification | Limited (HTLC scripts only) | Weak (needs off-chain verifier) | | Permissionless entry | Channel open (~$10) | Good | | Fault isolation | Bilateral channels, no global state | Excellent | ### 6.2 Recommended Architecture ``` Coordinator Node (LND, Neutrino or full node) ├── Aperture reverse proxy (L402-gated gradient submission endpoint) ├── 70 direct channels to known peers (pre-opened, wumbo if needed) ├── Macaroon-bakery (scoped credentials per worker tier) ├── Auto-rebalancer (circular via peer channels) └── MCP server (status monitoring) Worker Nodes (lnd light client via Neutrino) ├── lnget client (auto-pays L402 for gradient submission) ├── Direct channel to coordinator (pre-opened) ├── pay-only macaroon (scoped, capped) └── Receive rewards via Keysend (push from coordinator) ``` ### 6.3 Open Questions for Whitepaper 1. **BTC volatility for compute pricing.** USDT on Lightning (Taproot Assets) is the answer, but adoption is early. Alternative: price in USD, settle in BTC at market rate per round. 2. **Gradient verification on-chain.** Lightning can't verify gradient quality on-chain (no smart contracts). This must be handled by the coordinator off-chain, with economic incentives (reputation, staking via channel capacity) for honesty. 3. **Channel capacity bootstrapping.** 70 channels at 1M sats each = 70M sats (~$70K) of locked capital. LSP (Lightning Service Provider) could provide initial inbound liquidity. Lightning Pool marketplace for channel leasing. 4. **Comparison with x402.** Coinbase's x402 + Cloudflare is a serious competitor. Stablecoin-native, EVM-programmable, institutional backing. The tradeoff is privacy (L402) vs. stability (x402) vs. programmability (x402). --- ## Sources - [L402 Protocol Specification — Lightning Labs](https://docs.lightning.engineering/the-lightning-network/l402/protocol-specification) - [L402 Builder's Guide — Lightning Labs](https://docs.lightning.engineering/the-lightning-network/l402) - [L402 GitHub Specification — Lightning Labs](https://github.com/lightninglabs/L402) - [L402 for Agents Blog Post — Lightning Labs (Mar 2026)](https://lightning.engineering/posts/2026-03-11-L402-for-agents/) - [Lightning Agent Tools Announcement — Lightning Labs (Feb 2026)](https://lightning.engineering/posts/2026-02-11-ln-agent-tools/) - [Lightning Agent Tools GitHub — Lightning Labs](https://github.com/lightninglabs/lightning-agent-tools) - [Bitcoin Lightning Network Usage Statistics 2026 — CoinLaw](https://coinlaw.io/bitcoin-lightning-network-usage-statistics/) - [Lightning Network Statistics — 1ML](https://1ml.com/statistics) - [Lightning Network Statistics — Bitcoin Visuals](https://bitcoinvisuals.com/lightning) - [Lightning Network Capacity ATH — Bitcoin Magazine](https://bitcoinmagazine.com/markets/bitcoins-lightning-network-capacity-hits-new-all-time-high) - [$1B Monthly Volume — Bitcoin Magazine](https://bitcoinmagazine.com/news/bitcoins-lightning-network-surpasses) - [$1M Payment Record — Cryptopolitan](https://www.cryptopolitan.com/lightning-network-payment-record/) - [Multi-Path Payments — Lightning Labs](https://docs.lightning.engineering/the-lightning-network/pathfinding/multipath-payments-mpp) - [Channel Rebalancing — Lightspark](https://www.lightspark.com/glossary/channel-rebalancing) - [Aperture L402 Reverse Proxy — GitHub](https://github.com/lightninglabs/aperture) - [Awesome L402 — Fewsats](https://github.com/Fewsats/awesome-L402) - [L402 Protocol Docs — l402.org](https://docs.l402.org/) - [x402 Protocol — Coinbase](https://www.x402.org/) - [x402 Whitepaper](https://www.x402.org/x402-whitepaper.pdf) - [x402 Cloudflare Integration](https://blog.cloudflare.com/x402/) - [Bittensor Documentation](https://docs.learnbittensor.org/) - [Bittensor TAO Economy — Opentensor Foundation](https://blog.bittensor.com/tao-token-economy-explained-17a3a90cd44e) - [Lightning Routing Yields — Atlas21](https://atlas21.com/lightning-routing-yields-10-annually-blocks-announcement/) - [Macaroons on Lightning — Voltage](https://voltage.cloud/blog/what-are-macaroons-and-how-do-they-work-on-lightning-network) - [Wumbo Channels — CoinDesk](https://www.coindesk.com/tech/2020/08/20/ready-to-wumbo-lnd-enables-more-larger-bitcoin-transactions-on-lightning) - [Channel Splicing — Fidelity Digital Assets](https://www.fidelitydigitalassets.com/research-and-insights/introduction-channel-splicing-bitcoins-lightning-network) - [Lightning Network Throughput — Voltage](https://voltage.cloud/blog/how-many-transactions-can-the-lightning-network-handle) - [Google DeepMind Macaroon Agent Delegation — Dev Journal](https://earezki.com/ai-news/2026-03-11-what-google-deepmind-gets-right-about-agent-delegation-and-what-satgate-already-built/) - [Blockchain Transaction Speed & Costs 2026 — DollarPocket](https://www.dollarpocket.com/blockchain-transaction-speed-costs-2026) - [Ethereum L2 Fee Comparison — L2fees.info](https://l2fees.info/) --- ## Implementation Roadmap # l402-train — Implementation Plan ## Project Philosophy This is a research prototype, not a production system. The goal is to prove the core thesis — "Lightning micropayments can coordinate gradient quality in decentralized training" — with real code, real payments, and real numbers. Start small, validate incrementally, publish results. Related: [Whitepaper](whitepaper/Lightning-Coordinated%20Decentralized%20AI%20Training.md) | [Autoresearch project](../autoresearch/) --- ## Phase 0: Local End-to-End Loop (2 weeks) **Goal:** Single-machine simulation running the complete protocol loop: local training → gradient compression → validation scoring → payment settlement. All on the MacBook with regtest Lightning. **Why this first:** Before involving any networking, peers, or real money, prove the software architecture works end-to-end. Get a tight eval loop running fast. ### Components 1. **`sparseloco.py`** — SparseLoCo compression in MLX - Top-k sparsification (k=64 per chunk of 4096) - 2-bit quantization of selected values - Index encoding - Error feedback buffer (decay=0.95) - Test on Qwen2.5-0.5B. Train locally for 30 steps, compute pseudo-gradient (weight diff), compress, decompress, verify round-trip fidelity - Metric: compression ratio achieved + loss degradation from compress/decompress round-trip vs dense gradient 2. **`validator.py`** — Gauntlet-style loss scoring - Take compressed gradient, decompress, apply to model checkpoint - Measure loss on held-out validation batch before and after - Output: quality score (loss delta) normalized against baseline - Pure function: `f(checkpoint, gradient, val_data) → score`. Deterministic replay is free 3. **Regtest Lightning** — Two LND nodes in Docker via `lightning-agent-tools` - Coordinator node + simulated peer node (Tier 2 security — local keys, restricted perms) - Create payment channel between them - Test: issue hold invoice → pay → settle on validation pass / expire on fail 4. **`protocol_sim.py`** — Single-machine protocol loop ``` for round in range(N): 1. Peer trains locally for 30 steps (MLX) 2. Peer compresses pseudo-gradient (sparseloco.py) 3. Coordinator issues hold invoice for reward 4. Peer "submits" gradient (local function call, no HTTP) 5. Coordinator validates (validator.py) → quality_score 6. If quality_score > threshold: settle hold invoice 7. Else: let hold invoice expire 8. Log: round, quality_score, payment_settled, compression_ratio ``` ### Validates - SparseLoCo compression works on MLX (not just PyTorch/CUDA) - Validation oracle produces meaningful quality scores - Hold invoice conditional settlement works mechanically - Real numbers for: compression ratio, validation compute cost, payment latency ### Dependencies - Docker (for regtest LND nodes) - `lightning-agent-tools` repo (Docker Compose stack) - MLX + mlx-lm (already available) --- ## Phase 1: L402-Gated HTTP Exchange (2 weeks) **Goal:** Split coordinator and peer into separate processes communicating over HTTP with L402 payment gating. Still on one machine, but real HTTP and real L402 flows. ### Components 1. **`coordinator.py`** — FastAPI service behind Aperture proxy - `PUT /gradient` — L402-gated gradient submission (peer pays submission fee) - `GET /checkpoint` — L402-gated checkpoint download - `GET /reward-schedule` — public endpoint showing current bounty rates - Validation runs server-side after gradient upload - Hold invoice issued at upload time, settled or expired based on validation score 2. **`peer.py`** — Client using `lnget` for automatic L402 payment - Training loop → compress → `lnget PUT /gradient` → receive payment (or not) - `--max-cost` flag enforces per-request spending caps 3. **Aperture configuration** - Pricing: ~100 sats submission fee for `PUT /gradient`, ~50 sats for `GET /checkpoint` - Macaroon caveats: per-peer spending limits, time-bounded sessions ### Validates - L402 works for gradient exchange (the core protocol interaction) - Payment latency is acceptable within the 70-second training round window - `lnget` + Aperture stack works as described in whitepaper architecture --- ## Phase 2: Two-Machine Proof of Concept (4 weeks) **Goal:** Run the protocol across two separate machines over the real internet with real (small) Lightning payments. ### Components 1. **Coordinator on Hetzner VPS** - Deploy coordinator service + LND (Neutrino light client) + Aperture - Use existing VPS infrastructure for domain/TLS - Channel capacity: minimal for testing (100K-1M sats, ~$100-$1000) - Deploy via existing deployment tooling 2. **Peer on MacBook** - MLX training, LND light client, direct payment channel to coordinator - Real Lightning payments: submit gradients, receive rewards 3. **Testnet → Mainnet** - Start on Bitcoin testnet (free, no real money) - Move to mainnet when stable (budget: ~$100-500) ### Validates - Protocol works over real internet - Real Lightning payment latency over real network hops - Gradient upload/download times at realistic bandwidth - Channel management and rebalancing with real channels **Deliverable: conference demo** --- ## Phase 3: Multi-Peer Simulation + Byzantine Testing (4 weeks) **Goal:** Simulate 3-5 peers submitting varying quality gradients + 1 real peer on MacBook. Test incentive mechanics and Byzantine resistance. ### Simulated Peer Profiles - **Honest peer** — real gradients from actual training - **Free-rider** — random/noise gradients (zero compute) - **Plagiarist** — copies another peer's gradient - **Poisoner** — adversarial gradients designed to degrade model - **Mediocre** — real gradients from undertrained model (low quality but honest) ### Test Questions - Does quality-proportional payment correctly reward good and reject bad? - Do submission fees effectively prevent spam? - Does validation catch free-riders and poisoning? - What is the validation compute overhead relative to training? **Deliverable: technical paper with empirical results** — real Lightning payments + real gradient validation + Byzantine resistance is novel. Nobody has demonstrated this. --- ## Track B: Autoresearch Bounties (parallel with Phases 1-3) The protocol has two modes sharing the same L402 infrastructure. Track B — autoresearch bounties — can start as soon as Phase 1's L402 infrastructure is working. No GPU required. ### Phase B0: Bounty Runner Framework (2 weeks, parallel with Phase 1) - `bounty_coordinator.py` — FastAPI endpoints: `GET /bounties`, `GET /bounty/{id}`, `POST /bounty/{id}/submit` - `bounty_agent.py` — Reference agent client (downloads baseline, runs autoresearch loop, submits improvements) - Same hold invoice escrow as training, but for code diffs instead of gradients ### Phase B1: First Live Bounties (2 weeks, parallel with Phase 2) - Post real bounties with real sats (prompt optimization, regex improvement) - 80/20 public/held-out eval split, canary probes, temporal stability checks - Validate anti-gaming measures catch metric hacking in practice ### Phase B2: Multi-Sponsor Marketplace (4 weeks) - External sponsors post their own bounties - Public bounty board with leaderboards - Coordinator takes 5-10% fee on payouts (self-sustaining business model) **Deliverable: standalone open-source product** — doesn't require GPU clusters, works with any quantifiable metric. --- ## What to Skip for Prototype | Whitepaper Feature | Skip? | Why | |---|---|---| | DLC-bound settlement | Yes | Hold invoices sufficient for PoC | | Federated multi-validator | Yes | Single coordinator fine; deterministic replay is what matters | | 72B scale | Yes | 0.5B-3B on MLX. Proving the mechanism, not training a model | | Heterogeneous SparseLoCo | Yes | Single-tier peers only | | USDT (Taproot Assets) | Yes | Sats-only for prototype | ## Key Risks 1. **SparseLoCo on MLX** — No existing implementation. Top-k + quantization straightforward; error feedback buffer management is the hard part 2. **Aperture custom validation** — L402 gating is supported, but "validate before settling hold invoice" may need to be handled outside Aperture 3. **LND on CPX21** — 4GB RAM may be tight alongside existing services. May need to run LND on MacBook instead 4. **MLX scale gap** — 0.5B proof of concept is fine, but gap to publishable 7B+ results requires renting GPU time ## Lightning Labs Conversation Starters 1. Fastest path to regtest with two nodes + Aperture for ML gradient bounties 2. Hold invoice timeout tuning — training rounds are 70 seconds, can you set 2-minute timeouts? 3. Aperture callback mechanism for custom validation before hold invoice settlement 4. L402-python SDK (Fewsats) maturity for programmatic peer clients 5. Taproot Assets maturity for programmatic USDT compute pricing ## Deliverables Summary | Phase | Deliverable | Publishable? | |---|---|---| | 0 | Single-machine simulation with economics data | No — but provides all the numbers | | 1 | L402-gated gradient exchange | Blog post / tweet thread | | 2 | Two-machine PoC over real internet | Conference demo | | 3 | Multi-peer + Byzantine resistance | Technical paper with empirical results | | 4 | Decentralized autoresearch bounties | Open-source product |