Daily Feed - 2026-02-12

Date: 2026-02-12

Research Feed — 2026-02-12

Scaled-Dot-Product Attention as One-Sided Entropic Optimal Transport

Domain: ML / Information Geometry / Optimal Transport | Time cost: ~40min read

Intuition: The softmax attention mechanism—normally motivated by hand-wavy “key-query similarity” heuristics—is shown to be the exact solution to a degenerate, one-sided Entropic Optimal Transport (EOT) problem. The forward pass finds a distribution over values that maximizes similarity to queries while being maximally entropic; the backward pass turns out to be an advantage-based policy gradient from RL.

Concrete punch: The attention output Attn(Q,K,V) = softmax(QKᵀ/√d) V is the unique optimizer of a one-sided EOT problem: maximize ⟨C, P⟩ − ε·H(P) subject to marginal constraints, where C is the similarity matrix and H is Shannon entropy. The learning gradient ∂L/∂logits is mathematically identical to the advantage function A(a|s) = Q(s,a) − V(s) from policy-gradient RL, and the Fisher Information Matrix of the EOT solution dictates the geometry of this update — making it a natural gradient on the attention manifold.

Significance: This reframes transformer attention as principled optimal inference rather than an ad hoc design, and explains why it works so well: the forward pass solves an optimization problem with built-in regularization, and the backward pass implements manifold-aware learning. The same author extends this in You Need Better Attention Priors (arXiv:2601.15380, Jan 2026), introducing GOAT — learnable attention priors that replace the implicit uniform prior, providing an EOT-based explanation of attention sinks and enabling length extrapolation.

Why it matches: DPO-level cognitive novelty: collapses the heuristic attention mechanism into principled OT + RL geometry using Legendre-Fenchel duality, information geometry, and variational principles — three of Nakamoto’s core physics lenses. Novel relative to known work on attention (no existing seed covers this OT derivation).

Tilt Matching for Scalable Sampling and Fine-Tuning

Domain: ML / Generative Models / Stochastic Control | Time cost: ~35min read

Intuition: Given a pre-trained flow matching model (which transports noise → data), how do you steer it toward a reward function without backpropagating through trajectories? Tilt Matching derives a dynamical equation relating the original velocity field to one targeting a reward-tilted distribution, implicitly solving a stochastic optimal control problem. The correction is expressible in closed form as a cumulant expansion.

Concrete punch: Let vₜ be the pre-trained flow velocity and r(x) the reward. The tilted velocity is:

where κₙ(xₜ, r) denotes the n-th joint cumulant of the stochastic interpolant with n copies of the reward. To first order, this reduces to Cov(x₁, r | xₜ) — the covariance between the data endpoint and the reward, conditioned on the current interpolant state. The resulting objective has strictly lower variance than standard flow matching.

Significance: Provides a reward-steering mechanism for flow/diffusion models that requires no reward gradients, no backprop through ODE trajectories, and no reward multipliers. State-of-the-art on Lennard-Jones molecular sampling; competitive on Stable Diffusion fine-tuning. The cumulant expansion perspective is a clean physics-style perturbation theory applied to generative modeling.

Why it matches: Bridges stochastic optimal control and generative modeling via cumulant/perturbation expansions — a statistical mechanics lens. Principled derivation from variational principles. The “no gradient of reward” constraint echoes the information-geometric structure of DPO (operating on the model’s own probability space). Novel approach not covered by seed papers.

High-dimensional Mean-Field Games by Particle-based Flow Matching

Domain: ML / Optimal Control / Game Theory | Time cost: ~45min read

Intuition: Mean-field games (MFGs) describe Nash equilibria of systems with many interacting agents — a framework that unifies optimal transport, generative modeling, and multi-agent control. This paper uses flow matching to solve MFGs in high dimensions, bridging the Eulerian (density-based PDE) and Lagrangian (particle-trajectory) formulations.

Concrete punch: The proximal fixed-point scheme iterates:

where J(ρ) is the MFG cost, W₂ is the 2-Wasserstein distance, and τ is a step size. Particles are updated via first-order optimality conditions, then a flow neural network is trained simulation-free to match the particle velocities (flow matching). The paper proves: (1) sublinear convergence to stationary points in general; (2) linear (exponential) convergence under displacement convexity; (3) equivalence between Eulerian and Lagrangian MFG solutions when the density is sufficiently regular.

Significance: Demonstrates that flow matching is not just a generative modeling trick — it’s a computational tool for solving high-dimensional optimal control and game-theoretic problems. The Eulerian↔Lagrangian equivalence result is particularly elegant: flow matching provides the bridge between the two fundamental coordinate systems of continuum mechanics. Works on non-potential MFGs where previous methods fail.

Why it matches: Deep cross-domain unification: optimal transport, game theory, PDE control, and generative models connected through shared mathematical structure (Wasserstein geometry, flow matching, proximal operators). The Lagrangian↔Eulerian duality resonates with physics-informed thinking. Novel application of flow matching beyond image generation.

An Impulse Control Approach to Market Making in a Hawkes LOB Market

Domain: Quant Finance / Market Microstructure / Stochastic Control | Time cost: ~40min read

Intuition: Market makers can’t update quotes at every LOB event — they act discretely. This paper formulates market making in a Hawkes-driven LOB (where order arrivals exhibit self-exciting clustering) as an impulse control problem, leading to a Hamilton-Jacobi-Bellman Quasi-Variational Inequality (HJB-QVI). It then solves this both analytically (deep PDE methods) and via RL (PPO with self-imitation learning), comparing the two.

Concrete punch: The HJB-QVI characterizes the value function V:

where ℒ is the infinitesimal generator of the Hawkes-driven LOB state, f is the running payoff, ξ is the impulse (limit/cancel/market order), and c(ξ) is the intervention cost. The Hawkes kernel captures endogenous feedback: each trade excites future arrivals with intensity λ(t) = μ + ∫ α·e^{−β(t−s)} dN(s), creating realistic clustering. The PPO agent with self-imitation learning achieves Sharpe ratios >30 in the simulated LOB.

Significance: Unlike Brownian mid-price models (Avellaneda-Stoikov lineage), this captures the key microstructural reality that order flow is self-exciting and market impact is endogenous. The impulse control formulation is more realistic than continuous control for actual market making. The dual approach (solve the QVI + train RL, compare) provides principled validation. Directly relevant to anyone building execution or market-making systems.

Why it matches: Combines three of Nakamoto’s interests: market microstructure, stochastic control (HJB-QVI), and RL (PPO). The Hawkes process captures real LOB dynamics rather than toy Brownian assumptions. The QVI framework is the impulse-control analog of the HJB equation — a natural extension of the dynamic programming principle (DPP). Novel treatment not in seed papers.

Video Supplement (added 07:34 ET)

Paper Talk

Michael Albergo — Non-equilibrium transport and tilt matching for sampling

Source: Monte Carlo Seminar | Duration: 38 min | Date: Oct 2025 Paper: Tilt Matching (Paper 2 above). Author talk covering cumulant expansion, non-equilibrium stat mech connection, Lennard-Jones experiments.

Other papers: No author talks found for Papers 1, 3, or 4.

Standalone Video Recommendations

Gabriel Peyré — Diffusion Flows and Optimal Transport in Machine Learning

Source: Centre de Recerca Matemàtica | Duration: 43 min | Date: Jan 2026 Peyré (CNRS & ENS) reviews how OT concepts — Wasserstein distances, Brenier maps, entropic regularization — underpin diffusion models and flow matching. Direct complement to Papers 1-3.

Martin Hairer — Yang-Mills and the Mass Gap

Source: Clay Mathematics Institute | Duration: 1 hr | Date: Nov 2025 Fields medalist on the Millennium Prize Problem: prove Yang-Mills gauge theory has a mass gap. Covers non-perturbative existence, singular SPDEs, renormalization.

Feedback

Content

Positive feedback on all 4 picks (themes, interestingness, explanations). Issues: (1) Telegram/email diverged in content — should be identical up to formatting. (2) Missing YouTube video recommendations — should include 1-2 per day.

Extrapolated content

YouTube videos are a mandatory component, not optional.
Paper author talks should be searched and linked for each recommended paper.
Both delivery channels must push identical content.