Daily Feed - 2026-03-09
Date:
1. Gradient Flow Polarizes Softmax Outputs towards Low-Entropy Solutions
Varre, Rofin, Flammarion · Mar 6, 2026
Analyzes gradient flow dynamics of the value-softmax parameterization
Why it matters: Direct theoretical insight into why transformers develop the features they do during training. Connects optimization dynamics to empirical phenomena (attention sinks) that are otherwise hand-waved.
2. Beyond Softmax and Entropy: f-SoftArgmax Policy Gradients with Coupled Regularization
Labbi, Tiapkin, Mangold, Moulines · Jan 18, 2026
Replaces softmax policy parameterization with a generalized f-softargmax family, coupled with an f-divergence regularizer. The coupling creates a Polyak-Łojasiewicz landscape, yielding the first explicit non-asymptotic last-iterate convergence for stochastic policy gradient without any preconditioning. Key result: with Tsallis divergences, f-PG achieves polynomial sample complexity — in contrast to the exponential blow-up that softmax + entropy regularization suffers from.
Why it matters: The softmax parameterization is everywhere in RL, and its exponential convergence pathology is well-known. This paper shows the fix isn’t just “use natural gradient” — it’s to change the parameterization itself. The f-divergence lens connects policy optimization to information geometry in a very clean way.
3. TradeFM: A Generative Foundation Model for Trade-flow and Market Microstructure
Kawawa-Beaudan, Sood, Papasotiriou, Borrajo, Veloso · Feb 27, 2026
524M-parameter generative Transformer trained on billions of trade events across 9K+ equities. The core innovation: scale-invariant features and a universal tokenization scheme that maps heterogeneous order flow into discrete sequences — eliminating per-asset calibration. Generated rollouts reproduce heavy tails, volatility clustering, and absence of return autocorrelation. Achieves
Why it matters: The “foundation model for microstructure” idea has been floating around, and this is the most serious attempt so far. The scale-invariant tokenization is the key insight — it’s essentially asking “what’s the right embedding for order flow?” and answering with something that transfers across markets. Opens paths to synthetic data generation and learning-based trading agents.
4. Riemannian Geometry of Optimal Rebalancing in Dynamic Weight AMMs
Willetts · Mar 5, 2026
Shows that in dynamic-weight AMMs (TFMMs), the per-step arbitrage loss from rebalancing is exactly the KL divergence between weight vectors — so the Fisher-Rao metric is the natural Riemannian metric on the weight simplex. The loss-minimizing trajectory is SLERP (spherical linear interpolation) in Hellinger coordinates
Why it matters: A beautiful collision of information geometry and DeFi mechanism design. The fact that the “right” rebalancing path is a Fisher-Rao geodesic means all the machinery of information geometry (exponential families, natural parameters, divergence duality) becomes available for AMM design. Concise paper, clean math.