Back to past content

Daily Feed - 2026-02-20

Date:

3 paper picks + 2 video picks (same bundle for Telegram/email).

Author-talk check: I searched YouTube with exact paper titles for today’s paper picks and did not find clear author/conference talks yet, so I included two high-signal topic-adjacent lectures.


SMAC: Score-Matched Actor-Critics for Robust Offline-to-Online Transfer

Domain: RL Theory / Algorithms | Time cost: ~20min abstract+method skim, ~70min full read

Intuition: Offline-to-online transfer often fails because gradient updates must cross low-reward valleys between the offline optimum and better online solutions. SMAC reshapes the offline objective so actor and critic gradients are locally aligned before online fine-tuning begins.

Concrete punch: The key regularization enforces a first-order compatibility between the policy score and the action-gradient of the Q-function:

implemented via a penalty like

The paper reports smooth transfer to Soft Actor-Critic and Twin Delayed Deep Deterministic Policy Gradient in 6/6 D4RL tasks, with 34–58% regret reduction in 4/6 settings.

Significance: This gives a mechanistic criterion (gradient-field compatibility) for whether offline pretraining is likely to survive online adaptation.

Why it matches: Strong mechanism-first RL theory, explicit geometric/optimization structure, and practical transfer consequences beyond benchmark-only framing.


One-step Language Modeling via Continuous Denoising

Domain: ML / Generative Modeling | Time cost: ~20min abstract+figures, ~75min full read

Intuition: The paper challenges the assumption that discrete diffusion is necessary for text. It builds a flow-based language model that denoises continuous one-hot embeddings, then distills the flow into a few-step (even one-step) generator.

Concrete punch: Training is framed as clean-token prediction from noisy states with cross-entropy,

where is a continuous-noised version of one-hot token encodings under a time reparameterization. Distillation then learns a flow map enabling one-step generation that reportedly exceeds prior 8-step quality on LM1B/OWT.

Significance: If this scaling behavior holds, it changes the speed/quality frontier for non-autoregressive text generation and narrows the practical gap to autoregressive systems.

Why it matches: Directly on your VAE↔diffusion↔flow unification thread, with concrete algorithmic novelty and explicit challenge to a prevailing assumption.


Autodeleveraging as Online Learning

Domain: Blockchain / Quant Finance / Market Microstructure | Time cost: ~18min abstract+setup, ~65min full read

Intuition: Auto-deleveraging (ADL) on perpetual venues is usually treated as exchange ops policy; this paper formalizes it as a sequential online learning/control problem over positive-profit haircuts and solvency recovery.

Concrete punch: At round , choose action (solvency budget , selected profitable accounts ) and recover

Performance is measured by regret

In the Hyperliquid stress-event case study, the production queue is estimated near ~50% of an upper regret bound, while their optimized algorithm is ~2.6% of that bound (large reduction in overshoot).

Significance: This turns ADL from ad hoc “risk ops” into auditable mechanism design with measurable worst-case guarantees.

Why it matches: High signal for your market microstructure + control-theory interests, with concrete objective design and policy-level implications.


Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 5: Off-Policy Actor Critic

Domain: RL (Video) | Time cost: 1h 9m

Intuition: A clean bridge from policy-gradient basics to modern off-policy actor-critic machinery (experience replay, bootstrapping, stability tricks), which is exactly the substrate SMAC-type methods modify.

Concrete punch: Canonical Soft Actor-Critic style objective appears in entropy-regularized form:

with Bellman backup

Significance: Useful for debugging the exact place where offline-to-online transfer can break (critic landscape vs policy update geometry).

Why it matches: High-production lecture, first-principles derivations, and immediate transfer value to today’s SMAC pick.


Advancing Diffusion Models for Text Generation

Domain: ML / Generative Modeling (Video) | Time cost: 1h 1m

Intuition: Research talk focused on pushing text diffusion quality while preserving the parallel-generation upside. It complements today’s continuous-denoising paper by emphasizing practical bottlenecks and algorithmic improvements.

Concrete punch: The core denoising factorization uses a time-indexed objective over partially corrupted sequences,

then studies scheduler/parameterization choices that improve the quality-speed Pareto frontier in few-step decoding.

Significance: Gives a concrete map of where diffusion text models still lose to autoregressive language models and which interventions actually move the boundary.

Why it matches: Directly aligned with your deep generative modeling theory focus; mathematically grounded and implementation-relevant.


Source-discovery note

  • ArXiv: scanned recent (6–12 month eligible, prioritizing newest) candidates in offline-to-online RL, diffusion/flow language modeling, and market-microstructure control.
  • YouTube: searched exact paper-title talks first; no clear author/conference videos found yet for these new papers, so selected topic-adjacent high-signal lectures.
  • Hacker News / Lobsters: scanned recent results; signal was mostly tool/showcase noise, so none cleared today’s mechanism-first + concrete-punch bar.

Comments