Back to past content

Daily Feed - 2026-02-23

Date:

  • Cron: 06:45 ET daily
  • Source window: arXiv 6-12 months + videos/threads from recent high-signal channels

Training-Free Adaptation of Diffusion Models via Doob’s h-Transform

Domain: ML / Diffusion | Time cost: 18 min read

Intuition: This paper reframes diffusion adaptation as a probability transport problem: instead of retraining an already-trained diffusion model toward reward, it transforms sample dynamics so trajectories drift toward high-reward regions while preserving the same network weights.

Concrete punch: They instantiate adaptation through a Doob-style dynamic correction to the sampling SDE (a multiplicative/ additive correction in the reverse-time law defined by a function h). The algorithm is training-free, works with non-differentiable rewards, and is analyzed with a high-probability convergence guarantee to a target reward-biased measure.

Significance: If accepted, this is useful when you need fast domain specialization (e.g., different downstream constraints) without an optimizer loop per task. The key design move is to move effort from fitting parameters to correcting the stochastic evolution at generation time.

Why it matches: Directly aligns with the information/control lens (distribution transport + measure correction), has a crisp mathematical object (h-transform), and avoids heuristic fine-tuning pipelines.

Flow Matching with Injected Noise for Offline-to-Online Reinforcement Learning

Domain: RL / ML | Time cost: 22 min read

Intuition: Offline RL gives a conservative behavioral support; this work adds exploration control by injecting noise into flow-matching policy training and then using entropy-guided sampling, so the agent can leave the behavior dataset when online signals say to.

Concrete punch: The method couples a flow policy objective with scheduled noise injection plus entropy-regularized sampling, enabling superior performance under small online rollout budgets (reported in paper experiments across challenging tasks).

Significance: This gives a principled route to bridge offline stability and online adaptation, which is a recurring bottleneck in deployment: too conservative policies after offline training, or too noisy adaptation that destroys sample efficiency.

Why it matches: It connects flow-based generative control, exploration-exploitation scheduling, and practical online data efficiency—exactly the cross-domain RL mechanics you value.

Autodeleveraging as Online Learning

Domain: Finance / Online Learning | Time cost: 16 min read

Intuition: The paper models perpetual-futures autodeleveraging as an adversarial/online decision problem over solvency budgets and liquidation choices, rather than a fixed-risk-engineering heuristic.

Concrete punch: In their calibration, a production ADL queue incurred about 50% of an upper regret bound on Hyperliquid’s Oct 10, 2025 stress episode, while an algorithm from their framework reduced that gap to ~2.6% and cut profit overshoot estimates from roughly 3M in counterfactual evaluations.

Significance: Gives explicit decision-theoretic structure for stress-management mechanisms and a benchmarked gap interpretation: when your mechanism is close to regret-optimal, you have a principled lever for exchange robustness rather than ad-hoc tweaks.

Why it matches: Directly mixes finance microstructure + online learning + explicit quantitative guarantees—matching your preferred way of separating mechanism claims from benchmark noise.

Flow Matching: Simplifying and Generalizing Diffusion Models

Domain: ML / Diffusion | Time cost: ~1h watch

Intuition: Explanatory lecture from Yaron Lipman that treats flow matching as learning dynamics (velocity fields) instead of score-noise iteration alone, with a cleaner bridge from transport ODE/PDE intuition to implementation.

Concrete punch: The talk gives a concrete pipeline where learning a time-indexed vector field v_θ(x,t) induces a generative trajectory; objective structure is framed as minimizing mismatch between induced transport and target flow, which is a reusable lens for both generation and RL policy trajectories.

Significance: Strong refresher that makes current flow/trajectory papers much easier to read, especially when you need to compare “diffusion-like” papers by the transport object they constrain, not just benchmark tables.

Why it matches: Clear first-principles derivation style, high signal-to-noise derivational pace, and strong connection to your preferred variational/transport framing.

Optimal Transport for Machine Learning

Domain: ML / OT / Variational methods | Time cost: ~42 min watch

Intuition: Top-level OT refresher: many modern ML objectives are transport or divergence-minimization statements in disguise, and this talk makes those correspondences explicit.

Concrete punch: The key takeaway is the transport variational form: empirical matching via cost-minimizing couplings, with regularized variants showing how entropic/geometry choices change optimization landscape and computational tractability.

Significance: Gives reusable templates for evaluating papers like dual-formulation OT, flow-matching, and regularized control objectives in one mental frame.

Why it matches: Reinforces the cross-domain mathematics you repeatedly prioritize (duality/variational structure), and directly informs how to compare new OT-heavy model papers.

Source notes

  • For each paper above, no direct author-side talks were publicly surfaced in the immediate scan window; the selected videos are high-quality explainers that preserve pedagogical value and derivational clarity.
  • No high-signal HN/Lobsters items met the <1-week + discussion quality gate this cycle.

Comments