What Is a Diffusion Model?

2026-04-26T09:00:00+00:00

One-Sentence Summary

A diffusion model is a generative model that learns how to create data by reversing a gradual noising process.

Why It Matters

Diffusion models are widely used for image, audio, video, and 3D generation because they can produce high-quality samples while giving the model a stable learning objective.

Core Ideas

Forward process: gradually add noise to clean data.
Reverse process: train a model to remove noise step by step.
Conditioning: guide generation with text, images, labels, or other signals.
Sampling: start from noise and repeatedly denoise until a sample appears.

Placeholder Example

For image generation, the model starts with random noise and gradually turns it into a coherent image according to the prompt or condition.

Notes to Expand Later

Add a simple noise-to-image diagram.
Explain the difference between DDPM and latent diffusion.
Add a short section on why denoising is easier than direct generation.

一句话总结

Diffusion model 是一种生成模型，它通过学习“如何反向去噪”来生成数据。

为什么重要

Diffusion model 常用于图像、音频、视频和 3D 生成，因为它可以生成高质量样本，同时训练目标相对稳定。

核心概念

Forward process: 从干净数据开始，逐步加入噪声。
Reverse process: 训练模型一步一步去除噪声。
Conditioning: 用文本、图像、标签或其他信息引导生成过程。
Sampling: 从随机噪声开始，反复去噪，直到得到最终样本。

占位例子

在图像生成中，模型一开始面对的是随机噪声，然后根据 prompt 或其他条件逐步把噪声变成一张有结构的图像。

之后可以扩展

加一个从噪声到图像的简单示意图。
解释 DDPM 和 latent diffusion 的区别。
写一小节说明为什么“去噪”比直接生成更容易建模。

What Is a Transformer?

2026-04-26T08:00:00+00:00

One-Sentence Summary

A Transformer is a neural network architecture that uses attention to decide which parts of the input are most relevant to each other.

Why It Matters

Before Transformers, many sequence models processed text step by step. Transformers made it easier to compare all tokens in a sequence at once, which helped models learn long-range relationships more effectively.

Core Ideas

Tokenization: split text or other data into small units.
Embedding: turn each token into a vector.
Self-attention: let each token look at other tokens and decide what matters.
Feed-forward layers: transform the attended information into richer features.

Placeholder Example

In the sentence “the robot picked up the cup because it was light,” attention helps the model connect “it” with the likely object being discussed.

Notes to Expand Later

Add diagrams for query, key, and value.
Explain positional encoding.
Compare encoder-only, decoder-only, and encoder-decoder Transformers.

一句话总结

Transformer 是一种神经网络结构，它通过 attention 机制判断输入中哪些部分彼此最相关。

为什么重要

在 Transformer 出现之前，很多序列模型会按顺序一步一步处理文本。Transformer 可以让序列中的所有 token 同时互相比较，因此更容易学习长距离依赖关系。

核心概念

Tokenization: 把文本或其他数据拆成小单位。
Embedding: 把每个 token 转换成向量。
Self-attention: 让每个 token 观察其他 token，并判断哪些信息重要。
Feed-forward layers: 对 attention 后的信息做进一步变换。

占位例子

在句子 “the robot picked up the cup because it was light” 中，attention 可以帮助模型判断 “it” 更可能指代哪个对象。

之后可以扩展

加 query、key、value 的示意图。
解释 positional encoding。
比较 encoder-only、decoder-only 和 encoder-decoder Transformer。

Prominent Johnson’s Blog

What Is a Diffusion Model?

One-Sentence Summary

Why It Matters

Core Ideas

Placeholder Example

Notes to Expand Later

一句话总结

为什么重要

核心概念

占位例子

之后可以扩展

What Is a Transformer?

One-Sentence Summary

Why It Matters

Core Ideas

Placeholder Example

Notes to Expand Later

一句话总结

为什么重要

核心概念

占位例子

之后可以扩展