we call
we use marginal prob to add noise at one time
discrete form
since
the only unknown variable in reverse process is
so the model’s output should be
marginal prob for add noise
reverse process for sampling
- in sampling, remember to clip the output.
- in loss function, the model’s output should be
$-\frac{\epsilon}{\sigma_t}$ . and we use
# use sum in c, h, w dim, equal to scale learning rate
train_loss = torch.mean(torch.sum((pred + noise) ** 2, dim=[1, 2, 3]))
instead of mse loss, actually they are the same expect the scaling learning rate effect.