Generative Modelling

Generative model is defined as a stochastic function (contrary to a discriminative model (e.g. classifier) ), where is the latent variable, and is the label/description. Output of the generator can be varied using the randomised latent variable.

Latent variable z is a multi-dimensional vector that describes the features of the output not mentioned in label to the model. In an image generation model, it could be the setting of the environment, colour depth, pose, background.

Process:

  • Learner: Given output data for a discriminative: , learner gets fed and outputs a generator function .
  • Generator: Sampling the generator with a randomised vector produces output .

Unconditional vs Conditional generative models: Model may be conditioned on inputs or some other variables c of the form .

Objective: How to measure the quality of the output by a generative model?

  • Output synthetic data that matches original data on certain marginal statistics. For example, it has the same mean colour as real photos or same colour variance.

    How to find the statistics for other modalities like text/molecules? Number of words/sentences?

  • Output synthetic data that has high probability under a density model fit to the real data, i.e. where is the true process that produces original data.

Approaches: How to form data generators?

  • Direct approach: learn the generator function directly. GANs or Diffusion models work based on direct approach.
  • Indirect approach: learn a score function and generate samples that scores highly under this function. Density models and Energy based models work under indirect approach.

ebm

autoregressive

gan

vae

vdm

score-matching

flow-matching

discrete-diffusion