Model the probability of subsequent elements using previous observations or graphically, can be interpreted as fully connected DAG. By chain rule of probability:

Above equation can be relaxed in various ways to treat the intractability of conditional dependence:

  • Markov assumption:
  • N-gram model:
  • Hidden state : compress past into hidden state
    • when is a deterministic function of past states, resulting model is RNN
    • when is stochastic function, resulting model is hidden markov model.