Model the probability of subsequent elements using previous observations or graphically, can be interpreted as fully connected DAG. By chain rule of probability:
Above equation can be relaxed in various ways to treat the intractability of conditional dependence:
- Markov assumption:
- N-gram model:
- Hidden state : compress past into hidden state
- when is a deterministic function of past states, resulting model is RNN
- when is stochastic function, resulting model is hidden markov model.