Let us assume that there are two types of people: smart people an dumb people. Smart people’s test scores are normally distributed about 80% and dumb people’s tests scores are normally distributed about 40% on their test. If we observe the test score of one person, how do we know if they are smart or dumb? If we see a score of 85%, we are pretty sure they are smart. A dumb person might have had a good day, but this would be a low probability event. Similarly, if we saw a score of 35%, we would be fairly certain that the person is dumb, even though there is a small probability that a smart person may have had a bad day. If we see a score of 62%, however, then it is very difficult to distinguish if the person is smart of dumb. But how can we quantify the probabilities that a person is of a certain type.

One way of doing this is finite mixture models. Jim Hamilton’s *Time Series Analysis *book has a good explanation of this topic and I will review this material here.

Each type (e.g.: how smart the person is) will be designated as *s _{t}=1,2,…, or N*. Let us assume that there is an observed variable

*y*(e.g.: the test score) which is distributed according to a N(μ

_{t}_{s},σ

_{j}

^{2}). What researchers wants to know is that given that we observe

*y*, what is the probability that the observation is from a person of type

_{t}*s*.

_{t}=jLet us assume that we know the density of *y _{t}* is:

- f(y
_{t}|s_{t}=j;**θ**)=(2πσ_{j}^{2})^{-1/2}* exp{-(y_{t}– μ_{j})/2σ_{j}^{2}}

There is also some underlying distribution of types.

- P(s
_{t}=j;**θ**)=λ_{j} **θ**=(μ_{1},…,μ_{N},σ_{1},…,σ_{N},λ_{1},…,λ_{N})

From Bayes Rule, we know that:

- P(A and B)=P(A|B)*P(B), which implies
- f(y
_{t},s_{t}=j;**θ**)=λ_{j}*(2πσ_{j}^{2})^{-1/2}* exp{-(y_{t}– μ_{j})/2σ_{j}^{2}}

The unconditional density can be found as follows:

- f(y
_{t};**θ**)=Σ_{1 to N}p(y_{t},s_{t}=j;**θ**) - f(y
_{t};**θ**)=λ_{1}*(2πσ_{1}^{2})^{-1/2}* exp{-(y_{t}– μ_{1})/2σ_{1}^{2}} +…+λ_{N}*(2πσ_{N}^{2})^{-1/2}* exp{-(y_{t}– μ_{N})/2σ_{N}^{2}}

Now we can use maximum likelihood estimation techniques to find the **θ** which will maximize:

- max
_{θ}*L*(**θ**)=Σ_{1 to T}log f(y_{t};**θ**) - s.t.: λ
_{1}+ λ_{2}+…+ λ_{N}=1 - s.t: λ
_{j}≥0

Once we have the MLE estimated **θ**, we can figure out what the probability is that observation y_{t }came from a person of type *s _{t}=j*. Using Bayes theory, again, we know that:

- P(s
_{t}=j|y_{t};**θ**)=f(y_{t},s_{t}=j;**θ**)/f(y_{t};**θ**)=λ_{j}*f(y_{t|}s_{t}=j;**θ**)/f(y_{t};**θ**)

This value represents the probabilty, given the observed data, that the unobserved type responsible for observation *t *was in of type *j*. For example, “…if an observation *y _{t}=0*,, one could be vertually certain that the observation had come from a N(0,1) distribution rather than a N(4,1) distribution, so that

*P(s*for that date would be near unity. If instead

_{t}=1|y_{t};**θ**)*y*were around 2.3, it is equally likely that the observation might have come from either regime so that

_{t}*P(s*for such an observation would be close to 0.5.”

_{t}=1|y_{t};**θ**)*Most of the above content came is from:*

- James D. Hamilton (1994) Time Series Analysis, Princeton University Press, Princeton, NJ; pp. 685-689.