Following are the assumptions underlying when we use method of Least Squares
Assumption 1: Linear regression model
The regression model is linear in the parameters i.e.,
\(Y_i \ =\ \beta_1 +\ \beta_2X_i\ +\ u_i\)
Assumption 2: \(X\) values are fixed in repeated sampling
Values taken by the regressor \(X\) are considered fixed in repeated samples. More technically, \(X\) is assumed to be non-stochastic.
Assumption 3: Zero value of disturbance \(u_i\)
Given the value of \(X\), the mean, or expected, value of the random disturbance term \(u_i\) is zero. Technically, the conditional mean value \(u_i\) is Zero. Symbolically, we have
\(E(u_i|X_i) = 0\)
Assumption 4: Homoscedasticity or equal variance of \(u_i\)
Given the value of \(X\), the variance of \(u_i\) is the same for all observations i.e., the conditional variances of \(u_i\) are identified. Symbolically, we have
\(Var (u_i|X_i) = E[u_i\ -\ E(u_i|X_i)]^2\)
\(=E(u_i^2\ |X_i)\) ( Because of assumption 3)
\(= \sigma^2\)
Assumption 5: No autocorrelation between the disturbances.
Given any two \(X\) values, \(X_i\) and \(X_j(i\neq j)\), the correlation between any two \(u_i\) and \(u_j\ (i\neq j)\) is zero. Symbolically,
\(cov(u_i\ , u_j\ |\ X_i,\ X_j)\) = \(E{[u_i-E(u_i)] |X_i }{[u_j-E(u_j – E(u_j)]|X_j}\)
\(= E(u_i|X_i)(u_j|X_j)\)
\(=0\)
Where \(i\) and \(j\) are two different observations and where \(cov\) means covariance.
Assumption 6 : Zero covariance between \(u_i\) and \(X_i\), or \(E(u_iX_i)=0\). Formally,
\(cov(u_i,X_i)=\) \(E[u_i-E(u_i)][X_i-E(X_i)]\)
\(=E[u_i(X_i-E(X_i)]\) since \(E(u_i)=0\)
\(=E(u_iX_i)\ -\ E(X_i)E(u_i)\) since \(E(X_i)\) is non-stochastic
\(=E(u_iX_i)\) since \(E(u_i)=0\)
\(=0\) By Assumption
Assumption 7: The number of observations n must be grater than the number of parameters to be estimated.
Alternatively, the number of observations n must be greater than the number of explanatory variables.
Assumption 8: Variability in \(X\) values.
The \(X\) values in a given sample must not all be the same. Technically, \(var(X)\) must be a finite positive number.
Assumption 9: The regression model is correctly specified
Alternatively, there is no specification bias or error in the model used in empirical analysis.