Statistical Inference I

The collection of random variables $X_1, X_2, X_3, \dots, X_n$ is said to be a random sample of size $n$ if they are independent and identically distributed (i.i.d.), i.e.,

$X_1, X_2, X_3, \dots, X_n$ are independent random variables, and
they have the same distribution, i.e.,

 $F_{X_1}(x) = F_{X_2}(x) = \cdots = F_{X_n}(x), \quad \text{for all } x \in \mathbb{R}.$

Let $\hat{\Theta} = h(X_1, X_2, \cdots, X_n)$ be a point estimator for $\theta$ . The bias of point estimator $\hat{\Theta}$ is defined by

 $B(\hat{\Theta}) = E[\hat{\Theta}] - \theta.$

Let $\hat{\Theta} = h(X_1, X_2, \cdots, X_n)$ be a point estimator for a parameter $\theta$ . We say that $\hat{\Theta}$ is an unbiased of $\theta$ if

 $B(\hat{\Theta}) = 0, \quad \text{for all possible values of } \theta.$

Let $X_1, X_2, X_3, \dots, X_n$ be a random sample with mean $E[X_i] = \mu < \infty$ , and variance $0 < \text{Var}(X_i) = \sigma^2 < \infty$ . The sample variance of this random sample is defined as

 $S^2 = \frac{1}{n-1}\sum_{k=1}^n (X_k - \overline{X})^2 = \frac{1}{n-1}\left(\sum_{k=1}^n X_k^2 - n\overline{X}^2\right),$

The sample variance is an unbiased estimator of $\sigma^2$ .

The sample standard deviation is defined as

 $S = \sqrt{S^2},$

and is commonly used as an estimator for $\sigma$ . Nevertheless, $S$ is a biased estimator of $\sigma$ .

Let $X_1, X_2, X_3, \dots, X_n$ be a random sample from a distribution with a parameter $\theta$ . Suppose that we have observed $X_1=x_1, X_2=x_2, \dots, X_n=x_n$ .

If $X_i$ 's are discrete, then the likelihood function is defined as

 $L(x_1,x_2,\cdots,x_n;\theta) = P_{X_1X_2\cdots X_n}(x_1,x_2,\cdots,x_n;\theta).$

If $X_i$ 's are jointly continuous, then the likelihood function is defined as

 $L(x_1,x_2,\cdots,x_n;\theta) = f_{X_1X_2\cdots X_n}(x_1,x_2,\cdots,x_n;\theta).$

In some problems, it is easier to work with the log likelihood function given by

 $\ln L(x_1,x_2,\cdots,x_n;\theta).$

Let $X_1, X_2, X_3, \dots, X_n$ be a random sample from a distribution with a parameter $\theta$ . Let $\hat{\Theta}_{\text{ML}}$ denote the maximum likelihood estimator (MLE) of $\theta$ . Then, under some mild regularity conditions,

$\hat{\Theta}_{\text{ML}}$ is asymptotically consistent, i.e.,

 $\lim_{n\to\infty} P\left(\left|\hat{\Theta}_{\text{ML}} - \theta\right| > \epsilon\right) = 0.$

$\hat{\Theta}_{\text{ML}}$ is asymptotically unbiased, i.e.,

 $\lim_{n\to\infty} E\left[\hat{\Theta}_{\text{ML}}\right] = \theta.$

As $n$ becomes large, $\hat{\Theta}_{\text{ML}}$ is approximately a normal random variable. More precisely, the random variable

 $\frac{\hat{\Theta}_{\text{ML}} - \theta}{\sqrt{\text{Var}\left(\hat{\Theta}_{\text{ML}}\right)}}$

converges in distribution to $N(0,1)$ .

Let $X_1, X_2, X_3, \dots, X_n$ be a random sample from a distribution with a parameter $\theta$ that is to be estimated. An interval estimator with confidence level $1-\alpha$ consists of two estimators $\hat{\Theta}_l(X_1,X_2,\cdots,X_n)$ and $\hat{\Theta}_h(X_1,X_2,\cdots,X_n)$ such that

 $P\left(\hat{\Theta}_l \leq \theta \text{ and } \hat{\Theta}_h \geq \theta\right) \geq 1-\alpha,$

for every possible value of $\theta$ . Equivalently, we say that $\left[\hat{\Theta}_l, \hat{\Theta}_h\right]$ is a $(1-\alpha)100%$ confidence interval for $\theta$ .

Let $X_1, X_2, X_3, \dots, X_n$ be a random sample from a distribution with a parameter $\theta$ that is to be estimated. The random variable $Q$ is said to be a pivot or a pivotal quantity, if it has the following properties:

It is a function of the observed data $X_1, X_2, X_3, \dots, X_n$ and the unknown parameter $\theta$ , but it does not depend on any other unknown parameters:

 $Q = Q(X_1, X_2, \cdots, X_n, \theta).$

The probability distribution of $Q$ does not depend on $\theta$ or any other unknown parameters.

Definition If $Z_1, Z_2, \cdots, Z_n$ are independent standard normal random variables, the random variable $Y$ defined as

 $Y = Z_1^2 + Z_2^2 + \cdots + Z_n^2$

is said to have a chi-squared distribution with $n$ degrees of freedom shown by

 $Y \sim \: \vcenter{\chi}^{2} (n).$

Properties:

The chi-squared distribution is a special case of the gamma distribution. More specifically,

 $Y \sim \text{Gamma}\left(\frac{n}{2}, \frac{1}{2}\right).$

Thus,

 $f_Y(y) = \frac{1}{ \Large{2^{\frac{n}{2}}} \: \Gamma\raisebox{0.05em}{(} \vcenter{\normalsize\frac{n}{2}} \raisebox{0.05em}{)} } y^{\frac{n}{2}-1} e^{-\frac{y}{2}}, \quad \text{for } y > 0.$

$E[Y] = n$ and $\text{Var}(Y) = 2n$ .
For any $p \in [0,1]$ and $n \in \mathbb{N}$ , we define $\chi^2_{p,n}$ as the real value for which

 $p\left(Y > \chi^2_{p,n}\right) = p$

where $Y \sim \chi^2(n)$ .

Theorem Let $X_1, X_2, \cdots, X_n$ be i.i.d. $N(\mu, \sigma^2)$ random variables. Also, let $S^2$ be the sample variance for this random sample. Then, the random variable $Y$ defined as

 $Y = \frac{(n-1)S^2}{\sigma^2} = \frac{1}{\sigma^2} \sum_{i=1}^n (X_i - \overline{X})^2$

has a chi-squared distribution with $n-1$ degrees of freedom, i.e., $Y \sim \vcenter{\chi}^2(n-1)$ . Moreover, $\overline{X}$ and $S^2$ are independent random variables.

Let $Z \sim N(0,1)$ , and $Y \sim \chi^2(n)$ , where $n \in \mathbb{N}$ . Also assume that $Z$ and $Y$ are independent. The random variable $T$ defined as

 $T = \frac{Z}{\sqrt{Y/n}}$

is said to have a $t$ -distribution with $n$ degrees of freedom shown by

 $T \sim T(n).$

Properties:

The $t$ -distribution has a bell-shaped PDF centered at $0$ , but its PDF is more spread out than the normal PDF.
$E[T] = 0$ , for $n > 0$ . But $E[T]$ is undefined for $n = 1$ .
$\text{Var}(T) = \frac{n}{n-2}$ , for $n > 2$ . But, $\text{Var}(T)$ is undefined for $n = 1,2$ .
As $n$ becomes large, the $t$ density approaches the standard normal PDF. More formally, we can write

 $T(n) \xrightarrow{d} N(0,1).$

For any $p \in [0,1]$ and $n \in \mathbb{N}$ , we define $t_{p,n}$ as the real value for which

 $P(T > t_{p,n}) = p.$

Since the $t$ -distribution has a symmetric PDF, we have

 $t_{1-p,n} = -t_{p,n}.$

Theorem Let $X_1, X_2, \cdots, X_n$ be i.i.d. $N(\mu, \sigma^2)$ random variables. Also, let $S^2$ be the sample variance for this random sample. Then, the random variable $T$ defined as

 $T = \frac{\overline{X} - \mu}{S/\sqrt{n}}$

has a $t$ -distribution with $n-1$ degrees of freedom, i.e., $T \sim T(n-1)$ .

$(1-\alpha)100%$ confidence interval

Assumptions: A random sample $X_1, X_2, X_3, \ldots, X_n$ is given from a $N(\mu,\sigma^2)$ distribution, where $\text{Var}(X_i) = \sigma^2$ is known.

Parameter to be Estimated: $\mu = E[X_i]$ .

Confidence Interval: $\left[\overline{X} - z_{\alpha/2}\frac{\sigma}{\sqrt{n}},\ \overline{X} + z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\right]$ is a $(1-\alpha)100%$ confidence interval for $\mu$ .

$(1-\alpha)$ confidence interval

Assumptions: A random sample $X_1, X_2, \ldots, X_n$ is given from a $N(\mu,\sigma^2)$ distribution, where $\mu = E[X_i]$ and $\text{Var}(X_i) = \sigma^2$ are unknown.

Parameter to be Estimated: $\mu = E[X_i]$ .

Confidence Interval: $\left[\overline{X} - t_{\alpha/2,n\text{-}1}\frac{S}{\sqrt{n}},\ \overline{X} + t_{\alpha/2,n\text{-}1}\frac{S}{\sqrt{n}}\right]$ is a $(1-\alpha)$ confidence interval for $\mu$ .

Assumptions: A random sample $X_1, X_2, \ldots, X_n$ is given from a $N(\mu,\sigma^2)$ distribution, where $\mu = E[X_i]$ and $\text{Var}(X_i) = \sigma^2$ are unknown.

Parameter to be Estimated: $\text{Var}(X_i) = \sigma^2$ .

Confidence Interval: $\left[\frac{(n-1)S^2}{\chi^2_{\alpha/2,n\text{-}1}},\ \frac{(n-1)S^2}{\chi^2_{1-\alpha/2,n\text{-}1}}\right]$ is a $(1-\alpha)100%$ confidence interval for $\sigma^2$ .

Let $X_1, X_2, \cdots, X_n$ be a random sample of interest. A statistic is a real-valued function of the data. For example, the sample mean, defined as

 $W(X_1,X_2,\cdots,X_n) = \frac{X_1 + X_2 + \cdots + X_n}{n},$

is a statistic.

A test statistic is a statistic based on which we build our statistical test.

P-value is the lowest significance level $\alpha$ that results in rejecting the null hypothesis.

Let $X_1, X_2, X_3, \dots, X_n$ be a random sample from a distribution with a parameter $\theta$ . Suppose that we have observed $X_1=x_1, X_2=x_2, \dots, X_n=x_n$ . To decide between two simple hypotheses

 $\begin{align*} H_0: \theta &= \theta_0, \\ H_1: \theta &= \theta_1, \end{align*}$

we define

 $\lambda(x_1,x_2,\cdots,x_n) = \frac{L(x_1,x_2,\cdots,x_n;\theta_0)}{L(x_1,x_2,\cdots,x_n;\theta_1)}.$

To perform a likelihood ratio test (LRT), we choose a constant $c$ . We reject $H_0$ if $\lambda < c$ and accept it if $\lambda \geq c$ . The value of $c$ can be chosen based on the desired $\alpha$

Given the observations $(x_1,y_1), (x_2,y_2), \dots, (x_n,y_n)$ , we can write the regression line as

 $\hat{y} = \beta_0 + \beta_1 x.$

We can estimate $\beta_0$ and $\beta_1$ as

 $\begin{align*} \hat{\beta}_1 &= \frac{s_{xy}}{s_{xx}}, \\ \hat{\beta}_0 &= \bar{y} - \hat{\beta}_1\bar{x}, \end{align*}$

where

 $\begin{align*} s_{xx} &= \sum_{i=1}^n (x_i - \bar{x})^2, \\ s_{xy} &= \sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y}). \end{align*}$

For each $x_i$ , the fitted value $\hat{y}_i$ is obtained by

 $\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i.$

The quantities

 $e_i = y_i - \hat{y}_i$

are called the residuals.

For the observed data pairs $(x_1,y_1), (x_2,y_2), \dots, (x_n,y_n)$ , we define the coefficient of determination, $r^2$ , as

 $r^2 = \frac{s_{xy}^2}{s_{xx}s_{yy}},$

where

 $s_{xx} = \sum_{i=1}^n (x_i - \bar{x})^2, \quad s_{yy} = \sum_{i=1}^n (y_i - \bar{y})^2, \quad s_{xy} = \sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y}).$

We have $0 \leq r^2 \leq 1$ . Larger values of $r^2$ generally suggest that our linear model

 $\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i$

is a good fit for the data.

H. Pishro-Nik, Introduction to probability, statistics, and random processes. Kappa Research LLC, 2014.

Statistical Inference I

Random Sampling

Evaluating Estimators

Point Estimators for Mean and Variance

Maximum Likelihood Estimation

Asymptotic Properties of MLEs

Interval Estimation

Pivotal Quantity

The Chi-Squared Distribution

The t-Distribution

Confidence Intervals for the Mean of Normal Random Variables

Confidence Intervals for the Variance of Normal Random Variables

Statistic

P-values

Likelihood Ratio Test

Simple Linear Regression

Coefficient of Determination

Reference(s)