The collection of random variables is said to be a random sample of size
if they are independent and identically distributed (i.i.d.), i.e.,
- are independent random variables, and
- they have the same distribution, i.e.,
Let be a point estimator for . The bias of point estimator is defined by
Let be a point estimator for a parameter . We say that is an unbiased of if
Let be a random sample with mean , and variance . The sample variance of this random sample is defined as
The sample variance is an unbiased estimator of .
The sample standard deviation is defined as
and is commonly used as an estimator for . Nevertheless, is a biased estimator of .
Let be a random sample from a distribution with a parameter . Suppose that we have observed .
- If 's are discrete, then the likelihood function is defined as
- If 's are jointly continuous, then the likelihood function is defined as
In some problems, it is easier to work with the log likelihood function given by
Let be a random sample from a distribution with a parameter . Let denote the maximum likelihood estimator (MLE) of . Then, under some mild regularity conditions,
- is asymptotically consistent, i.e.,
- is asymptotically unbiased, i.e.,
- As becomes large, is approximately a normal random variable. More precisely, the random variable
converges in distribution to .
Let be a random sample from a distribution with a parameter that is to be estimated. An interval estimator with confidence level consists of two estimators and such that
for every possible value of . Equivalently, we say that is a confidence interval for .
Let be a random sample from a distribution with a parameter that is to be estimated. The random variable is said to be a pivot or a pivotal quantity, if it has the following properties:
- It is a function of the observed data and the unknown parameter , but it does not depend on any other unknown parameters:
- The probability distribution of does not depend on or any other unknown parameters.
Definition If are independent standard normal random variables, the random variable defined as
is said to have a chi-squared distribution with degrees of freedom shown by
Properties:
- The chi-squared distribution is a special case of the gamma distribution. More specifically,
Thus,
and .
For any and , we define as the real value for which
where .
Theorem Let be i.i.d. random variables. Also, let be the sample variance for this random sample. Then, the random variable defined as
has a chi-squared distribution with degrees of freedom, i.e., . Moreover, and are independent random variables.
Let , and , where . Also assume that and are independent. The random variable defined as
is said to have a -distribution with degrees of freedom shown by
Properties:
The -distribution has a bell-shaped PDF centered at , but its PDF is more spread out than the normal PDF.
, for . But is undefined for .
, for . But, is undefined for .
As becomes large, the density approaches the standard normal PDF. More formally, we can write
- For any and , we define as the real value for which
Since the -distribution has a symmetric PDF, we have
Theorem Let be i.i.d. random variables. Also, let be the sample variance for this random sample. Then, the random variable defined as
has a -distribution with degrees of freedom, i.e., .
confidence interval
Assumptions: A random sample is given from a distribution, where is known.
Parameter to be Estimated: .
Confidence Interval: is a confidence interval for .
confidence interval
Assumptions: A random sample is given from a distribution, where and are unknown.
Parameter to be Estimated: .
Confidence Interval: is a confidence interval for .
Assumptions: A random sample is given from a distribution, where and are unknown.
Parameter to be Estimated: .
Confidence Interval: is a confidence interval for .
Let be a random sample of interest. A statistic is a real-valued function of the data. For example, the sample mean, defined as
is a statistic.
A test statistic is a statistic based on which we build our statistical test.
P-value is the lowest significance level that results in rejecting the null hypothesis.
Let be a random sample from a distribution with a parameter . Suppose that we have observed . To decide between two simple hypotheses
we define
To perform a likelihood ratio test (LRT), we choose a constant . We reject if and accept it if . The value of can be chosen based on the desired
Given the observations , we can write the regression line as
We can estimate and as
where
For each , the fitted value is obtained by
The quantities
are called the residuals.
For the observed data pairs , we define the coefficient of determination, , as
where
We have . Larger values of generally suggest that our linear model
is a good fit for the data.
- H. Pishro-Nik, Introduction to probability, statistics, and random processes. Kappa Research LLC, 2014.