The Normal Distribution

The normal distribution is the most commonly used continuous probability distribution. It is used to described random variables where the outcomes are clustered around a value $\mu$, and outcomes that are far away from $\mu$ are exponentially less probable. Due to the shape of the normal distribution pdf, it is often called a "bell curve". Normal random variables are also called "Gaussian random variables."

Definition: A continuous random variable is normally distributed if it has the pdf $$f(x) = \frac{1}{\sqrt{2\pi \sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$ for all $x\in\mathbb{R}$ Normal Distribution PDF The factor of $\frac{1}{\sqrt{2\pi \sigma^2}}$ is the normalization constant, which guarantees that $$\int_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi \sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}dx = 1.$$ The expected value of the normal distribution can be calculated as normal: \begin{align} \mathbb{E}(X) = \int_{-\infty}^{\infty} \frac{x}{\sqrt{2\pi \sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}dx \end{align} We can make the substitution $z = \frac{(x-\mu)}{\sigma}$ and $dz = \frac{1}{\sigma}dx$ to simplify the calculation, giving \begin{align} \mathbb{E}(X) &= \int_{-\infty}^{\infty}\frac{\sigma z + \mu}{\sqrt{2\pi}}e^{-\frac{z^2}{2}}dz\\ &= \int_{-\infty}^{\infty} \frac{\sigma z}{\sqrt{2\pi}}e^{-\frac{z^2}{2}}dz + \mu\int_{-\infty}^\infty \frac{1}{\sqrt{2\pi}}e^{-\frac{z^2}{2}}dz . \end{align} The first term is the integral of an odd function (meaning f(-x) = -f(x)) over a symmetric domain, and is therefore equal to zero. The second term is the integral of the normal distribution pdf when $\mu = 0$ and $\sigma=1$, which is equal to 1, so we find $$\mathbb{E}(X) = \mu.$$ A much more involved calculation shows that the variance is $\mathbb{V}(X) = \sigma^2$.

In finding the expected value, we made the substitution $z = \frac{(x-\mu)}{\sigma}$, which essentially converted the pdf to a simpler PDF with $\mu = 0$ and $\sigma = 1$. A normal distribution with these specific values for the mean and variance is called a standard normal distribution. Often, when we are doing probability calculations we will want to make this exact conversion. If $X$ is a normally distributed random variable with mean $\mu$ and standard deviation $\sigma$, then $Z = (X-\mu)/\sigma$ is standard normally distributed. The pdf for a standard normal distribution is $$f(x) = \frac{1}{\sqrt{2\pi}}e^{-x^2/2}$$

Example: The annual rainfall in Kingston is normally distributed with $\mu = 815mm$ and $\sigma = 155mm$. What is the probability that Kingston experiences less than $500mm$ of rain in a given year?

Solution: Let $X$ be the amount of rainfall, then $$P(X< 500) = \int_{-\infty}^{500} \frac{1}{155\sqrt{2\pi} }e^{-\frac{(x-815)^2}{2\cdot 155^2}}dx.$$ Unfortunately, this integral is not solveable. There is no antiderivative of this in terms of elementary functions. What we can do, however, is approximate this numerically. Most scientific calculators and programming languages like python, have built in methods to calculate the CDF for the normal distribution. Referring to python's scipy.stats.norm.cdf function gives $$P(X< 500) \approx 0.02106.$$

To calculate this using the Casio 991-X calculator, find the normal cdf by pressing mode, then down, then 3 to select DIST, then 2 to select Normal CD. Different versions of the calculator may have different procedures, so refer to the manual that came with the calculator.

Exercises:

  1. Let $X$ be a stanard normally distributed random variable. Find the following probabilities:
    1. $$P(X>0)$$
    2. $$P(-1< X < 1)$$
    3. $$P(-2< X < 2)$$
  2. Let $X$ be a standard normal random variable. Using either the inverse normal function on your calculator or the scipy.stats.norm.ppf function in python to calculate the following values
    1. $a$ so that $P(X < a) = 0.625$
    2. $b$ so that $P(b < X) = 0.125$