Continuous Random Variables

Recall that discrete random variables were defined as ones where the range is countable. We defined continuous random variables as random variables that are not discrete random variables, and therefore ones where the range is uncountable. Initially, it sounds like continuous random variables will be harder to deal with, but in reality we have an excellent tool for them that we didn't have for discrete random variables; calculus.

Definition: The cumulative distribution function (cdf) for a continuous random variable $X$ is $$P(X \le x) = F(x) = \int_{-\infty}^x f(t)dt. $$ The function $f$ is called a probability density function, or pdf for short.

Unlike discrete random variables, the probability density function doesn't have a direct relationship with the probability of an event. The only way to relate the pdf to probabilities is to integrate it over some interval. Any function can be a pdf for some distribution as long as it satisfies the following two properties:

  1. $f(t) \ge 0$ for all $t\in\mathbb{R}$, otherwise we could have negative probabilities.
  2. $\int_{-\infty}^\infty f(t)dt = 1$, so that $P(X\in \mathbb{R}) = 1$.

Some properties of the cdf are fairly easy to derive.

  1. $P(X>x) = 1-F(x),$ since $P(X>x)=1-P(X\le x) = 1-F(x)$.
  2. $P(a < X \le b) = F(b) - F(a),$ since \begin{align*} P(a < X \le b) &= P((X < a)\cap (X\le b))\\ & = P(a < X ) + P(X \le b) - P((a < X)\cup (X\le b)) \end{align*} This last term is just 1 since all points are either greater than a or less than b, so we have $$P(a < X \le b) = 1 - F(a) + F(b) - 1 = F(b) - F(a).$$
  3. $P(X=a) = 0,$ since $P(X=a) = P(a\le X \le a) = F(a) - F(a) = 0$
We'll note that this last property means that we don't have to be careful about whether the inequalities we use are inclusive or not, since \begin{align} P(a \le X) &= P((a < X)\cup(X=a)) \\ &= P(a < X) + P(X = a) \\ &= P( a < X). \end{align}

It will be useful to talk about some commonly found continuous probability distributions, their properties, and some of their applications.

The Uniform Distribution: This distribution is for random variables where the range is some bounded interval $[a,b]$, and the probability of a value is the same for every value in that interval. The probability density function is $$ f(x) = \begin{cases} C & x \in [a,b]\\ 0 & x \notin [a,b], \end{cases}$$ for some value of $C$.

To determine the appropriate value of $C$, we look to the properties a probability density must satisfy. By property 2, we must have $\int_{-\infty}^{\infty}f(t)dt = 1$, and from the piecewise definition of $f,$ we can split the domain into three pieces, $(-\infty,a)$, $[a,b]$, and $(b,\infty)$. Then \begin{align} 1 &= \int_{-\infty}^\infty f(t) dt \\ &= \int_{-\infty}^a f(t) dt + \int_{a}^b f(t) dt + \int_b^{\infty}f(t). \end{align} Since $f(t) = 0$ for $t\notin [a,b]$, the first and last terms are both zero. Hence, $$1 = \int_a^b C dt = Ct \vert_a^b = Cb-Ca,$$ and solving for $C$ gives $C = \frac{1}{b-a}$.

The Exponential Distribution: For random variables where the range is any non-negative real number, and the probability decreases as the value of $X$ gets larger. The probability density function is given by $$ f(x) = \begin{cases} C e^{-x/\lambda} & x \ge 0\\ 0 & x < 0, \end{cases}$$ where $\lambda$ is a positive parameter. As an exercise, show that the appropriate constant is $C=\frac{1}{\lambda}$.

The exponential distribution is often used to determine waiting times, in which case the parameter $\lambda$ is called the mean waiting time.

Example The life expectancy of LED lights is 1000 hours, which can be thought of as an exponentially distributed random variable.

  1. What is the probability that a lightbulb dies within its first 100 hours of use?
  2. What is the median lifetime of these lightbulbs?

Solution: Let $X$ be the time at which a randomly selected lightbulb dies, then $X$ is exponentially distributed with $\lambda = 1000.$ The probability that it dies within the first 100 hours is \begin{align} P(X \le 100) &= F(100) \\ & = \int_{-\infty}^{100} f(t)dt\\ &= \int_{-\infty}^0 f(t) dt + \int_0^{100}f(t)dt\\ &= \int_{0}^{100}\frac{1}{1000} e^{-t/1000}dt \\ & = -e^{-t/1000}\vert_{0}^{100}\\ & = 1 - e^{-1/10}\\ & \approx 0.095. \end{align}

The median lifetime is the value of $x$ at which $F(x) = \frac{1}{2}.$ Therefore we need to find \begin{align} F(x) &= \int_{-\infty}^x f(t)dt \\ & = \int_0^x \frac{1}{1000}e^{-t/1000}dt \\ &= -e^{-t/1000}\vert_0^x \\ & = 1-e^{-x/1000}. \end{align} Therefore, we can solve for the median by isolating $x$ in $$ 1-e^{-x/1000} = \frac{1}{2},$$ which leads to $x = 1000\ln(2).$

Exercises:

  1. For each of the following, find a value for the constant $C$ so that the function is a pdf.
    1. $f(x) = Ce^{-|x|}$ for all $x\in\mathbb{R}$, where $|x|$ is the absolute value of $x.$
    2. $f(x) = Cx(1-x)$ if $x\in(0,1)$ and $f(x) = 0$ if $x\notin(0,1).$
    3. $f(x) = Cx\ln(x)$ if $x \in (0,1)$ and $f(x) = 0$ if $x\notin(0,1).$
  2. Suppose that the average lifetime of a certain type of battery is 10 hours, but it is guaranteed to fail after 20 hours. The pdf can be described by $$ f(t) = \begin{cases} C e^{-t/10} & t \in (0,20) \\ 0 & t \notin (0,20)\end{cases}.$$
    1. Find the value of $C$ that makes $f$ a pdf.
    2. What is the probability that the battery lasts longer than 10 hours?