What is Statistics?

Statistics is the systematic study of how to analyze data. As civil Engineers, you may need to collect data related to material strengths, water flow rates, wind speeds in the area of a build, load on a structure such as a bridge, etc. The data you collect about these things will always vary a little bit. In the case of someting like material strengths, the variations may be small, while in the case of something like wind speeds the variations could be very large. It is important for you to be able to look at the data that you collect and be able to understand it well enough to determine what to expect from your projects.

Statistics can be broadly split into two parts: descriptive statistics and inferential statistics. As the name suggests, descriptive statistics is concerned with how the describe data. This includes being able to summarize the data using a few helpful measures, such as describing the center and spread of the data set. This also includes being able to determine the appropriate way to present the data set. We will spend some time talking about descriptive statistics in the lab portion of the course, but not much in lectures beyond today.

Inferential statisics is more involved, and potentially more interesting. The ideas involved here center around trying to learn about a complete data set by looking at only a portion (a sample) of the data. Inevitably, this involves using a lot of probability theory, which we will spend a lot of the next 5-6 weeks learning about.

Example:

Consider the following set of data. For each of the following 6 questions: determine whether the question is one that falls into the category of "descriptive statistics" or "inferential statistics". Try to come up with an answer for the descriptive statistics questions.

140 139 123 116 242
226 230 258 561 116
  • What would you describe as the center of the data set?
  • How would you describe the "spread" of the data set?
  • How would you present the data set?
  • What other numbers could be added to this data set and make sense?
  • Suppose this is a subset of a larger data set. How likely is it that 102 is in the full data set?
  • Suppose this is a subest of a larger data set. What is the center of the full data set?