Lecture Presentations

Part I – Descriptive Statistics

Course Description

This is an introductory Statistics course, which surveys basic statistical techniques with particular emphasis on business and economic applications. The learning objective of this course is to improve students’ analytical skills in understanding and employing descriptive and inferential statistics under classical tradition. We begin this course by learning how to describe the data in use. Then, we focus on probability theory, which enables us to understand the essence of statistical inference. And for the rest of the course, we explore multiple inference tools such as confidence interval estimation, hypothesis testing, and the analysis of variance. These tools help us make use of sample data to reach conclusions about population parameters.

In this lecture series, I rely heavily on Basic Business Statistics by Berenson, Levine, and Szabat. I also make use of MyStatLab, developed by Pearson. As for softwares, we use MS Excel when necessary. OU students have free access to this software.

If you have any questions about the materials shared below, you may contact me via: tabrizy@ou.edu. You may also contact my graduate assistants.

Getting Started

  • Lecture presentation (PDF)
  • Data is everywhere:
    • Video: Roisin Donnelly (P&G) on using data to win business
      • Let’s think about this for a minute: Every single transaction that relates to P&G products (its own product, its complements, or its substitutes) becomes a DATA POINT (oh, learn this term… we will use it frequently) for the marketing team at P&G. They first have access to their own sales (and inventory) data, and they will do their best to collect data about other related products. Why? What are they going to do with all these data?
    • Video: WITF’s report on the ORION system at UPS
      • Let’s think about this for a minute: The above video illustrates how applied mathematicians DEFINE the data that can be used to drive smarter, COLLECT the data from their customers, vehicles, and drivers, ORGANIZE and VISUALIZE the data, and eventually ANALYZE the data points to make routes more efficient.
  • Required reading: GS1 and GS2 (pp. 2-4)
  • Wait a minute… why did we start with data, again!? Well, because we wanted to get to know what STATISTICS is all about:

STATISTICS helps us transform data into useful information for decision making.

DESWebsite HeaderCRIPTIVE STATISTICS provides some information about the variations (e.g., the mean and standard deviation of the annual earnings of the OU alumni), and INFERENTIAL STATISTICS provides some information about the population using sample observations (e.g., testing whether the choice of major has any impacts on the life time earning of the OU alumni, using only a sample of alumni).

 

  • Let us look into a dataset that we put together in class using the News Outlet Survey. This survey shows the news outlets that a randomly selected group of your classmates rely on. It also shows the outlets that they anticipate they will rely on seven years from now. There are also few other questions that are answered by your classmates.
    • A quick look at the survey results of the survey for:
    • If you are interested, you may also download the dataset here. To download this file on your computer, go on File, then select Download as, and select Microsoft Excel (.xlsx).

Statistician-OU-CreateThe DCOVA framework:

Those who work with data are typically involved in either of these activities: Defining data, Collecting the defined data, Organizing the collected data, Visualizing the organized data, or Analyzing them using the tools that are developed in Inferential Statistics.

Chapter 1. Defining and Collecting Data

  • Lecture Presentation (PDF)
  • Highlights:
    • The difference between CATEGORICAL (qualitative) and NUMERICAL (quantitative) variables
    • The difference between DISCRETE and CONTINUOUS numerical variables
      • Note: Given their decimal precision, measures of income and expenditure are often considered Continuous though they appear to be the result of counting
    • Example: Soda Consumption Survey Results. To download this file on your computer, go on File, then select Download as, and select Microsoft Excel (.xlsx).
      • Variables include: ID, Name, Gender, Weight, Number of Soda Last Week, Regular vs Diet, Coke vs Pepsi, Other Brands, 5 Cents Price Increase, and 95 Cents Price Increase.
      • The above variables help us understand:
        • Some general characteristics about the observations; e.g., their gender, their weight, etc.
        • Some general information about their consumption of soda drinks; e.g., number of cans of soda that they drank last week and regular vs. diet.
        • Some general information about their preferences for brands of soda drinks; e.g, Coke vs. Pepsi, other brands, etc.
        • Some general information about the sensitivity of their demand for soda drinks (a.k.a. price elasticity of their demand for soda drinks); e.g., 5 cents price increase and 95 cents price increase.
      • Among the above variables, some are categorical and some are numerical.
        • Categorical variables:
          • Nominal: Name, Gender, Regular vs. Diet, Coke vs Pepsi, Other Brands
          • Ordinal: 5 Cents Price Increase and 95 Cents Price Increase
        • Numerical variables:
          • Discrete: Number of Soda Last Week
          • Continuous: Weight
    • The difference between POPULATION and SAMPLE.
      • Note: We are only interested in the populations (I cannot put enough emphasis on this!). We employ samples along with Inferential Statistics techniques to understand the populations better.
    • The difference between Non-probability and Probability Sampling
    • Probability Sampling:
      • SIMPLE RANDOM SAMPLING and SYSTEMATIC SAMPLING: In these two methods, we neglect the characteristics of the items in population when we draw a random sample. Items are nothing to us but bunch of IDs.
      • STRATIFICATION and CLUSTERING: In these two methods, we consider the characteristics of the items in the population. Taking the Gender Composition of the population of voters into account, for instance, a random sample can be chosen to represent the voters in the US. This is called Stratification. Taking the share of each State in the population of voters into account, another random sample can be chosen to represent the voters in the US. This is called Clustering.
  • A class activity on Probability Sampling (PDF)
  • An Excel file for Sampling and Recoding. To download this file on your computer, go on File, then select Download as, and select Microsoft Excel (.xlsx)
    • Start from a tab called Note. Then, go to dataset.

Chapter 2. Organizing and Visualizing Data

  • Lecture Presentation (PDF)
  • Highlights:
    • The FREQUENCY DISTRIBUTION and HISTOGRAM are the most important tools that we have learned about in this chapter. These tools are widely used for organizing and visualizing data. Make sure that you know how to construct the frequency, relative frequency, cumulative, and relative cumulative distributions. Also, make sure that you are able to read and understand histograms, polygons, and cumulative polygons.
      • Start with this class activity (PDF) to construct a frequency distribution
      • Then, take a look at this example (XLSX) to learn how the Frequency Function works in Excel. To download this file on your computer, go on File, then select Download as, and select Microsoft Excel (.xlsx).

RoadThe Road Ahead:

In near future, you will see that PROBABILITY DISTRIBUTION and the graphical illustration of DENSITY FUNCTION are closely related to FREQUENCY DISTRIBUTION and HISTOGRAM, respectively.

  • Highlights (continued):
    • The SUMMARY TABLE (tabulation) and CONTINGENCY TABLE (cross tabulation) are also among the important tools that are introduced in this chapter. Make sure that you know how to construct, read, and understand these tables.
      • Make use of PivotTable tools in Excel to construct summary and contingency tables using this data set (XLSX). To download this file on your computer, go on File, then select Download as, and select Microsoft Excel (.xlsx).
      • Also, make sure that you know how to read and understand bar charts, pie charts, Pareto charts, and side-by-side bar charts.
    • HISTOGRAMS are useful in visualizing Frequency Distributions.
      • How to draw a Histogram using Excel?
        • First, you need to load the Analysis ToolPak. Click here for loading instruction for Windows. Click here for loading instruction for Mac. (If you are unable to load this for your Mac computer, you may use any on campus Windows computers.)
        • Once the Analysis ToolPak is loaded, go to MyStatLab, choose Econ 2843 course, and go on Multimedia Library. Let “Chapter” to be set as “All Chapters” and “Section” as “All Section.” Then, choose video, and click on “Find Now.” Once all the items are loaded, look for the following video: Excel 2013 with Data Analysis ToolPak: Histogram (3:12).
      • In case you do not want to use Excel to draw a Histogram, you may use this online platform. It also generates a Frequency Table for you.
    • There are also two types of graphs that can be used to visualize the variations in two numerical variables: SCATTER PLOT and TIME SERIES. It is important that you can read and interpret both of them.

Chapter 3. Numerical Descriptive Measures

  • Lecture Presentation (PDF)
  • Highlights:
    • The idea behind measures of central tendency and measures of dispersion:
      • A sample may include multiple observations with a particular characteristics that can take different numerical values; e.g., a randomly selected group of baseball players hit different number of home-runs in a given season. (Click here for a small data set containing the number of home-runs for 20 randomly selected MLB players in 2014 season.)
      • As discussed in Chapter 2, we are able to construct a frequency distribution, which can also be illustrated via a histogram, using numerical variations in a sample. This would be the best way for us to understand how data are distributed.
      • A frequency distribution, however, often provides too much information. Alternatively, we can make use of only two measures to understand how data are distributed. For instance:
        • We can make use of MEAN and STANDARD DEVIATION to measure central tendency and dispersion, respectively.
        • Alternatively, we can make use of MEDIAN and INTERQUARTILE RANGE for central tendency and dispersion, respectively.
      • You need to make sure that you know how these measures are defined, what the differences are between them, and how Excel can be used to compute these measures.
    • Mean and Median are both measures of central tendency. Mean is very useful in making decisions. It is also a very useful measure in inferential statistics. However, it is sensitive to outliers (i.e., the observations that take extreme numerical values).  Median is very useful in describing the data, and it is not sensitive to outliers.
      • StatTalk Videos: Go on MyStatLab, and watch the following video clips:
        • What is an Average? (3:41)
        • When Should You Use a Mean and When Should You Use a Median? (3:42)
        • Variation 1: Introduction and Quartiles (4:57)
        • Variation 2: Standard Deviation (With a Digression on Eggroulette) (4:55)
    • Standard Deviation measures the dispersion around mean, and Inter-Quartile Range measures the dispersion around median. They are both very useful in describing the data: the greater the Standard Deviation or Inter-Quartile Range, the greater the dispersion.
    • A class activity on Summary Statistics (PDF)
    • How to compute the Geometric Mean Rate of Return?
      • Enter the rates of return into Excel. Do not forget to include the signs, however; e.g., 50% loss should be entered as -0.5, and 100% recovery should be entered as 1.
      • Then, add 1 to the rates that you entered; e.g., for the above loss it would be +0.5, and for the above recovery it would be 2.
      • Then, use the geometric mean function and deduct 1 (=GEOMEAN (0.5,2)-1) in order to obtain the geometric mean rate of return.
    • Make use of this data set and compute the Covariance between the Revenue generated by NBA teams and their Value. Also, compute the Correlation Coefficient. To download this file on your computer, go on File, then select Download as, and select Microsoft Excel (.xlsx).
      • To compute the Covariance, you may make use of the Covariance Function in Excel (=COVARIANCE.S). You may also make use of this file (multiple tabs). The latter is more instructive.
      • To compute the Correlation Coefficient, you may make use of the Correlation Function in Excel (=CORREL). You may also make use of this file (multiple tabs). The latter is again more instructive.

Part II – Probability

Chapter 4. Probability Theory

  • Lecture Presentation (PDF)
  • Highlights:
    • Using Probability Theory, we can identify the chance that an uncertain event will occur.
      • We examine SIMPLE events, JOINT events, UNION of events, and CONDITIONAL events.
      • In this chapter, we typically examine EMPIRICAL probability, which can be identified using data (e.g., weather data, survey data, financial data, etc.). We also study A PRIORI probability, which can be identified using the prior knowledge of the process.
    • To examine probabilities, we must first identify the SAMPLE SPACE, which is the collection of all possible events. We must also identify the EVENT OF INTEREST. Probability is, then, defined as the number of ways in which the event of interest occurs divided (a.k.a. adjusted) by the total number of possible outcomes in the sample space.
      • I encourage my students to make use of Venn Diagram, Contingency Table, or Decision Tree to identify the sample space and the event of interest.

dices-316473_1280In this chapter, we make use of five simple rules to identify probabilities: Marginal Probability, General Addition, Conditional Probability, Multiplication Rule, and Bayes Rule. The last three rules are closely related to each other, and can easily be derived from Conditional Probability rule.

    • Rule No. 1.) Marginal Probability: Consider two MUTUALLY EXCLUSIVE and COLLECTIVELY EXHAUSTIVE events. Let’s call them B1 and B2. Probability of a SIMPLE event like A, can be written as the sum of the JOINT probabilities that A and B1 and A and B2 happen.
      • Thus: P(A)=P(A and B1)+P(A and B2)
      • Reminder: This rule applies to more than two events, as long as all of them are mutually exclusive and collectively exhaustive.
      • Reminder: Mutual exclusivity implies that B1 and B2 cannot happen at the same time (e.g., A college football team either wins a game (B1) or loses (B2). They cannot win and lose a game at the same time). Collective exhaustiveness implies that either B1 or B2 happens. No other outcome is possible (e.g., A college football team may either win (B1) or lose (B2) a game in NCAA Tournaments. No tie is allowed since 1995).
    • Rule No. 2.) General Addition: Consider two simple events, A and B. The probability of the union of A and B (i.e., the probability that A or B happen) can be identified by the probability that A happens plus the probability that B happens, minus the probability that A and B happen (since we counted for this twice).
      • Thus: P(A or B)=P(A)+P(B)-P(A and B)
      • Refer to slides 33-37 for an example.
    • Rule No. 3.) Conditional Probability: Consider two simple events, A and B. The probability that A happens, knowing that B has already happened (e.g., the probability that FC Barcelona wins a game (A), knowing that it is snowing (B)) can be identified using the joint probability that A and B happen at the same time, divided (a.k.a. adjusted) by the simple probability of B.
      • Thus: P(A|B)=P(A and B)/P(B)
      • Refer to slides 39-44 for an example.
      • Reminder: Let A and B be two independent events (e.g., that FC Barcelona wins a game at home (A) and that Microsoft gains value in the stock market (B)). The conditional probability of A given B is simply the simple probability of A (e.g., the probability that FC Barcelona wins a game at home (A) given that Microsoft gained some value in the stock market last week (B) is simply the probability that Barcelona wins a game at home, since these two have nothing to do with each other).
      • Thus: P(A|B)=P(A) when A and B are independent
    • Rule No. 4.) Multiplication Rule: Consider two simple events, A and B. Given conditional probability rule, one can identify the joint probability that A and B happen at the same time using the product of the conditional probability of A given B and the simple probability of B.
      • Thus: P(A and B)=P(A|B).P(B)
      • Refer to slides 49-53 for an example.
    • Rule No. 5.) Bayes Rule: Consider two simple events, A and B. Given conditional probability and multiplication rules, one can identify the conditional probability of A given B as the product of the conditional probability of B given A and the simple probability of A, divided (a.k.a. adjusted) by the simple probability of B.
      • Thus: P(A|B)=[P(B|A).P(A)]/P(B)
      • Refer to slides 57-59 for derivation.
      • Reminder: Often we make use of the combination of marginal probability and multiplication rule to identify the simple probability of B. Refer to slides 60-80 for derivation and two examples.
      • Reminder: It is common to think of Bayes rule in the following fashion: having some PRIOR information (e.g., P(B|A), P(A), and P(B)), we are able to compute the POSTERIOR probability (e.g., P(A|B)) by employing some NEW information (e.g., B). This probability could be computed via: P(A|B)=[P(B|A).P(A)]/P(B). Refer to slides 83-92 for an example.
      • Note: In applying Bayes rule, it is quite important to prepare a summary of the prior information that are explicitly and implicitly given. It is also quite important to understand the conditional probability of interest, given the new information. This is well illustrated in this class activity.
    • This is an example of how basic probability rules may be applied in real life to address interesting questions. This lecture was offered by Dr. Michael Patten (the University of Oklahoma) on March 8, 2017.

Chapters 5 and 6. Probability Distribution

  • Lecture Presentation (PDF)
  • Highlights:
    • PROBABILITY DISTRIBUTION is a listing of random events and their associated probabilities.
      • Often, we make use of a function to describe how the events and their probabilities relate to each other.
      • Like other distributions, probability distributions also have a measure of central tendency, known as Expected Value, and some measures for dispersion, such as Standard Deviation.
        • The EXPECTED VALUE is simply a weighted mean. For the formula refer to the lecture presentations (Slide 10).
        • The STANDARD DEVIATION is the square root of the mean scatter from the Expected Value. For the formula refer to the lecture presentations (Slide 12).
      • The linear relationship between two random variables can also be measured using COVARIANCE. For the formula refer to the lecture presentation (Slide 15).
        • A RANDOM VARIABLE is a function that can take on either a finite or infinite number of random values.
          • For example, the number of wins for a given baseball team during regular season is a DISCRETE random variable (with finite number of random values, which could be counted).
          • The time required to download a music file is also another random variable. This one, however, is a CONTINUOUS random variable (with infinite number of random values, which could be measured).
      • The number of WiFi outages per day being the random variable, this Excel file illustrates how you can compute the Expected Value and Standard Deviation of a probability distribution. It also shows how you can measure the Covariance between two random variables. To download this file on your computer, go on File, then select Download as, and select Microsoft Excel (.xlsx).
      • Also, refer to Slide 20 (and 23) to see how you could compute the Expected Value and the Standard Deviation of the (weighted) sum of two random variables. Make sure that you review the application for portfolio performance measurement.

 

3249101355_bcf4aa50cb_qThe probability distributions of interest:

In this lecture series, we examine two discrete probability distribution and one important continuous probability distribution.

 

  1. Discrete Probability Distributions:
    • BINOMIAL Probability Distribution, which measure the probability of a given number of successes in a given number of trials when the probability of success is known and remains constant for all the trials.
      • Example: We can use this function to compute the probability of getting 6 (i.e., the successful outcome) 10 times (i.e., he number of successes) when we throw a fair dice 50 times (i.e. the number of trials).
    • POISSON Probability Distribution, which measures the probability of the number of times that an event happens in an area of opportunity, given the historical average number of events.
      • Example: We can use this function to compute the probability that 15 fish are caught by a group of students (i.e., the number of times that an event happens) in a day of camping (i.e., the area of opportunity), knowing that on average students catch 10 fish per day of camping (i.e., the historical average).
  2. Continuous Probability Distribution:
    • NORMAL Probability Density Function, which can be used to compute the probability for an interval over a continuum when the continuous random variable of interest is distributed symmetrically, like a bell.
      • Example: We can use this function to compute the probability that a flight from MKE to OKC takes more than 3 hours and less than 4 hours.

Binomial and Poisson Probability Distributions

  • The intuition behind Binomial Probability Distribution:
    • Think of an event with two outcomes: Success and Failure. Just for example, let’s say that you may succeed with 50% probability and that you may fail with 50% probability. Let’s try this event for, say, ten times. Let’s assume that the probability of success and failure in each trial remains the same, i.e.: it does not change even after multiple trials – no learning is involved. Under these assumptions, you may compute the probability of a given number of successes within those ten trials using Binomial Probability Distribution. For instance, you may compute the probability that you are successful exactly 2 times during 10 trials, or that you are successful more than 2 times during 10 trials, or that you are successful less than 2 times during 10 trials.
    • Why is this a “probability distribution”? Well, because it provides you with a list of events (e.g., more than 2 successes in 10 trials) and their associated probabilities.
    • How could we compute those probabilities? In this course, we make use of its Excel function. For more, refer to Slide 42.
      • Tip: When computing “the probability of exactly 2 successes,” put FALSE for the last argument in the Excel function (i.e., make use of the Mass Function). To compute “the probability of 2 or less successes,” put TRUE as the last argument (i.e., make use of the Cumulative Function).
      • This Excel file provides a playground. To download this file on your computer, go on File, then select Download as, and select Microsoft Excel (.xlsx).
  • The intuition behind Poisson Probability Distribution:
    • Think of an event that may occur repeatedly, like car accidents. Imagine you have some historical information about that event. For instance, you know about the average number of car accidents per day in OKC. You may make use of Poisson Probability Distribution to compute the probability that such event happens a given number of times. Given the average number of car accidents per day in OKC, for example, you may use Poisson Probability Distribution to compute the probability that exactly 10 accidents, or more than 10 accidents, or less than 10 accidents happen in a given day in OKC .
    • Why is this a “probability distribution”? Well, because it provides you with a list of events (e.g., more than 10 accidents per day) and their associated probabilities.
    • How could we compute those probabilities? In this course, we make use of its Excel function. For more, refer to Slide 59.
      • Tip: When computing “the probability of exactly 10 accidents per day,” put FALSE for the last argument in the Excel function (i.e., make use of the Mass Function). To compute “the probability of 10 or less accidents per day,” put TRUE as the last argument (i.e., make use of the Cumulative Function).
      • This Excel file provides a playground. To download this file on your computer, go on File, then select Download as, and select Microsoft Excel (.xlsx).

A note on Mass versus Cumulative probability distribution functions:

Probability Distribution Functions “describe” how events and their associated probabilities relate to each other. This can be done in two different ways. Mass functions provide us with the probability for a precise outcome; e.g., the probability that the annual income of a randomly selected household being exactly equal to $100,000. Cumulative functions provide us with the probability for the outcome being less or equal to a precise value; e.g., the probability that the annual income of a randomly selected household is equal or less than $100,000.

Normal Probability Distribution

  • The intuition:
    • Think of a game with two outcomes. Let’s say that with the probability of 50% you will win the game and with probability of 50% you will lose. Let’s also say that we would like to play this game repeatedly, over and over again. When we keep playing this game for 1,000 times, for instance, we can ask yourself: what is the probability that we win the game, say, 250 times or less? Keep in mind that winning this game for 250 times or less is the result of a large sum of random events. The Normal Probability Distribution is what you get when you add up a large number of random events together.
      • StatTalk Video: To understand this intuition better, I recommend watching the following video on MyStatLab:
        • The Normal Distribution (4:30)
        • Not the Normal Distribution (3:55)
      • To generate a Normal Probability Distribution and a Non-normal Probability Distribution, we may conduct an experiment. Let us first generate three random events: X1 is a random number between 10 and 20, X2 is a random number generated by a binomial distribution with 100 trials and 50% probability of success, and X3 is a random number generated by a Poisson distribution with historical average of 10. Using the above numbers, we generate two random variables: Y is the sum of X1, X2, and X3; Z is the product of X1-squared, X2, and X3. We may repeat this process for 10,000 times, generating 10,000 Ys and Zs. Given the intuition behind Normal Probability Distribution, we expect Ys to be normally distributed and Zs to be non-normal. This difference is well reflected in the histograms below.
  • What is the bell-shaped curve that we always see for Normal Probability Distribution? That is called DENSITY, which is basically a smoothed histogram for the associated probabilities. Imagine that you draw the histogram for the probability of winning the game X number of times, where X takes different values (like what they do in the StatTalk video). For this graph the horizontal axis is the number of wins and the vertical axis is the associated probabilities. You can connect the top of the histograms to each other in a smooth fashion (again, like what they do in the StatTalk video). The resulting curve is called the Density.
  • What is the Normal Density used for? It is used to compute the probability that you win the game X number of times of less. The area under the Normal Density and to the left of X is equal to the probability that you win the game X number of times of less. Thus, the Normal Density is used to identify Cumulative Normal Probabilities. And keep in mind that the area under the Normal Density for the entire range of X is always equal to 1.
  • As illustrated in Slide 75, the Normal Probability Distribution is known by:
    1. its symmetric bell-shaped density
    2. its mean
    3. its standard deviation

RoadThe Road Ahead:

Because of a theorem called Central Limit Theorem we will make use of Normal Probability Distribution frequently.

  • STANDARDIZED Normal Probability Distribution:
    • In practice, there are a lot of different random variables that are “normally distributed.” You are introduced to some of them in the above-mentioned StatTalk video. These variables all have a symmetric bell-shaped density. However, depending on their units and scale, they are going to have different means and standard deviations.
      • Think of these two variables: height and weight. Let’s assume that these variables are normally distributed, which is likely to be the case. These variables are measured using different units (e.g., feet vs. lbs), and they also have different scales (e.g., different ranges).
    •  To get rid of the differences in means and standard deviations, we may transform the data by using the Z-score for each observation rather than using the observation itself. This is known as “standardization.” For the observations that are normally distributed, the Z-scores are also normally distributed. Unlike original distribution, however, the mean and standard deviation of the standardized distribution are always equal to zero and one, respectively.
      • Reminder: What is the Z-score? The Z-scores are computed by taking the difference between the value of each observation and the mean, divided (a.k.a. adjusted) by the standard deviation. They measure the deviation of each observation from the mean in terms of standard deviations; e.g., an observation with a Z-score equal to +2 is two standard deviations greater than the mean, while an observation with a Z-score equal to -2 is two standard deviation less than the mean.
  • As illustrated in Slide 80, the Normal Probability Distribution is known by:
    1. its symmetric bell-shaped density
    2. its mean, which is always equal to zero
    3. its standard deviation, which is always equal to one

Normal-distThe most important application of Normal Probability Distribution in this course is finding the normal probabilities. It is very important to keep in mind that we can make use of Normal Density to compute the cumulative normal probabilities.

Let X be a normally distributed random variable (e.g., weight). Let a and b be some constants (e.g., a=125 lbs and b=175 lbs). We can make use of Normal Density to compute the probability that X is less or equal to a; that X is greater or equal to a and it is at the same time less or equal to b (e.g., Slide 88); and that X is greater or equal to b.  Identifying these probabilities can either be done using Cumulative Standardized Normal Probability Distribution (which requires standardization) or using the functions that are built in Excel (which does not require standardization).

  • Click here to obtain the Cumulative Standardized Normal Probability Distribution Table. (The table is taken out of Statistical Tables for Economists, prepared by the Department of Economics at the University of Warwick.) To access the table that we used in class, you may click here. This table is a bit harder to read, but it is more detailed.
  • Also, use this Excel file as a playground. It helps you identify normal probabilities faster. To download this file on your computer, go on File, then select Download as, and select Microsoft Excel (.xlsx).
  • Click here for a class activity. In this activity, you first learn how to think about normal probabilities. You also learn how to use the Cumulative Standardized Normal Probability Distribution Table to identify the probability of interest. Rather than the full table, I put only a small block of the Cumulative Standardized Normal Probability Distribution Table on this exercise. This is what we usually do for the exams too.

Chapter 7. Sampling Distribution

  • Lecture Presentation (PDF)
  • Highlights:
    • The Sampling Distribution of Means is among the key concepts in Statistical Inference. An intuitive understanding of this concept will help you a lot in understanding how inference is conducted.
    • Imagine a “population” in which the items are not all the same, like Oddland (Slide 9). You are asked to choose a randomly selected sample and compute a mean for that very sample (e.g., average age). Let’s call the computed mean: X-bar-one. You are, then, asked to repeat the same exercise one more time: choose another random sample, compute a mean, and call the computed mean X-bar-two. In fact, you are asked to repeat this for, say, hundred times, obtaining X-bar-one, X-bar-two, X-bar-three, …, X-bar-hundred. Depending upon the items that are included in the randomly selected samples, the computed means may be different from one another. Some are, in fact, equal. But you will obtain other values for sample means. Slides 14 and 17 illustrate this quite well.
      • Since sample means vary, given the random sample drawn, the sample means above become a random variable. A variable with hundred values that may or may not be equal to one another.
      • THE SAMPLING DISTRIBUTION OF MEANS is the distribution of sample means that are obtained from repeated sampling. Like other distributions, one may identify:
        • The MEAN of the sampling distribution of means
        • The STANDARD DEVIATION of the sampling distribution of means (a.k.a. Standard Error)
      • What type of distribution will the sampling distribution of means, then, follow? This is an important question that is addressed in this course under two sets of assumptions.
        • When the population, from which random samples are drawn, has a normal distribution, the sampling distribution of means is also normal. The mean of the sampling distribution of means in this case is equal to the population mean. The standard deviation of the sampling distribution of means in this case is equal to the population mean, adjusted by sample size. For formula, please refer to Slide 22.
        • When the population, from which random samples are drawn, is not normal, the sampling distribution of means is approximately normal provided that the sample size is large enough. The mean of the sampling distribution of means in this case is again equal to the population mean. The standard deviation of the sampling distribution of means in this case is again equal to the population mean, adjusted by sample size. For formula, please refer to Slide 34.

cropped-Website-Icon.jpgThe Central Limit Theorem implies that, as the sample size gets large enough, the Sampling Distribution of Means is normally distributed. This is true regardless of the shape of the population distribution. But how large is “large enough”? As a general rule, when the sample size is larger than 30, the sampling distribution of means is approximately normal (Slide 38).

  • To illustrate the implications of Central Limit Theorem, we may conduct a short exercise using real data:
    • Let us begin with this histogram, which shows the distribution of 750 midterm grades in Elements of Statistics. Since each question is worth five points in the exams, the width of each class in this histogram is set to be equal to five points.
    • From the population of midterm grades (N=750):
      • we choose a random sample of 50 observations with replacement,
      • we compute the mean grade for the chosen sample,
      • we record the obtained mean in a new data set
      • and we repeat the three steps above 500,000 times, which in turn yields 500,000 mean grades that are each coming from a sample of 50 observations
    • Given these recorded sample means, we can now plot the histogram for the sampling distribution of means with 500,000 observations, where each observation is the mean grade coming from a randomly selected sample of 50 grades. In this histogram, the width of each class is set to be equal to half a point.
    • Despite the fact that the Population Grade Distribution does not look like a normal distribution, the Sampling Distribution of Grade Means, for which n=50, looks very much like a normal distribution. Plus, we observe that:
      • the mean of grade means (=76.97) is an unbiased estimator for the population mean grade (=76.97)
      • the standard deviation of grade means (=2.40) is also much smaller than the population grade standard deviation (=16.99). In fact, the ratio of population grade standard deviation divided by the square root of sample size (=50) is equal to the standard deviation of grade means. To confirm this, just type in =16.99/SQRT(50) into a random cell in Excel.
    • In case you are interested, the codes for the above exercise is written in Stata. If you do not have Stata installed on your computer, you may use the computer lab at the Department of Economics. Download the codes (.DO) and grades (.DTA). For confidentiality reasons, no name or identification number is included in the grades data set. Thus, you should not be worried about using this data set.
  • Go to MyStatLab and watch the following videos:
    • Two types of variations (4:17)
    • Standard error and standard deviation (5:18)
  • NOTE: I cover the Sampling Distribution of Proportions in my lectures only if time allows. Nevertheless, I left the slides at the end of lecture presentations for Chapter 7.

Part III – Statistical Inference

Deadly DistributionLet’s play a game, called Deadly Distribution. You will soon realize why this game has been incorporated in the lecture series. To access Deadly Distribution, go to Canvas, click on Assignments, and you will find the game under the Extra Credits. You may obtain 10 points, which may improve your midterm grades, if you accomplish all the missions. (This game is designed and developed by the OU K20 Center.)

Chapter 8. Confidence Interval Estimation

  • Lecture Presentation (PDF)
  • Confidence Interval Estimation is a direct application of what you learned about Normal Density (Ch. 6) and Sampling Distribution of Means (Ch. 7).
  • But what is Confidence Interval Estimation about? Often, we do not know much about POPULATION PARAMETERS, like population mean (e.g., the average number of newly hired employees among all firms in the US over the last year). However, we are able to select a random sample from the population (e.g., a random sample of American firms), and compute SAMPLE STATISTICS, like sample mean (e.g., the average number of newly hired employees among the selected sample of firms over the last year). Employing Confidence Interval Estimation techniques, we are able to ESTIMATE the population mean using the information obtained from the sample.
    • StatTalk Video: Go to MyStatLab, and watch the following video:
      • Sampling and Parameters (4:18)

PopSampPopulation mean and population standard deviation are often unknown PARAMETERS. Using STATISTICS such as sample mean and sample standard deviation, we are able to estimate the above-mentioned PARAMETERS. Drawing conclusions about the properties of a population using sample information is known as STATISTICAL INFERENCE.

  • Confidence Interval Estimation is conducted under two sets of assumptions:
    • The not-so-realistic assumption: population standard deviation is known to us.
      • In this case, we make use of Normal Probability Distribution.
    • The realistic assumption: population standard deviation is unknown to us.
      • In this case, we make use of another probability distribution, called: Student’s t Distribution.
  • Confidence Interval Estimation provides us with two limits:
    • The Upper Confidence Limit, which is the point estimate (e.g., sample mean) plus the product of critical value, determined by the level of confidence and the above assumption, and standard error of the sampling distribution (e.g., sampling distribution of means)
    • The Lower Confidence Limit, which is the point estimate (e.g., sample mean) minus the product of critical value , determined by the level of confidence and the above assumption, and standard error of the sampling distribution (e.g., sampling distribution of means)
      • For more on the limits, refer to Slide 50 (in which we assume that the population standard deviation is known to us) and Slide 78 (in which we assume that the population standard deviation is unknown to us).
      • For more on the critical values, refer to Slide Slide 52 (in which we assume that the population standard deviation is known to us) and Slide 75 (in which we assume that the population standard deviation is unknown to us).
    • To estimate the above limits, under the realistic assumption of unknown population standard deviation, you may make use of this Excel file. Again, think of it as a playground. To download this file on your computer, go on File, then select Download as, and select Microsoft Excel (.xlsx).

StudentsDistributionSTUDENT’S t DISTRIBUTION is a probability distribution. It looks almost like a Normal Probability Distribution when the sample size is large (say, more than 120). As sample size gets smaller, the right and left tails of the distribution become a bit fat and the peakedness of the distribution also declines. (Slides 69 and 71 offer detailed illustration).

Sample size, in fact, determines the DEGREE OF FREEDOM of the Student’s t Distribution. The degree of freedom reflects the number of observations that can freely vary, while a sample statistics (e.g., sample mean) is kept constant. To obtain a predetermined sample mean, for instance, we may only change n-1 observations in a sample of n observations. Given the predetermined mean, the nth observation depends on the other observations and cannot freely vary. In this case, therefore, the degree of freedom is equal to: n-1.

  • To understand the intuition behind the degree of freedom better, you may refer to the explanation provided by Ding, Jin, and Shuai (2017) in Teaching Statistics:

  • Example: The Average Amount on Time Spent on Social Media
    • We surveyed 70 students in class on how much time they spent on social media over a non-exam weekend in April. The social media platforms of interest include Facebook, Youtube, Twitter, Instagram, and LinkedIn. The result of the survey is given in this file, and the histogram looks like this.
    • Given the dataset collected, we may compute a 95% confidence interval for the average time spent on social media over a weekend among the population of students in my class. Relying on sample evidence, we could be 95% confident that the average time spent on social media over a weekend is greater than 2 hours, yet it is less than 3 hours. Click here for more details.
    • To estimate this interval with greater confidence, you may change the probability used for the critical value. Go ahead and plug in 1-0.99 rather than 1-0.95 to obtain the critical value associated with 99% confidence level. What would happen to the confidence interval estimation? Why? Make sure that you use drawing to justify your answer.
  • NOTE: I cover the Confidence Interval Estimation of Proportion in my lectures only if time allows. Nevertheless, I left the slides at the end of lecture presentations for Chapter 8.

Chapter 9. Hypothesis Testing: One-sample Tests

  • Lecture Presentation (PDF)
  • In statistical inference, a HYPOTHESIS is always about a POPULATION PARAMETER; e.g.: population mean. Assuming that the hypothesis of interest is true (e.g., H0: population mean weight = 185 pounds), one gathers some evidence from a randomly selected sample of observations, trying to REJECT the null hypothesis (H0) and ACCEPT an alternative hypothesis (e.g., H1: population mean weight > 185 pounds or H1: population mean weight < 185 pounds).
    • In Chapter 9, you learn how to test a hypothesis about one population using One-sample Tests. In Chapter 10, you learn how to test a hypothesis about two populations using Two-samples Tests.
    • Keep in mind that:
      • You may never accept the null hypothesis (H0), as you always begin by assuming that the null hypothesis is true.
      • You may only accept the alternative hypothesis (H1) when you find enough evidence suggesting that your assumption about the null hypothesis was irrelevant.
      • You may never be 100% sure about your conclusion. At best, you may say that, given the sample evidence, the alternative hypothesis (H1) is more likely than the null hypothesis.
      • You may commit a Type I Error should you reject a true null hypothesis. It is like concluding that someone is guilty (rejecting the assumption of innocence), while she is, in fact, innocent.
        • The probability of Type I Error determines the confidence that you may have in your test. The greater this probability, the lower the confidence.
      • You may commit a Type II Error should you fail to reject a false null hypothesis. It is like concluding that someone is innocent (not being able to reject the assumption of innocence), while she is, in fact, guilty.
        • The probability of Type II Error determines the power of your test. The greater this probability, the lower the power.
      • Type I and type II errors may not happen at the same time. The former requires the null hypothesis to be true, while the latter requires that null hypothesis to be false. We cannot have a null hypothesis which is true and false at the same time. That is why these two errors may not happen at the same time. What we focus on is the type I error and the Confidence in the test.
  • Here is the recipe for One-sample Hypothesis Testing:
    1. State the null hypothesis (H0) and the alternative (H1).
    2. Choose the probability of committing type I error. It is conventional to choose 1%, 5%, or sometimes even 10%. Also, choose a sample size: n.
      • Keep in mind that one minus the probability of committing type I error will determine the confidence that you have in your test; e.g., if the probability of type I error is equal to 5%, you have 95% confidence in your test.
      • The sample size will affect the standard deviation of the sampling distribution. Remember that the greater the sample size, the lower the standard deviation of sampling distribution of means.
    3. Determine the TEST STATISTIC.
      • If you know the population standard deviation (which is quite unlikely), then use the Z-stat. (The formula is given in Slide 68)
      • If you don’t know the population standard deviation (which is very likely), then use the t-stat. (The formula is given in Slide 114)
    4. Collect the data and compute the value of the chosen test statistic, given the formula. In the formula:
      • X-bar is the sample mean, Mu is the population mean under the null hypothesis (the hypothesized population mean if you wish), and n is the sample size.
      • If you have access to population standard deviation (Sigma), you may use that value in the Z-stat.
      • If you do not have access to the population standard deviation, which is often the case, then use sample standard deviation (S) in the t-stat.
    5. Given the probability of committing type I error, either use CRITICAL VALUES or P-VALUE to draw a conclusion.
      • Critical Value Approach:
        • Given the null hypothesis, critical values associated with the probability of committing type I error will divide the sampling distribution of means into two areas: Rejection and No-rejection. Take a look at two examples:
          • Slide 70: Rejection and No-rejection area in two-tails test, where strict equality is used in the null hypothesis (e.g., H0: population mean weight is equal to 185 pounds)
          • Slide 133: Rejection and No-rejection area in one-tail test, where an inequality is used in the null hypothesis (e.g., H0: population mean weight is greater or equal to 185 pounds)
        • Reject the null hypothesis (H0) if the test statistic is in the Rejection area (e.g., Slide 80 for two-tails test and Slide 136 for one-tail test).
        • Do not reject the null hypothesis (H0) if the test statistic is in the No-rejection area (e.g., Slide 122).
      • P-value Approach:
        • Given the sampling distribution, p-value is the probability of the test statistics or anything more extreme; e.g., Slide 90.
        • If the p-value is lower than the probability of committing type I error (e.g., p-value < 5%), then you may safely reject the null hypothesis: When the p-value is low, the null must go.
        • If the p-value is greater than the probability of committing type I error (e.g., p-value > 5%), then you may not reject the null hypothesis.
    6. Don’t forget to explain the conclusion in the context of the problem. Use plain English!

keyboardThe p-value is a key concept in hypothesis testing, and it is quite easy to work with. The p-value is the probability of the evidence obtained (or anything more extreme), assuming that the null is true. If the evidence obtained are highly unlikely (e.g., p-value < 5%), then there should be something wrong with our assumption that the null hypothesis is true. This may lead us to reject the null hypothesis.

Statistics computer packages often report the p-value associated with a hypothesis test. When you come across them, you should always keep in mind that: when the p-value is low, the null must go.

  • Go to MyStatLab and watch the following videos:
    • What does a p-value mean? (3:36)
    • What is statistical significance? (5:05)
    • Basketball players won’t accept the null hypothesis. (2:39) Also, The chip and fish guy won’t accept the null hypothesis. (4:15)
  • You may use Excel to conduct one-sample hypothesis testing.
    • If you know the population standard deviation (which is unlikely), then go to Slides 104 – 107 to learn how to make use of Z-stat, critical values, and p-value.
    • If you do not know the population standard deviation (which is more likely), then go to Slides 123 – 126 to learn how to make use of t-stat, critical values, and p-values.
    • Also, download this Excel file for a two-tails one-sample test.Think of it as a playground. To download this file on your computer, go on File, then select Download as, and select Microsoft Excel (.xlsx).
    • There are three useful Excel functions to identify the p-value associated with a null hypothesis. To illustrate them, we are going to use a random sample of the grand total of the home game attendance for a given MLB team during a given season between 1990 and 2010. We test for three sets of hypotheses using Excel. The hypotheses are listed below, the p-value functions are given, and detailed computations are conducted in this file. To download this file on your computer, go on File, then select Download as, and select Microsoft Excel (.xlsx).
      • H0: The Average Attendance Per Season = 2 million fans v.s. H1: The Average Attendance Per Season ~= 2 million fans (Note: ~= is used for not equal to)
        • The p-value function: =T.DIST.2T(t-stat,df). Note: for this case, the absolute value of t-stat should be reported; e.g.: for t-stat of -1.18, you must enter 1.18.
      • H0: The Average Attendance Per Season =< 2 million fans v.s. H1: The Average Attendance Per Season > 2 million fans (Note: <= is used for less or equal to)
        • The p-value function: =T.DIST.RT(t-stat,df). Reminder: The rejection area of this test is on the right tail, which is why we compute the p-value on the right tail (RT).
      • H0: The Average Attendance Per Season >= 2 million fans v.s. H1: The Average Attendance Per Season < 2 million fans (Note: >= is used for greater or equal to)
        • The p-value function: =T.DIST(t-stat,df,true). Reminder: The rejection area of this test is on the left tail, which is why we compute the p-value on the left tail.
    • The resulting p-values, computed in the first three tabs of this file, suggests that the average attendance per season is greater than 2 million fans. To estimate the magnitude, we may employ a Confidence Interval Estimation (CIE), which suggests that we could be 95% confident that between 1990 and 2010 on average more than 2.16 million fans but less than 2.59 million fans attended the MLB games each season.
  • Hypothesis Testing Summary (pages 1 and 2 are relevant for this chapter).
  • NOTE: I cover the Hypothesis Test of Proportion in my lectures only if time allows. Nevertheless, I left the slides at the end of lecture presentations for Chapter 9.

Chapter 10. Hypothesis Testing: Two-sample Tests

  • Lecture Presentation (PDF)
  • Using Two-samples Tests, one may test:
    • How the MEANS of two INDEPENDENT populations relate to each other; e.g.: H0: Average Productivity of Exporting Firms = Average Productivity of Domestic Firms
    • How the MEANS of two RELATED populations relate to each other; e.g.: H0: Average Productivity of Exporters Who Randomly Receive Subsidies = Average Productivity of Similar Exporting Firms With No Subsidy
    • How the VARIANCE of two INDEPENDENT populations relate to each other; e.g.: H0: Variance of Sales Among Exporting Firms = Variance of Sales Among Domestic Firms
  • Note: It is conventional to test the difference between the means in the null hypothesis; e.g., Ho: Average Productivity of Exporting Firms – Average Productivity of Domestic Firms = 0 (For more, see Slide 10). It is also conventional to test the ratio of the variances in the null hypothesis: H0: Variance of Sales Among Exporting Firms divided by the Variance of Sales Among Domestic Firms = 1 (For more, see Slide 51).
  • Like One-sample Tests, we intend to reject the null hypothesis (H0) using the appropriate test statistics by:
    • Comparing the value of test statistics to the critical value(s), as given by the sampling distribution
    • Comparing the p-value associated with the test statistics (i.e., the probability of the obtained test statistics or anything more extreme) to conventional values such as 1%, 5%, or even 10%.
  • The decision upon rejecting the null hypothesis (H0) in a Two-samples Test is quite similar to the decision upon rejection in a One-sample Test.
  • Testing the means of two independent populations is done under two sets of assumptions:
    • Assumption 1: The unknown variance of the independent populations are equal
      • Under this assumption one may employ a pooled-variance t test. (For the formula for this particular variance, refer to Slide 13)
      • The t-stat in this case is the difference between the difference in sample means and the difference in hypothesized population means, divided by the square root of the pooled-variance. (For the formula for this particular t-stat, refer to Slide 14)
      • You may easily derive the confidence interval for the difference in population means using the pooled-variance. Refer to Slide 16 for more details.
    • Assumption 2: The unknown variance of the independent populations are not equal
      • Under this assumption one may employ a separate-variance t test, which is the sum of sample variances that are each adjusted by the sample sizes.
      • The t-stat in this case is the difference between the difference in sample means and the difference in hypothesized population means, divided by the square root of the separate-variance. (For the formula for this particular t-stat, refer to Slide 26)
      • The above t-stat has a particular degree of freedom, as given in Slide 27.
  • Testing the means of two related populations is based on the difference between the paired values (Slides 35 and 36), which is why this is called a Paired Difference Test.
    • The following offers a step-by-step instruction.
      • Step 1.) Using the two samples, compute the difference between the paired values for each observation. The paired difference becomes your new sample statistics.
      • Step 2.) Compute the sample mean for the difference between the paired values.
      • Step 3.) Compute the sample standard deviation for the difference between the paired values.
      • Step 4.) Form the t-test as given by Slide 40 (or alternatively form the confidence interval as given by Slide 42).
      • Step 5.) Compare the t-stat to the critical values, given the significance of your test. Alternatively, compare the associated p-value to the conventional probabilities of type I error.
  • Testing the variance of two independent populations is done using:
    • F-stat
      • The F-stat is simply the ratio of sample variances, putting the larger sample variance in the numerator and the smaller one in the denominator
      • The F-stat has two degree of freedom. The first one is the sample size minus one, for the sample with the larger variance. The second one is the sample size minus one, for the sample with smaller variance.
    • F-distribution
      • Assuming that the populations of interest are normally distributed, the sampling distribution of the ratio of variances follows an F-distribution
      • Given the probability of type I error, one may identify the critical value for Rejection and No-rejection areas.
        • See Slide 58 for an illustration of the above areas.
        • Use Excel if you would like to identify the critical value (Slide 55)
  • Hypothesis Testing Summary (pages 3 and 4 are relevant for this chapter).

Chapter 11. Analysis of Variance

  • Lecture Presentation (PDF)
  • In one sample tests, we focus only on one population parameter. In two sample tests, we focus on two population parameters. In Analysis of Variance (ANOVA), we focus on three or more population parameters.
  • Given the scope of this lecture series, we only examine One-way ANOVA, which relates to completely randomized designs that incorporate only one factor into the analysis.
    • Example: Ceteris paribus (i.e., all else unchanged), how much of a factor is the golf club brand in determining the distance traveled?
  • Running an experiment, one observes the total variation, which could be measured by Total Sum of Squares (SST). This measure is defined as the sum of the squared differences between each observation and the grand mean (i.e., the mean of all data values). Refer to slide 23 for an illustration and to Slide 27 for the formula. The total variation could, then, be partitioned into two sets of variations:
    • The variations that are due to differences among groups: Sum of Squares Among Groups (SSA)
      • The SSA variations are generated by the factor of analysis. (Illustration: Slide 24; Formula: Slide 29)
      • Example: differences in distance traveled, caused only by the choice of golf club brand.
    • The variations that are due to differences within groups: Sum of Squares Within Groups (SSW)
      • The SSW variations are generated by some random things that could potentially affect the outcome but we have no control over. (Illustration: Slide 25; Formula: Slide 32)
      • Example: difference in distance traveled, caused by a sudden change in wind’s direction.
  • The above measures of variation could then be divided by their degree of freedom to obtain something like variance.
    • Degrees of Freedom and Mean Squares:
      • For SSA, the degree of freedom is the number of groups, determined by the factor of interest, minus one. For instance, if we study three different brands of golf club, then the degree of freedom for SSA is two.
        • The Mean Squares for SSA is, therefore, equal to: SSA divided by the above degree of freedom (Slide 34). We call this MSA, which measures the average of variations caused by the factor of interest.
      • For SSW, the degree of freedom is the number of observation minus the number of groups. (The reasoning behind this degree of freedom is described fully in Slide 37)
        • The Mean Square for SSW is, therefore, equal to: SSW divided by the above degree of freedom (Slide 34). We call this MSW, which measures the average of variations caused by random things that we have no control over.
  • To conduct One-way ANOVA, we perform a F test, where the F-statistics is the ratio of MSA over MSW and the degrees of freedom are as described above. Refer to Slide 41 for the formal set-up.

ANOVAThough One-way ANOVA examines variations by employing the mean of squares among groups (MSA) and the mean of squares within groups (MSW), the purpose of One-way ANOVA is to reach conclusions about possible differences among the means of each groups. In a sense, we a ratio of two measures of sample variance to say something about population means.