# Lecture Presentations

## Course Description

Welcome to ECON 2843: Elements of Statistics. This course is offered by Saleh S. Tabrizy (Assistant Professor of Economics at the University of Oklahoma). This is an introductory Statistics course, which surveys basic statistical techniques with particular emphasis on business and economic applications. The learning objective of this course is to improve students’ analytical skills in understanding and employing descriptive and inferential statistics under classical tradition. We begin this course by learning how to describe the data in use. We, then, survey selected topics in probability theory that enable us to understand the essence of statistical inference. And for the rest of the course, we explore multiple inference tools such as confidence interval estimation, hypothesis testing, and the analysis of variance. These tools help us make use of sample information to reach conclusions about unknown population parameters.

In this lecture series, I rely heavily on the last edition of Basic Business Statistics by by Berenson, Levine, Szabat, and Stephan. I also make use of MyStatLab, developed and maintained by Pearson. As for software, we use MS Excel in class. Students at the University of Oklahoma have free access to this software. They are asked to install and maintain an updated version of the software on their laptop devices. They must also load the Data Analysis Toolpak, which will be used frequently in class.

For more information about course requirements, please review this short slide set. If you have any questions, you may contact me via: tabrizy@ou.edu.

## Part I – Descriptive Statistics

### Introduction

• Data is everywhere. Let us look into two examples:
• Begin with this radio report on an effort to identify tourism strength and gaps in Oklahoma City.
• Using a survey that is already conduced in about 133 destinations in 11 countries, the OKC Convention and Visitors Bureau tried to identify what works well in the \$2.2 billion tourism industry in Oklahoma City. They also tried to understand what is missing. Ultimately, the result of this survey may help this bureau to learn how Oklahoma City may attract more visitors. This is a recent effort, which is yet ongoing.
• Let us then look into a dataset that we put together in class of Fall 2017 using an in-class survey. This survey shows the news outlets that a randomly selected group of your peer rely on. It also shows the outlets that they anticipate they will rely on seven years from now. There are also few other questions that are answered by your peer.
• A quick look at the survey results of the survey for:

STATISTICS helps us transform data into useful information for decision making.

DES CRIPTIVE STATISTICS provides some information about the variations (e.g., the mean and standard deviation of the annual earnings of the OU alumni), and INFERENTIAL STATISTICS provides some information about the population using sample observations (e.g., testing whether the choice of major has any impacts on the life-time earning of the OU alumni, using only a sample of alumni).

• Statistics is a branch of mathematics. Our focus in this course is on applied statistics, however. Even when we survey the underlying mathematical models, the emphasis is on the practical implications. Advanced courses explore the theoretical issues in more details. The DCOVA framework:

Those who work with data are typically involved in either of these activities: Defining data, Collecting the defined data, Organizing the collected data, Visualizing the organized data, or Analyzing them using the tools that are developed in Inferential Statistics.

• This course is organized around the DCOVA framework, with particular emphasis on the Analyses part. We begin by a survey of methods that are used for collecting the defined data (Ch. 1). We, then, examine some organization and visualization tools (Ch. 2). And for the remainder of the course, we explore descriptive and inferential tools that are widely used in data analyses.

### Chapter 1. Defining and Collecting Data

• Lecture Presentation (PDF)
• Highlights:
• The difference between CATEGORICAL (qualitative) and NUMERICAL (quantitative) variables
• The difference between DISCRETE and CONTINUOUS numerical variables
• Note: Given their decimal precision, measures of income and expenditure are often considered Continuous though they appear to be the result of counting
• Variables include: ID, Name, Gender, Weight, Number of Soda Last Week, Regular vs Diet, Coke vs Pepsi, Other Brands, 5 Cents Price Increase, and 95 Cents Price Increase.
• The above variables help us understand:
• Some general characteristics about the observations; e.g., their gender, their weight, etc.
• Some general information about their consumption of soda drinks; e.g., number of cans of soda that they drank last week and regular vs. diet.
• Some general information about their preferences for brands of soda drinks; e.g, Coke vs. Pepsi, other brands, etc.
• Some general information about the sensitivity of their demand for soda drinks (a.k.a. price elasticity of their demand for soda drinks); e.g., 5 cents price increase and 95 cents price increase.
• Among the above variables, some are categorical and some are numerical.
• Categorical variables:
• Nominal: Name, Gender, Regular vs. Diet, Coke vs Pepsi, Other Brands
• Ordinal: 5 Cents Price Increase and 95 Cents Price Increase
• Numerical variables:
• Discrete: Number of Soda Last Week
• Continuous: Weight
• The difference between POPULATION and SAMPLE.
• Note: We are only interested in the populations (I cannot put enough emphasis on this!). We employ samples along with Inferential Statistics techniques to understand the populations better.
• The difference between Non-probability and Probability Sampling
• Probability Sampling:
• SIMPLE RANDOM SAMPLING and SYSTEMATIC SAMPLING: In these two methods, we neglect the characteristics of the items in population when we draw a random sample. Items are nothing to us but bunch of IDs.
• STRATIFICATION and CLUSTERING: In these two methods, we consider the characteristics of the items in the population. Taking the Gender Composition of the population of voters into account, for instance, a random sample can be chosen to represent the voters in the US. This is called Stratification. Taking the share of each State in the population of voters into account, another random sample can be chosen to represent the voters in the US. This is called Clustering.
• Reading: Selecting a Stratified Sample (PDF)
• A class activity and an Excel exercise on sampling and recoding:

### Chapter 2. Organizing and Visualizing Data

• Lecture Presentation (PDF)
• Highlights:
• The FREQUENCY DISTRIBUTION and HISTOGRAM are the most important tools for organizing and visualizing numerical data. Make sure that you know how to construct the frequency, relative frequency, cumulative, and relative cumulative distribution tables using Excel. For this, you may use the Frequency function to put together a frequency distribution. Using the frequency distribution, you can then put together the relative frequency, cumulative, and relative cumulative distribution tables
• Start with this dataset and the video below to construct a frequency distribution using the Frequency function in Excel. To download the Excel file on your computer, go on File, then select Download as, and select Microsoft Excel (.xlsx).
• Keep in mind that unlike, say, the Randbetween function, Frequency is an array function. To execute this command, therefore, you must select an array, enter the command, and then use control+shift+enter to ask Excel to execute the command.
• How to draw a histogram? There are two ways.
• You may simply use the frequency distribution along with the graphing tools in Excel to draw a histogram.
• Alternatively, you may use the Data Analysis ToolPack in Excel.
• Once the Analysis ToolPak is loaded, go to MyStatLab, choose Econ 2843 course, and go on Multimedia Library. Let “Chapter” to be set as “All Chapters” and “Section” as “All Section.” Then, choose video, and click on “Find Now.” Once all the items are loaded, look for the following video: Excel 2016 with Data Analysis Toolpak – Histogram (3:08). Notice that you may use the Histogram option in the Analysis ToolPak to put together a frequency distribution table, just like what you could do with Frequency function. The Road Ahead:

In near future, you will see that PROBABILITY DISTRIBUTION and the graphical illustration of DENSITY FUNCTION are closely related to FREQUENCY DISTRIBUTION and HISTOGRAM, respectively.

• Highlights (continued):
• The SUMMARY TABLE (tabulation) and CONTINGENCY TABLE (cross tabulation) are also among the important tools that are introduced in this chapter. Make sure that you know how to construct, read, and understand these tables.
• Make use of PivotTable tools in Excel to construct summary and contingency tables using this data set (XLSX). To download this file on your computer, go on File, then select Download as, and select Microsoft Excel (.xlsx).
• Also, make sure that you know how to read and understand bar charts, pie charts, Pareto charts, and side-by-side bar charts.
• There are also two types of graphs that can be used to visualize the variations in two numerical variables: SCATTER PLOT and TIME SERIES. It is important that you can read and interpret both of them.

### Chapter 3. Numerical Descriptive Measures

• Lecture Presentation (PDF)
• Highlights:
• The idea behind measures of central tendency and measures of dispersion for a sample:
• A sample may include multiple observations with a particular characteristics that can take different numerical values; e.g., as seen in this small dataset, a randomly selected group of baseball players hit different number of home runs over a given season. To download the this file on your computer, go on File, then select Download as, and select Microsoft Excel (.xlsx).
• As discussed in Chapter 2, we are able to construct a frequency distribution, which can also be illustrated via a histogram, using numerical variations in a sample; e.g., numerical variation in the number of home runs. This would be the best way for us to understand how data are distributed.
• A frequency distribution, however, often provides too much information. Alternatively, we can make use of only two measures to understand how data are distributed. For instance:
• We can make use of MEAN and STANDARD DEVIATION to measure central tendency and dispersion, respectively. For example, the mean number of home runs for the above sample is 4.10 home runs (central tendency), and the standard deviation is 6.99 home runs (dispersion).
• Alternatively, we can make use of MEDIAN and INTERQUARTILE RANGE for central tendency and dispersion, respectively. For example, the median number of home runs for the above sample is 0 home runs (central tendency), and the interquartile range is 7.5 home runs (central tendency).
• Mean and Median are both measures of central tendency. Mean is very useful in making decisions. It is also a very useful measure in inferential statistics. However, it is sensitive to outliers (i.e., the observations that take extreme numerical values).  Median is very useful in describing the data, and it is not sensitive to outliers. This is, in fact, evident in the small sample above. Though the majority of players hit no home runs, a few hit many. Those few players are the outliers. They push the mean towards themselves, which is why the mean is quite high (4.1 home runs) despite the fact that the majority do not even hit one home run. However, the median remains at the center (0 home runs) despite the fact that there are some outliers. The mean is useful in decision making, while the median is useful in description.
• Standard deviation (SD) measures the dispersion around mean, and inter-quartile range (IQR) measures the dispersion around median. They are both very useful in describing the data: the greater the SD or IQR, the greater the dispersion. Mean, median, standard deviation, and inter-quartile range have the same unit as the variable in use. For instance, if the variable in use is measured in volts, its mean, median, standard deviation, and inter-quartile range are also measured in volts. Variance, which is the base for computing standard deviation and is also used as another measure for dispersion, does not have the same unit as the variable in use. For example, if the variable in use is measured in volts, its variance is not measured in volts.

• A class activity on Summary Statistics (PDF)
• Computing numerical descriptive measures using Excel:
• How to compute the sample and population summary statistics?
• Sample Summary Statistics: We are going to make use of this dataset for the number of cars owned by a sample of households (with only seven observations; n=7) in a small neighborhood in OKC. To download the this file on your computer, go on File, then select Download as, and select Microsoft Excel (.xlsx).
• Central tendency: Excel function for sample mean: =AVERAGE; Excel function for sample median: =MEDIAN
• Dispersion: Excel function for sample variance: =VAR.S; Excel function for sample standard deviation: STDEV.S — pay attention to the usage of “.S” in these two functions.
• Population Summary Statistics: We are going to make use of this dataset for the number of cars owned by the population of households (with seventy five observations; N=75) in a small neighborhood in OKC. To download the this file on your computer, go on File, then select Download as, and select Microsoft Excel (.xlsx).
• Central tendency: Excel function for sample mean: =AVERAGE; Excel function for sample median: =MEDIAN — the functions for the population central tendency are the same as for the sample.
• Dispersion: Excel function for sample variance: =VAR.P; Excel function for sample standard deviation: =STDEV.P — unlike the functions for the sample, we are now using “.P” in these two function.
• How to use the Analysis ToolPack to compute the summary statistics for a sample?
• Make sure that the Analysis ToolPak is loaded on your device. For instruction, see the highlights for Ch. 2. Then, go to MyStatLab, choose Econ 2843 course, and go on Multimedia Library. Let “Chapter” to be set as “All Chapters” and “Section” as “All Section.” Then, choose video, and click on “Find Now.” Once all the items are loaded, look for the following video: Excel 2016 with Data Analysis Toolpak – Descriptive Statistics and Confidence Intervals for a Mean (3:02).
• Note 1: At this point, we are mainly interested in descriptive statistics, not confidence interval.
• Note 2: You will find the mean, median, mode (if there is any), sample standard deviation, sample variance, and range listed in the generated table. Keep in mind that there is a big difference between “Standard Error” and “Standard Deviation”. These two are two different (but related!) animals. At this point, we are only interested in standard deviation. We will discuss standard error later.
• Note 3: You will also find measures of kurtosis and skewness in the generated table.
• How to compute the Inter-Quartile Range using Excel?
• Use the Quartile function to identify the first and the third quartiles. To compute the first quartile, use: =QUARTILE(array,1). To compute the third quartile, use: =QUARTILE(array,3). In these cases, array is basically your data array.
• Then, subtract the value of the first quartile from the third quartile in order to obtain the Inter-Quartile Range.
• The above is illustrated using a sample of calories per one cup of breakfast cereal. To download the this file on your computer, go on File, then select Download as, and select Microsoft Excel (.xlsx).
• How to compute the Geometric Mean Rate of Return using Excel?
• Enter the rates of return into Excel. Do not forget to include the signs, however; e.g., 50% loss should be entered as -0.5, and 100% recovery should be entered as 1.
• Then, add 1 to the rates that you entered; e.g., for the above loss it would be +0.5, and for the above recovery it would be 2.
• Then, use the geometric mean function and deduct 1 (=GEOMEAN (0.5,2)-1) in order to obtain the geometric mean rate of return.
• How to compute sample covariance and coefficient of correlation using Excel?
• Make use of this data set. Looking at 22 manufacturing industries in Korea during 2014, this data set shows the variations in industrial R&D expenditures and industrial exporting activities (both measured in Korean won). To download the this file on your computer, go on File, then select Download as, and select Microsoft Excel (.xlsx).
• To compute the covariance between R&D expenditures and exports, you may make use of the Sample Covariance function in Excel: =COVARIANCE.S.
• To compute the coefficient of correlation between R&D expenditures and exports, you may make use of the Correlation function in Excel: =CORREL. CORRELATION DOES NOT NECESSARILY IMPLY CAUSATION! Let’s reflect on the above example again. The coefficient of correlation between industrial R&D expenditures and industrial exports in Korea is quite high (about 0.85), implying that those industries that are R&D-intensive are also export-intensive and that those industries who are relatively less involved in R&D activities are also less involved in exporting activities. However, this does not mean that R&D activities of those industries have led them to engage more in exporting activities. Also, it does not mean that exporting activities of those industries have led them to engage more in R&D activities. To figure out the direction of causation, we need to run an experiment… something like a randomized controlled trial (RCT). We can also rely on some natural experiments, which may occur in real life. Or, alternatively, we may make use of some advanced applied statistics techniques to examine the direction of causality. But this should be done carefully.

## Part II – Probability

### Chapters 5 and 6. Probability Distribution

• Lecture Presentation (PDF)
• Highlights:
• PROBABILITY DISTRIBUTION is a listing of random events and their associated probabilities.
• Often, we make use of a function to describe how the events and their probabilities relate to each other.
• Like other distributions, probability distributions also have a measure of central tendency, known as Expected Value, and some measures for dispersion, such as Standard Deviation.
• The EXPECTED VALUE is simply a weighted mean. For the formula refer to the lecture presentations (slide 11).
• The STANDARD DEVIATION is the square root of the mean scatter from the Expected Value. For the formula refer to the lecture presentations (slide 13).
• The linear relationship between two random variables can also be measured using COVARIANCE. For the formula refer to the lecture presentation (slide 16).
• NOTE: A RANDOM VARIABLE is a variable that can take on either a finite or infinite number of random values.
• For example, the number of wins for a given baseball team during regular season is a DISCRETE random variable (with finite number of random values, which could be counted).
• The time required to download a music file is also another random variable. This one, however, is a CONTINUOUS random variable (with infinite number of random values, which could be measured).
• The number of WiFi outages per day being a random variable, this Excel file illustrates how you can compute the Expected Value and Standard Deviation of a probability distribution. It also shows how you can measure the Covariance between two random variables. To download this file on your computer, go on File, then select Download as, and select Microsoft Excel (.xlsx).
• Also, refer to Slide 22 (and 24) to see how you could compute the Expected Value and the Standard Deviation of the (weighted) sum of two random variables. Make sure that you review the application for portfolio performance measurement. The probability distributions of interest in this course:

In this lecture series, we examine two discrete probability distribution and one important continuous probability distribution.

1. Discrete Probability Distributions:
• BINOMIAL Probability Distribution, which measure the probability of a given number of successes in a given number of trials when the probability of success is known and remains constant for all the trials.
• Example: We can use this function to compute the probability of getting 6 (i.e., the successful outcome) 10 times (i.e., he number of successes) when we throw a fair dice 50 times (i.e. the number of trials).
• POISSON Probability Distribution, which measures the probability of the number of times that an event happens in an area of opportunity, given the historical average number of events.
• Example: We can use this function to compute the probability that 15 fish are caught by a group of students (i.e., the number of times that an event happens) in a day of camping (i.e., the area of opportunity), knowing that on average students catch 10 fish per day of camping (i.e., the historical average).
2. Continuous Probability Distribution:
• NORMAL Probability Density Function, which can be used to compute the probability for an interval over a continuum when the continuous random variable of interest is distributed symmetrically, like a bell.
• Example: We can use this function to compute the probability that a flight from MKE to OKC takes more than 3 hours and less than 3.5 hours.

#### Binomial and Poisson Probability Distributions

• The intuition behind Binomial Probability Distribution:
• Think of an event with two outcomes: Success and Failure. For example, let’s say that you may succeed with 50% probability and that you may fail with 50% probability. Let’s try this event for, say, ten times. Let’s assume that the probability of success and failure in each trial remains the same, i.e.: it does not change even after multiple trials – no learning is involved. Under these assumptions, you may compute the probability of a given number of successes within those ten trials using Binomial Probability Distribution. For instance, you may compute the probability that you are successful exactly 2 times during 10 trials, or that you are successful more than 2 times during 10 trials, or that you are successful less than 2 times during 10 trials.
• Why is this a “probability distribution”? Well, because it provides you with a list of events (e.g., more than 2 successes in 10 trials) and their associated probabilities.
• How could we compute those probabilities? In this course, we make use of its Excel function. For a detailed description, refer to slide 44.
• Tip: When computing “the probability of exactly 2 successes,” put FALSE for the last argument in the Excel function (i.e., make use of the Mass Function). To compute “the probability of 2 or less successes,” put TRUE as the last argument (i.e., make use of the Cumulative Function).
• The intuition behind Poisson Probability Distribution:
• Think of an event that may occur repeatedly, like car accidents. Imagine you have some historical information about that event. For instance, you know about the average number of car accidents per day on I-35 between Dallas and OKC. You may make use of Poisson Probability Distribution to compute the probability that such event happens for a particular number of times. Given the average number of car accidents per day on I-35 between Dallas and OKC, for example, you may use Poisson Probability Distribution to compute the probability that exactly 10 accidents, or more than 10 accidents, or less than 10 accidents happen in a given day on this segment of interstate highway .
• Why is this a “probability distribution”? Well, because it provides you with a list of events (e.g., more than 10 accidents per day) and their associated probabilities.
• How could we compute those probabilities? In this course, we make use of its Excel function. For more, refer to slide 61.
• Tip: When computing “the probability of exactly 10 accidents per day,” put FALSE for the last argument in the Excel function (i.e., make use of the Mass Function). To compute “the probability of 10 or less accidents per day,” put TRUE as the last argument (i.e., make use of the Cumulative Function).

A note on Mass versus Cumulative probability distribution functions:

Probability Distribution Functions “describe” how events and their associated probabilities relate to each other. This can be done in two different ways. Mass functions provide us with the probability for a precise outcome; e.g., the probability that the annual income of a randomly selected household is exactly equal to \$100,000. Cumulative functions provide us with the probability for the outcome being less or equal to a precise value; e.g., the probability that the annual income of a randomly selected household is equal or less than \$100,000.

#### Excel Examples for Binomial and Poisson Probability Distributions

Excel Examples for Binomial Prob. Distribution:

A laboratory is planning to test the quality of 500 newly developed transmitters. From experience, they know that 85% of transmitters pass the quality control test.

• What is the probability that out of 500 newly developed transmitters exactly 430 transmitters pass the quality control test?
• Prob(X=430|500, 0.85), a mass probability (as opposed to cumulative probability), could be computed in Excel using: =BINOM.DIST(430,500,0.85,FALSE), which yields 0.0420 (4.2%). Note that the last argument in the above command (i.e., FALSE) implies that we employ the mass probability function (again, as opposed to the cumulative probability function).
• What is, then, the probability that out of 500 newly developed transmitters 430 or less than 430 transmitters pass the quality control test?
• Prob(X<=275|500, 0.85), a cumulative probability (as opposed to mass probability), could be computed in Excel using: =BINOM.DIST(430,500,0.85,TRUE), which yields 0.7521 (75.2%). Note that the last argument in the above command (i.e., TRUE) implies that we employ the cumulative probability function (again, as opposed to the mass probability function).
• What is The probability that out of 500 newly developed transmitters less than 430 transmitters pass the quality control test?
• Prob(X<430|500, 0.85) is equal to Prob(X<=430|500, 0.85) minus Prob(X=430|500, 0.85). And we know them both. Thus: Prob(X<430|500, 0.85)=0.7521-0.0420=0.7101 (about 71%)
• What is the probability that out of 500 newly developed transmitters 430 or more than 430 transmitters pass the quality control test?
• Prob(X>=430|500, 0.85) is equal to the sum of Prob(X=430|500, 0.85) and Prob(X>430|500, 0.85). We have already computed Prob(X=430|500, 0.85). We only need to compute Prob(X>430|500, 0.85). It is quite easy. This latter probability could be written as: 1-Prob(X<=430|500, 0.85). And, fortunately, we have already computed Prob(X<=430|500, 0.85). In short:
Prob(X>=430|.)=Prob(X=430|.)+Prob(X>430|.)
Also,
Prob(X>430|.)=1-Prob(X<=430|.)
Thus:

Excel Examples for Poisson Prob. Distribution:

Consider a rivalry between two European soccer teams. Historical data suggests that on average 1.05 goals are scored per game.

• What is the probability that in the next game exactly 2 goals are scored?
• Prob(X=2|1.05), a mass probability (as opposed to cumulative probability), could be computed in Excel using: =POISSON.DIST(2,1.05,FALSE), which yields 0.1929 (19.3%). Note that the last argument in the above command (i.e., FALSE) implies that we employ the mass probability function (again, as opposed to the cumulative probability function).
• What is, then, the probability that in the next game 2 goals or less than 2 goals are scored?
• Prob(X<=2|1.05), a cumulative probability (as opposed to mass probability), could be computed in Excel using: =POISSON.DIST(2,1.05,TRUE), which yields 0.9103 (about 91%). Note that the last argument in the above command (i.e., TRUE) implies that we employ the cumulative probability function (again, as opposed to the mass probability function).
• Given the last two examples for binomial distribution, it would be easy for you to compute the following probabilities (The idea is the same; the commands are different):
• Prob(X<2|1.05)
• Prob(X>=2|1.05)

#### Normal Probability Distribution

• The intuition:
• Think of a game with two outcomes. Let’s say that with the probability of 50% you will win the game and with probability of 50% you will lose. Let’s also say that we would like to play this game repeatedly, over and over again. When we keep playing this game for 1,000 times, for instance, we can ask yourself: what is the probability that we win the game, say, 250 times or less? Keep in mind that winning this game for 250 times or less is the result of a large sum of random events. The Normal Probability Distribution is what you get when you add up a large number of random events together.
• To generate a Normal Probability Distribution and a Non-normal Probability Distribution, we may conduct an experiment.
• Let us first generate three random events: X1 is a random number between 10 and 20, X2 is a random number generated by a binomial distribution with 100 trials and 50% probability of success, and X3 is a random number generated by a Poisson distribution with historical average of 10.
• Using the above numbers, we generate two random variables: Y is the sum of X1, X2, and X3; Z is the product of X1-squared, X2, and X3.
• We may repeat this process for 10,000 times, generating 10,000 Ys and Zs.
• Given the intuition behind Normal Probability Distribution, we expect Ys to be normally distributed and Zs to be non-normal. This difference is well reflected in the histograms below.
• What is the bell-shaped curve that we always see for Normal Probability Distribution? That is called NORMAL DENSITY, which is basically a smoothed histogram for the associated probabilities. Imagine that you draw the histogram for the probability of winning the game X number of times, where X takes different values. For this graph the horizontal axis is the number of wins and the vertical axis is the associated probabilities. You can connect the top of the histograms to each other in a smooth fashion. The resulting curve is called the Normal Density.
• What is the Normal Density used for? It is used to compute the probability that you win the game X number of times or less. The area under the Normal Density and to the left of X is equal to the probability that you win the game X number of times of less. Thus, the Normal Density is used to identify Cumulative Normal Probabilities. And keep in mind that the area under the Normal Density for the entire range of X is always equal to 1.
• As illustrated in slide 78, the Normal Probability Distribution is known by:
1. its symmetric bell-shaped density
2. its mean
3. its standard deviation The Road Ahead:

Because of a theorem called Central Limit Theorem we will make use of Normal Probability Distribution frequently.

• STANDARDIZED Normal Probability Distribution:
• In practice, there are a lot of different random variables that are “normally distributed.” These variables all have a symmetric bell-shaped density. However, depending on their units and scale, they are going to have different means and standard deviations.
• Think of these two variables: height and weight. Let’s assume that these variables are normally distributed, which is likely to be the case. These variables are measured using different units (e.g., feet vs. lbs), and they also have different scales (e.g., different ranges).
•  To get rid of the differences in means and standard deviations, we may transform the data by using the Z-score for each observation rather than using the observation itself. This is known as “standardization.” For the observations that are normally distributed, the Z-scores are also normally distributed. Unlike original distribution, however, the mean and standard deviation of the standardized distribution are always equal to zero and one, respectively.
• Reminder: What is the Z-score? The Z-scores are computed by taking the difference between the value of each observation and the mean, divided (a.k.a. adjusted) by the standard deviation. They measure the deviation of each observation from the mean in terms of standard deviations; e.g., an observation with a Z-score equal to +2 is two standard deviations greater than the mean, while an observation with a Z-score equal to -2 is two standard deviation less than the mean.
• As illustrated in slide 83, the Normal Probability Distribution is known by:
1. its symmetric bell-shaped density
2. its mean, which is always equal to zero
3. its standard deviation, which is always equal to one The most important application of Normal Probability Distribution in this course is finding the normal probabilities. It is very important to keep in mind that we can make use of Normal Density to compute the cumulative normal probabilities.

Let X be a normally distributed random variable (e.g., weight). Let a and b be some constants (e.g., a=125 lbs and b=175 lbs). We can make use of Normal Density to compute the probability that X is less or equal to a; that X is greater or equal to a and it is at the same time less or equal to b (e.g., Slide 88); and that X is greater or equal to b.  Identifying these probabilities can either be done using Cumulative Standardized Normal Probability Distribution (which requires standardization) or using the functions that are built in Excel (which does not require standardization).

• Click here to obtain the Cumulative Standardized Normal Probability Distribution Table. (The table is taken out of Statistical Tables for Economists, prepared by the Department of Economics at the University of Warwick.) To access the table that we used in class, you may click here. This table is a bit harder to read, but it is more detailed.
• Also, use this Excel file as a playground. It helps you identify normal probabilities faster. To download this file on your computer, go on File, then select Download as, and select Microsoft Excel (.xlsx).
• Click here for a class activity. In this activity, you first learn how to think about normal probabilities. You also learn how to use the Cumulative Standardized Normal Probability Distribution Table to identify the probability of interest. Rather than the full table, I put only a small block of the Cumulative Standardized Normal Probability Distribution Table on this exercise. This is what we usually do for the exams too.

### Chapter 7. Sampling Distribution

• Lecture Presentation (PDF)
• Highlights:
• The Sampling Distribution of Means is among the key concepts in Statistical Inference. An intuitive understanding of this concept will help you a lot in understanding how inference is conducted.
• Imagine a “population” in which the items are not all the same, like Oddland (slide 8). You are asked to choose a randomly selected sample and compute a mean for that very sample (e.g., average age). Let’s call the computed mean: X-bar-one. You are, then, asked to repeat the same exercise one more time: choose another random sample, compute a mean, and call the computed mean X-bar-two. In fact, you are asked to repeat this for, say, hundred times, obtaining X-bar-one, X-bar-two, X-bar-three, …, X-bar-hundred. Depending upon the items that are included in the randomly selected samples, the computed means may be different from one another. Some are, in fact, equal. But you will obtain other values for sample means. Slides 10 to 13 illustrate this quite well.
• Since sample means vary, given the random sample drawn, the sample means above become a random variable. A variable with hundred values that may or may not be equal to one another.
• THE SAMPLING DISTRIBUTION OF MEANS is the distribution of sample means that are obtained from repeated sampling. Like other distributions, one may identify:
• The MEAN of the sampling distribution of means
• The STANDARD DEVIATION of the sampling distribution of means (a.k.a. Standard Error)
• What type of distribution will the sampling distribution of means, then, follow? This is an important question that is addressed in this course under two sets of assumptions.
• When the population, from which random samples are drawn, has a normal distribution, the sampling distribution of means is also normal. The mean of the sampling distribution of means in this case is equal to the population mean. The standard deviation of the sampling distribution of means in this case is equal to the population standard deviation, adjusted by the square root of sample size. For formula, please refer to slide 21.
• When the population, from which random samples are drawn, is not normal, the sampling distribution of means is approximately normal provided that the sample size is large enough. The mean of the sampling distribution of means in this case is again equal to the population mean. The standard deviation of the sampling distribution of means in this case is again equal to the population standard deviation, adjusted by the square root of sample size. For formula, please refer to slide 37. The CENTRAL LIMIT THEOREM implies that, as the sample size gets large enough, the Sampling Distribution of Means is normally distributed. This is true regardless of the shape of the population distribution. But how large is “large enough”? As a general rule, when the sample size is larger than 30, the sampling distribution of means is approximately normal (slide 41).

• To illustrate the implications of Central Limit Theorem, we may conduct a short exercise using real data:
• Let us begin with this histogram, which shows the distribution of 750 midterm grades in Elements of Statistics. Since each question is worth five points in the exams, the width of each class in this histogram is set to be equal to five points.
• From the population of midterm grades (N=750):
• we choose a random sample of 50 observations with replacement,
• we compute the mean grade for the chosen sample,
• we record the obtained mean in a new data set
• and we repeat the three steps above 500,000 times, which in turn yields 500,000 mean grades that are each coming from a sample of 50 observations
• Given these recorded sample means, we can now plot the histogram for the sampling distribution of means with 500,000 observations, where each observation is the mean grade coming from a randomly selected sample of 50 grades. In this histogram, the width of each class is set to be equal to half a point.
• Despite the fact that the Population Grade Distribution does not look like a normal distribution, the Sampling Distribution of Grade Means, for which n=50, looks very much like a normal distribution. Plus, we observe that:
• the mean of grade means (=76.97) is an unbiased estimator for the population mean grade (=76.97)
• the standard deviation of grade means (=2.40) is also much smaller than the population grade standard deviation (=16.99). In fact, the ratio of population grade standard deviation divided by the square root of sample size (=50) is equal to the standard deviation of grade means. To confirm this, just type in =16.99/SQRT(50) into a random cell in Excel.
• In case you are interested, the codes for the above exercise is written in Stata. If you do not have Stata installed on your computer, you may use the computer lab at the Department of Economics. Download the codes (.DO) and grades (.DTA). For confidentiality reasons, no name or identification number is included in the grades data set. Thus, you should not be worried about using this data set.
• NOTE: I cover the Sampling Distribution of Proportions in my lectures only if time allows. Nevertheless, I left the slides at the end of lecture presentations for Chapter 7.

## Part III – Statistical Inference Let’s play a game, called Deadly Distribution. You will soon realize why this game has been incorporated in the lecture series. To access Deadly Distribution, go to Canvas, click on Assignments, and you will find the game under the Extra Credits. You may obtain 10 points, which may improve your midterm grades, if you accomplish all the missions. (This game is designed and developed by the OU K20 Center.)

### Chapter 8. Confidence Interval Estimation

• Lecture Presentation (PDF)
• Highlights:
• Confidence Interval Estimation is a direct application of what you learned about Normal Density (Ch. 6) and Sampling Distribution of Means (Ch. 7).
• What is Confidence Interval Estimation about? Often, we do not know much about POPULATION PARAMETERS, like population mean (e.g., the average number of newly hired employees among all firms in the US over the last year). However, we are able to select a random sample from the population (e.g., a random sample of American firms), and compute SAMPLE STATISTICS, like sample mean (e.g., the average number of newly hired employees among the selected sample of firms over the last year). Employing Confidence Interval Estimation techniques, we are able to ESTIMATE the population mean using the information obtained from the sample. Population mean and population standard deviation are often unknown PARAMETERS. Using STATISTICS such as sample mean and sample standard deviation, we are able to estimate the above-mentioned PARAMETERS. Drawing conclusions about the properties of a population using sample information is known as STATISTICAL INFERENCE.

• Confidence Interval Estimation is conducted under two sets of assumptions:
• The not-so-realistic assumption: population standard deviation is known to us.
• In this case, we make use of Normal Probability Distribution.
• The realistic assumption: population standard deviation is unknown to us.
• In this case, we make use of another probability distribution, called: Student’s t Distribution.
• Confidence Interval Estimation provides us with two, so called, limits:
• The Upper Confidence Limit, which is the point estimate (e.g., sample mean) plus the product of critical value, determined by the level of confidence and the above assumption, and standard error of the sampling distribution (e.g., sampling distribution of means)
• The Lower Confidence Limit, which is the point estimate (e.g., sample mean) minus the product of critical value , determined by the level of confidence and the above assumption, and standard error of the sampling distribution (e.g., sampling distribution of means)
• For more on the limits, refer to Slide 52 (in which we assume that the population standard deviation is known to us) and Slide 81 (in which we assume that the population standard deviation is unknown to us).
• For more on the critical values, refer to Slide 55 (in which we assume that the population standard deviation is known to us) and Slide 78 (in which we assume that the population standard deviation is unknown to us).
• To estimate the above limits, under the realistic assumption of unknown population standard deviation, you may make use of this Excel file. Again, think of it as a playground. To download this file on your computer, go on File, then select Download as, and select Microsoft Excel (.xlsx). STUDENT’S t DISTRIBUTION is a probability distribution. It looks almost like a Normal Probability Distribution when the sample size is large (say, more than 120). As sample size gets smaller, the right and left tails of the distribution become a bit fat and the peakedness of the distribution also declines. (Slides 74 offers detailed illustration).

Sample size determines the DEGREE OF FREEDOM of the Student’s t Distribution. The degree of freedom reflects the number of observations that can freely vary, while a sample statistics (e.g., sample mean) is kept constant. To obtain a predetermined sample mean, for instance, we may only change n-1 observations in a sample of n observations. Given the predetermined mean, the nth observation depends on the other observations and cannot freely vary. In this case, therefore, the degree of freedom is equal to: n-1.

• To understand the intuition behind the degree of freedom better, you may refer to the explanation provided by Ding, Jin, and Shuai (2017) in Teaching Statistics journal: • Example: The Average Amount on Time Spent on Social Media
• We surveyed 70 students in class on how much time they spent on social media over a non-exam weekend in April. The social media platforms of interest include Facebook, Youtube, Twitter, Instagram, and LinkedIn. The result of the survey is given in this file, and the histogram looks like this.
• Given the dataset collected, we may compute a 95% confidence interval for the average time spent on social media over a weekend among the population of students in my class. Relying on sample evidence, we could be 95% confident that the average time spent on social media over a weekend is greater than 2 hours, yet it is less than 3 hours. Click here for more details.
• To estimate this interval with greater confidence, you may change the probability used for the critical value. Go ahead and plug in 1-0.99 rather than 1-0.95 to obtain the critical value associated with 99% confidence level. What would happen to the confidence interval estimation? Why? Make sure that you use drawing to justify your answer.
• NOTE: I cover the Confidence Interval Estimation of Proportion in my lectures only if time allows. Nevertheless, I left the slides at the end of lecture presentations for Chapter 8.

### Chapter 9. Hypothesis Testing: One-sample Tests

• Lecture Presentation (PDF)
• In statistical inference, a HYPOTHESIS is always about a POPULATION PARAMETER; e.g.: population mean. Assuming that the hypothesis of interest is true (e.g., H0: population mean weight = 185 pounds), one gathers some evidence from a randomly selected sample of observations, trying to REJECT the null hypothesis (H0) and ACCEPT an alternative hypothesis (e.g., H1: population mean weight > 185 pounds or H1: population mean weight < 185 pounds).
• In Chapter 9, you learn how to test a hypothesis about one population using One-sample Tests. In Chapter 10, you learn how to test a hypothesis about two populations using Two-samples Tests.
• Keep in mind that:
• You may never accept the null hypothesis (H0), as you always begin by assuming that the null hypothesis is true.
• You may only accept the alternative hypothesis (H1) when you find enough evidence suggesting that your assumption about the null hypothesis was irrelevant.
• You may never be 100% sure about your conclusion. At best, you may say that, given the sample evidence, the alternative hypothesis (H1) is more likely than the null hypothesis.
• You may commit a Type I Error should you reject a true null hypothesis. It is like concluding that someone is guilty (rejecting the assumption of innocence), while she is, in fact, innocent.
• The probability of Type I Error determines the confidence that you may have in your test. The greater this probability, the lower the confidence.
• You may commit a Type II Error should you fail to reject a false null hypothesis. It is like concluding that someone is innocent (not being able to reject the assumption of innocence), while she is, in fact, guilty.
• The probability of Type II Error determines the power of your test. The greater this probability, the lower the power.
• Type I and type II errors may not happen at the same time. The former requires the null hypothesis to be true, while the latter requires that null hypothesis to be false. We cannot have a null hypothesis which is true and false at the same time. That is why these two errors may not happen at the same time. What we focus on is the type I error and the Confidence in the test.
• Here is the recipe for One-sample Hypothesis Testing:
1. State the null hypothesis (H0) and the alternative (H1).
2. Choose the probability of committing type I error. It is conventional to choose 1%, 5%, or sometimes even 10%. Also, choose a sample size: n.
• Keep in mind that one minus the probability of committing type I error will determine the confidence that you have in your test; e.g., if the probability of type I error is equal to 5%, you have 95% confidence in your test.
• The sample size will affect the standard deviation of the sampling distribution. Remember that the greater the sample size, the lower the standard deviation of sampling distribution of means.
3. Determine the TEST STATISTIC.
• If you know the population standard deviation (which is quite unlikely), then use the Z-stat. (The formula is given in Slide 68)
• If you don’t know the population standard deviation (which is very likely), then use the t-stat. (The formula is given in Slide 114)
4. Collect the data and compute the value of the chosen test statistic, given the formula. In the formula:
• X-bar is the sample mean, Mu is the population mean under the null hypothesis (the hypothesized population mean if you wish), and n is the sample size.
• If you have access to population standard deviation (Sigma), you may use that value in the Z-stat.
• If you do not have access to the population standard deviation, which is often the case, then use sample standard deviation (S) in the t-stat.
5. Given the probability of committing type I error, either use CRITICAL VALUES or P-VALUE to draw a conclusion.
• Critical Value Approach:
• Given the null hypothesis, critical values associated with the probability of committing type I error will divide the sampling distribution of means into two areas: Rejection and No-rejection. Take a look at two examples:
• Slide 70: Rejection and No-rejection area in two-tails test, where strict equality is used in the null hypothesis (e.g., H0: population mean weight is equal to 185 pounds)
• Slide 133: Rejection and No-rejection area in one-tail test, where an inequality is used in the null hypothesis (e.g., H0: population mean weight is greater or equal to 185 pounds)
• Reject the null hypothesis (H0) if the test statistic is in the Rejection area (e.g., Slide 81 for two-tails test and Slide 136 for one-tail test).
• Do not reject the null hypothesis (H0) if the test statistic is in the No-rejection area (e.g., Slide 122).
• P-value Approach:
• Given the sampling distribution, p-value is the probability of the test statistics or anything more extreme; e.g., Slide 90.
• If the p-value is lower than the probability of committing type I error (e.g., p-value < 5%), then you may safely reject the null hypothesis: When the p-value is low, the null must go.
• If the p-value is greater than the probability of committing type I error (e.g., p-value > 5%), then you may not reject the null hypothesis.
6. Don’t forget to explain the conclusion in the context of the problem. Use plain English! The p-value is a key concept in hypothesis testing, and it is quite easy to work with. The p-value is the probability of the evidence obtained (or anything more extreme), assuming that the null is true. If the evidence obtained are highly unlikely (e.g., p-value < 5%), then there should be something wrong with our assumption that the null hypothesis is true. This may lead us to reject the null hypothesis.

Statistics computer packages often report the p-value associated with a hypothesis test. When you come across them, you should always keep in mind that: when the p-value is low, the null must go.

• You may use Excel to conduct one-sample hypothesis testing.
• If you know the population standard deviation (which is unlikely), then go to Slides 103 – 106 to learn how to make use of Z-stat, critical values, and p-value.
• If you do not know the population standard deviation (which is more likely), then go to Slides 123 – 126 to learn how to make use of t-stat, critical values, and p-values.
• There are three useful Excel functions to identify the p-value associated with a null hypothesis. To illustrate them, we are going to use a random sample of the grand total of the home game attendance for a given MLB team during a given season between 1990 and 2010. We test for three sets of hypotheses using Excel. The hypotheses are listed below, the p-value functions are given, and detailed computations are conducted in this file. To download this file on your computer, go on File, then select Download as, and select Microsoft Excel (.xlsx).
• H0: The Average Attendance Per Season = 2 million fans v.s. H1: The Average Attendance Per Season ~= 2 million fans (Note: ~= is used for not equal to)
• The p-value function: =T.DIST.2T(t-stat,df). Note: for this case, the absolute value of t-stat should be reported; e.g.: for t-stat of -1.18, you must enter 1.18.
• H0: The Average Attendance Per Season =< 2 million fans v.s. H1: The Average Attendance Per Season > 2 million fans (Note: <= is used for less or equal to)
• The p-value function: =T.DIST.RT(t-stat,df). Reminder: The rejection area of this test is on the right tail, which is why we compute the p-value on the right tail (RT).
• H0: The Average Attendance Per Season >= 2 million fans v.s. H1: The Average Attendance Per Season < 2 million fans (Note: >= is used for greater or equal to)
• The p-value function: =T.DIST(t-stat,df,true). Reminder: The rejection area of this test is on the left tail, which is why we compute the p-value on the left tail.
• The resulting p-values, computed in the first three tabs of this file, suggests that the average attendance per season is greater than 2 million fans. To estimate the magnitude, we may employ a Confidence Interval Estimation (CIE), which suggests that we could be 95% confident that between 1990 and 2010 on average more than 2.16 million fans but less than 2.59 million fans attended the MLB games each season.
• Hypothesis Testing Summary (pages 1 and 2 are relevant for this chapter).
• NOTE: I cover the Hypothesis Test of Proportion in my lectures only if time allows. Nevertheless, I left the slides at the end of lecture presentations for Chapter 9.

### Chapter 10. Hypothesis Testing: Two-sample Tests

• Lecture Presentation (PDF)
• Using Two-samples Tests, one may test:
• How the MEANS of two INDEPENDENT populations relate to each other; e.g.: H0: Average Productivity of Exporting Firms = Average Productivity of Domestic Firms
• How the MEANS of two RELATED populations relate to each other; e.g.: H0: Average Productivity of Exporters Who Randomly Receive Subsidies = Average Productivity of Similar Exporting Firms With No Subsidy
• How the VARIANCE of two INDEPENDENT populations relate to each other; e.g.: H0: Variance of Sales Among Exporting Firms = Variance of Sales Among Domestic Firms
• Note: It is conventional to test the difference between the means in the null hypothesis; e.g., Ho: Average Productivity of Exporting Firms – Average Productivity of Domestic Firms = 0 (For more, see Slide 10). It is also conventional to test the ratio of the variances in the null hypothesis: H0: Variance of Sales Among Exporting Firms divided by the Variance of Sales Among Domestic Firms = 1 (For more, see Slide 51).
• Like One-sample Tests, we intend to reject the null hypothesis (H0) using the appropriate test statistics by:
• Comparing the value of test statistics to the critical value(s), as given by the sampling distribution
• Comparing the p-value associated with the test statistics (i.e., the probability of the obtained test statistics or anything more extreme) to conventional values such as 1%, 5%, or even 10%.
• The decision upon rejecting the null hypothesis (H0) in a Two-samples Test is quite similar to the decision upon rejection in a One-sample Test.
• Testing the means of two independent populations is done under two sets of assumptions:
• Assumption 1: The unknown variance of the independent populations are equal
• Under this assumption one may employ a pooled-variance t test. (For the formula for this particular variance, refer to Slide 13)
• The t-stat in this case is the difference between the difference in sample means and the difference in hypothesized population means, divided by the square root of the pooled-variance. (For the formula for this particular t-stat, refer to Slide 14)
• You may easily derive the confidence interval for the difference in population means using the pooled-variance. Refer to Slide 16 for more details.
• Assumption 2: The unknown variance of the independent populations are not equal
• Under this assumption one may employ a separate-variance t test, which is the sum of sample variances that are each adjusted by the sample sizes.
• The t-stat in this case is the difference between the difference in sample means and the difference in hypothesized population means, divided by the square root of the separate-variance. (For the formula for this particular t-stat, refer to Slide 26)
• The above t-stat has a particular degree of freedom, as given in Slide 27.
• Testing the means of two related populations is based on the difference between the paired values (Slides 35 and 36), which is why this is called a Paired Difference Test.
• The following offers a step-by-step instruction.
• Step 1.) Using the two samples, compute the difference between the paired values for each observation. The paired difference becomes your new sample statistics.
• Step 2.) Compute the sample mean for the difference between the paired values.
• Step 3.) Compute the sample standard deviation for the difference between the paired values.
• Step 4.) Form the t-test as given by Slide 40 (or alternatively form the confidence interval as given by Slide 42).
• Step 5.) Compare the t-stat to the critical values, given the significance of your test. Alternatively, compare the associated p-value to the conventional probabilities of type I error.
• Testing the variance of two independent populations is done using:
• F-stat
• The F-stat is simply the ratio of sample variances, putting the larger sample variance in the numerator and the smaller one in the denominator
• The F-stat has two degree of freedom. The first one is the sample size minus one, for the sample with the larger variance. The second one is the sample size minus one, for the sample with smaller variance.
• F-distribution
• Assuming that the populations of interest are normally distributed, the sampling distribution of the ratio of variances follows an F-distribution
• Given the probability of type I error, one may identify the critical value for Rejection and No-rejection areas.
• See Slide 58 for an illustration of the above areas.
• Use Excel if you would like to identify the critical value (Slide 55)
• Hypothesis Testing Summary (pages 3 and 4 are relevant for this chapter).

### Chapter 11. Analysis of Variance

• Lecture Presentation (PDF)
• In one sample tests, we focus only on one population parameter. In two sample tests, we focus on two population parameters. In Analysis of Variance (ANOVA), we focus on three or more population parameters.
• Given the scope of this lecture series, we only examine One-way ANOVA, which relates to completely randomized designs that incorporate only one factor into the analysis.
• Example: Ceteris paribus (i.e., all else unchanged), how much of a factor is the golf club brand in determining the distance traveled?
• Running an experiment, one observes the total variation, which could be measured by Total Sum of Squares (SST). This measure is defined as the sum of the squared differences between each observation and the grand mean (i.e., the mean of all data values). Refer to slide 23 for an illustration and to Slide 27 for the formula. The total variation could, then, be partitioned into two sets of variations:
• The variations that are due to differences among groups: Sum of Squares Among Groups (SSA)
• The SSA variations are generated by the factor of analysis. (Illustration: Slide 24; Formula: Slide 29)
• Example: differences in distance traveled, caused only by the choice of golf club brand.
• The variations that are due to differences within groups: Sum of Squares Within Groups (SSW)
• The SSW variations are generated by some random things that could potentially affect the outcome but we have no control over. (Illustration: Slide 25; Formula: Slide 32)
• Example: difference in distance traveled, caused by a sudden change in wind’s direction.
• The above measures of variation could then be divided by their degree of freedom to obtain something like variance.
• Degrees of Freedom and Mean Squares:
• For SSA, the degree of freedom is the number of groups, determined by the factor of interest, minus one. For instance, if we study three different brands of golf club, then the degree of freedom for SSA is two.
• The Mean Squares for SSA is, therefore, equal to: SSA divided by the above degree of freedom (Slide 34). We call this MSA, which measures the average of variations caused by the factor of interest.
• For SSW, the degree of freedom is the number of observation minus the number of groups. (The reasoning behind this degree of freedom is described fully in Slide 37)
• The Mean Square for SSW is, therefore, equal to: SSW divided by the above degree of freedom (Slide 34). We call this MSW, which measures the average of variations caused by random things that we have no control over.
• To conduct One-way ANOVA, we perform a F test, where the F-statistics is the ratio of MSA over MSW and the degrees of freedom are as described above. Refer to Slide 41 for the formal set-up. Though One-way ANOVA examines variations by employing the mean of squares among groups (MSA) and the mean of squares within groups (MSW), the purpose of One-way ANOVA is to reach conclusions about possible differences among the means of each groups. In a sense, we a ratio of two measures of sample variance to say something about population means.