## Learning Outcomes

If you work through this section you should be able to:

• Understand the key definition and usefulness of statistics.
• Understand the different types of data.
• Understand the two different statistical hypotheses.
• Understand statistical significance.

Statistics is a discipline involved in applying scientific methods in the collection, analysis, interpretation and presentation of numeric data.

Statistics can be used to interpret large amount of data, especially when such data tends to behave in a regular, predictable manner.

The rest of the section will look at different types of statistics and variables, statistical hypotheses and significance, and basic descriptive statistics, particularly measuring the centre and dispersion of a distribution.

ACCORDION OPTIONS: enable collapse / start fully collapsed

There are two types of statistical techniques, descriptive statistics and inferential statistics.

Descriptive statistics are used to describe the basic features of the data in a research. They simply describe what the data is or what the data shows, and often involve methods of organising and summarising information from data.

For instance, descriptive statistics can be used to present a summary of the frequency of individual values or ranges of values for a variable, such as a distribution of school students by year, or a distribution of income values of a company. Descriptive statistics can also be used to display the central tendency and dispersion of a data set.

Inferential statistics tend to reach conclusions that extend beyond the immediate data alone. They involve methods of using information from a sample to draw conclusions about the population.

For instance, inferential statistics can be used to compare the average academic performance of children in two or more schools. Inferential statistics can also be used to estimate the proportion of defective items from a production line based on the proportion of faulty items in a sample taken from the line.

In research, a variable is a logical set of characteristics and attributes of an object. The sub-value of a variable can vary.

The colour of a car could be a variable, and its sub-values could be red, blue, green, etc. Students’ results could be a variable, and its sub-values could be 41,52, 57, etc.

There are three main types of variables: nominal, ordinal, and scale variables.

Nominal variables are variables which have two or more categories with no intrinsic order, such as gender, nationality, ethnicity, language and types of property.

Numbers may be used to represent these categories, but it would be meaningless if these numbers are used in any arithmetic way.

Ordinal variables are variables that have two or more ordered or ranked categories. For example, in terms of people’s satisfaction level with a product, the categories could be ‘very satisfied’, ‘satisfied’, ‘no opinion’, ‘dissatisfied’, and ‘very dissatisfied’. In terms of age groups, the categories could be ‘<18’, ‘18-25’, ‘26-35’, ‘36-45’, and ‘>45’.

Scale variables are variables for which they have a numerical value over a continuous range, such as height, body mass, blood pressure, temperature, etc.

If you use SPSS to analyse your data, you need to ensure that you select the right type of variable so that you can run appropriate analysis in SPSS.

There are two types of statistical hypotheses.

The null hypothesis (H0) is usually the hypothesis that the observed relationships between two or more variables (e.g. association or difference) in a sample result purely from chance.

The alternative hypothesis (H1) states the opposite, which is the hypothesis that the observed relationships in a sample are influenced by some non-random factors.

#### Examples

1. To determine whether a new maths teaching method would improve students’ academic performance:
• H0: The new teaching method would have no impact on students’ maths results;
• H1: The new teaching method would improve students’ maths results significantly.
2. To test the association between qualification level and salary level:
• H0: There is no association between qualification level and salary level;
• H1: There is association between qualification level and salary level.
ACCORDION: Converted tabs - this tab opens by default

The statistical significance of a result is the likelihood that the observed relationship between two or more variables (e.g. association or difference) in a sample occurred by pure chance other than some non-random factors.

p-value is used to determine the statistical significance of a result. The p-value is a number between 0 and 1.

Different research may use different p-value cutoff points for a decision. In the majority of researches, an alpha of 0.05 is used as the cutoff for significance. Other cutoff points such as 0.01 or 0.001 may be used.

When using the standard 0.05 cutoff, if the p-value is less than or equal to 0.05, it indicates strong evidence against the null hypothesis (H0), therefore we reject the null hypothesis (H0) and accept the alternative hypothesis (H1). If the p-value is larger than 0.05, it indicates weak evidence against the null hypothesis (H0), therefore we accept the null hypothesis (H0).

#### Examples

1. To determine whether a new maths teaching method would improve students’ academic performance, you randomly select some students and divide them into two groups. You apply original maths teaching method to one group of students and new maths teaching method to the other group of students. After one month you ask all student to take the same test and record students’ test scores.
2. The statistical hypotheses are:
• H0: There is no statistically significant difference between the means of the two groups’ scores;
• H1: There is statistically significant difference between the means of the two groups’ scores.

If you run the test scores through the hypothesis test and your p-value turns out to be 0.02, it means that there is probability of 0.02 that you will mistakenly reject the null hypothesis. If using 0.05 as a cutoff point, you conclude that there is statistically significant difference between the means of the two groups’ students’ scores, therefore the new maths teaching method could help improve students’ maths scores. If your p-value turns out to be 0.25, which is larger than 0.05, you accept the null hypothesis and conclude that there is no statistically significant difference between the means of the two groups’ scores, therefore the new maths teaching method could not significantly improve students’ maths scores.

## Activity

You can download a version of this Introduction to statistics activity in Word format: