8 min to read
Statistics - Foundation - Basic Terminology
This article is a part of the Statistics - 101 series, you can access the full version of the series here:
-
Foundation
- Basic terminology (you are here!)
-
Descriptive Statistics
-
Inferential Statistics
-
- Probability and distribution
- Hypothesis testing
- Estimation
- Regression
Welcome to the first article of the Statistics 101 series, after reading this article, you will learn:
- What exactly is statistics?
- How many types of data are there? Their level of measurements?
- How a sample differs from a population
- Variable and types of variable
This is the start of the 101 series on Statistics, a very important field in Data Science. Statistics is a very wide field, and as you may have known, it has proven its usability and application and thus, has been separated from Mathematics to bear its own name.
Learning Statistics is not easy. Not at all! But if you can grasp its idea, your perception on the world and every problem will be sharpened, allowing to draw more concrete conclusion. So prepare yourself with that.
There will be times you may get lost, or bored along the way to accumulate your knowledge. But if you truly understand why you should learn it and what it can bring to you, I can guarantee one thing: You will be in love with Statistics.
- 1. What is statistics?
- 2. Different types of data and their level of measurement
- 3. Sample vs Population
- 4. Variable and different types of variable
- 5. Last words
1. What is statistics?
First let’s take the image below as an example.
Seeing that pictures, can you answer these following questions?
- How many people are there in that picture?
- How many are adults and how many are teenagers?
These questions lie in the field of descriptive statistics.
Descriptive statistics provides us with tool (tables, graphs, averages, ranges, correlations) for organizing and summarizing the inevitable variability in collections of actual observations or scores.
There is also another field of statistics, known as inferential statistics, which helps generalize insights extracted from a set of actual observations.
Inferential statistic provides tools (a variety of tests and estimates) for generalizing beyond collections of actual observations.
Notice the terms actual observations and generalize. They will be further discussed in part 3
If we have to take an example of inferential statistics, also from the picture above, assume that these students come from the same university and all have the GPA of 3.6 or higher. A question that would like in the field of inferential statistics would be: does every student in that university achieve a GPA of at least 3.6?
2. Different types of data and their level of measurement
Any statistical analysis is performed on data, a collection of actual observations or scores in a survey or an experiment. In statistics, there are 3 types of data and 3 corresponding level of measurement, which are:
-
Qualitative data - Nominal measurement
Qualitative data is a set of observations where any single observation is a word, letter, or numerical code that represents a class or category. For example, Yes/No or 1/0.
Level of measurement: Nominal. Property: classification.
Since it is just the representation of the underlying attribute, arithmetic operators would be inappropriate here. We cannot say that if male is encoded by 2 and female is encoded by 1, male have twice as much gender as female right.
For qualitative data, a number or word only represents the class that value belongs to. No arithmetic operators can be applied here.
-
Ranked data - Ordinal measurement
Ranked data is a set of observations where any single observation is a number that indicates relative standing. For example, 1st or 2nd place.
Level of measurement: Ordinal. Property: order.
Since it does not reflect actual distance between adjacent ranks, arithmetic operators with ranks are also inappropriate. For example, we cannot say that the speed of horse ranked 2nd is the average of the speed of horses ranked 1st and 3rd.
For ranked data, a number only represents the rank of that value in a set. Actual distance between adjacent ranks cannot be known. No arithmetic operators can be applied here.
-
Quantitative data - Interval/Ratio measurement
Quantitative data is a set of observations where any single observation is a number that represents an amount or a count. For example, weights of 60kg or height of 170cm.
Level of measurement: Interval/Ratio. Property: Equal interval & a true zero
Equal interval means that for every set of two values with fixed steps, the difference between them represents the same amount. For example, the difference between 150cm and 160cm would be the same as the difference between 90cm and 100cm. Thus, it would be appropriate to describe an number as a certain amount greater than another.
For quantitative data, we can describe an number as a certain amount greater than another
A true zero signifies that, for example, the bathroom scale registers 0 when not in use—that is, when weight is completely absent. Since the bathroom scale possesses a true zero, numerical readings reflect the total amount of a person’s weight, and it’s appropriate to describe one person’s weight as a certain ratio of another’s. It can be said that the weight of a 140-lb person is twice that of a 70-lb person.
For quantitative data, if there exists a true zero, we can describe an number as a certain ratio of another.
NOTE: there may be an absent of a true zero in some cases, for example, the temperature in the Fahrenheit scale. Specifically speaking, a reading of 0 in this scale does not reflect the complete absence of heat. In fact, true zero equals −459.4°F on this scale.
In such cases, we cannot conclude that 80°F is twice as hot as 40°F. Clearly, 539.4°F (80°F + 459.4°F) is not twice as hot as 499.4°F (40°F + 459.4°F).
3. Sample vs Population
The terms generalize and collections of actual observations pretty much sum up the definition of sample and population in statistics.
In statistics:
- A population = complete collection of observations or potential observations
- A sample = smaller collection of actual observations drawn from a population.
Whether a collection of observations is a population or a sample needs to be assessed on a case by case basis.
For example, a set of weights reported of 53 males in a class can either be seen as a population when you are concerned about exceeding the load-bearing capacity of an excursion boat (chartered by the 53 students to celebrate successfully completing their stat class!), or as a sample from a population because you wish to generalize to the weights of all male statistics students or all male college students.
One important feature of a good sample is that it must represent the population; otherwise, any generalization might be erroneous.
A sample must represent the population
How to achieve a sample that represent the population will be further discussed in further articles.
4. Variable and different types of variable
A variable is any characteristics, number, or quantity that can be measured or counted. Age, or gender are examples of variables. It is called a variable because the value may vary between data units in a population, and may change in value over time.
There are 3 types of variable:
-
Independent variable
Well the name said it all, a variable (often denoted by x ) whose variation does not depend on that of another. For example, whether a person takes on training or not.
-
Dependent variable
When a variable is believed to have been influenced by the independent variable, it is called a dependent variable. For example, the life span of a person
-
Confounding variable
Whenever groups differ not just because of the independent variable but also because some uncontrolled variable co-varies with the independent variable, such variable is called confounding variable.
As an exercise, try to determine which is which type of variable in the following situation: Weight loss among obese males who choose to participate either in a weight-loss program or a self-esteem enhancement program.
5. Last words
Now that we have gone through every basic terminology in statistics, you will be more comfortable on the road ahead towards in-depth statistics.
In the next article, we will talk about different types of study, and how to efficiently conduct a statistical study in real life. Stay tuned!
REFERENCE: Robert S. Witte, John S. Witte - Statistics-Wiley (2016)
Comments