# Introduction

Let’s start from a basic definition: ANOVA stands for “Analysis of Variance“, and it’s a test used to assess the statistical equality (or lack of thereof) among different groups’ means.

Notice as well that to run this test you need to have at least 3 different groups (the means of which you intend to compare).

As the name itself partly suggests, it does so by analysing the relationship among the VARIANCE BETWEEN GROUPS (the larger the variation, the more likely it is that the means among groups are different) and the VARIANCE WITHIN GROUPS (the larger it is, the less likely it is that the means among groups are different).

This is well noticeable in the formula used to run our ANOVA:

F = (variance between groups) / (variance within groups)

The larger the value of F, the more likely it is that the groups have different means (which typically results in rejecting our H0, or null, hypotesis).

This can be intuitively understood with a simple practical example.

Imagine to have 3 groups (A, B and C), of 10 people each, sampled from 3 different university classes:

Imagine that the means are respectively (in cms):
– 187cms (A)
– 180cms (B)
– 173cms (C)

You’d be naturally brought to think that the means are indeed different, and don’t just suffer from some sampling randomness.

Why so? Because the differences among the samples (A,B and C) is quite relevant in relative terms.
In other words: the variance between the groups is large.

At the same time, before stating that such means are indeed different, you might want to assess what goes on within each of the groups.
You probably want to see how the individual heights are distributed.
The more they’ll be actually concentrated around some value, the more you’ll agree that they are representative of the population they come from (in this case, each university class).

If instead you’ll notice that group A is skewed because it has one semi-professional basketball player in it, and group C is skewed because it has a succesful jockey in it, you might be tempted instead to say that, afterall, the means of the populations are not necessarily deemed to be so different.

In other words, you would be more lineant in judging the groups means as different because of a larger variance within (certain) groups.

This logic is summarized in the measure used in an Anova test to determine wheteher the means of different groups are indeed different or not, which is (again):

F = (variance between groups) / (variance within groups)

Which tries to make sense of such differences using a statistical approach (which includes assessing how big the groups are, through the use of the “degrees of freedom”).

#### One-way ANOVA

The most basic form of this test is the one we are addressing in this post: the one-way Anova.

Using the term “one-way” we simply mean that the test is run on a single independent variable (such as, for example, the height seen before).

Let’s run a simple example using the R statistic software, to see how it works.

First, we load the data from a webpage (mind the “data.table” library).

Then we assign names to the 2 columns of the dataset, for clarity:

Since we run the View() function, we can see how we have 6 values for “Location”, and that there are 12.858 data points.

For general interest, we know from the references of the dataset that the “Location” values indicate the following:

We can then run a boxplot() function to assess how the number of vertebrae is distributed, in each location:

Which gives us the following plot: The boxplot allows us to see at a glance details such as:
54 vertebrae is a value found only in locations 2, 3 and 5
49 vertebrae is a value found only in location 2 and 4
– There is a prevalence of 52 in all locations, except location 1

In any case this plot doesn’t show us the number of samples for each location, nor allows us to decide whether the means can be considered equal or not, from the statistical point of view.

#### ANOVA test

As we should now have a fair idea about the nature of our data, we can finally run our Anova test.

To do so, we have 2 options:

1. Taking a relaxed approach at the equality of variances among groups. In this case, you can use the:

oneway.test() function

which is more lenient, in relation to the equality of variance assumption.

2. You first verify wheter the variances are equal among the groups using the Levene Test, and only then use the:

aov() function

if variances are proven to be statistically equal.

Let’s try this 2nd, stricter approach:

Levene Test:

Which gives us the following result:

The p-value of the Levene Test tells us that the variances of the 6 groups definitely cannot be considered equal.

We then need to procede using the more lenient version of the Anova test, using the oneway.test() function:

Which gives us the following :

Given the minuscule p-value (0.0003717), the means of the different groups CAN be considered NOT EQUAL.

We definitely have enough support to reject the null hypotesis.

As a last word, please note that the Anova test only tells us if the means of at least 2 groups are different, but falls short of indicating which groups actually have different means.

There are several post-hoc test able to assess which pairs of groups have different means, a common one being the “Tukey test“.
Unfortunately, this test assumes equality of variance among groups, hence cannot be used in our case, following our previous findings.