On 7th September, during a phenomenal fall of migrants on the North Norfolk coast, my dad encountered a very monochromatic “yellow wagtail”* *taxon out in Burnham Overy Dunes. While he didn’t hear it call, his impressions in the field (and indeed the photos) were highly suggestive of Eastern Yellow Wagtail *Motacilla tschutschensis*.

However, the previous earliest accepted record of Eastern Yellow Wagtail is from Cemlyn Bay, Anglesey, on 25th September 2019, with an additional record from Salthouse, Norfolk, on 23rd September 2023 reported last year and awaiting acceptance by BBRC. If this bird were an Eastern Yellow Wagtail, it would be a good 16 days since the earliest record on the books.

However, just because it was very early, it doesn’t necessarily mean we should discount Eastern Yellow Wagtail immediately. Recent August records of conventional Siberian autumn vagrants, such as Steppe Grey Shrike *Lanius excubitor pallidirostris* and Siberian Stonechat *Saxicola maurus* have shown that the unexpected can and does happen on occasion.

Therefore, I wanted to figure out just how unexpected it would be for an Eastern Yellow Wagtail to turn up on 7th September. Fortunately, using some university-level statistics easily explained with a quick Google, I was able to answer this very question! Read on for a rundown of how I got there…

### Probability Distributions

Here is a graph of Eastern Yellow Wagtail records by the day of the year that they were first found. The dataset used comprises 31 records accepted by BBRC, as well as 12 records substantiated by photos or sound recordings that have not yet been accepted by BBRC, mostly from 2023.

For the non-mathematically inclined amongst you, a given dependent variable (i.e. number of Eastern Yellow Wagtails recorded) will always follow some form of pattern when plotted against an independent variable (i.e. day of the year). This pattern is called a **distribution**. In essence, it’s how the records of Eastern Yellow Wagtail are distributed across the days of the year.

One major grouping of distributions that we have to ascertain pretty quickly is whether it is parametric or non-parametric. A parametric distribution is also known as the **normal **distribution and is the famous “bell-shaped curve” that governs many continuous patterns in humans (e.g. height). The reason we need to determine if the distribution is normal or not is because many of the calculations we need to do rely on an assumption that a distribution is either normal or not.

While the Central Limit Theroem dictates that normality can be assumed for sample sizes over 30 (and ours is 43), you can see that the Eastern Yellow Wagtail records do not match the normal distribution whatsoever just by looking at it. There is also an empirical test we can do called the **Shapiro-Wilk test**. Upon executing it, lo and behold it fails (W_{43} = 0.94286, p = 0.03288). The executive (and slightly oversimplified) summary of this test is that there is a ~3.3% chance that the data comes from a normal distribution. Most statisticians set a threshold of 5% to consider a distribution to be normal, so we must use tests that assume a non-normal distribution.

Incidentally, the main reason for this is almost certainly due to a phenomenon called **skewness **which is essentially a fancy way of saying the data is concentrated at one end of the X axis or the other. Carrying out a test for this confirms this statement (skewness = ~0.50, normal distribution has skewness of 0). There is also some **kurtosis** in the data; this is a metric which essentially measures the amount and extremity of any outliers in the data. A normal distribution has a kurtosis of 3, and the Eastern Yellow Wagtail records have a kurtosis of 2.372, so the outliers are slightly less extreme and/or fewer in number than a normal distribution.

So, if we don’t have a normal distribution, what distribution do we have? Well, we can get a computer to figure that out for us!

**Discrete Distributions**

Modelling count data like this is typically done with one of two discrete distribution models: the **Poisson distribution** and the **negative binomial distribution**. Both work better in different contexts, but given we’re unsure which one will work best, we’re going to fit both models to the data and compare their performance.

The key metric we use for this is a statistic called **Akaike’s Information Criterion (AIC)**. This metric essentially measures how well a model fits the data compared to other models fit on the same dataset. A smaller AIC score means the model performed better on the data. In this instance, the AIC score for the Poisson model was **224.28**, while the score for the negative binomial model was slightly lower at **219.8** so we want to use the negative binomial distribution model in our future analysis.

Given we are looking for the probability that an Eastern Yellow Wagtail will occur on a given day, we need to use a function called the **probability mass function**. However, as the data involves counts, we need to find the probability mass function values for the counts + 1 in order to find the probability of a new bird occurring on a given day.

After calculating the probability mass function values, we find that the probability that the count of Eastern Yellow Wagtails on 7th September increases by 1 is **8.11% **(3 significant figures). A preliminary check of the model fit through a **Chi-squared goodness of fit test** also indicated it fit the data very well (X^{2}_{365} = 269.45, p = 0.99; i.e. in 1% of cases the observed counts of Eastern Yellow Wagtail would not follow a negative binomial distribution).

So, we have a distribution that seems to fit the data very well, and we’ve computed a probability based on a good-fitting model. Case closed, right?

Wrong. A quick look at the graph showing the actual counts of Eastern Yellow Wagtails versus the predictions of the model calls the extent of the goodness-of-fit into question.

It seems that during the key period we’re examining, the autumn, the model fits the data far worse than the Chi-squared test suggests! Indeed, carrying out the Chi-squared test solely on dates within the range of Eastern Yellow Wagtail records (i.e. 23rd September onwards) does indeed indicate that the model doesn’t fit whatsoever (X^{2}_{101} = 220, p = 7.92 x 10^{11}). Therefore, the predicted probabilities and counts from the model are unreliable and probably inaccurate.

So, is there a way to improve the reliablity of our model?

### Continuous Distributions

Previously we’ve assumed that each day of the year is a discrete point. However, we can also model the days of the year as a continuous variable. When modelling in this way, it turns out we have the very aptly named **skew-normal distribution** for this data – a broadly normal-shaped graph but with an additional skewness parameter (also known as the gamma parameter). Fitting the distribution to the data therefore gives us the **probability density function**.

### Probability Density vs Absolute Probability

You’ll note the y axis above uses the term **probability density**. This is because the graph *doesn’t *represent **absolute probabilities** (i.e. There is an X% chance for Eastern Yellow Wagtail to occur on this date, like in the discrete distributions), but it represents the probabilities of an Eastern Yellow Wagtail occurring on a given day *relative to the other dates* *in the function* (i.e. An Eastern Yellow Wagtail is X% more likely to occur on Y date than Z date).

When computing this, the number is astronomically low – **1 x 10 ^{-16}**!

But, how do we interpret this? The best way is to compare probability density values across different days of the year. To illustrate this, using an optimization algorithm, I calculated the maximum probability density of the distribution, and its corresponding day of the year. Following this, we can state that the maximum probability density is **0.0172** (3 significant figures, occurring around the 22nd or 23rd September); therefore, the relative likelihood of an Eastern Yellow Wagtail occurring on the 7th September vs the 22nd September is simply the two values divided by each other. In this instance the number is enormous (into the many billions), but in other cases it won’t be anywhere near as large as that.

What if we want to work with absolute probabilities? Given the probability density function is continuous, this is impossible, as all single days technically have a probability of zero due to the inherent inability to pinpoint a precise value in a continuous space; however, its integral, the **cumulative density function**, allows us to *estimate* absolute probabilities. The sheer mention of calculus will no doubt have prompted some readers to click off this article, but for those of you who are still here, the essential summary of the process is to calculate the definite integral value between increments of 0.5 surrounding the specific day of interest – in our case, 249.5 and 250.5. This is why I used “estimate” – this is a workaround of assuming the increments of 0.5 comprise an entire “day” in linear time. While an obviously flimsy premise, it is a useful approximation with some mathematical merit.

Doing this for day 250 again yields a probability **less than 1 x 10 ^{16}**, meaning with this methodology we can say that the estimated probability of Eastern Yellow Wagtail occurring on the 7th September in the British Isles is

**less than 1 x 10**, or about 1 in 100 trillion,

^{14}%**based on prior records**.

Incidentally, we can use the same method to calculate the most likely day for an Eastern Yellow Wagtail to occur in the British Isles, again using the optimisation algorithm from earlier. The most likely day for an Eastern Yellow Wagtail to occur based on previous records is the 23rd September at **1.72%***, or about 1 in 58.

**While the values may look the same as the one accrued from using the probability density function, they are actually subtly different at >5 or 6 significant figures. Because the number of records involved in this case study is so small, the values appear the same due to the low probability density values; however, this does not apply for larger sample sizes, hence the explanation of the full methodology here.*

We do also need to check the goodness-of-fit of this distribution model. While visually it looks quite close (even on the days not shown on the graph, due to the rapid deterioration to 0 on the left tail), we can check this empirically using a test called a **two-sample Anderson-Darling test**. This test allows us to compare whether the distribution of the Eastern Yellow Wagtail records matches the distribution of a sample of numbers of the same length predicted using the model. Carrying out this test showed that the distribution matched the data very closely (A^{2 }= 68.434, p = 0.996; i.e. there is a **99.6% probability** that the Eastern Yellow Wagtail records follow the same distribution as the one computed by the model), so we have relatively strong confidence that the model is reflective of Eastern Yellow Wagtail records in the UK.

So now we have two competing approaches: a discrete approach, which has a more accurate interpretation but poor predictive power; and a continuous approach, with very high predictive power but disingenuous interpretations. The logical next step is to ask if there are any methods that can reconcile the flaws of either of these models.

### Mitigations of Continuous Models

**Definite Integrals of Multiple Days**

While ascertaining the probability of a bird occurring on a specific day is not possible with continuous probability density functions, we can ascertain the probability of it occurring over a **range of days** and remove a significant amount of the dissonance between the mathematical properties of the model and its biological applications.

Let’s say for instance we wanted to find the probability of an Eastern Yellow Wagtail occurring **before the 20th September**. For this, all we would need to do is calculate the definite integral of the probability density function between days 1 and 263. When doing this, we return a value of around **1.12 x 10 ^{12}%**, or about 1 in a trillion – not quite as ridiculous as day 250 on its own, but still very low nonetheless.

We can also figure out which week of the year is best for certain species by adapting the optimisation algorithm from earlier. For Eastern Yellow Wagtail, this is the week between the 21st and 27th September, at a whopping **9.14%**.

#### Chebyshev’s Inequality

A safer approach may perhaps be to assume that the distribution is unknown, in which instance we can use a proof called **Chebyshev’s inequality**. This theorem dictates that a certain percentage of values in a distribution, regardless of its shape, will fall within certain bounds. In practical terms, this means that the upper bound of the probability of an Eastern Yellow Wagtail occurring before or after a certain date can be found.

When applying Chebyshev’s inequality to the **20th September**, we can state that an Eastern Yellow Wagtail occurring before this date is no likelier than **46.3%**. Not a particularly useful conclusion, but certainly the most accurate and reliable.

**C**onclusions and Evaluations

While some issues have been discussed, there are several others which affect the reliability of this methodology:

**The scalar quality of “days of the year” (i.e. day 367 doesn’t equal day 1 in our data).**Processing of circular data is a common problem for researchers in the sciences. Methods have been developed to create an assumption of circularity in data used in models, but the trouble with the scenario here is that not all years are created equal (as discussed below), so many methods will be inappropriate. In most cases, setting a different date to January 1st will mitigate most issues, but some cases will still see some model distortion, especially with birds that occur year-round.**Leap years.**An enormous headache. While we could express days as a proportion of 365.2422 (the actual length of a solar year, and thus the time scale which governs our seasons), we would likely need to pick an hour point within said day with which to calculate the proportions involved (e.g. 12:00). Consequently we would need to set the value of all records for that day to that hour. While we lose biological realism plus or minus a few hours, we eliminate the issue of different day–of-year numbers referring to different dates across different years. It also provides a convenient way of mitigating a third problem, the**intra-daily detection assumption**(i.e. the hour of detection is assumed to be the hour of arrival, when clearly it often isn’t). However, there is no way to get around…**Inter-daily detection****assumption**(i.e. the day of detection is assumed to be the day of arrival). Any attempt to extrapolate arrival dates from detection dates is going to be pure guesswork, and so we can only go with the data that we have, and accept that the flaw will impact our studies in some cases. However, it stands to reason that the margins involved in well-watched hotspots, where the majority of records come from, are likely to not be especially large in some cases, and the larger variations between detection dates and arrival dates should only affect a small proportion of birds in less well-watched areas.**Sampling bias.**The BBRC’s acceptance criteria of Eastern Yellow Wagtail are strict, and birds can (currently) only be accepted with a sonogram or DNA analysis as supporting evidence. This means that we are likely not going to be recording the full spectrum of Eastern Yellow Wagtails, especially given the identification criteria have only recently become known to British birders. While obviously impacting the sample size (and thus the model’s reliability), it also reduces the realism of the model that’s built. This limits the extent to which we can generalise the interpretations of this model.

It goes without saying that using probabilities **should not replace field identification**. Nothing is a certainty in statistics, and this method is nowhere near as reliable at identifying birds as the old-fashioned way (i.e. field observations, DNA). However, it absolutely can be used to help narrow down likely possibilities.

In any case, these methods provide more advanced ways of “guessing” (for want of a better phrase) the probability of a certain species occurring in a certain time window. Because fitting any of these models is reliant on high sample sizes, and many species do not occur in sufficient volumes to maximise the efficiency of the models, a lot of the exact interpretation of the results garnered will not be reflective of the “real” answers. However, they do let us answer the original question: how weird is this record **compared to previous records** (as long as the distribution fit to the record data in question is good).

The applications of this methodology are broad. Some ideas that will be explored on the Species Pages of OrniStats at later dates will include:

- Concerning birds that couldn’t be aged in the field (e.g. waders), assigning probabilities of them being certain age groups using binomial distributions.
- Ascertaining the likelihood of which of two closely related species is occurring, in cases where the identity of a bird in the field cannot be narrowed down to two taxa (e.g. Bonelli’s Warblers, Subalpine Warblers)

While not explored in this post, we can also refine the likelihood estimations by incorporating covariates such as weather, location and age of the birds involved into the models. A future project outlining this will be published here.

I hope you all enjoyed the first installation of the OrniStats articles. Stay tuned for more explorations like this – there’s lots of great ideas to come!

Pingback: Case Study 1: Applying probability distribution modelling to two early vagrants to the British Isles in autumn 2024 - OrniStats