the Berlin Marathon

Last Sunday (16 September 2018), Eliud Kipchoge shattered the marathon world record in Berlin with a time of 2:01:39, more than a minute faster than the previous record (Dennis Kimetto's 2:02:57). Every news article I read about Eliud's record mentioned that he ran a negative split, meaning that the second half of his race was faster than his first. I remember reading once that "world records are run with negative splits". Let's see if we can check this claim!

I started by looking for some useful data and quickly found the results archive page of the Berlin marathon. Apart from a runner's time at the finish line, this website also lists the times at 5 km intervals (starting from the 1999 edition), which is exactly what I was looking for. Since 1999, seven of the nine men's marathon world records have been set in Berlin ^[1], making this dataset even more interesting. I put together a text file of the winner's times and splits, as well as some basic information on each winner.

The figure below shows the 5 km splits for all the Berlin marathon winners between 1999 and 2018. Hover over or click on the little information boxes below to highlight that particular run in the figure. All the code for the visualizations can be found on GitHub.

animate

Looking at this figure, I was surprised to see how evenly paced most runs are, although it does seem like the world record runs might indeed be run with a faster second half. They definitely don't show the slowing down at the end as seen in for example Gebrselassie's 2009 victory.

To check the world-records-and-negative-splits claim, I created a slope chart (below left) and a dot plot (below right). The slope chart shows the time of the first half of the race compared to the second half. It's clear that most (but not all) world record runs have a faster second half, whereas most (but—again—not all) non-world record runs have a slower second half. The dot plot of the differences between the first and second half of the race confirms that world records do indeed tend to have a negative split (t-test, p = 0.0262)^[2].

So if you're a middle or long distance runner and you're looking to improve your personal record, aim for those negative splits!

[1] This Wikipedia article gives a nice historical overview of the marathon world record. ↑ back up

[2] For another project I'm working on (mexpress.be) I wrote a bunch of statistical functions in JavaScript, including one to perform a Welch's t test. To keep things (relatively) simple, I calculate the p value using a numerical approximation developed by Abramowitz and Stegun (Handbook of Mathematical Functions With Formulas, Graphs, and Mathematical Tables, 1970). All the functions can be found on GitHub, but here are the two most relevant ones:

						
var tTest = function(x, y) {
	// Perform a Welch's t-test.
	var c;

	// Remove missing values.
	x = x.filter(function(a) {
		return a !== null && a !== undefined;
	});
	y = y.filter(function(a) {
		return a !== null && a !== undefined;
	});
	var nx = x.length;
	var ny = y.length;
	if (nx >= 3 && ny >= 3) {
		var xMean = mean(x);
		var yMean = mean(y);
		var xVar = variance(x);
		var yVar = variance(y);
		var t = (xMean - yMean) / (Math.sqrt(xVar / nx + yVar / ny));
		t = Math.abs(t);
		var df = degreesOfFreedom(x, y);
		var p = tDistribution(df, t);
		return p;
	} else {
		return NaN;
	}
};

var tDistribution = function(df, t) {
	// Calculate a p value based on:
	// - a number of degrees of freedom
	// - a t value
	// - the t distribution.
	// The p value is calculated using a numerical approximation:
	// Abramowitz, M and Stegun, I. A. (1970), Handbook of Mathematical
	// Functions With Formulas, Graphs, and Mathematical Tables, NBS Applied
	// Mathematics Series 55, National Bureau of Standards, Washington, DC.
	// p 932: function 26.2.19
	// p 949: function 26.7.8
	var a1 = 0.049867347;
	var a2 = 0.0211410061;
	var a3 = 0.0032776263;
	var a4 = 0.0000380036;
	var a5 = 0.0000488906;
	var a6 = 0.000005383;
	var x = t * (1 - 1 / (4 * df)) / Math.sqrt(1 + Math.pow(t, 2) / (2 * df));
	var p = 2 * (1 / (2 * Math.pow(1 + a1 * x + a2 * Math.pow(x, 2) + a3 * Math.pow(x, 3) +
		a4 * Math.pow(x, 4) + a5 * Math.pow(x, 5) + a6 * Math.pow(x, 6), 16)));
	return p;
};

↑ back up