UK political donations

data analysis
Exploring how donations to the UK’s political parties compare
Published

2024-10-14

The data

I’m interested to know more about the donations made to the UK’s political parties. More specifically, I’m curious to know how the donations are spread among donors. The information is readily available from the electoral comission. I’ve downloaded everything from the start of 2001 to the end of 2023. I omitted donations from the parties themselves, which I think is just them moving money from the central office into constituencies.

Show the code
import polars as pl
import altair as alt
import vegafusion
alt.data_transformers.enable("vegafusion")
import matplotlib.pyplot as plt
import numpy as np
from IPython.display import display

Let’s get an overview of the data.

Show the code
donations = pl.read_csv(
  "./data/uk_pol_donations.csv"
).select(
  pl.col("*").exclude(
    "ECRef", "Postcode", "CompanyRegistrationNumber", "PurposeOfVisit", "DonationAction", "ReceivedDate", "ReportedDate", "ReportingPeriodName", "AccountingUnitId", "RegulatedEntityType"
  )
)
donations.describe()
shape: (9, 20)
statistic RegulatedEntityName Value AcceptedDate AccountingUnitName DonorName AccountingUnitsAsCentralParty IsSponsorship DonorStatus RegulatedDoneeType DonationType NatureOfDonation IsReportedPrePoll IsBequest IsAggregation RegulatedEntityId DonorId CampaigningName RegisterName IsIrishSource
str str str str str str f64 f64 str str str str str f64 f64 f64 f64 str str f64
"count" "70669" "70669" "70669" "70669" "70669" 70669.0 70669.0 "70669" "0" "70669" "11386" "42917" 70669.0 70669.0 70669.0 70669.0 "0" "70669" 70116.0
"null_count" "0" "0" "0" "0" "0" 0.0 0.0 "0" "70669" "0" "59283" "27752" 0.0 0.0 0.0 0.0 "70669" "0" 553.0
"mean" null null null null null 0.049527 0.010146 null null null null null 0.012735 0.246898 116.895555 41802.893136 null null 0.000242
"std" null null null null null null null null null null null null null null 635.27884 28796.123603 null null null
"min" "Advance Together [De-registere… "£0.00" "01/01/2002" "ALDC" " A D Gatling" 0.0 0.0 "Company" null "Cash" "Administration services" "False" 0.0 0.0 17.0 1.0 null "Great Britain" 0.0
"25%" null null null null null null null null null null null null null null 52.0 18697.0 null null null
"50%" null null null null null null null null null null null null null null 53.0 34448.0 null null null
"75%" null null null null null null null null null null null null null null 90.0 67845.0 null null null
"max" "women4theworld [De-registered … "£999.92" "31/12/2023" "kingston upon Hull" "the capital dairy Company ltd" 1.0 1.0 "Unincorporated Association" null "Visit" "Travel" "False" 1.0 1.0 12922.0 102542.0 null "Northern Ireland" 1.0

There’s some cleaning required here. The dates and donation value are strings, but we want to convert them to dates and numbers.

Show the code
donations = donations.with_columns(
  (pl.col("Value").str.replace_all(r",|£", "").cast(pl.Float32)),
  (pl.col("AcceptedDate").str.to_date("%d/%m/%Y")),
)
donations.select(
  ["Value", "AcceptedDate"]
).describe()
shape: (9, 3)
statistic Value AcceptedDate
str f64 str
"count" 70669.0 "70669"
"null_count" 0.0 "0"
"mean" 17936.255859 "2013-01-10 17:57:47.042000"
"std" 101902.71875 null
"min" 0.0 "2001-02-16"
"25%" 1500.0 "2007-11-09"
"50%" 3000.0 "2013-05-25"
"75%" 9545.0 "2018-06-01"
"max" 1e7 "2023-12-31"

Plots

Totals over time

I’m going to mostly work with the values of donations aggregated by year. A year is long enough to get a good number of donations to look at, but granular enough that we can see some patterns, like the big spikes for the 2010, 2019, and 2024 elections.

Show the code
donations.select(
  ["Value", "AcceptedDate"]
).sort(
  "AcceptedDate"
).group_by_dynamic(
  "AcceptedDate", every="1y"
).agg(
  pl.col("Value").sum().alias("Total Donations (£)")
).plot.bar(
  x="AcceptedDate",
  y="Total Donations (£)"
).properties(width="container")

I’m going to concentrate on the three main parties: the Conservatives, Labour, and Liberal Democrats. I’ve chucked in Reform UK, simply because the ~£11 million they were donated in 2019 put them, briefly, in the same league, financially.

Show the code
main_parties = ["Labour Party", "Conservative and Unionist Party", "Liberal Democrats", "Reform UK"]
party_colours = ["#d50000", "#0087dc", "#FDBB30", "cyan"]

donations.select(
  ["Value", "RegulatedEntityName", "AcceptedDate"]
).filter(
  pl.col("RegulatedEntityName").is_in(main_parties)
).sort(
  "AcceptedDate"
).group_by_dynamic(
  "AcceptedDate", every="1y", group_by="RegulatedEntityName"
).agg(
  pl.col("Value").sum().alias("Total Donations (£)")
).plot.bar(
  x="AcceptedDate",
  y="Total Donations (£)",
  color=alt.Color("RegulatedEntityName", scale=alt.Scale(domain=main_parties, range=party_colours)),
).properties(width="container")

Median donation value

The total donations aren’t the only interesting metric here. I want to know what the average value of a donation is here. I won’t use the mean, as the values are very skewed1, but the more robust median. I’ve included the interquartile range as bands either side of the line.

Show the code
donation_four_parties = donations.with_columns(
  pl.col("AcceptedDate").dt.year().alias("Year")
).filter(
  pl.col("RegulatedEntityName").is_in(main_parties)
)

line_four_parties = alt.Chart(donation_four_parties).mark_line().encode(
  x='Year',
  y='median(Value)',
  color=alt.Color("RegulatedEntityName", scale=alt.Scale(domain=main_parties, range=party_colours))
).properties(width="container")

band_four_parties = alt.Chart(donation_four_parties).mark_errorband(extent='iqr').encode(
  x='Year',
  y=alt.Y('Value').title('Median Value (£)'),
  color=alt.Color("RegulatedEntityName", scale=alt.Scale(domain=main_parties, range=party_colours))
)

band_four_parties + line_four_parties

I’ll admit, I included Reform UK for humour here. Let’s just look at the three major parties.

Show the code
donation_three_parties = donations.with_columns(
  pl.col("AcceptedDate").dt.year().alias("Year")
).filter(
  pl.col("RegulatedEntityName").is_in(main_parties[:-1])
)

line_three_parties = alt.Chart(donation_three_parties).mark_line().encode(
  x='Year',
  y='median(Value)',
  color=alt.Color("RegulatedEntityName", scale=alt.Scale(domain=main_parties[:-1], range=party_colours[:-1]))
).properties(width="container")

band_three_parties = alt.Chart(donation_three_parties).mark_errorband(extent='iqr').encode(
  x='Year',
  y=alt.Y('Value').title('Median Value (£)'),
  color=alt.Color("RegulatedEntityName", scale=alt.Scale(domain=main_parties[:-1], range=party_colours[:-1]))
)

band_three_parties + line_three_parties

Remarkably, between 2006 and 2012, the median value of donations to the Conservatives was greater than the 75th centile of Labour. Labour and the Liberal Democrats had similar distributions of donations until 2020. The post-coalition elections were not kind to the Lib Dems.

Show the code
donations.with_columns(
  pl.col("AcceptedDate").dt.year().alias("Year")
).select(
  ["Value", "Year", "NatureOfDonation"]
).sort(
  pl.col("Year")
).group_by(
  ["Year", "NatureOfDonation"]
).agg(
  pl.col("Value").sum()
).plot.line(
  x="Year",
  y="Value",
  color="NatureOfDonation"
).properties(width="container")

There’s a field “NatureOfDonation” in the dataset. There are a few kinds of state funding in here that don’t tell us anything about who is donating to the parties, so we can ignore those.

  • Short money: money given to opposition parties in the House of Commons
  • Cranbourne money: money given to opposition parties in the House of Lords
  • Assistance for Parties (Scottish Parliament): money given to parties in the Scottish Parliament
  • Financial Assistance for Parties in NI Assembly
  • Policy development grant

The main thing to take away is that most of the money doesn’t have a “NatureOfDonation”. There’s also “DonationType”

Show the code
donations.filter(
  pl.col("NatureOfDonation").is_null()
).select(
  ["DonationType", "Value"]
).group_by(
  "DonationType"
).agg(
  count=pl.len(),
  TotalValue=pl.sum("Value")
)
shape: (4, 3)
DonationType count TotalValue
str u32 f32
"Permissible Donor Exempt Trust" 51 447430.65625
"Cash" 58778 9.73968896e8
"Exempt Trust" 435 3.6146e6
"Visit" 19 276699.15625

That seems more important to us. I’m unclear on what “Visit” is, but the value is low, so I’ll leave it in. “Cash” is obvious (and something we care about). The other two are types of bequest left in a will.

Donations look broadly the same if we filter out the state funding.

Show the code
selected_donations = donations.with_columns(
  pl.col("AcceptedDate").dt.year().alias("Year"),
  pl.col("NatureOfDonation").fill_null(strategy="zero")
).select(
  ["DonationType", "NatureOfDonation", "Value", "Year", "RegulatedEntityName"]
).filter(
  [
    pl.col("NatureOfDonation").is_in(
      [
        "Assistance for Parties (Scottish Parliament)",
        "Cranborne Money (House of Lords)",
        "Financial Assistance for Parties in NI Assembly",
        "Policy Development Grant",
        "Short Money (House of Commons)",
        "Start Up Grant (Discontinued)"
      ]
    ).not_(),
    pl.col("RegulatedEntityName").is_in(main_parties)
  ]
)
Show the code
selected_donations_3 = selected_donations.filter(pl.col("RegulatedEntityName") != "Reform UK")
line_selected_donations_3 = alt.Chart(selected_donations_3).mark_line().encode(
  x='Year',
  y='median(Value)',
  color=alt.Color("RegulatedEntityName", scale=alt.Scale(domain=main_parties[:-1], range=party_colours[:-1]))
).properties(width="container")

band_selected_donations_3 = alt.Chart(selected_donations_3).mark_errorband(extent='iqr').encode(
  x='Year',
  y=alt.Y('Value').title('Median Value (£)'),
  color=alt.Color("RegulatedEntityName", scale=alt.Scale(domain=main_parties[:-1], range=party_colours[:-1]))
)

band_selected_donations_3 + line_selected_donations_3

OK, so much for the how the value of donations is distributed. It’s quite interesting that the Conservatives have received a fairly similar distribution of donations, except in election years, where the median value (and total) spikes. The exception is the last election cycle.

Meanwhile, Labour and the Liberal Democrats have had a similar distribution (though different total value) to one another until the last election cycle, where Labour’s donations have steadily increased, and the Liberal Democrats’ have dropped.

The Donors

Now, who is giving all this money? Here are the top 10 donors:

Show the code
who_donations = donations.select(
  pl.exclude(["AccountingUnitId",
              "AccountingUnitName",
              "AccountingUnitsAsCentralParty",
              "IsSponsorship",
              "RegulatedDoneeType",
              "IsReportedPrePoll",
              "IsBequest",
              "IsAggregation",
              "RegulatedEntityId",
              "CampaigningName",
              "IsIrishSource"])
).with_columns(
  pl.col("NatureOfDonation").fill_null(strategy="zero")
).filter(
  pl.col("NatureOfDonation").is_in(
      [
        "Assistance for Parties (Scottish Parliament)",
        "Cranborne Money (House of Lords)",
        "Financial Assistance for Parties in NI Assembly",
        "Policy Development Grant",
        "Short Money (House of Commons)",
        "Start Up Grant (Discontinued)"
      ]
  ).not_()
).sort("Value", descending=True)

biggest_donors = who_donations.select(
  ["DonorName", "RegulatedEntityName", "Value"]
).group_by(["DonorName", "RegulatedEntityName"]).sum().sort(
  "Value", descending=True
).pivot(
  index="DonorName",
  on="RegulatedEntityName"
).head(10)

biggest_donors[[s.name for s in biggest_donors if not (s.null_count() == biggest_donors.height)]]
shape: (10, 6)
DonorName Labour Party Conservative and Unionist Party Reform UK Liberal Democrats Co-operative Party
str f32 f32 f32 f32 f32
"Unite the Union" 4.8891412e7 null null null null
"UNISON" 3.7628308e7 null null null 10000.0
"GMB" 3.6300552e7 null null null null
"National Conservative Draws So… null 1.3818294e7 null null null
"Union of Shop Distributive and… 1.3506607e7 null null null null
"Union of Shop, Distributive an… 1.2255292e7 null null null null
"Lord John Sainsbury" null 1.1257435e7 null null null
"Mr Christopher Harborne" null null 1.019e7 null null
"Lord David Sainsbury" 5.071808e6 null null 1.0125e7 null
"Communication Workers Union" 9.473529e6 null null null null

So there’s a lot of money coming to Labour from the unions. This makes sense with the history of the Labour party.

There’s also the “National Conservative Draws Society”, which is a pretty sketchy way of concealing donations to political parties.

Then, there are the private donors. Journalists sometimes make a noise about how the unions and the Labour party are tied together, but there are three guys there who have donated about £35 million between them.

I spotted something here. Christopher Harborne made the news donating £1 million to Boris Johnson, which should have been recorded. There’s a “DonorId” field in the dataset. Maybe we’ll see the donation if we look at DonorId instead.

Show the code
biggest_donors_by_id = who_donations.select(
  ["DonorId", "RegulatedEntityName", "Value"]
).group_by(["DonorId", "RegulatedEntityName"]).sum().sort(
  "Value", descending=True
).pivot(
  index="DonorId",
  on="RegulatedEntityName"
).head(10)

biggest_donors_by_id[[s.name for s in biggest_donors_by_id if not (s.null_count() == biggest_donors_by_id.height)]]
shape: (10, 6)
DonorId Labour Party Conservative and Unionist Party Reform UK Liberal Democrats Co-operative Party
i64 f32 f32 f32 f32 f32
55387 2.5267816e7 null null null null
34448 1.8958894e7 null null null null
34447 1.5817447e7 null null null null
34446 1.2251777e7 null null null null
38732 null 1.0902435e7 null null null
45958 1.0799636e7 null null null null
87126 null null 1.019e7 null null
74671 null null null 1.0125e7 null
34196 null null null null 8.69897e6
36362 null 7824772.5 null null null

Still not there. If we look at all the Christophers Harborne in the data, we can see something I find very annoying.

Show the code
with pl.Config(tbl_rows=-1):
  display(who_donations.select(
    ["DonorId", "DonorName", "Value"]
  ).filter(
    pl.col("DonorName").str.contains("Harborne"),
    pl.col("DonorName").str.contains("Christopher")
  ).group_by(
    ["DonorName", "DonorId"]
  ).sum().sort("Value", descending=True))
shape: (12, 3)
DonorName DonorId Value
str i64 f32
"Mr Christopher Harborne" 87126 1.019e7
" Christopher Charles Sherriff … 95060 1.515e6
"Mr Christopher C S Harborne" 84977 62500.0
" Christopher C Harborne" 8987 50000.0
" Christopher C S Harborne" 84275 50000.0
" Christopher C Harborne" 12223 50000.0
" Christopher C Harborne" 4527 19994.0
"Mr Christopher C Harborne" 6441 19994.0
"Mr Christopher C Harborne" 997 10000.0
"Mr Christopher C Harborne" 12780 8000.0
"Mr Christopher C Harborne" 18353 8000.0
" Christopher C Harborne" 11283 400.0

Mr Christopher Charles Sherriff Harborne has donated with a bunch of variations on his name. What’s the point of a DonorId that doesn’t track this?! A couple of these might be other Christophers, but I’m willing to bet a lot of these are the same guy. It might not be deliberate obfuscation, but it should be clearer who is the same person donating multiple times.

Anyway, let’s look at a breakdown of donor types by party:

Show the code
who_donations.select(
  ["RegulatedEntityName", "DonorStatus", "Value"]
).filter(
  pl.col("RegulatedEntityName").is_in(main_parties)
).group_by(
  ["RegulatedEntityName", "DonorStatus"]
).sum().plot.bar(
  x=alt.X("RegulatedEntityName", axis=alt.Axis(labelAngle=-45)),
  y="Value",
  color="DonorStatus"
).properties(width="container")

It’s quite astonishing how much of their money Labour gets from the unions! If they didn’t get their union money, their donations would be pretty similar to the Liberal Democrats.

If you look at the top 20 individual donors, you can see another instance of someone donating twice with different names. You can also see that political donations at the top haven’t even heard of gender equality.

Show the code
with pl.Config(tbl_rows=20):
  biggest_donors_individual = who_donations.select(
    ["DonorName", "DonorStatus", "RegulatedEntityName", "Value"]
  ).filter(
    pl.col("DonorStatus") == "Individual"
  ).select(
    pl.exclude("DonorStatus")
  ).group_by(["DonorName", "RegulatedEntityName"]).sum().sort(
    "Value", descending=True
  ).pivot(
    index="DonorName",
    on="RegulatedEntityName"
  ).head(20)

  display(biggest_donors_individual[[s.name for s in biggest_donors_individual if not (s.null_count() == biggest_donors_individual.height)]])
shape: (20, 8)
DonorName Conservative and Unionist Party Reform UK Liberal Democrats Labour Party The Reclaim Party Sinn Féin Co-operative Party
str f32 f32 f32 f32 f32 f32 f32
"Lord John Sainsbury" 1.1257435e7 null null null null null null
"Mr Christopher Harborne" null 1.019e7 null null null null null
"Lord David Sainsbury" null null 1.0125e7 5.071808e6 null null null
"Lord David Sainsbury of Turvil… null null null 9.316e6 null null null
"Sir Michael L Davis" 5524949.5 null null null null null null
" Graham H Edwards" 5.450804e6 null null null null null null
" Francis X Hester" 5e6 null null null null null null
" Mohamed Younes Mansour Lotfy … 5e6 null null null null null null
"Sir Paul Getty" 5e6 null null null null null null
"Lord Michael S Farmer" 4.99453e6 null null null null null null
"Mr Gary Lubner" null null null 4.6695e6 null null 50000.0
"Mr JEREMY HOSKING" null null null null 3.623949e6 null null
"Mr John S Wheeler" 3.563301e6 null null null null null null
"Mr William E Hampton" null null null null null 3.462866e6 null
" Nigel Doughty" null null null 3.305201e6 null null null
"Lord James Roger Lupton" 3.2999e6 null null null null null null
"Mr John E Gore" 3.241499e6 null null null null null null
" Lakshmi Mittal" 75000.0 null null 3.125e6 null null null
"Mr John Griffin" 3.1085e6 null null null null null null
"Mr Malcolm S Healey" 3.01e6 null null null null null null
Show the code
labour_donations = [np.array(donations.filter(
  pl.col("RegulatedEntityName") == "Labour Party"
).with_columns(
  pl.col("AcceptedDate").dt.year().alias("Year")
).filter(
  pl.col("Year") == year
).select(
  "Value"
)).reshape(-1) for year in range(2001, 2024)]

con_donations = [np.array(donations.filter(
  pl.col("RegulatedEntityName") == "Conservative and Unionist Party"
).with_columns(
  pl.col("AcceptedDate").dt.year().alias("Year")
).filter(
  pl.col("Year") == year
).select(
  "Value"
)).reshape(-1) for year in range(2001, 2024)]

lib_dem_donations = [np.array(donations.filter(
  pl.col("RegulatedEntityName") == "Liberal Democrats"
).with_columns(
  pl.col("AcceptedDate").dt.year().alias("Year")
).filter(
  pl.col("Year") == year
).select(
  "Value"
)).reshape(-1) for year in range(2001, 2024)]

I don’t really have any conclusions here. I hadn’t realised quite how much money was coming in to political parties. This isn’t everything; if you donate below certain thresholds you don’t have to report anything. Given the eye-watering sums at the top though, you do get a good sense of how much money everyone is getting.

I also hadn’t realised how dependent the Labour party is on unions. Another time, I might look into why the median value of their donations has increased recently, but this is already long enough.

Bonus

I love a violin plot, or a box plot to compare distributions. However, this time, the values are so skewed that when I tried it I got this:

Show the code
fig, ax = plt.subplots()

lab_violins = ax.violinplot(labour_donations, [x-0.2 for x in range(2001,2024)], widths=0.5)
con_violins = ax.violinplot(con_donations, [x for x in range(2001,2024)], widths=0.5)
lib_dem_violins = ax.violinplot(lib_dem_donations, [x+0.2 for x in range(2001,2024)], widths=0.5)

for pc in lab_violins['bodies']:
  pc.set_facecolor(party_colours[0])
for pc in con_violins['bodies']:
  pc.set_facecolor(party_colours[1])
for pc in lib_dem_violins['bodies']:
  pc.set_facecolor(party_colours[2])

ax.set_xlabel("Accepted Year")
ax.set_ylabel("Value (£)")

ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.show()

The maximum values are so high that you can’t even see the distribution. In any case, they’re kind of boring, and just showing the median and interquartile ranges is more informative.

Show the code
fig, ax = plt.subplots()

lab_violins = ax.violinplot(labour_donations, [x-0.2 for x in range(2001,2024)], widths=0.5, showextrema=False)
con_violins = ax.violinplot(con_donations, [x for x in range(2001,2024)], widths=0.5, showextrema=False)
lib_dem_violins = ax.violinplot(lib_dem_donations, [x+0.2 for x in range(2001,2024)], widths=0.5, showextrema=False)

for pc in lab_violins['bodies']:
  pc.set_facecolor(party_colours[0])
for pc in con_violins['bodies']:
  pc.set_facecolor(party_colours[1])
for pc in lib_dem_violins['bodies']:
  pc.set_facecolor(party_colours[2])

ax.set_xlabel("Accepted Year")
ax.set_ylabel("Value (£)")
ax.set_ylim((0,2.5e5))

ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.show()

Footnotes

  1. breaking the illusion of me doing this “as live”, there’s a bit on how hilariously skewed the distribution is below↩︎