There and back again: a spreadsheets’ tale

data analysis
prison map
In which I complain about spreadsheets, normalize a database, then make another spreadsheet
Published

2024-03-14

This post might count as a confession. I always complain about spreadsheets, and that the more ornate you make them the more brittle they are. Here’s how I’ve made a pretty complicated spreadsheet, and why I did it. This is quite a long one, and if you aren’t interested in hearing why I don’t think spreadsheets are the best solution to the problems they are applied to at great length, maybe give this one a miss. If you want to read some of my thought process approaching a problem, and about my gifts as a cheapskate, then please continue.

The Data

I’m working with a friend who’s a sociologist on an interactive map of UK prisons. In a previous step, I scraped Wikipedia for the coordinates of each listed prison and put a dot on the map for each one. My friend also sent me a spreadsheet with a lot more information about prisons, including some that are closed. He’s also interested in immigration detention, so there’s information about removal centres etc. in there.

Let’s look at the spreadsheet:

Show the code
import pandas as pd
tom_df = pd.read_excel("data/Prison database.xlsb.xlsx", sheet_name="eprison")
tom_df.dropna(axis=1, how='all', inplace=True)
tom_df.drop(['Column1', *['Column'+str(i) for i in range(3,10)]], axis = 1, inplace=True)
tom_df['Name'] = tom_df['Name'].str.title().str.replace(' Sch', ' SCH')
tom_df.head()
Prison Name Name Area Type Operator Opened Operational Capacity Closed Notes
0 ALDINE SCH Aldine NaN Secure Children's Home NaN NaN NaN NaN NaN
1 ATKINSON UNIT SCH Atkinson Unit SCH NaN Secure Children's Home NaN NaN NaN NaN NaN
2 AYCLIFFE SCH Aycliffe SCH NaN Secure Children's Home NaN NaN NaN NaN NaN
3 BARTON MOSS SCH Barton Moss SCH NaN Secure Children's Home NaN NaN NaN NaN NaN
4 BEECHFIELD SCH Beechfield SCH NaN Secure Children's Home NaN NaN NaN NaN NaN

So there are some more columns here, like the prison type and capacity. I’ll merge this with my coordinate data. Then I’ll make the prison names a bit more pleasant to read by first making them title case, rather than ALL CAPS, and then make the initialisms (“SCH” for Secure Children’s Home etc.) uppercase.

Show the code
coordinate_df = pd.read_csv("data/prison_coordinates.csv")
coordinate_df.rename(columns = {'Unnamed: 0': 'Name'}, inplace=True)
coordinate_df.drop("Coordinates", axis = 1, inplace=True)

prison_with_coord_df = pd.merge(left=coordinate_df.drop(columns = 'Operator'),
                  right=tom_df,
                  on='Name',
                  how='outer')

prison_with_coord_df['Prison Name'] = prison_with_coord_df['Prison Name'].str.title()

prison_with_coord_df['Prison Name'] = prison_with_coord_df['Prison Name'].replace({'Hmp': 'HMP',
                                                                ' Sch': ' SCH',
                                                                'HMPYOI': 'HMPYOI',
                                                                'Yoi ': 'YOI ',
                                                                'Hmyoi': 'HMYOI',
                                                                'Stc': 'STC',
                                                                'Irc': 'IRC',
                                                                "'S": "'s"
                                                                }, regex=True)

prison_with_coord_df.loc[~prison_with_coord_df['latitude'].isnull()].head()
Name Capacity latitude longitude Prison Name Area Type Operator Opened Operational Capacity Closed Notes
1 Addiewell 700.0 55.846667 -3.599167 NaN NaN NaN NaN NaN NaN NaN NaN
4 Altcourse 1324.0 53.461944 -2.935556 HMP Altcourse 31.83 hectares Local adult male prison Private - G4S 1997 1,184 (Jan 2021) NaN NaN
5 Ashfield 400.0 51.481389 -2.439722 HMP Ashfield 6.29 hectares Category C adult male prison Private - Serco&nbsp 1999 412 (Jan 2021) NaN NaN
7 Askham Grange 128.0 53.925833 -1.184444 HMP/ YOI Askham Grange 4.08 hectares Open adult female prison Public - Ministry of Justice 1947 128 (Jan 2021) NaN NaN
10 Aylesbury 443.0 51.821944 -0.799722 HMYOI Aylesbury 8.89 hectares Male closed Young Offender Institution Public - Ministry of Justice 1847 229 (April 2021) NaN NaN

That’s a bit nicer, isn’t it? There are still some things that aren’t ideal about this dataset.

Improving the data model

Room for improvement

Show the code
show_multiple = prison_with_coord_df[['Prison Name', 'Operational Capacity']]
show_multiple['cap_len'] = show_multiple['Operational Capacity'].apply(lambda x: len(x.split(' ')) if type(x) == str else 0)
show_multiple.sort_values(by='cap_len', ascending=False).drop(columns=['cap_len']).head(10)
Prison Name Operational Capacity
132 The Marshalsea 34 (1802) (with 8 wives and seven children)
24 Brook House Immigration Removal Centre 429 (June 2010), 448 (2013), 450 (2022)
199 HMP Wellingborough 525 (Aug 2003), 548 (August 2010)
30 Campsfield House Immigration Removal Centre 199 (1997), 216 (2008), 257 (2018)
31 HMP Canterbury 314 (as of August 2008)
72 HMP Fosse Way 1930 (planned in 2023)
157 HMP/ YOI Prescoed 252 (Oct 2017)
158 HMP Preston 680 (Jan 2021)
160 HMP Ranby 1032 (Jan 2021)
97 HMP Holme House 1159 (Jan 2021)

This is the first issue. Some columns, including the Operational Capacity, contain multiple pieces of data. The problem, of course, is that the prisons can change over time. I did try to parse this1 and keep the structure, but the underlying problem is that a single table is insufficient to deal with the nature of prisons over time.

A better data model

To get things in a more reasonable format, let’s use the concept of a data model. This will let us think about what aspects of a prison we really want to capture. Here are the facts:

  1. More than one prison can have existed on one site (e.g. HMP Fosse Way was built on the site of HMP Glen Parva)
  2. A prison can change capacity over time. A new wing can be added, or another closed.
  3. A prison can change operator over time. Some have been turned over to private operators, others go from being a prison to some kind of immigration detention centre and back again.
  4. A prison can change type over time. The classification of prisoner security can change, or the age or gender of inmates can change

This means we can represent our data like this:

erDiagram
  SITES ||--o{ PRISONS : contains
  SITES {
    int site_id
    string site_name
    float latitude
    float longitude
    float area
  }
  PRISONS {
    int prison_id
    string prison_name
    int site_id
  }
  PRISONS ||--o{ TYPES : has
  TYPES {
    int prison_id
    string type
    date start_date
    date end_date
  }
  OPERATOR_DATES }o--|| PRISONS : operates
  OPERATOR_DATES {
    int operation_id
    string operator
    int prison_id
    date start_date
    date end_date
  }
  PRISONS ||--o{ CAPACITIES : has
  CAPACITIES {
    int capacity_id
    int prison_id
    int operational_capacity
    date start_date
    date end_date
  }

Here we have split one table into multiple tables. This allows each prison to be represented once in the prisons table, and the ways each prison changes over time in different ways to be represented separately. In the “Types”, “Operator Dates”, and “Capacities” tables, the prison_id field means you can relate each entry to a prison.

This way of representing the data scratches an itch in my brain. In data terms, this is called a “data schema”. The original data was sort of in one too, if not consistently. This particular schema is quite simple, and known as a star schema. In this terminology, the prisons table is a fact table, and the others are dimension tables. I could further split these up so the dimension tables had further dimensions off these, but I think for this relatively small dataset, this is sufficient.

Using the star schema we can capture the ways a prison can change over time independently (even if changes often occur together) in a reliable way.

Data wrangling

So how do we move from the “flat” table in the spreadsheet to the star schema?

Sites table

The first thing to do is to make an entry for each site. In the current data, there are only two prisons that share a site, but I suspect as data is added we may well find more.

Show the code
site_df = pd.DataFrame({
    'site_name': prison_with_coord_df['Name'].unique()
})

site_df = site_df.merge(prison_with_coord_df[['Name', 'Area', 'latitude', 'longitude']], left_on='site_name', right_on='Name').drop(columns='Name')
site_df['area'] = site_df['Area'].apply(lambda x: float(x.split(' ')[0]) if type(x) == str else x)
site_df.drop(columns='Area', inplace=True)
site_df.drop_duplicates(inplace=True)
site_df['site_id'] = site_df.index + 1
site_df.head()
site_name latitude longitude area site_id
0 Abingdon NaN NaN NaN 1
1 Addiewell 55.846667 -3.599167 NaN 2
2 Aldine NaN NaN NaN 3
3 Aldington NaN NaN NaN 4
4 Altcourse 53.461944 -2.935556 31.83 5

I also did a simple bit of data cleaning here. The area column was given as a string with “X hectares”. As each entry was given in hectares, I’ve removed that and made it a number.

Prisons table

The central table of the schema is pretty simple. All I’ve done here is take the prison name, added a column to generate a prison ID for the rest of the tables, then joined it with the sites table so we can find the site it was built on.

Show the code
prisons_df = pd.DataFrame({
    'prison_name': prison_with_coord_df['Prison Name'],
    'site': prison_with_coord_df['Name']
})
prisons_df = prisons_df.merge(site_df[['site_name', 'site_id']], left_on = 'site', right_on = 'site_name').drop(columns=['site', 'site_name'])
prisons_df['prison_id'] = prisons_df.index+1
prisons_df.dropna(inplace=True)

prisons_df.head()
prison_name site_id prison_id
0 HMP Abingdon 1 1
2 Aldine SCH 3 3
3 HMP Aldington 4 4
4 HMP Altcourse 5 5
5 HMP Ashfield 6 6

Types table

Making this dimension table is pretty simple too. This is only true because the complexity prison histories isn’t held in this dataset!

Show the code
types_df = prison_with_coord_df[['Prison Name', 'Type', 'Opened', 'Closed']]
types_df.columns = ['prison_name', 'prison_type', 'start_date', 'end_date']

types_df = types_df.merge(prisons_df, on = 'prison_name').drop(columns=['prison_name', 'site_id'])

types_df
prison_type start_date end_date prison_id
0 NaN 1811 1868.0 1
1 Secure Children's Home NaN NaN 3
2 Category C adult male prison 1947 1999.0 4
3 Local adult male prison 1997 NaN 5
4 Category C adult male prison 1999 NaN 6
... ... ... ... ...
176 Local adult male prison 1849 NaN 205
177 Core local adult male prison 1992 NaN 206
178 Local adult male prison 1891 NaN 208
179 Category C adult male prison 1979 NaN 209
180 Immigration Removal Centre NaN NaN 210

181 rows × 4 columns

There’s no data in the original about how the types changed over time, exept sporadically in notes. For example, HMP Morton Hall has changed type at least five times since 1985, but this isn’t captured. For now, I’m just taking the type provided and pretending that has been the type for the history of the prison. In a later stage, the histories will be captured more fully. I’ll also restrict the possible values for type, as there are defined types a prison can have, and the type column in the spreadsheet isn’t consistent with this. For now, this will do.

Operator table

This one isn’t hard either, and just needs a bit of cleaning

Show the code
operator_df = prison_with_coord_df[['Prison Name', 'Operator', 'Opened', 'Closed']]

operator_df.columns = ['prison_name', 'operator', 'start_date', 'end_date']
operator_df['operator'] = operator_df['operator'].apply(lambda x: x.split('- ')[1] if type(x) == str and len(x.split('- ')) > 1 else x)

operator_df['operator'] = operator_df['operator'].replace({'Ministry of Justice': 'His Majesty\'s Prison Service', 'Serco&nbsp':'Serco'})

operator_df = operator_df.merge(prisons_df[['prison_name', 'prison_id']], on='prison_name', how='left')
operator_df['operation_id'] = operator_df.index+1
operator_df = operator_df.loc[~operator_df['prison_name'].isnull()].drop(columns=['prison_name'])

operator_df
operator start_date end_date prison_id operation_id
0 NaN 1811 1868.0 1.0 1
2 NaN NaN NaN 3.0 3
3 His Majesty's Prison Service 1947 1999.0 4.0 4
4 G4S 1997 NaN 5.0 5
5 Serco 1999 NaN 6.0 6
... ... ... ... ... ...
204 His Majesty's Prison Service 1849 NaN 205.0 205
205 His Majesty's Prison Service 1992 NaN 206.0 206
207 His Majesty's Prison Service 1891 NaN 208.0 208
208 His Majesty's Prison Service 1979 NaN 209.0 209
209 NaN NaN NaN 210.0 210

181 rows × 5 columns

The main thing here is that whether the prison was privately or publicly owned was included in the column. I’ve removed this, then have done some tidying. I later had to go through manually and correct some spelling mistakes. Some stuff is easier not to bother programming in.

This presents an opportunity to show the strengths of making a more complex schema. If we wanted to keep track whether a prison was privately or publicly operated, we could add another table to the model to represent this.

erDiagram
  SITES ||--o{ PRISONS : contains
  SITES {
    int site_id
    string site_name
    float latitude
    float longitude
    float area
  }
  PRISONS {
    int prison_id
    string prison_name
    int site_id
  }
  PRISONS ||--o{ TYPES : has
  TYPES {
    int prison_id
    string type
    date start_date
    date end_date
  }
  OPERATOR_DATES }o--|| PRISONS : operates
  OPERATOR_DATES {
    int operation_id
    string operator
    int prison_id
    date start_date
    date end_date
  }
  PRISONS ||--o{ CAPACITIES : has
  CAPACITIES {
    int capacity_id
    int prison_id
    int operational_capacity
    date start_date
    date end_date
  }
  OPERATOR_TYPE ||--o{ OPERATOR_DATES : public
  OPERATOR_TYPE {
    string operator
    bool public
  }

I’m not doing this for now, as there are not many operators and the purpose of the model doesn’t require it.

Capacity table

This might be the messiest one. We do have multiple pieces of data, as we’ve seen above. What I’m doing for now is pretending each prison hasn’t changed over time, and this is mostly because the dates provided are mostly when a capacity was recorded, not when the change happened. Unfortunately for my sociologist friend, this is going to mean a lot of digging and manual data entry, unless we find a new data source!

Show the code
capacities_df = prison_with_coord_df[['Prison Name', 'Operational Capacity', 'Opened', 'Closed']]
capacities_df.columns = ['prison_name', 'operational_capacity', 'start_date', 'end_date']

capacities_df = capacities_df.merge(prisons_df, on='prison_name', how='left').drop(columns=['prison_name', 'site_id'])

capacities_df['operational_capacity'] = capacities_df['operational_capacity'].apply(lambda x: x.split(' ')[0] if type(x) == str else x)
capacities_df['operational_capacity'] = capacities_df['operational_capacity'].str.replace(r'[^1-9]','', regex=True)
capacities_df['operational_capacity'] = capacities_df['operational_capacity'].apply(lambda x: int(x) if type(x) == str and len(x) > 0 else None)
capacities_df['capacity_id'] = capacities_df.index+1

capacities_df
operational_capacity start_date end_date prison_id capacity_id
0 NaN 1811 1868.0 1.0 1
1 NaN NaN NaN NaN 2
2 NaN NaN NaN 3.0 3
3 NaN 1947 1999.0 4.0 4
4 1184.0 1997 NaN 5.0 5
... ... ... ... ... ...
205 535.0 1992 NaN 206.0 206
206 NaN NaN NaN NaN 207
207 115.0 1891 NaN 208.0 208
208 12.0 1979 NaN 209.0 209
209 NaN NaN NaN 210.0 210

210 rows × 5 columns

What now?

We have the data in the tables as required, with the relationships between tables established.

The ideal world

What we have here is a great case for using a relational database. Relational Database Management Systems are well-established pieces of software that make it easy for you to add entries to a system like this while maintaining data integrity, and have guard rails in place to stop you from doing something that breaks your system.

The real world

More important than protecting users from themselves, and providing a convenient interface is the ability to share it with my collaborator. There are free database hosting services, but these are mostly trials, and often involve learning a proprietary system. Then I have to set up administration so that my friend can edit it and protect it from vandals. Because I am both stupid and lazy, I’m going to use a stupid and lazy solution: another spreadsheet.

Back to a spreadsheet

Google sheets has the features I want - I can share it with my friend and Google handles authentication for me. I can mitigate some of the risks of using a spreadsheet by applying data validation rules. For example, as there are a limited set of options, I can restrict that column of the spreadsheet to only accept those options. Sheets also makes these into a snazzy drop-down menu:

I have also made it so it shouts at you if you try to duplicate entries by adding a prison twice or trying to reuse a prison ID, for example.

The difficulty comes with adding entries. Human beings aren’t made to remember that prison 157 is Hassockfield STC. This means that when you’re adding information about a prison, you would need to switch between sheets and check IDs all over the place.

Apps Script

Another feature Google Sheets offers is Apps Script. This is basically the possibility of using javascript to build your own extensions. The way this works is that you provide some code that runs in your sheet, and they provide an API for accessing the data in your sheet. Very clever people can use this to link together Google’s services and even build full web apps. This isn’t something I need to scale, so I’ve made a fairly minimal one, which is just a little sidebar that pops up and tells you about a prison you care about:

This is something of a work in progress. If the requirements of the project change, I can make a full user interface, as the Sheets API supports modifying the spreadsheet.

Here’s the code if you want a look. It’s very much a hack, so I wouldn’t recommend it2.

Show the code
function onOpen() {
  var ui = SpreadsheetApp.getUi();
  var menu = ui.createMenu('Prison Data');
  menu.addItem('Select Prison', 'showSidebar');
  menu.addToUi();
}

function showSidebar() {
  var html = HtmlService.createHtmlOutputFromFile('Sidebar')
      .setTitle('Prison Information');
  SpreadsheetApp.getUi().showSidebar(html);
}

function prisonOptions() {
  var prisonsSheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Prisons');
  var prisonData = prisonsSheet.getDataRange().getValues();
  return prisonData.map((row) => row[1])
}

function getPrisonID(prisonName) {
  var prisonsSheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Prisons');
  var prisonData = prisonsSheet.getDataRange().getValues();
  var prisonRow = prisonData.find(row => row[1] === prisonName);
  
  if (!prisonRow) {
    return 'Prison not found';
  }
  
  return prisonRow[0];
}

function getSiteInfoForPrison(prisonName) {
  var prisonsSheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Prisons');
  var prisonData = prisonsSheet.getDataRange().getValues();
  var prisonRow = prisonData.find(row => row[1] == prisonName);

  var siteID = prisonRow[2];
  var siteSheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Sites');
  
  var siteData = siteSheet.getDataRange().getValues();
  var siteRow = siteData.find(row => row[0] === siteID);
  
  if (!siteRow) {
    return 'Site not found';
  }
  //Return an object of site information
  return {
    siteName: siteRow[1],
    latitude: siteRow[2],
    longitude: siteRow[3],
    area: siteRow[4],
  }
}

function getTypeInfoForPrison(prisonID) {
  var typeSheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Types');
  
  var typeData = typeSheet.getDataRange().getValues();
  var typeRows = typeData.filter(row => row[0] === prisonID);
  return typeRows.map(row => {return {
    type: row[1],
    start_date: /[\d]{4}/m.test(String(row[2])) ? String(row[2]).match(/[\d]{4}/m)[0] : "",
    end_date: /[\d]{4}/m.test(String(row[3])) ? String(row[3]).match(/[\d]{4}/m)[0] : ""
  }});
}

function getOperatorDatesForPrison(prisonID) {
  var opDatesSheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Operator dates');
  var opDatesData = opDatesSheet.getDataRange().getValues();
  var opDatesRows = opDatesData.filter(row => row[2] === prisonID);
  
  var datesobj = opDatesRows.map(row => {return {
    operator: row[1],
    start_date: /[\d]{4}/m.test(String(row[3])) ? String(row[3]).match(/[\d]{4}/m)[0] : "",
    end_date: /[\d]{4}/m.test(String(row[4])) ? String(row[4]).match(/[\d]{4}/m)[0] : ""
  }});

  Logger.log(datesobj);
  return datesobj
}

function getCapacitiesForPrison(prisonID) {
  var capacitiesSheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Capacities');
  var capacitiesData = capacitiesSheet.getDataRange().getValues();
  var capacityRows = capacitiesData.filter(row => row[1] === prisonID);
  
  return capacityRows.map(row => {return {
    operationalCapacity: row[2],
    start_date: /[\d]{4}/m.test(String(row[3])) ? String(row[3]).match(/[\d]{4}/m)[0] : "",
    end_date: /[\d]{4}/m.test(String(row[4])) ? String(row[4]).match(/[\d]{4}/m)[0] : ""
  }});
}
And the html
<!DOCTYPE html>
<html>
  <head>
    <style>
      table {
        border-collapse: collapse;
        border: 2 px solid black;
      }
      table td, table th {
        border: 1px solid black;
      }
    </style>
  </head>
  <body>
    <select id="prisonDropdown" onchange="changeID()">
      <option value="">Select a Prison</option>
    </select>
    <div id="prisonID"></div>
    <div id="siteInfo"></div>
    <div id="typeInfo"></div>
    <br>
    <div id="operatorDatesInfo"></div>
    <br>
    <div id="capacitiesInfo"></div>

    <script>
      // Function to populate the dropdown with prison options
      function populatePrisons(prisons) {
        var dropdown = document.getElementById("prisonDropdown");

        // Clear existing options
        dropdown.innerHTML = "";

        // Add options for each prison
        prisons.forEach(function(prison) {
          var option = document.createElement("option");
          option.value = prison;
          option.text = prison;
          dropdown.appendChild(option);
        });
      }

      // Function to handle error
      function showError(error) {
        console.error("Error:", error);
      }

      // Function to show site info for the selected prison
      function changeID() {
        var selectedPrison = document.getElementById("prisonDropdown").value;

        google.script.run.withSuccessHandler(displayPrisonID)
                          .withFailureHandler(showError)
                          .getPrisonID(selectedPrison);
        google.script.run.withSuccessHandler(displaySiteInfo)
                          .withFailureHandler(showError)
                          .getSiteInfoForPrison(selectedPrison);

      }

      function showInfo() {
        var prisonID = parseInt(document.getElementById("prisonID").innerText.slice(11));
        
        google.script.run.withSuccessHandler(displayTypeInfo)
                          .withFailureHandler(showError)
                          .getTypeInfoForPrison(prisonID);
        google.script.run.withSuccessHandler(displayOperatorDates)
                          .withFailureHandler(showError)
                          .getOperatorDatesForPrison(prisonID);
        google.script.run.withSuccessHandler(displayCapacities)
                          .withFailureHandler(showError)
                          .getCapacitiesForPrison(prisonID);

      }

      function displayPrisonID(prisonID) {
        var prisonIDDiv = document.getElementById("prisonID");
        prisonIDDiv.innerHTML = `<p>Prison ID: ${prisonID}</p>`;
        // Trigger the showInfo function after updating the prisonID div
        showInfo();
      }

      // Function to display site info in the "siteInfo" div
      function displaySiteInfo(siteInfo) {
        var siteInfoDiv = document.getElementById("siteInfo");
        siteInfoDiv.innerHTML = `
          <p>Site Name: ${siteInfo.siteName}</p>
          <p>Latitude: ${siteInfo.latitude}</p>
          <p>Longitude: ${siteInfo.longitude}</p>
          <p>Area: ${siteInfo.area}</p>
        `;
      }


      // Function to show type info in the "typeInfo" div
      function displayTypeInfo(typeInfo) {
        var typeInfoDiv = document.getElementById("typeInfo");
        var tableHTML = "<table style=\"border: 1px solid black\"><tr><th>Type</th><th>Start Date</th><th>End Date</th></tr>";

        typeInfo.forEach(function(type) {
          tableHTML += `<tr><td>${type.type}</td><td>${type.start_date}</td><td>${type.end_date}</td></tr>`;
        });
        
        tableHTML += "</table>";
        typeInfoDiv.innerHTML = tableHTML;
      }

      function displayOperatorDates(operatorDates) {
        var operatorDatesInfoDiv = document.getElementById("operatorDatesInfo");
        var tableHTML = "<table><tr><th>Operator</th><th>Start Date</th><th>End Date</th></tr>";
        operatorDates.forEach(function(dateInfo) {
          tableHTML += `<tr><td>${dateInfo.operator}</td><td>${dateInfo.start_date}</td><td>${dateInfo.end_date}</td></tr>`;
          });
          tableHTML += "</table>";
          operatorDatesInfoDiv.innerHTML = tableHTML;
        console.log(operatorDates);
      }

      function displayCapacities(capacites) {
        var capacitesDiv = document.getElementById("capacitiesInfo");
        var tableHTML = "<table><tr><th>Capacity</th><th>Start Date</th><th>End Date</th></tr>";
        capacites.forEach(function(dateInfo) {
          tableHTML += `<tr><td>${dateInfo.operationalCapacity}</td><td>${dateInfo.start_date}</td><td>${dateInfo.end_date}</td></tr>`;
          });
          tableHTML += "</table>";
          capacitesDiv.innerHTML = tableHTML;
      }
      // Call the prisonOptions function in the Google Apps Script
      // and populate the dropdown with the retrieved prison data
      google.script.run.withSuccessHandler(populatePrisons)
                        .withFailureHandler(showError)
                        .prisonOptions();
      
    </script>
  </body>
</html>

Conclusion

So what is the takeaway here? Just use spreadsheets? No. I’m doing my best not to use the spreadsheet as a spreadsheet. I guess my conclusion here is that it’s hard to argue with free hosting where someone else handles security for you!

Footnotes

  1. I used a horrible regular expression: ((,)?(>)?|(?:(,)? (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) )|(?:(,)? )) then used the dataframe’s explode method↩︎

  2. particularly the bit where I don’t bother parsing the datetime format to read the year from cells, and use a regular expression instead↩︎