Analysing Parliamentary Questions Answered at Accountability Hack 2015

22 Nov 2015 | by Florian Rathgeber

A very frosty November weekend marked the end of Parliament Week and the fifth anniversary of the Accountability Hack, originally named UK Parliament Hack, organised by Tracy Green from the Parliament Digital Service, Nick Halliday from the National Audit Office and Terry Makewell from the Office for National Statistics with very active support from the RebelUncut crew.

Hack #Parliament's data and show us what you can do with it, sign up to #AccHack15 https://t.co/mcN02u4NbV
— UK Parliament (@UKParliament) November 13, 2015

Hackers and “armchair auditors” were invited to tackle four different challenges using a diverse set of open data sources:

NAO: Use spend data and any other data set to improve accountability.
Parliament: Best use of linked data to improve accountability.
ONS: Use the ONS OpenAPI to improve accountability.
Wildcard: Use any three open data sets to improve accountability.

And we're LIVE! #AccHack15 is go pic.twitter.com/QHjV7QWOUd
— RebelUncut (@RebelUncut) November 21, 2015

Many prospective participants were deterred by either the freezing cold or by issues with public transport, so not that many heard Meg Hillier MP give the introductory address. After that, ideas were thrown around and teams started forming. I joined Natalia, Mina and Emma, a brilliant trio who were working on a visualisation of Parliamentary Questions Answered and were looking for some help with crunching the data and classifying the quality of answers.

Wonderful to have @Meg_HillierMP sharing her passion for holding "people like me" accountable #AccHack15 pic.twitter.com/q98Q85HkPq
— RebelUncut (@RebelUncut) November 21, 2015

We first of all needed to pull all data from the Parliament’s Linked Data API in JSON format. Downloading all 63000 questions in batches of 500 (which is the maximum batch size the API allows unfortunately) by hand was of course not an option, so I started by implementing a download script in Python. Pulling down all questions took several hours due to the rather poor performance of the API.

In the evening, Kevin ran a very entertaining round of the MLH !LIGHT challenge, where each contestant has 15 minutes to re-create a given website (in our case it was the bootstrap front page) using a very bare bones browser based editor with no syntax highlighting or auto completion. No navigating away from the tab to bring up help and you don’t get to see a rendered preview of your creation until after you submitted.

The overnight stay at the NAO was again quite comfortable and we could use a shower in the morning. My team from the Saturday did sadly not come back, however John Sandall, a good friend and brilliant data scientist, arrived in time for breakfast. We discussed classifying answer quality with an N-gram analysis using a list of previously identified phrases commonly used to defer questions. An alternative would be training a text analysis model on the entire text corpus based on a training set of manually classified answers.

@emjan29 @minaorangina @NataliaLKB I’m working with @frathgeber at #AccHack15 we have a big flat CSV of MPs Q&A, you coming back for today?
— John Sandall (@John_Sandall) November 22, 2015

I’m at Accountability Hack at the @NAOorguk today #AccHack15 using @ukparlidata #data to hold lawmakers to account! pic.twitter.com/JKCQgzN33C
— John Sandall (@John_Sandall) November 22, 2015

Before getting to work on that we needed to transform the raw data into a suitable form and identify which attributes were relevant for our analysis. Halfway through doing that I realised the answer text was missing from the data and found out it was due to passing a query parameter to the API (_view=all), which included extra fields, but left out the actual answer data. By that point it was unrealistic to be able to rerun the entire download in time for show & tell.

Need to pull all data from @UKParliData again, because ?_view=all contrary to what you might think does *not* give you everything #AccHack15
— Florian Rathgeber (@frathgeber) November 22, 2015

Full MPs answered questions #OpenData from @ukparlidata as 62k row CSV (7MB compressed) -> https://t.co/JmuqMdl3k9 #AccHack15
— John Sandall (@John_Sandall) November 22, 2015

John did however still manage to run some statistical analysis on the data to answer many interesting questions. Meanwhile I turned my download script into a “proper” Python package using PyScaffold, uploaded the package to PyPI and the documentation to Read the Docs - just in time for going up on stage!

Here we go!! @gabysslave kicking off #AccHack15 show & tell pic.twitter.com/kkTCNOZ5jG
— RebelUncut (@RebelUncut) November 22, 2015

Sounds like we're going to get some robust feedback on @UKParliData now... #acchack15
— Dan Barrett (@dasbarrett) November 22, 2015

Quite a few extra spectators came along to attend the show & tell with a rather impressive lineup of 16 projects! I was on stage twice: First to present the results of our analysis of “Any Questions Answered?”, which had revealed some interesting insights. Jim Shannon MP asked the most questions, the Department of Health had to answer the most. The Foreign & Commonwealth Office was the slowest to respond. Nick Clegg’s questions were ignored the longest and the Prime Minister referred the highest proportion of questions. With more time to build up a training set by categorising some questions manually e.g. for quality or difficulty, we could have trained a Bayes classifier for the entire corpus.

#acchack15 next up any questions answered... pic.twitter.com/4KaEJHr64m
— Ranjan Balakumaran (@financialeyes) November 22, 2015

Fun fact: @nick_clegg ranks higher than any other MP for how long it takes to get questions answered (49 days on avg) #PoorNick #AccHack15
— John Sandall (@John_Sandall) November 22, 2015

"Have all these MPs always been so annoying?!" @John_Sandall Love it #AccHack15
— RebelUncut (@RebelUncut) November 22, 2015

Later I went on stage again to present DDPy, a command line interface to interact with the Parliament Linked Data API, which had evolved out of my download script to pull the Parliamentary Questions Answered. I decided to solve this once and for all and, wrote a generic downloader in Python and put it on PyPI. Now anyone can easily download any data set after a simple

pip install ddpkuk

Ok so *now* @frathgeber is going to give us some robust feedback on @ukparlidata #acchack15
— Dan Barrett (@dasbarrett) November 22, 2015

DDPy from @frathgeber #AccHack15 https://t.co/YeBULBhRrt
— RebelUncut (@RebelUncut) November 22, 2015

Read how to use DDPy to access @UKParliData #data at https://t.co/iJTmrdEv89 and get the code on @GitHub https://t.co/ufLZdFP0pk #AccHack15
— Florian Rathgeber (@frathgeber) November 22, 2015

The judges apparently came away quite impressed with both our presentations since we received an honourable mention for the “Best Analysis of Parliamentary Data” for Any Questions Answered? and the “Best Tool for the Community” for DDPy. As if that wasn’t enough, I was quite touched for also being awarded a “Community Spirit Prize”. It was (and continues to be) an honour and pleasure serving the community!

Got an honourable mention at #AccHack15 for "Any Question Answered?" our analysis of Parliamentary Questions Answered /w @John_Sandall
— Florian Rathgeber (@frathgeber) November 22, 2015

Got another honourable mention at #AccHack15 for DDPy as a useful tool for the #hackathon community. So go & use it! https://t.co/80K4puh9Wh
— Florian Rathgeber (@frathgeber) November 22, 2015

No end of awards today it seems ;) Many thanks to @RebelUncut @nickmhalliday and @greentrac for running #AccHack15 https://t.co/9hvSrkubyY
— Florian Rathgeber (@frathgeber) November 22, 2015

Florian Rathgeber's site

Not sorry for the inconvenience

Analysing Parliamentary Questions Answered at Accountability Hack 2015

Any Questions Answered?

DDPy - data.parliament.uk for Humans

Other resources

Florian Rathgeber's site

Not sorry for the inconvenience

Analysing Parliamentary Questions Answered at Accountability Hack 2015

Any Questions Answered?

DDPy - data.parliament.uk for Humans

Other resources

Related Posts

Save all voicebox messages from your FRITZ!Box 7530 router 28 Dec 2023

Commands to replace the deprecated `apt-key` script 01 Jan 2021

How to create a transcript of a Google Support chat 30 Dec 2020