- Contributing author: Nicholette Pollard-Odle, Evaluation Officer
Imagine you want to understand the impact of taking different education pathways on a students’ future employability and earnings.
You could send a survey 10 years after students take their GCSEs. On a good day, you might get a few thousand responses. However, it is likely that the data collected would have two key issues – selection bias and recall bias. First, it’s likely that the people who are the type to respond to a survey about their employment status aren’t representative of the population as a whole. Maybe the individuals who are out of work and job searching are more likely to be actively checking their emails and to respond. Second, the respondents may not be accurate in their responses. Maybe some will not remember their starting salary or that they received an end-of-year bonus.
The good news is that the UK routinely collects data, called administrative data, which can be made available for research purposes. By making a sufficient case for the public benefit and following procedures that protect individuals’ privacy and rights, you can gain access to population-level datasets. These datasets allow you to answer questions such as ‘what are the returns to higher education in the UK’ without relying on individual recall.
New TASO guidance
To help make sense of how to use these datasets, TASO has released the ‘Administrative data guide’. It offers practical advice on using data to answer research questions such as the education pathways question outlined above.
The guide covers administrative data sources relevant to higher education including: The National Pupil Database, the Higher Education Statistics Agency, Longitudinal Education Outcomes (LEO) Data, The University College Admissions Service, and the Higher Education Access Tracker. We will update the guide as more relevant datasets become available.
The guide provides an overview of the key variables, populations, and time ranges covered by each dataset to help you narrow down which ones are most appropriate for your research question. It walks you through the access procedures and, where possible, provides concrete examples of how each dataset can be applied to answer research questions about inequalities in higher education. In addition, it includes case studies demonstrating how recent TASO projects have utilised administrative data – one of which is summarised below.
TASO case study: Using Longitudinal Education Outcomes (LEO) data to assess long-term outcomes
In a recent TASO-funded project, we commissioned State of Life and Mime to use the LEO to understand the extent to which taking part in higher education addresses existing equality gaps between advantaged and disadvantaged students. The study examined employment earnings and employment status among young people who pursued different educational pathways. The different pathways are defined as the highest level and type of qualification that students obtained, nine years after completing Key Stage 4.
The application process for accessing the LEO dataset is managed by the Office for National Statistics. To access the LEO standard extract, State of Life and Mime completed a full project application. The application covered the below stages:
- Describing the project purpose, research questions, proposed methods and the expected public benefit
- Identifying funders and commissioners of the project, the research team including confirmation that all members requiring access were accredited researchers and specifying the project timeline.
- Providing a description of the data sources that will be linked and submitting the LEO Iteration 1 Standard Extract variable request form specifying the required datasets and variables
- Outlining the planned publication and dissemination strategy
- Submitting the UK Statistics Authority data ethics self-assessment form.
State of Life and Mime requested linkage of four individual-level datasets that made up the LEO I1SE dataset for two cohorts of pupils who completed their GCSEs in 2002 and 2003:
- The National Pupil Database for school related data
- Higher Education Statistics Agency for higher education data
- Individual Learner Record for further education and apprenticeship data
- HMRC and DWP for employment and earnings data
These records comprised over one million learners of whom over 95% were matched across the four datasets using characteristics such as gender and attainment.
While this process required some up-front work to put together a strong application, the benefit is clear – you can access huge datasets with very high population and longitudinal coverage, without the added time, costs, and risks associated with survey data.