Imperial Cbig data 2ollege Healthcare NHS Trust and Imperial College London are part of a new programme to pool anonymised data from NHS trusts for research.

The National Institute for Health Research Health Informatics Collaborative (NIHR HIC) is working to make anonymised NHS clinical data more readily available to researchers. This is done by ensuring data is in a standard format alongside adhering to strict procedures to guarantee the data is non-identifiable. This will enable researchers to gain new insights into areas such as the effectiveness of different treatments and what factors influence patient outcomes and recovery.

The collaboration is between five leading NHS trusts, each of which has a strong relationship with a partner university.

Ben Glampson, NIHR-HIC programme manager at Imperial College Healthcare NHS Trust, answers questions about the challenges, potential and progress of the collaborative’s work.
What are the goals of the NIHR-HIC?

A huge amount of data is routinely captured as part of the direct clinical care of patients. There is huge potential in this information in terms of investigating the effectiveness of different treatments and influences on patient outcomes. However, it’s currently difficult for researchers to access and analyse the data. The NIHR-HIC is pooling this data from five NHS trusts in order to maximise its use for research that will benefit patients, whilst providing assurance that the data is de-dentified and its use complies with the law, such as the Data Protection Act.

How might research using ‘big data’ affect patients’ treatments and clinical procedures?

Our goal is to improve patient outcomes and patient experience, and the only way to do that is to understand what’s happening to patients. This is in terms of assessing the effectiveness of current treatments and their outcomes over larger groups.

All the research projects that get approved will have some focus on improving outcomes, improving treatment, saving money for the NHS via shortened treatment time or reducing the likelihood of readmission. The goal really is translational research, so changing the way healthcare is delivered based on understanding what’s happened, and the only way to do this is using good data. As such, we need data in the best possible form to make it easily accessible to researchers. This will reduce the time researchers spend getting hold of data and increase the time they can spend actually analysing and reviewing it, and eventually moving the results through to implementing treatments and procedures and changes to practice.

It’s a long term goal and so we won’t see changes happening overnight, but by telling people what’s happening now and what’s happened previously we can inform decisions that are made in the future.

And can it be used to inform care pathways and new models?

Potentially, yes. So, the other side of ’big data’ is around understanding and refining day-to-day processes that happen within hospitals. For example, we are looking at patient care pathways and the data collected at each stage of the patient journey. Clinicians can then use that analysis to work out if there are ways to improve the patient’s journey and whether there are any delays or bottlenecks that could be addressed.

Why is the collaboration between five hospital Trusts so important?

It provides us with much larger sample sizes for research so we can be more confident that the sample represents the populations we are researching and covers all different groups. For example, for the database on acute coronary syndrome, we have about quarter of a million records.

As the data has already been collected, it would be a shame not to use such a valuable source of information to help improve healthcare.  Also, as the data already exists, it means that research is more cost-effective as we do not have to invest in resources to collect data from scratch.

What have the challenges been so far in setting up the NIHR-HIC?

Probably one of the major challenges is getting the data into a standard format so it can be pooled. The five partner trusts do not collect data in the same way: it varies not only between trusts but also within trusts. Data collected is recorded in different units and for slightly different pathways and processes, data are stored in different ways and in multiple, differing systems. To overcome this we have created a standardised  data models and asked the partner trusts to map their existing data onto this.

How do you ensure the data remain anonymous?

We never make identifiable data available to researchers. There are two main types of ‘identifiers’: direct identifiers are those such as name, date of birth, NHS number whilst indirect identifiers are those such as date of admission and date of operation.

So for example in the acute coronary syndrome dataset, researchers won’t get dates but instead they get numbers that correspond to dates. We also strip out NHS numbers, names and addresses, and follow a procedure called pseudonymisation. This means identifiable data is encrypted and, unless you know the ‘hash key’ to unlock the encryption, you won’t be able to get the original information. This means that in cases where there may be benefits to the patient, we can re-identify them.

And when might it be useful to re-identify data in this way?

It’s rare but, for example, it could be that during research it was found that the patient had another condition that could be life-threatening and that could be fed back to the patient’s clinician who could make an assessment on how to contact the patient. Or, if through research, a certain risk factor was discovered for a group of patients that means they’re more likely to develop certain treatable conditions we could re-identify the patients and feed this back to their direct care team. If these patients could benefit, then it would be down to their direct care team to contact them.