Research in the spotlight

Federated learning in the healthcare setting

Interview with Sebastian van der Voort, researcher on Quality of care (IT systems), methods in medical informatics and Reusable Health Data.

In healthcare, many data scientists, clinicians and researchers are using machine learning techniques to develop prediction models. However, developing a model with data from multiple institutes comes with many privacy and security issues. This limits the available data and delays the development process. Sebastian van der Voort conducts research on federated learning - a solution to this problem.

What is federated learning?

Federated learning is a new way to train machine learning models without having to exchange sensitive patient data. Currently, when multiple institutes are involved, patient data is transferred to a single location where it is then used to train a model. This means agreements need to be made to allow for this data transfer, and that there needs to be trust since a single person/entity can see and use the data from all the institutes. With federated learning, rather than bringing the data to the model we bring the model to the data. A blueprint of the model is sent to all participating institutes, which locally fit the model on their data. The model parameters are then sent back to the initiating institute, which combines them to a global model. This process of sending the global model parameters to the participating institutes, local fitting, sending back local parameters, and combining the local parameters to create a new global model is repeated to optimize model performance. Since the model parameters are anonymous, it is much easier to set up this exchange than to exchange the sensitive patient data.

Why do you study federated learning for the healthcare setting?

Healthcare is a perfect use case for federated learning: data is distributed among different healthcare organizations and is in almost all cases (highly) sensitive. There is a strong push for the integration of AI in healthcare to lighten the workloads of healthcare personnel, also from the Dutch government. However, this integration is often hindered by the difficulty of sharing data between institutes, due to the fragmentation and sensitivity of the data. Federated learning solves this problem by enabling the construction of machine learning models without the need to share the patient data, removing a big barrier for both researchers and clinical practitioners.

Is a guideline available for conducting a federated learning project?

So far there are no unified guidelines available. As federated learning is a (relatively) new field, there is still a lot of unclarity. First, there is a great diversity in workflows and federated learning tools, which complicates implementation and collaboration. Second, most agreements and collaboration contracts are currently based on the idea that patient data needs to be exchanged in a project, for example in the case of a data transfer agreement. The legal side of projects is not really tuned towards the new way of working of federated learning, which also complicates things.

What are your future plans?

Currently I’m working on a few things. We’re trying to set up a standardized approach for federated learning projects both within Amsterdam UMC, as well as broader within the Netherlands. This should make it easier to set up new federated learning projects, with less legal barriers and a more streamlined process. We hope this gives a boost to federated learning by making it easier to set up new projects in this way than in the traditional way.

I am also looking at specific federated learning projects, for example with the NICE registry which collects data from Dutch ICUs. This is a perfect example of the potential of federated learning: currently, the data from all ICUs in the Netherlands is collected at ‘NICE Research and Support’, a team based in my department (Medical Informatics), but with stricter requirements regarding data sharing, federated learning might be a valid future option. Therefore, I’m comparing models based on the currently centrally-collected data with models that are based on simulations of a federated setting to evaluate the effects on the model’s performance.

Curious and want to read more?

Yordanov TR, Ravelli ACJ, Amiri S, Vis M, Houterman S, Van der Voort SR and Abu-Hanna A. Performance of federated learning-based models in the Dutch TAVI population was comparable to central strategies and outperformed local strategies. Front. Cardiovasc. Med. 11:1399138 (2024). doi: 10.3389/fcvm.2024.1399138

Pati, S., Baid, U., Edwards, B. et al. Federated learning enables big data for rare cancer boundary detection. Nat Commun 13, 7346 (2022). doi: 10.1038/s41467-022-33407-5