ZK Federated Learning - Initial Idea

One sentence description:

Collaborate training of public machine learning models using private data using MPC (multi-party computation) and ZK (zero-knowledge) techniques.

Example use case:

As a user, I go to a hospital and get an MRI scan. The hospital checks the scans and tells everything is fine.
Researchers want this data to train their model, but by law hospitals aren’t allowed to share it.
The hospitals want to collaborate on training these models to help identify problems on MRI but aren’t allowed to share the data.
Solution: use ZK Federated Learning tools to collaborate on training models.

Data Sets:

There are two data sets that each actor would train the data model (binary data set):

Note: we might want to prove that the two data sets are correct. To prove that the classification is correct might require statistics.

Deeper Description:

In order for this protocol to work, we need to prove two things:
- Proof of data source: we need to check that the data used to train these modes, actually comes from valid and reputable actors and sources. This involves doing signature verifications. Some ways to get these proofs are: signature from emails, exporting it using TLS Notary or even better, getting a signature from the MRI machine readings on the data.
- Proof of correct training: as part of the training process, I need to proof that my inputs helped train the model by increasing its accuracy. For example, if I train the model with 10.000 MRI images, everyone on this protocol would expect the model to improve with their datasets too. The goal is to reduce the average error.

Malicious Actors:

There is no concern for malicious actors as there´s no data sets shared in this protocol.
Also if someone wants to damage the model, the accuracy results will drop after this party inserts their data sets.
The group of collaborators will exclude this actor if it´s worsening the model.

Security Properties: