The PAPAYA project provides tools that enable computation over a wide range of operations, from simple statistics to sophisticated machine learning algorithms, in a most efficient manner and while attaining functional requirements of a set of realistic scenarios we propose to validate the platform against.
This scenario defines a setting where the data owner applies data analytics primitives to her sensitive data. Because of the computational burden of these operations, the data analytics tasks are offloaded to a third-party data processor, such as a cloud server.
The PAPAYA platform will provide the data owner with the privacy preserving data analytics primitives.
This case addresses the scenario where multiple data owners collaboratively process a large dataset containing data from all the different data owners and derive some relevant information such as a global machine learning model.
In particular, this setting protects each data owner’privacy against the other data owners. Only the outcome of the privacy preserving analytics is known by all the data owners. This will help small entities to augment their dataset and hence have more accurate information while remaining compliant with the GDPR.
In this scenario, the data comes from a single source that protects it from being read by a third party.
However, the data owner allows a third party to perform analytical tasks over its encrypted data, provided that the third party will only learn the analytics result. In this scenario, the data source can be a single user, or a privacy preserving aggregator of data coming from different users. The latter case addresses Article 89 of the GDPR 2018 that imposes that processing encrypted data to statistical ends should be done in a privacy preserving way whenever possible.
In this scenario, a special care will be given to the leakage generated by the analytical process. For example, multiple queries should not allow the querier to recover sensitive information from the database.
In this setting, the data to be analysed comes from different sources and is queried by a third party.
In this setting, neither the server nor the querier sees the collected data in clear, but only in an encrypted version, thus achieving end-to-end privacy.