Project2021 - 2024
Secure Analysis Environment
Privacy, copyright, and competition barriers limit the sharing of sensitive data for scientific purposes. We developed the Secure Analysis Environment (SANE): a virtual container in which the researcher can analyse sensitive data, and yet leaves the data provider in complete control. By following the Five Safes principles, SANE will enable researchers to conduct research on data that up until now are hardly available to them.
Although non-academic parties have an increasing number of interesting datasets available, there is currently no infrastructure available allowing researchers to analyse sensitive data in a way that data providers remain in control. As a result, most potential data providers are reluctant to share their datasets and so they remain unused (such as governments, heritage institutions or commercial parties like the Chamber of Commerce or Funda). Yet scientific breakthroughs would be possible if these datasets were available.
The Secure ANalysis Environment (SANE) is a virtual, fully shielded computer containing pre-approved analysis software (such as R and Jupyter notebooks) and access to the sensitive data. It allows the data provider to maintain complete control while still allowing the researcher to study the data in a convenient manner.
Researchers can analyse the data within the SANE environment, after the data provider has granted access. Results of the analyses can only be exported to the researchers' own computer after verification by the data provider. The data provider can even prevent the researcher from seeing the data. All actions of the researcher are monitored. Data uploading can also be prevented, as combining more data may result in de-anonymization.
SANE comes in 2 variants. Tinker SANE allows the researcher to see and manipulate the data. In Blind SANE, the researcher submits an algorithm without being able to see the data and the data provider approves the algorithm and the output. Interest in SANE is high: even before the project team started, 6 parties have already expressed interest.
PDI-SSH (Platform Digital Infrastructure Social Sciences & Humanities) awarded a funding request of nearly one million euros in December for the development of this secure data environment. SANE is being developed by the Erasmus School of Social and Behavioural Sciences, ODISSEI (Open Data Infrastructure for Social Science and Economic Innovations), Netherlands Institute for Sound & Vision, CLARIAH (Common Lab Research Infrastructure for the Arts and Humanities), SURF and KB, National Library of the Netherlands.
SANE builds on previous initiatives of the project partners, such as the CBS Remote Access Environment, ODISSEI Secure Supercomputer, SURF Data Exchange, SURF Research Cloud and CLariah-as-a-Service (CLaaS). We are building a generic off-the-shelf solution that can be applied by any provider of sensitive data and by any researcher. SANE can be used by researchers in all disciplines, as illustrated by the involvement of consortia in both the social sciences (ODISSEI) and humanities (CLARIAH). The platform is expected to go into production within 3 years.