03/12/2021

Privacy, Data Quality & More in Data Spaces

The European H2020 projects MOSAICrOWN and TRAPEZE joined forces to organise a workshop entitled “Privacy, data quality & more in Data Spaces” during the European Big Data Value Forum 2021 held on 1st December 2021. The workshop attracted 72 participants.

Rigo Wenning from ERCIM/W3C introduced the speakers and the objectives of the workshop, addressing leading-edge research on data markets, data spaces and privacy-related issues. In the European Union, data processing is subject to rules like GDPR and also to constraints from business imperatives. The workshop presented solutions for issues that arise when data is shared or monetized and presents a possible architecture for interoperability and data management for data markets and data spaces. As an example of the research advances made in this area, a use case of intelligent connected vehicles was presented. This workshop also included a presentation on advances in policy management, protection techniques, and also in standardisation of linked data to overcome interoperability issues.

Pierangela Samarati from Università degli Studi di Milano, coordinator of the MOSAICrOWN project (Multi-Owner data Sharing for Analytics and Integration respecting Confidentiality and OWNer control) gave an overview on the architecture developed by the project. An important issue is data wrapping and security. She explained how data wrapping provides protection by disabling the visibility of data for storage and collaborative computations and how this is achieved through intelligent indexing and an authorisation model.

Data sanitization & anonymization is another important part of the MOSAICrOWN architecture, presented by Stefano Paraboschi from the University of Bergamo. He explained that privacy metrics can be based on different privacy definitions and outlined the difficulties that arise when data needs to be anonymised. The presented solution is based on applying an algorithm called Mondrian, a multidimensional anonymization method within the Apache Spark framework, an engine for large-scale data analytics. 

Piero Bonatti, from CINI, and the University of Naples Federico II gave a presentation on data usage policies, developed in the frame of the TRAPEZE project (Transparency, Privacy and Security for European Citizens). He first explained what data usage policies are and how are they used, for instance in the context of the European General Data Protection Regulation (GDPR). He presented use cases related to policies and compliance, such as validation, audit/monitoring, actors, access control, etc. The solution applied in the TRAPEZE project is one simple language to express all policies in a uniform way. He demonstrated this in two examples: (1) First, by showing how privacy policy is expressed in the JSON format and (2) how the objective part of the GDPR is modelled. He then explained why TRAPEZE’s policy language is vocabulary-neutral. The property names and classes used in the policies are not hardwired in the policy language. They are defined in an ontology and TRAPEZE is adopting the vocabularies developed by W3C DPVCG (Data Privacy Vocabularies and Control Community Group). Piero concluded by explaining the advantages of applying formal semantics and how privacy policies can currently be assessed.

Pierre-Antoine Champin from ERCIM/W3C presented how the RDF-star draft standard is bridging the gap between linked data and property graphs in the frame of the MOSAICrOWN project. He introduced the concept of Linked Data and Property Graphs and demonstrated with examples how RDF-star reduces the impedance mismatch between Linked Data and Property Graphs.

Aidan O’Mahony from the OCTO Research Office at Dell Technologies concluded the workshop with a presentation of the use case “Intelligent Connected Vehicles” developed in MOSAICrOWN. He provided insight to the architecture for an automotive scenario involving data owners (drivers) ingesting their data into the data market, consumers accessing data in the data market, and the data market provider offering storage and computation services to data owners and consumers. In this scenario, RDF-star is applied to intelligently connect vehicles.

At the end of the workshop, the presenters had the opportunity to answer questions raised during the online sessions.

The speakers at the EBDVF workshop. From top left: Pierangela Samarati, Stefano Paraboschi, Rigo Wenning, Pierre-Antoine Champin, Aidan O Mahony and Piero Bonatti.

News