File(s) not publicly available

Responsible Data Integration: Next-generation Challenges

conference contribution
posted on 21.08.2022, 00:36 authored by Fatemeh Nargesian, Abolfazl AsudehAbolfazl Asudeh, HV Jagadish

 Data integration has been extensively studied by the data management community and is a core task in the data pre-processing step of ML pipelines. When the integrated data is used for analysis and model training, responsible data science requires addressing concerns about data quality and bias. We present a tutorial on data integration and responsibility, highlighting the existing efforts in responsible data integration along with research opportunities and challenges. In this tutorial, we encourage the community to audit data integration tasks with responsibility measures and develop integration techniques that optimize the requirements of responsible data science. We focus on three critical aspects: (1) the requirements to be considered for evaluating and auditing data integration tasks for quality and bias; (2) the data integration tasks that elicit attention to data responsibility measures and methods to satisfy these requirements; and, (3) techniques, tasks, and open problems in data integration that help achieve data responsibility. 

Funding

III: Medium: Collaborative Research: Fairness in Web Database Applications | Funder: Directorate for Computer & Information Science & Engineering | Grant ID: 2107290

History

Citation

Nargesian, F., Asudeh, A.Jagadish, H. V. (2022, June). Responsible Data Integration: Next-generation Challenges. Proceedings of the 2022 International Conference on Management of Data (pp. 2458-2464). ACM. https://doi.org/10.1145/3514221.3522567

Publisher

ACM

Usage metrics

Read the peer-reviewed publication

Categories

Exports