posted on 2020-05-01, 00:00authored byMirko Mantovani
Similarity detection seeks to identify items which resemble other items without being identical to them, sometimes over relatively large collections of multivariate items. Oftentimes, similarity cannot be defined computationally over a dataset, leading to a need for visual analysis. Such situations arise commonly in the analysis of ensemble simulations, of multiple computational models, of patient data repositories, or of geospatial data. In this research, we examine, in the context of similarity detection, the effectiveness of several visual encodings for multivariate data. We conducted a user study with 40 participants to measure similarity detection accuracy and response time under two conditions: moderate-scale (16 items) and large-scale (36 items). Our statistical analysis shows that there are significant differences in encoding performance, especially in the large-scale setting of the experiment. In all settings, we found that plain parallel coordinate plots are slower to read and lead to lower accuracy than juxtaposed star glyph approaches. When the number of items grows, the contour star plot (Kiviat diagram) outperforms other variations, including data lines star plots, and is therefore suitable for similarity identification when dealing with relatively large multivariate datasets.