posted on 2016-05-02, 00:00authored byS Derrible, N. Ahmad
We introduce and develop a new network-based and binless methodology to perform frequency
analyses and produce histograms. In contrast with traditional frequency analysis
techniques that use fixed intervals to bin values, we place a range ±ζ around each individual
value in a data set and count the number of values within that range, which allows us to
compare every single value of a data set with one another. In essence, the methodology is
identical to the construction of a network, where two values are connected if they lie within a
given a range (±ζ). The value with the highest degree (i.e., most connections) is therefore
assimilated to the mode of the distribution. To select an optimal range, we look at the stability
of the proportion of nodes in the largest cluster. The methodology is validated by sampling
12 typical distributions, and it is applied to a number of real-world data sets with both
spatial and temporal components. The methodology can be applied to any data set and provides
a robust means to uncover meaningful patterns and trends. A free python script and a
tutorial are also made available to facilitate the application of the method.
Funding
This research was supported, in part, by NSF Award CCF- 1331800, by the University of Illinois at Chicago Institute for Environmental Science and Policy (IESP) Pre-Doctoral Fellowship, and by the Department of Civil and Materials Engineering at the University of Illinois at Chicago.