Analysis of Algorithms in Learning Theory and Network Analysis of Knowledge Bases
thesisposted on 01.11.2015 by Dimitrios Diochnos
In order to distinguish essays and pre-prints from academic theses, we have a separate category. These are often much longer text based documents than a paper.
This thesis is concerned with problems that arise in learning theory as well as with an investigation, using the tools of network analysis, of a popular commonsense knowledge base that is publicly available online and reasoning with the knowledge that is available in the database. We study the evolvability of monotone conjunctions through an intuitive neighborhood under the uniform distribution, give a structure theorem of best approximations, and improve the previous known results. We show the existence of local optima under $\mu$-nondegenerate product distributions for the same algorithm. Using covariance as the fitness metric we prove a similar structure theorem as well as the evolvability of short monotone conjunctions under these distributions. Using points from the moment curve we show that the VC dimension of $d$-dimensional halfspaces is $\Omega(d\lg r)$ under the multiple-instance learning framework, matching the previous known upper bound. This result is also shown to hold over any large point set in general position. Further, it is shown that the hypothesis finding problem is NP-complete. The disagreement coefficient of a concept class for a fixed distribution can be used to give bounds for the label complexity of active learning. We study the disagreement coefficient of monotone conjunctions under the uniform distribution and compute tight, or almost tight, asymptotic lower and upper bounds for every target. This is the first time that the disagreement coefficient is studied for Boolean concept classes. We are investigating the commonsense knowledge base ConceptNet 4 using the tools of network analysis. We identify missing links, spurious links, as well as provide additional knowledge to be added in ConceptNet. Using variants of spreading activation techniques we use the database for question answering and explain, as well as improve, candidate answers for the same questions from a previous study that relied in a low-rank approximation of the adjacency matrix. Aiming to add general rules for further reasoning we mine frequent triples from ConceptNet. This mining approach also identifies incorrect assertions already in ConceptNet. Finally, we also identify rules that may make sense as factual statements about the world but not in terms of natural language usage.