ExtExtracting Concepts and Constructing Ontology from Software Engineering Research Papers
thesisposted on 2019-08-06, 00:00 authored by Gurpreet Kaur Chabada
The global research output has been increasing steadily over the years. A recent study estimated that there were 50 million scholarly articles published between 1965 and 2009 with 3% annual growth in global research article output. The International Association of Scientific, Technical and Medical Publishers (STM) report for 2018, estimates the number of active scholarly peer-reviewed English-language journals in mid-2018 were about 33,100. It also states that the number of articles published each year and the number of journals have both grown steadily for over two centuries, by about 3% and 3.5% per year respectively. With most of this research output being published as text we need means to perform operations such as summarisation, analysis and search on these research papers. This work focuses on papers published in the Software Engineering field and extraction of knowledge from them. Using 200+ research papers we use Natural Language Processing and Machine Learning based methods to construct an ontology from these papers. The ontology will be represented as a knowledge graph with the nodes as concepts and edges as relationships between these concepts. In this work, we will look beyond the metadata of these papers and extract concepts and relationships from the contents of these papers, thus in a way summarising the paper and also the corpus. With nodes and relationships linked to the papers they were extracted from, we will essentially have a knowledge network on which a multitude of operations can be performed. I discuss and describe multiple tools and solutions for extracting an ontology from text. Each solution is different from the other but all capable of working independently in an unsupervised manner. I also discuss the possible future applications of unsupervised knowledge extraction from research papers.