posted on 2019-08-06, 00:00authored byGurpreet Kaur Chabada
The global research output has been increasing steadily over the years. A recent study estimated
that there were 50 million scholarly articles published between 1965 and 2009 with 3% annual
growth in global research article output. The International Association of Scientific, Technical and
Medical Publishers (STM) report for 2018, estimates the number of active scholarly peer-reviewed
English-language journals in mid-2018 were about 33,100. It also states that the number of articles
published each year and the number of journals have both grown steadily for over two centuries,
by about 3% and 3.5% per year respectively.
With most of this research output being published as text we need means to perform operations
such as summarisation, analysis and search on these research papers.
This work focuses on papers published in the Software Engineering field and extraction of
knowledge from them.
Using 200+ research papers we use Natural Language Processing and Machine Learning based
methods to construct an ontology from these papers. The ontology will be represented as a
knowledge graph with the nodes as concepts and edges as relationships between these concepts.
In this work, we will look beyond the metadata of these papers and extract concepts and
relationships from the contents of these papers, thus in a way summarising the paper and also the
corpus. With nodes and relationships linked to the papers they were extracted from, we will
essentially have a knowledge network on which a multitude of operations can be performed.
I discuss and describe multiple tools and solutions for extracting an ontology from text. Each
solution is different from the other but all capable of working independently in an unsupervised
manner. I also discuss the possible future applications of unsupervised knowledge extraction from
research papers.