File(s) under embargo

1

year(s)

8

month(s)

4

day(s)

until file(s) become available

Neural Keyphrase Generation: Looking Inside the Decoder

thesis
posted on 01.05.2021, 00:00 by Tuhin Kundu
Neural keyphrase generation aims to generate keyphrases that encapsulate the core ideas of a given document. The keyphrases could either be copied from the input text (present keyphrases) or new keyphrases that capture the topicality of the document (absent keyphrases) could be generated. Various recurrent neural networks (RNN) with encoder-decoder architectures have been used for this task and have achieved state of the art performance. However little is known about the decoder behavior during the sequence generation process. We contrast the decoder behavior of multiple strong seq2seq models which are RNN based (catSeq and ExHiRD) and a transformer model (T5). We train the T5 model for the keyphrase generation task and propose keyphrase perplexity (KPP) as a metric towards gauging the decoder performance. Through our experiments on several benchmark datasets, we conclude that (1) All seq2seq models are more confident in predicting present keyphrases over absent ones and T5 predicts present keyphrases with the highest degree of certainty (2) Seq2seq models are uncertain around keyphrase boundaries (3) RNN based models are more biased towards extracting keyphrases from the beginning of the document (4) T5 achieves state of the art performance for present and absent keyphrases over several datasets.

History

Advisor

Caragea, Cornelia

Chair

Caragea, Cornelia

Department

Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

Masters

Degree name

MS, Master of Science

Committee Member

Parde, Natalie Zheleva, Elena

Submitted date

May 2021

Thesis type

application/pdf

Language

en

Usage metrics

Categories

Exports