Analysis of Cognitive and Keyword-based Approaches for Searching Judicial Documents

Marziali, Matteo

doi:10.25417/uic.13475190.v1

MARZIALI-THESIS-2020.pdf (5.06 MB)

Analysis of Cognitive and Keyword-based Approaches for Searching Judicial Documents

thesis

posted on 2020-05-01, 00:00 authored by Matteo Marziali

Searching and retrieving information efficiently represents an urgency in both ordinary and business tasks. With the introduction of high-performance storage systems and cloud tech- nologies, reporting on paper sheets resulted an obsolete practice. Hence, companies aimed at reducing this type of operations to digital processes, employing digital repositories to store files in order to gain in availability and resilience. In this situation, software that provide functionalities to allow direct and valid access to stored data are referred to as search engines. Aiming at analyzing cognitive and keyword-based searching algorithms, the goal of this dissertation is to develop a domain-specific search engine capable to combine the two cited approaches. The rationale behind adopting these two techniques together has to be found in the necessity to overcome the lack of ’query context’ and ’intent understanding’, along with the inefficiency of common procedures in handling equal words carrying diverse meanings. To meet the desired objectives, Text Retrieval was carried out on a juridical domain by performing the search on a corpus of authentic legal documents from the Italian Court of Cassation. We organized our work into two core activities: Document Processing and Text Retrieval, both integrated in the Search Engine pipeline. In particular, Text Retrieval has been per- formed on top of processing units expressly built for granting proper answers to literal and non-literal queries. During the Document Processing phase, significant effort has been destined to extracting texts from actual judicial documents, initially in the form of images of scanned documents. Text Retrieval, instead, concerned with the realization of a search engine pipeline featuring diverse Deep Learning approaches. Such techniques involved the encoding of text portions into a more furbished representation by capturing both syntactic and semantic word features. Finally, the considered embedding approaches are compared by collecting the answers to specific questionnaires given to a random sample of people with the purpose of validating our approaches in a concrete use-case scenario.

History

Advisor

Caragea, Cornelia

Chair

Caragea, Cornelia

Department

Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

Masters

Degree name

MS, Master of Science

Committee Member

Koyuncu, Erdem Lanzi, Pier Luca

Submitted date

May 2020

Thesis type

application/pdf

Language

en

Usage metrics

Keywords

information retrieval search engine NLP Deep Learning judicial documents

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Analysis of Cognitive and Keyword-based Approaches for Searching Judicial Documents

History

Advisor

Chair

Department

Degree Grantor

Degree Level

Degree name

Committee Member

Submitted date

Thesis type

Language

Usage metrics

Categories

Keywords

Licence

Exports