Opening talk 2024 (in English)
Title: « An overview of automatic term extraction »
Prof Antoine Doucet, Université de la Rochelle (France)
Abstract: Automatic term extraction (ATE) is a natural language processing (NLP) task is meant to ease the effort of manually identifying terms from domain-specific corpora by providing a list of candidate terms. As units of knowledge in a specific field of expertise, extracted terms are not only beneficial for several terminographical tasks, but also support and improve several complex downstream tasks, e.g., information retrieval, machine translation, topic detection, and sentiment analysis. ATE systems, along with annotated datasets, have been studied and developed widely for decades, but recently we observed a surge in novel neural systems to address this task. The talk will present an overview of recent ATE approaches, notably deep learning-based approaches, with a focus on Transformer-based neural models. We will also compare them to the previous ATE approaches, which were mainly based on feature engineering and non-neural supervised learning algorithms.
Biography: Antoine Doucet (https://pageperso.univ-lr.fr/antoine.doucet/) is a Professor in computer science at the University of La Rochelle since 2014, where he leads the research group in document analysis, digital contents and images (about 50 people). His main research interests are information retrieval, natural language processing, (text) data mining and artificial intelligence. The central focus of his work is on the development of methods that scale to very large document collections, applicable to documents of any type written in any language, from news articles to social networks, and from digitized manuscripts to digitally-born documents. Until 2022, he was the PI of H2020 NewsEye (a digital investigator for historical newspapers) leading to state-of-the-art approaches for robust-to-noise and crosslingual natural language processing. He also led the effort on semantic enrichment for low-resourced languages in the context of H2020 Embeddia.