Műegyetemi Digitális Archívum

Named Entity Recognition For Construction Documents Based on Fine-Tuning of Large Language Models

Date

Type

könyvfejezet

Language

en

Reading access rights:

Open access

Rights Holder

Szerző

Conference Date

2024.06.29.-2024.07.02

Conference Place

Praha, Czech Republic

Conference Title

Creative Construction Conference 2024

ISBN, e-ISBN

978-615-5270-78-9

Container Title

Proceedings of the Creative Construction Conference 2024

Department

Építéstechnológia és Menedzsment Tanszék

Version

Online

Faculty

Faculty of Architecture

Subject Area

Műszaki tudományok

Subject Field

építészmérnöki tudományok

Subject (OSZKAR)

construction documents
large language model
named entity recognition

Gender

Konferenciacikk

University

Budapest University of Technology and Economics

OOC works

Abstract

Named Entity Recognition (NER) is a necessary task for automatic processing of construction documents. In traditional methods, machine learning has been used, but they rely on large high-quality datasets that are manually made and costly to obtain. Therefore, this paper proposes a method of NER based on fine-tuning of Large Language Models (LLMs) for information extraction of construction documents. Firstly, low-quality datasets are semi-automatically generated from national standards, professional qualification textbooks, input method editor lexicons, including a generation-type dataset, a tagging-type dataset, and a question-answering dataset. Then, the above datasets are used to fine-tune an LLM for NER of structural elements to obtain optimal parametric conditions for fine-tuning. Finally, the optimal conditions are used to fine-tune the LLM and the latter was evaluated manually based on an established dataset and evaluation rules. The accuracy and completeness of the method are significantly improved compared to the LLM before fine-tuning, proving that the method works well. The research contributes to providing a more efficient method for automatic processing of construction documents.

Description

Keywords