Műegyetemi Digitális Archívum

Named Entity Recognition For Construction Documents Based on Fine-Tuning of Large Language Models

Zhou, Junyu
Ma, Zhiliang
2024-10-07T09:13:01Z
2024-10-07T09:13:01Z
2024

Abstract

Named Entity Recognition (NER) is a necessary task for automatic processing of construction documents. In traditional methods, machine learning has been used, but they rely on large high-quality datasets that are manually made and costly to obtain. Therefore, this paper proposes a method of NER based on fine-tuning of Large Language Models (LLMs) for information extraction of construction documents. Firstly, low-quality datasets are semi-automatically generated from national standards, professional qualification textbooks, input method editor lexicons, including a generation-type dataset, a tagging-type dataset, and a question-answering dataset. Then, the above datasets are used to fine-tune an LLM for NER of structural elements to obtain optimal parametric conditions for fine-tuning. Finally, the optimal conditions are used to fine-tune the LLM and the latter was evaluated manually based on an established dataset and evaluation rules. The accuracy and completeness of the method are significantly improved compared to the LLM before fine-tuning, proving that the method works well. The research contributes to providing a more efficient method for automatic processing of construction documents.

http://hdl.handle.net/10890/57803
en
Named Entity Recognition For Construction Documents Based on Fine-Tuning of Large Language Models
könyvfejezet
Open access
Szerző
2024.06.29.-2024.07.02
Praha, Czech Republic
Creative Construction Conference 2024
2024.09.01
978-615-5270-78-9
Budapest University of Technology and Economics
Online
Proceedings of the Creative Construction Conference 2024
Építéstechnológia és Menedzsment Tanszék
Online
Faculty of Architecture
10.3311/CCC2024-175
Műszaki tudományok
Műszaki tudományok - építészmérnöki tudományok
építészmérnöki tudományok
construction documents
large language model
named entity recognition
Konferenciacikk
Budapest University of Technology and Economics

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
175.pdf
Size:
992 KB
Format:
Adobe Portable Document Format
Description:
175.pdf