Műegyetemi Digitális Archívum

Adaptive Temporal Convolutional Network for language modeling

Date

Type

Konferenciaközlemény

Language

en

Reading access rights:

Open access

Rights Holder

Szerző

Conference Date

2024-02-05

Conference Place

Budapest

Conference Title

2nd Workshop on Intelligent Infocommunication Networks, Systems and Services (WI2NS2)

ISBN, e-ISBN

978-963-421-944-6

Container Title

2nd Workshop on Intelligent Infocommunication Networks, Systems and Services

Version

Post print

Faculty

Faculty of Electrical Engineering and Informatics

First Page

85

Subject (OSZKAR)

Temporal Convolutional Network
Adaptive model architecture
Deep learning
Neural network
Sequential data modeling
Natural language processing

Gender

Konferenciacikk

University

Budapest University of Technology and Economics

OOC works

Abstract

Temporal Convolutional Networks (TCNs) are one-dimensional convolutional neural networks for modelling sequential data. A key component in TCN is the dilation, that is used to increase the receptive field while keeping the number of parameters low. Dilation rates are predetermined in TCN. In this paper, an adaptive method is introduced for learning dilation rates by utilizing trainable binary masks with sparsity constraints (named as Adaptive TCN, AdaTCN). To select connections that are deemed important, the binary masks are applied to convolutional layers. We introduce structured sparsity into the mask using Gumbel Sharp softmax in order to control the number of active connections. Four different models, including TCN, random masked TCN, AdaTCN, and an AdaTCN that is initiated with TCN-like mask are trained and evaluated. With the Penn TreeBank (PTB) and WikitText-2 (WT2) datasets, experiments are conducted on word-level language models

Description

Keywords