Műegyetemi Digitális Archívum

Using Dimension Reduction Methods on the Latent Space of Molecules

Date

Type

könyvfejezet

Language

en

Reading access rights:

Open Access

Rights Holder

Budapest University of Technology and Economics, Department of Measurement and Information Systems

Conference Date

2022.02.07-2022.02.08.

Conference Place

Budapest, Hungary

Conference Title

29th Minisymposium of the Department of Measurement and Information Systems

ISBN, e-ISBN

978-963-421-872-2

Container Title

Proceedings of the 29th Minisymposium

Department

Department of Measurement and Information Systems

Version

Kiadói változat

Faculty

Faculty of Electrical Engineering and Informatics

First Page

62

Subject (OSZKAR)

data science
data visualization
dimension reduction
drug discovery
machine learning

Gender

Konferenciacikk

University

Budapest University of Technology and Economics

OOC works

Abstract

De novo molecule design is the process of generating novel chemicals based on a dataset of drug-like molecules. This method has gained popularity in recent decades. Developing drug-like molecules is both costly and time-consuming. To speed the process up, machine learning and deep neural networks have been used in the last three decades. A particularly popular method is using a variational autoencoder to generate a latent space of drug-like molecules suitable for targeted searching. Quantifying the quality of such a latent space is vital for effective usage. This task is not trivial however, as the chemical structure of molecules cannot be easily quantized and such latent spaces tend to be high-dimensional, leading to the need for dimension reducing visualization algorithms to be applied. Many dimension reduction and visualization algorithms have been developed in recent decades. In this paper, we evaluate five recent algorithms – PCA, t-SNE, UMAP, TriMAP and PaCMAP – to see how well they perform on a given dataset. We examine each algorithm on its ability to transform a 64-dimensional latent space such that the resulting two-dimensional space is smooth over chemical structure. We optimize the hyperparameters of each algorithm to see how they transform the resulting embedding and perform a linear interpolation test to see how they map the latent space into two dimensions. We examine the invertibility and extensibility of each algorithm, as this can make targeted searching much easier to execute.

Description

Keywords