Using Dimension Reduction Methods on the Latent Space of Molecules
Date
Authors
Type
Language
Reading access rights:
Rights Holder
Conference Date
Conference Place
Conference Title
ISBN, e-ISBN
Container Title
Department
Version
Faculty
First Page
Subject (OSZKAR)
data visualization
dimension reduction
drug discovery
machine learning
Gender
University
- Cite this item
- https://doi.org/10.3311/MINISY2022-016
OOC works
Abstract
De novo molecule design is the process of generating novel chemicals based on a dataset of drug-like molecules. This method has gained popularity in recent decades. Developing drug-like molecules is both costly and time-consuming. To speed the process up, machine learning and deep neural networks have been used in the last three decades. A particularly popular method is using a variational autoencoder to generate a latent space of drug-like molecules suitable for targeted searching. Quantifying the quality of such a latent space is vital for effective usage. This task is not trivial however, as the chemical structure of molecules cannot be easily quantized and such latent spaces tend to be high-dimensional, leading to the need for dimension reducing visualization algorithms to be applied. Many dimension reduction and visualization algorithms have been developed in recent decades. In this paper, we evaluate five recent algorithms – PCA, t-SNE, UMAP, TriMAP and PaCMAP – to see how well they perform on a given dataset. We examine each algorithm on its ability to transform a 64-dimensional latent space such that the resulting two-dimensional space is smooth over chemical structure. We optimize the hyperparameters of each algorithm to see how they transform the resulting embedding and perform a linear interpolation test to see how they map the latent space into two dimensions. We examine the invertibility and extensibility of each algorithm, as this can make targeted searching much easier to execute.