Understanding the mysteries of non-trivial topology in proteins and RNA based on Deep Learning approach and structural biology methods

Joanna Sulkowska

Knotted proteins, although scarce, are crucial structural components of certain protein families, and their roles remain a topic of intense research. Capitalizing on the vast collection of protein structure predictions offered by AlphaFold, this study computationally examines the entire UniProt database to create a robust dataset of knotted and unknotted proteins. Utilizing this dataset, we develop a machine learning model capable of accurately predicting the presence of knots in protein structures solely from their amino acid sequences. We test the capabilities of the model based on 100 proteins whose structures were not predicted yet by AlphaFold, and found agreement with our local prediction in 92% cases. Moreover,  we have shown that Long Short-Term Memory (LSTM) based Neural Networks (NN) architecture can be applied to detect, classify, and predict entanglement not only in closed polymeric chains, but also in polymers and protein-like structures with open knots, actual protein configurations based on ML simulation and predicted by AF.  The analysis revealed that the LSTM model, tested on hundreds of thousands of knotted and unknotted protein structures with different architectures predicted by AlphaFold 2, can distinguish between trivial and nontrivial topology of the native state of the protein with an accuracy of 93%.

Last Modified: 31.01.2025