RNA plays central roles in gene expression as messenger RNA and in many biological processes as structured non-coding RNAs. Understanding these functions requires knowledge of RNA tertiary structure, but experimental structure determination is slow and expensive, leading to a large gap between sequence and structure data.
Inspired by advances in protein structure prediction such as AlphaFold, RNA structure prediction has attracted growing interest. However, current deep learning models for RNA often report inflated accuracy due to structural overlap between training and test sets. Early benchmarks also indicate that AlphaFold3 remains inaccurate for RNA structures.
Because RNA structures are scarce, we focus on two essentials: carefully curated training data to prevent data leakage (RNA3DB, Szikszai, et al., 2024), and high-quality sequence alignments (http://rnahub.org, Magnus, et al., 2025). Using these datasets and alignments, we are currently testing our own AlphaFold-based RNA model that explicitly emphasizes base pairing and RNA geometry.
This work aims to improve RNA 3D structure prediction and is being evaluated in upcoming blind challenges such as CASP and RNA-Puzzles. It will also serve as a platform for further deep learning developments in RNA design and therapeutics in my Laboratory of RNA Design and Therapeutics at CeNT, University of Warsaw.