SimpleFold: Folding Proteins Is Simpler Than You Think

SimpleFold: Folding Proteins is Simpler than You Think This is the GitHub repository accompanying the research paper "SimpleFold: Folding Proteins is Simpler than You Think" (Arxiv 2025), authored by Yuyang Wang, Jiarui Lu, Navdeep Jaitly, Joshua M. Susskind, and Miguel Angel Bautista. --- Introduction SimpleFold is a protein folding model based on flow-matching and uses general transformer layers. Unlike other models, it avoids expensive modules like triangle attention or pair representation biases. Trained with a generative flow-matching objective. Scaled to 3 billion parameters. Trained on over 8.6 million distilled protein structures plus experimental PDB data. Claims to be the largest folding model developed. Achieves competitive performance with state-of-the-art baselines on standard folding benchmarks. Exhibits strong ensemble prediction due to the generative training objective. Challenges the requirement of complex domain-specific architectures, offering an alternative direction in protein structure prediction. --- Installation Install the package directly from the GitHub repository: --- Example Usage A Jupyter notebook sample.ipynb is provided to demonstrate predicting protein structures from example sequences. --- Inference Once installed, protein structures can be predicted from FASTA files using simplefold CLI. Supports both PyTorch and MLX backends (MLX recommended for Apple hardware). Example command: --- Evaluation Predicted structures of varying model sizes are available for download: Evaluation scripts use openstructure 2.9.1 Docker image. To evaluate folding tasks (CASP14/CAMEO22): For two-state predictions (Apo/CoDNaS) using TMscore: --- Training Data Preparation SimpleFold trains on both experimental PDB structures and distilled predictions from AFDB SwissProt and AFESM. Target lists: