SeqTrainer: Encoding Synthetic Biology Data for Machine Learning

Aug 7, 2025ยท
Sai Wong
,
Chris Myers
Gonzalo Vidal
Gonzalo Vidal
ยท 0 min read
Image credit: ACS Synthetic Biology
Abstract
SeqTrainer aims to help researchers efficiently collect the data they need to train models by preprocessing the data stored in SynBioHub. We developed a Python package that streamlines querying and preprocessing data from SynBioHub, generating features for ML models. By integrating SBOL data querying, feature engineering (including k-mers, PWM, and GC skew), and graph neural network (GNN) modeling, this project will help researchers to efficiently analyze synthetic constructs and generate predictions for their data.
Type
Publication
SeqTrainer: Encoding Synthetic Biology Data for Machine Learning