SeqTrainer: Encoding Synthetic Biology Data for Machine Learning
Aug 7, 2025ยท
,
ยท
0 min read
Sai Wong
Chris Myers
Gonzalo Vidal
Image credit: ACS Synthetic BiologyAbstract
SeqTrainer aims to help researchers efficiently collect the data they need to train models by preprocessing the data stored in SynBioHub. We developed a Python package that streamlines querying and preprocessing data from SynBioHub, generating features for ML models. By integrating SBOL data querying, feature engineering (including k-mers, PWM, and GC skew), and graph neural network (GNN) modeling, this project will help researchers to efficiently analyze synthetic constructs and generate predictions for their data.
Type
Publication
SeqTrainer: Encoding Synthetic Biology Data for Machine Learning