SeqTrainer: Encoding Synthetic Biology Data for Machine Learning

Abstract
SeqTrainer aims to help researchers efficiently collect the data they need to train models by preprocessing the data stored in SynBioHub. We developed a Python package that streamlines querying and preprocessing data from SynBioHub, generating features for ML models. By integrating SBOL data querying, feature engineering (including k-mers, PWM, and GC skew), and graph neural network (GNN) modeling, this project will help researchers to efficiently analyze synthetic constructs and generate predictions for their data.
Type
Publication
SeqTrainer: Encoding Synthetic Biology Data for Machine Learning