A FASTA file can tell us what the bases are. A GenBank record can tell us more, including annotations and sequence features. But engineered biological systems are usually more than a sequence record.
We often need to represent:
what the parts are
how they are composed
what roles they play
what interactions they participate in
what constraints or assumptions define the design
how a design moves between tools in the design-build-test-learn cycle
That is where SBOL, the Synthetic Biology Open Language, becomes important.
For synthetic biology, SBOL should be the default exchange format whenever we care about both structure and function, not just raw sequence. FASTA and GenBank are still useful, but they are best treated as sequence-oriented formats, not as the canonical representation of an engineered design.
In this chapter, we will do four things:
understand why SBOL matters
learn the core object model of pySBOL3
use SBOL-utilities to reduce boilerplate and bridge older sequence formats into SBOL
visualize designs with VisBOL and DNAplotlib
By the end, you should see SBOL not as an abstract standard, but as a practical data structure for Python-based synthetic biology and a natural foundation for standardized design visualization.
11.1 Why not stop at FASTA or GenBank?
FASTA is extremely simple.
That simplicity is also its limitation.
A FASTA record usually gives us an identifier and a sequence. That is enough for alignment, primer design, BLAST searches, and many sequence-processing tasks. It is not enough to represent a complete design intent.
GenBank is richer.
A GenBank file can include annotated features such as promoters, coding sequences, and terminators. That makes it much more informative than FASTA for sequence-centric work. But GenBank is still centered on the idea of a sequence record with annotations layered on top.
Synthetic biology often needs more than that.
We may want to represent that a promoter regulates a coding sequence, that a protein inhibits another component, that a construct is part of a larger system, or that a design is intended to move into another software tool for simulation, build planning, repository upload, or metadata capture.
SBOL was developed for exactly this problem.
SBOL is meant to standardize how designs are represented so that tools can exchange information without inventing a new private format for each project.
A useful practical rule is this:
use FASTA when you only need sequence strings
use GenBank when you need sequence plus familiar annotations
use SBOL when you need a standardized representation of engineered biological design, especially when structure, function, composition, and interoperability all matter
11.2 SBOL as a design language
One helpful way to think about SBOL is that it gives us a common language for design objects.
Instead of passing around only strings, we can pass around structured objects such as:
a DNA component
a protein component
a promoter role
a transcriptional unit as an engineered region
interactions such as repression or production
documents that collect many related design objects together
This matters because software becomes much easier to connect when tools agree on the meaning of those objects.
That is why SBOL shows up naturally in standards-driven workflows, repositories such as SynBioHub, and design automation tools.
11.3 The tooling layers we will use
In this chapter we will use two related packages.
11.3.1pySBOL3
This is the core Python interface to SBOL version 3.
It gives us the low-level object model directly:
Document
Component
Sequence
SubComponent
Interaction
Participation
Constraint
When you want precise control, pySBOL3 is the main tool.
11.3.2SBOL-utilities
This package sits one layer higher.
It provides helper functions for common operations, including:
creating standard biological parts
assembling simple engineered regions
converting between formats such as FASTA, GenBank, and SBOL
supporting common synthetic biology workflows
In practice, many projects use both layers together.
Use pySBOL3 when you want explicit control over the data model. Use SBOL-utilities when you want convenience and interoperability helpers.
11.3.3 VisBOL
VisBOL is useful when your source of truth is already an SBOL design and you want to inspect or communicate that design visually using standardized glyphs.
It is especially good for:
quickly checking whether a design looks structurally right
rendering SBOL-native diagrams without manually defining shapes
exporting clean figures for notes, talks, or manuscripts
11.3.4 DNAplotlib
DNAplotlib is useful when your source of truth is a Python analysis workflow and you want fine-grained control over how the diagram is drawn.
It is especially good for:
building publication-style figures programmatically
overlaying or aligning design diagrams with analysis outputs
customizing colors, labels, part geometry, and regulation arcs
A good working habit is to think of these tools as complementary rather than competing.
SBOL + VisBOL is a strong path for standardized design exchange and standardized visualization
Python objects + DNAplotlib is a strong path for flexible figure generation inside analysis notebooks and scripts
11.4 Installing the packages
A minimal installation looks like this:
pip install sbol3 sbol-utilities biopython
If you also want to reproduce the DNAplotlib examples later in the chapter, install it as well:
pip install dnaplotlib
VisBOL is typically used as a separate visualization tool rather than as a Python package inside the same notebook workflow, so we will treat it as an external viewer for SBOL files.
11.5 A first SBOL document with pySBOL3
We will start with a constitutive GFP transcriptional unit.
The goal is not biological realism in every nucleotide. The goal is to understand the data model.
One pattern emphasized in introductory SBOL tutorials is that a Document is not a black box. You should get used to inspecting its contents early and often.
top_level_inventory = pd.DataFrame( {"display_id": [getattr(obj, "display_id", None) for obj in doc.objects],"python_class": [type(obj).__name__for obj in doc.objects],"identity": [obj.identity for obj in doc.objects], })top_level_inventory
display_id
python_class
identity
0
pConst_seq
Sequence
https://example.org/python-for-synthetic-biolo...
1
pConst
Component
https://example.org/python-for-synthetic-biolo...
2
BCD2_seq
Sequence
https://example.org/python-for-synthetic-biolo...
3
BCD2
Component
https://example.org/python-for-synthetic-biolo...
4
GFP_seq
Sequence
https://example.org/python-for-synthetic-biolo...
5
GFP
Component
https://example.org/python-for-synthetic-biolo...
6
T1_seq
Sequence
https://example.org/python-for-synthetic-biolo...
7
T1
Component
https://example.org/python-for-synthetic-biolo...
8
TU_constitutive_gfp
Component
https://example.org/python-for-synthetic-biolo...
That inspection step is extremely useful when you are learning the model or debugging a larger document exported from another tool.
That table is a tidy inventory of the main design objects: one row per object, one column per variable.
The important point is not the exact nucleotide sequence. It is the fact that the design is now represented by structured SBOL objects instead of a single anonymous string.
11.6 What just happened?
Several SBOL ideas appeared in a compact example.
The coding style above is close to the pattern used in many introductory pySBOL3 tutorials:
set a namespace
create a Document
build top-level objects like Component and Sequence
connect them through features, references, and constraints
inspect the resulting document before writing it to disk
That sequence of steps is worth internalizing because it scales from toy examples to larger design libraries.
11.6.1Document
A Document is the container for SBOL objects.
It is the thing you read from disk, write to disk, and pass between tools.
11.6.2Component
A Component represents a biological design object.
Here, our promoter, RBS, CDS, terminator, and complete transcriptional unit are all components.
11.6.3Sequence
A Sequence stores the actual sequence text.
This matters conceptually. The sequence is not the same thing as the design object. A component may refer to a sequence, but the component also carries type and role information.
11.6.4SubComponent
A SubComponent says that one component occurs inside another.
That is how the transcriptional unit contains the promoter, RBS, CDS, and terminator.
11.6.5Constraint
A Constraint lets us say that one part precedes another.
That is how we capture order in the engineered region.
This is already a major step beyond FASTA. We are not only storing bases. We are storing a design structure.
11.7 Representing function, not only sequence
SBOL is especially valuable when we go beyond sequence layout and start representing function.
Here is a minimal example of a sensor-like design where:
a protein LacI is represented explicitly
a promoter and coding region are placed inside a system
interactions are added to state repression and genetic production
This is not yet a full mechanistic model. It is a structured functional description.
laci = sbol3.Component("LacI", sbol3.SBO_PROTEIN)doc.add(laci)sensor = sbol3.Component("lac_sensor", sbol3.SBO_FUNCTIONAL_ENTITY)sensor_promoter = sbol3.SubComponent(promoter)sensor_output = sbol3.SubComponent(gfp)sensor.features.extend([sensor_promoter, sensor_output])sensor.constraints.append(sbol3.Constraint(sbol3.SBOL_PRECEDES, sensor_promoter, sensor_output))repression = sbol3.Interaction( sbol3.SBO_INHIBITION, participations=[ sbol3.Participation([sbol3.SBO_INHIBITOR], laci), sbol3.Participation([sbol3.SBO_INHIBITED], sensor_promoter), ],)production = sbol3.Interaction( sbol3.SBO_GENETIC_PRODUCTION, participations=[ sbol3.Participation([sbol3.SBO_TEMPLATE], sensor_output), sbol3.Participation([sbol3.SBO_PRODUCT], laci), ],)sensor.interactions.extend([repression, production])doc.add(sensor)interaction_table = pd.DataFrame( {"interaction_type": [i.types[0].split(":")[-1] if":"in i.types[0] else i.types[0] for i in sensor.interactions],"n_participants": [len(i.participations) for i in sensor.interactions], })interaction_table
interaction_type
n_participants
0
0000169
2
1
0000589
2
Now we have crossed the line from annotated sequence into design semantics.
That is the key educational leap of SBOL.
You are no longer asking only, “what is this sequence?” You are also asking, “what role does this object play?” and “how does it relate to other objects in the design?”
11.8 Writing the design to disk
An SBOL document can be serialized to disk in RDF-based formats.
The exact serialization format is less important than the principle.
Once a design is encoded as SBOL, it can be:
stored in a repository
exchanged across tools
inspected programmatically
enriched with more structure or metadata later
11.9 Visualizing the design with VisBOL
Once a design exists as an SBOL document, the simplest visualization workflow is often to open that file in a tool that already understands SBOL semantics.
That is the role of VisBOL.
A practical workflow looks like this:
build or export an SBOL document from Python
write it to disk in an SBOL serialization format
load that file into VisBOL
inspect whether the structure, orientation, and composition match your intent
export a figure when you want a quick standards-oriented diagram
In other words, VisBOL is best thought of as a viewer and renderer for SBOL-native designs.
If the design file is already the source of truth, this is often the fastest path from model to figure.
The key idea is that VisBOL does not require us to redraw the design by hand. It consumes the standardized SBOL representation directly.
11.10 Programmable visualization with DNAplotlib
Sometimes standardized viewing is not enough.
You may want to:
match a figure style used in a paper
control colors and labels precisely
line up a design diagram with experimental plots
render many design variants inside the same Python workflow
That is where DNAplotlib becomes useful.
Where VisBOL starts from an SBOL file, DNAplotlib usually starts from a Python description of the design to be drawn. The common pattern is to define a list of part dictionaries and then render them with a DNARenderer.
The example below is marked as not executed because DNAplotlib is an optional dependency and may not be installed in every environment. The important thing is to see the workflow.
This representation is more manual than VisBOL, but it is also more flexible.
You can think of the difference like this:
VisBOL is excellent when you want a standards-aware rendering of the SBOL design itself
DNAplotlib is excellent when you want a programmable publication figure inside a Python workflow
One very effective pattern is to keep SBOL as the canonical design representation, then derive a smaller plotting-oriented representation from it for custom figures.
11.11 Using SBOL-utilities to reduce boilerplate
Writing pySBOL3 objects directly is powerful, but it can feel verbose.
That is where SBOL-utilities helps.
The package provides helper constructors for common biological parts and common workflows.
Here we will rebuild a transcriptional unit using helper functions rather than writing each piece by hand.
This is a good example of how to think about the formats together.
FASTA and GenBank do not need to disappear.
But if your workflow is moving toward standardization, automation, and interoperability, they should usually become boundary formats, while SBOL becomes the canonical internal representation of the design.
11.13 A practical mindset for using SBOL
At first, SBOL can feel like extra work.
Why not just keep using strings and GenBank files?
The answer is that standards pay off when projects become larger, more collaborative, or more automated.
SBOL becomes especially valuable when you want to:
move designs between tools without hand-written adapters
keep structure and function together in one representation
connect sequence design to metadata, repositories, simulation, or build planning
represent systems, not only isolated records
write reusable code that operates on standardized design objects
If you are working alone on one plasmid, FASTA or GenBank might feel enough.
If you want reproducible, standards-driven synthetic biology software, SBOL is the better long-term choice.
11.14 Recommended workflow
A practical educational workflow looks like this:
start with simple sequence manipulation when needed
convert important designs into SBOL early
use pySBOL3 for explicit modeling of components and interactions
use SBOL-utilities to reduce repetitive code and bridge formats
treat FASTA and GenBank as import/export formats, not as the richest source of design truth
This mirrors a broader pattern in computational biology.
Raw strings are convenient. Structured objects scale better.
11.15 Exercises
Create an SBOL document for a transcriptional unit containing a promoter, RBS, coding sequence, and terminator for a reporter of your choice.
Add a protein regulator and encode a repression interaction in SBOL.
Convert a small FASTA file into SBOL and inspect the generated top-level objects.
Convert a simple GenBank record into SBOL and then export it back to GenBank.
Extend one of the examples so that the resulting SBOL document contains two transcriptional units rather than one.
11.16 Recap
In this chapter, we moved from sequence-centric thinking to design-centric thinking.
The main ideas are:
SBOL is the right format when we need standardized representations of both structure and function
pySBOL3 exposes the SBOL 3 data model directly in Python
SBOL-utilities makes common tasks easier and helps bridge older sequence formats into SBOL
VisBOL gives us a standards-aware way to inspect and communicate SBOL-native designs
DNAplotlib gives us a programmable way to build highly customized design figures inside Python workflows
FASTA and GenBank remain useful, but SBOL is the better canonical format for interoperable synthetic biology tooling
This chapter also changes the mental model we will use in the rest of the book.
When a design matters as an engineered object rather than just a nucleotide string, we should now think first in terms of SBOL documents, components, features, interactions, and standardized exchange.