By the end of the previous chapter, you can already write small Python programs.
That is an important milestone, but it is not enough for real scientific work.
In research, code lives inside a larger workflow. You write in a notebook, save files, clean data, rerun analyses, revise plots, and send results to collaborators. A script that works once on your laptop is useful. A workflow that another person can rerun next month is much more useful.
This chapter is about that second level of practice.
We will introduce four ideas that belong together:
Jupyter notebooks for exploration and explanation
files and folders as the raw material of analysis
environments for keeping software dependencies under control
reproducibility as a habit of working, not only a technical trick
These topics may look less glamorous than modeling circuits or designing constructs, but they make the difference between fragile code and trustworthy research.
6.1 Code in science has more than one form
A beginner often imagines that programming means writing one kind of thing: a program.
In practice, scientific Python work usually appears in three complementary forms.
6.1.1 Scripts
A script is a file that runs a sequence of steps from top to bottom.
Scripts are good when you want to:
clean raw data the same way every time
rename or reorganize files
process a batch of sequences
turn a manual analysis into a repeatable workflow
Scripts are especially useful once a task stops being exploratory and starts becoming routine.
6.1.2 Notebooks
A notebook mixes code, output, equations, and prose. It is ideal for:
trying ideas quickly
inspecting intermediate results
teaching or documenting a workflow
producing a shareable computational narrative
That is why notebooks became so popular in biology, data science, and quantitative research. They let you think with code in public.
6.1.3 Rendered documents
A rendered document, such as a Quarto chapter or report, takes reproducibility one step further. Instead of a notebook that only runs interactively, you can create a document that executes code and produces a polished output in HTML or PDF.
That is one reason Quarto is a good fit for this book. It encourages a workflow in which explanation and computation live together.
A healthy research project often uses all three:
notebooks for exploration
scripts for reusable tasks
rendered reports or book chapters for communication
6.2 Why notebooks became so influential
Jupyter notebooks are not just popular because they are convenient. They match the real rhythm of experimental reasoning.
When you are characterizing a promoter library, you rarely know the entire analysis in advance. You want to:
load the data
inspect a few rows
notice a suspicious value
clean the data
recompute a summary
try a different normalization
make a quick plot
write down what you learned
That loop of inspect, modify, rerun, interpret is exactly what notebooks are good at.
They are especially valuable for beginners because they reduce the distance between writing code and seeing what it does.
6.3 Your computational environment matters
When scientists say, “the code works on my machine,” they often mean something narrower than they realize. The code works with:
a particular Python version
a particular set of installed packages
a particular folder layout
a particular set of input files
a particular order of execution
Reproducibility means making those assumptions visible and manageable.
This kind of information is boring until something breaks. Then it becomes extremely valuable.
If a collaborator cannot run your notebook, one of the first questions is: Are we even using the same Python environment?
6.4 Virtual environments: keeping projects separate
A virtual environment is an isolated Python installation for a specific project.
Why bother with that?
Because scientific projects accumulate dependencies. One analysis may need a recent version of pandas. Another may depend on an older version of a modeling package. If everything is installed globally, projects start interfering with each other.
A virtual environment gives each project its own small software world.
You do not need to be rigid about this exact layout. What matters is the principle:
raw data should stay separate from modified data
scripts should be kept under version control
results should be regenerable from code
README files should explain what the project is and how to run it
A folder structure is not just organizational. It is a model of how work flows through the project.
6.6 Paths are part of scientific thinking
Beginners often treat file paths as annoying details. In reality, paths are how your code locates the world.
Python’s pathlib module makes path handling much clearer than manual string concatenation.
raw_dir = chapter3_demo_dir /"data"/"raw"processed_dir = chapter3_demo_dir /"data"/"processed"results_dir = chapter3_demo_dir /"results"for directory in [raw_dir, processed_dir, results_dir]: directory.mkdir(parents=True, exist_ok=True)sorted(str(path.relative_to(chapter3_demo_dir)) for path in chapter3_demo_dir.iterdir())
['data', 'results']
The / operator in pathlib joins path components in a readable way. This is much safer than manually building strings like "data/raw/file.csv", especially if you want your code to work across operating systems.
Let us create a file path for a plate-reader export.
Now the dataset lives in a file, not only in the notebook state.
That distinction matters. A notebook variable disappears when the kernel restarts. A file can be rerun, versioned, shared, and inspected independently.
6.8 Reading data back in
A reproducible workflow should be able to reconstruct its analysis from saved inputs.
with plate_reader_file.open() as handle: reader = csv.DictReader(handle) measurements =list(reader)measurements[:2]
Already we can see a core pattern of computational biology:
load raw data
standardize types
derive new quantities
save or report the result
6.9 Notebook state is helpful and dangerous
A notebook remembers what you have already run.
That is incredibly helpful during exploration, but it also creates one of the most common sources of confusion for beginners: hidden state.
Suppose you define a threshold.
qc_threshold =0.76[row["sample"] for row in measurements if row["od600"] >= qc_threshold]
['A1', 'A2', 'B1']
Now imagine that, twenty minutes later, you redefine that threshold in another cell.
qc_threshold =0.80[row["sample"] for row in measurements if row["od600"] >= qc_threshold]
['A1']
Nothing about the data changed. Only the notebook state changed.
This is one reason people get different answers from the “same notebook.” They are not always running the same sequence of cells.
Two habits reduce this problem dramatically:
restart the kernel and run all cells from top to bottom
keep important parameters near the top of the notebook or report
A notebook becomes much more trustworthy when it can be executed cleanly from a fresh start.
6.10 Small functions make notebooks stronger
A notebook should not become a wall of ad hoc code. Even in exploratory work, small functions help isolate logic and reduce mistakes.
Let us define a reusable normalization function.
def normalize_expression(row: dict) ->float:if row["od600"] <=0:raiseValueError("OD600 must be positive for normalization")return row["fluorescence"] / row["od600"][round(normalize_expression(row), 1) for row in measurements]
[18802.5, 18337.7, 14354.4, 17027.0]
And let us use it to build a cleaner processed dataset.