Tutorials

Working with Phylogenetic Trees with Biopython

Phylogenetic trees help you reason about evolutionary relationships instead of looking at sequences in isolation. If you work with comparative genomics, proteins, or species-level data, tree workflows are a core bioinformatics skill.

In this tutorial, you will use `Bio.Phylo` to build, read, write, and visualize phylogenetic trees.

## Building Phylogenetic Trees

import requests
from Bio import Phylo

# Download all files used in this tutorial once
urls = {
    "opuntia.dnd": "https://raw.githubusercontent.com/biopython/biopython/master/Doc/examples/opuntia.dnd",
    "hedgehog.aln": "https://raw.githubusercontent.com/biopython/biopython/master/Tests/Clustalw/hedgehog.aln",
    "apaf.xml": "https://raw.githubusercontent.com/biopython/biopython/master/Tests/PhyloXML/apaf.xml",
}

for filename, url in urls.items():
    response = requests.get(url, timeout=30)
    response.raise_for_status()
    with open(filename, "w", encoding="utf-8") as f:
        f.write(response.text)

# Load a Newick tree file and inspect basic properties
tree = Phylo.read("opuntia.dnd", "newick")
print("Rooted:", tree.rooted)
print("Number of terminal clades:", len(tree.get_terminals()))
print("Terminal names:", [clade.name for clade in tree.get_terminals()])
Rooted: False
Number of terminal clades: 7
Terminal names: ['gi|6273291|gb|AF191665.1|AF191665', 'gi|6273290|gb|AF191664.1|AF191664', 'gi|6273289|gb|AF191663.1|AF191663', 'gi|6273287|gb|AF191661.1|AF191661', 'gi|6273286|gb|AF191660.1|AF191660', 'gi|6273285|gb|AF191659.1|AF191659', 'gi|6273284|gb|AF191658.1|AF191658']
This block downloads the tree/alignment files used throughout the tutorial and reads a Newick tree with `Phylo.read`. Starting with a parsed tree object gives you a foundation for traversal, analysis, and export in later sections.

## Building a Phylogenetic Tree from Protein Sequences

from Bio import AlignIO
from Bio.Phylo.TreeConstruction import DistanceCalculator, DistanceTreeConstructor

# Read a protein multiple sequence alignment in Clustal format
alignment = AlignIO.read("hedgehog.aln", "clustal")

# Compute pairwise distances using a protein substitution model
calculator = DistanceCalculator("blosum62")
distance_matrix = calculator.get_distance(alignment)

# Build a Neighbor-Joining tree from the distance matrix
constructor = DistanceTreeConstructor()
protein_tree = constructor.nj(distance_matrix)

print("Terminal clades:", len(protein_tree.get_terminals()))
print("First five terminal names:", [c.name for c in protein_tree.get_terminals()[:5]])
Terminal clades: 5
First five terminal names: ['gi|13990994|dbj|BAA33523.2|', 'gi|167877390|gb|EDS40773.1|', 'gi|167234445|ref|NP_001107837.', 'gi|74100009|gb|AAZ99217.1|', 'gi|56122354|gb|AAV74328.1|']
Here you convert a protein alignment into a distance matrix and then into a tree with Neighbor-Joining. This is a practical workflow when you already have aligned protein sequences and need an interpretable evolutionary topology quickly.

## Reading and Writing Phylo Trees

from Bio import Phylo

# Read Newick and PhyloXML trees from local files
newick_tree = Phylo.read("opuntia.dnd", "newick")
phyloxml_tree = Phylo.read("apaf.xml", "phyloxml")

# Write trees to different output formats
Phylo.write(newick_tree, "opuntia_copy.xml", "phyloxml")
Phylo.write(phyloxml_tree, "apaf_copy.nwk", "newick")

print("Wrote opuntia_copy.xml and apaf_copy.nwk")
Wrote opuntia_copy.xml and apaf_copy.nwk
This block demonstrates format conversion across common tree standards. Converting between Newick and PhyloXML is useful when tools in your pipeline expect different file formats.

## Rooting and Re-rooting Trees

from Bio import Phylo

# Load an unrooted/partially rooted tree
tree = Phylo.read("opuntia.dnd", "newick")

# Midpoint root is a practical default when no outgroup is available
tree.root_at_midpoint()
print("Rooted after midpoint rooting:", tree.rooted)

# Re-root using an explicit outgroup terminal if present
terminals = tree.get_terminals()
if terminals:
    tree.root_with_outgroup(terminals[0])
    print("Re-rooted with outgroup:", terminals[0].name)
Rooted after midpoint rooting: True
Re-rooted with outgroup: gi|6273291|gb|AF191665.1|AF191665
This section shows two common rooting workflows: midpoint rooting for exploratory analysis and explicit outgroup rooting for biologically guided trees. Correct rooting is essential when you interpret ancestor-descendant direction.

## Pruning to Taxa of Interest

from Bio import Phylo

# Load tree and get terminal labels
tree = Phylo.read("opuntia.dnd", "newick")
terminals = [clade.name for clade in tree.get_terminals()]

# Keep only a small panel of taxa by pruning the rest
keep = set(terminals[:4])
for clade in list(tree.get_terminals()):
    if clade.name not in keep:
        tree.prune(clade)

print("Remaining terminals after pruning:", [c.name for c in tree.get_terminals()])
Remaining terminals after pruning: ['gi|6273291|gb|AF191665.1|AF191665', 'gi|6273290|gb|AF191664.1|AF191664', 'gi|6273289|gb|AF191663.1|AF191663', 'gi|6273287|gb|AF191661.1|AF191661']
Pruning lets you focus on a biologically relevant subset, such as one genus or one set of samples from a larger tree. This is useful for cleaner figures and targeted interpretation.

## Extracting and Exporting a Subtree

from Bio import Phylo
from copy import deepcopy

# Read original tree and find an internal clade
tree = Phylo.read("opuntia.dnd", "newick")
internal_clades = [c for c in tree.find_clades() if not c.is_terminal()]

if not internal_clades:
    raise ValueError("No internal clades found for subtree extraction.")

target_clade = internal_clades[0]
subtree = deepcopy(target_clade)

# Wrap clade in a Tree object and export
subtree_tree = Phylo.BaseTree.Tree(root=subtree)
Phylo.write(subtree_tree, "opuntia_subtree.nwk", "newick")
print("Subtree terminals:", [c.name for c in subtree_tree.get_terminals()])
print("Wrote subtree to opuntia_subtree.nwk")
Subtree terminals: ['gi|6273291|gb|AF191665.1|AF191665', 'gi|6273290|gb|AF191664.1|AF191664', 'gi|6273289|gb|AF191663.1|AF191663', 'gi|6273287|gb|AF191661.1|AF191661', 'gi|6273286|gb|AF191660.1|AF191660', 'gi|6273285|gb|AF191659.1|AF191659', 'gi|6273284|gb|AF191658.1|AF191658']
Wrote subtree to opuntia_subtree.nwk
Subtree export is practical when you need to share one branch with collaborators or run downstream analyses on one lineage only.

## Visualizing Phylogenetic Trees

import matplotlib.pyplot as plt
from Bio import Phylo

# Read the tree object
visual_tree = Phylo.read("opuntia.dnd", "newick")

# ASCII view is useful in terminal-only environments
Phylo.draw_ascii(visual_tree)

# Matplotlib rendering gives publication-style visualization
plt.figure(figsize=(10, 6))
Phylo.draw(visual_tree, do_show=False)
plt.title("Opuntia Phylogenetic Tree")
plt.tight_layout()
plt.show()

# Save static files for reports and papers
plt.figure(figsize=(10, 6))
Phylo.draw(visual_tree, do_show=False)
plt.title("Opuntia Phylogenetic Tree")
plt.tight_layout()
plt.savefig("opuntia_tree.png", dpi=300)
plt.savefig("opuntia_tree.pdf")
print("Saved opuntia_tree.png and opuntia_tree.pdf")
                             _______________ gi|6273291|gb|AF191665.1|AF191665
  __________________________|
 |                          |   ______ gi|6273290|gb|AF191664.1|AF191664
 |                          |__|
 |                             |_____ gi|6273289|gb|AF191663.1|AF191663
 |
_|_________________ gi|6273287|gb|AF191661.1|AF191661
 |
 |__________ gi|6273286|gb|AF191660.1|AF191660
 |
 |    __ gi|6273285|gb|AF191659.1|AF191659
 |___|
     | gi|6273284|gb|AF191658.1|AF191658

<Figure size 1000x600 with 0 Axes>
Saved opuntia_tree.png and opuntia_tree.pdf
<Figure size 1000x600 with 0 Axes>
You get two complementary visual outputs: quick ASCII inspection for debugging and a full plotted tree for reports and presentations. Visualization is often the fastest way to detect unusual branch structure before deeper analysis.