Multiple sequence alignments (MSAs) are a backbone of evolutionary and comparative analysis. They let you compare homologous positions across sequences, which is essential for [tree building](/tutorials/biopython-phylogenetic-trees), motif discovery, and conservation analysis. In this tutorial, you will focus on reading and writing alignment files with Biopython. ## Reading and Writing Alignment Files
import requests
from Bio import AlignIO
# Download one example alignment file to reuse
url = "https://raw.githubusercontent.com/biopython/biopython/master/Doc/examples/opuntia.aln"
response = requests.get(url, timeout=30)
response.raise_for_status()
with open("opuntia.aln", "w", encoding="utf-8") as f:
f.write(response.text)
# Read alignment in Clustal format
alignment = AlignIO.read("opuntia.aln", "clustal")
print("Number of sequences:", len(alignment))
print("Alignment length:", alignment.get_alignment_length())
print("First three sequence IDs:", [record.id for record in alignment[:3]])
# Write the same alignment to FASTA and PHYLIP formats
AlignIO.write(alignment, "opuntia.fasta", "fasta")
AlignIO.write(alignment, "opuntia.phy", "phylip")
print("Wrote opuntia.fasta and opuntia.phy")This block shows the full I/O cycle for alignments: read once, inspect core metadata, and export to additional formats. Format conversion is common when chaining tools that each expect a different alignment format. ## Reading Back Converted Alignments
from Bio import AlignIO
# Read converted files back to verify they are usable
fasta_alignment = AlignIO.read("opuntia.fasta", "fasta")
phylip_alignment = AlignIO.read("opuntia.phy", "phylip")
print("FASTA alignment length:", fasta_alignment.get_alignment_length())
print("PHYLIP alignment length:", phylip_alignment.get_alignment_length())
print("Same number of sequences:", len(fasta_alignment) == len(phylip_alignment))Reading exported files back is a quick integrity check that your conversion worked as expected. This prevents subtle downstream errors when a tool fails due to malformed or incompatible alignment output. ## Computing a Consensus Sequence
from Bio import AlignIO
from Bio import motifs
# Read alignment from local file
alignment = AlignIO.read("opuntia.aln", "clustal")
# Build a motif from aligned sequences and use built-in consensus methods
motif = motifs.create([record.seq for record in alignment])
consensus = motif.consensus
degenerate = motif.degenerate_consensus
thresholded = motif.counts.calculate_consensus(identity=0.7)
print("Consensus sequence:")
print(consensus)
print("Degenerate consensus:")
print(degenerate)
print("Thresholded consensus (identity=0.7):")
print(thresholded)Consensus extraction is useful when you need one representative sequence for primer design, motif checks, or high-level reporting. Using `Bio.motifs` also keeps the tutorial aligned with current Biopython APIs instead of deprecated `SummaryInfo` helpers. ## Measuring Per-Column Conservation
from Bio import AlignIO
# Read alignment
alignment = AlignIO.read("opuntia.aln", "clustal")
length = alignment.get_alignment_length()
conservation_scores = []
for i in range(length):
column = alignment[:, i]
non_gap = [c for c in column if c != "-"]
if not non_gap:
conservation_scores.append(0.0)
continue
most_common = max(set(non_gap), key=non_gap.count)
score = non_gap.count(most_common) / len(non_gap)
conservation_scores.append(score)
print("First 20 conservation scores:")
print([round(s, 3) for s in conservation_scores[:20]])Per-column conservation helps you find stable versus variable regions, which is practical for marker selection and downstream phylogenetic feature design. ## Filtering High-Gap Columns
from Bio import AlignIO
from Bio.Align import MultipleSeqAlignment
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
# Read alignment
alignment = AlignIO.read("opuntia.aln", "clustal")
length = alignment.get_alignment_length()
# Keep columns where gap fraction is <= 0.3
keep_positions = []
for i in range(length):
column = alignment[:, i]
gap_fraction = column.count("-") / len(column)
if gap_fraction <= 0.3:
keep_positions.append(i)
# Rebuild filtered alignment from kept columns
filtered_records = []
for record in alignment:
filtered_seq = "".join(record.seq[i] for i in keep_positions)
filtered_records.append(SeqRecord(Seq(filtered_seq), id=record.id, description=""))
filtered_alignment = MultipleSeqAlignment(filtered_records)
AlignIO.write(filtered_alignment, "opuntia_filtered.aln", "clustal")
print("Original length:", length)
print("Filtered length:", filtered_alignment.get_alignment_length())
print("Wrote opuntia_filtered.aln")Gap filtering is a common preprocessing step before tree construction to reduce noise from poorly aligned positions. ## Visualizing Alignment Patterns
import matplotlib.pyplot as plt
from Bio import AlignIO
# Read alignment and compare each residue against first sequence as reference
alignment = AlignIO.read("opuntia.aln", "clustal")
reference = str(alignment[0].seq)
# Encode alignment: 0 = match to reference, 1 = mismatch, 2 = gap
matrix = []
for record in alignment:
row = []
seq = str(record.seq)
for ref_char, char in zip(reference, seq):
if char == "-":
row.append(2)
elif char == ref_char:
row.append(0)
else:
row.append(1)
matrix.append(row)
plt.figure(figsize=(12, 5))
plt.imshow(matrix, aspect="auto", interpolation="nearest", cmap="viridis")
plt.colorbar(label="0=match, 1=mismatch, 2=gap")
plt.title("Alignment Pattern Heatmap (relative to first sequence)")
plt.xlabel("Alignment position")
plt.ylabel("Sequence index")
plt.tight_layout()
plt.show()This heatmap gives a quick visual summary of conserved blocks, mismatch-rich regions, and gap-heavy segments. It is a practical diagnostic before selecting regions for phylogenetic or marker analyses. ## Exporting a Subset of Sequences
from Bio import AlignIO
from Bio.Align import MultipleSeqAlignment
# Read alignment and keep only first five sequences as an example subset
alignment = AlignIO.read("opuntia.aln", "clustal")
subset = MultipleSeqAlignment(alignment[:5])
AlignIO.write(subset, "opuntia_subset.fasta", "fasta")
print("Subset size:", len(subset))
print("Wrote opuntia_subset.fasta")Subsetting is useful when preparing focused comparisons for one clade, one sample group, or a small demo dataset for collaborators.