Tutorials

Downloading and Aligning Homologous Genes with Biopython

Homologous gene comparisons are a core task in comparative genomics. A practical workflow usually starts with downloading related sequences, checking quality, and aligning pairs before deeper analyses like tree building.

In this tutorial, you will download homologous [nucleotide sequences](/tutorials/biopython-nucleotide-sequences) and align them with Biopython.

## Downloading Homologous Gene Sequences

import requests
from Bio import SeqIO

# Download three homologous nucleotide sequences from related plant examples
urls = {
    "lupine.nu": "https://raw.githubusercontent.com/biopython/biopython/master/Tests/Fasta/lupine.nu",
    "lavender.nu": "https://raw.githubusercontent.com/biopython/biopython/master/Tests/Fasta/lavender.nu",
    "phlox.nu": "https://raw.githubusercontent.com/biopython/biopython/master/Tests/Fasta/phlox.nu",
}

records = []
for filename, url in urls.items():
    response = requests.get(url, timeout=30)
    response.raise_for_status()
    with open(filename, "w", encoding="utf-8") as f:
        f.write(response.text)
    records.append(SeqIO.read(filename, "fasta"))

print("Downloaded records:", [r.id for r in records])
print("Sequence lengths:", [len(r.seq) for r in records])
Downloaded records: ['gi|5049839|gb|AI730987.1|AI730987', 'gi|5690369|gb|AF158246.1|AF158246', 'gi|5052071|gb|AF067555.1|AF067555']
Sequence lengths: [655, 550, 623]
This block downloads and stores homologous sequences locally so later alignment steps can be rerun without additional network calls.

## Pairwise Global Alignment of Homologs

from Bio import SeqIO
from Bio.Align import PairwiseAligner

# Read local FASTA records
lupine = SeqIO.read("lupine.nu", "fasta")
lavender = SeqIO.read("lavender.nu", "fasta")

# Configure a global aligner for homolog comparison
aligner = PairwiseAligner()
aligner.mode = "global"
aligner.match_score = 2
aligner.mismatch_score = -1
aligner.open_gap_score = -3
aligner.extend_gap_score = -1

score = aligner.score(str(lupine.seq), str(lavender.seq))
alignment = aligner.align(str(lupine.seq), str(lavender.seq))[0]

print("Alignment score:", score)
print(alignment)
Alignment score: 185.0
target            0 GAAAATTCATTTTCTTTGGACTTTCTCTGAAATCCGAGTCCTAGGAAAG-ATGCGTGAGA
                  0 -----------------||-----||||.||.|-|..||-|||||.|.|-.|||...||.
query             0 -----------------GG-----CTCTTAAGT-CATGT-CTAGGCAGGTGTGCACAAGT

target           59 TT-CTTCATATTCAGGCAGGGCA-GTGCGGCAACCAGATCGGTGCCAATTTTTGGGAGGT
                 60 ||-..||.|.|||.|.||--|||-.||.||-----||||.||-|-------||||||..|
query            36 TTAGGTCTTGTTCCGCCA--GCATCTGAGG-----AGATTGG-G-------TTGGGAACT

target          117 GGTTTGC--GCCGAACA-CGGCATAAATTC---AACGGGA---CGCTATCAAGGGGACAA
                120 |||.|||--|.|-|.||-|.|..|..|..|---|.|||||---|.||-|||...|--||.
query            81 GGTATGCATGTC-ACCATCTGAGTTTAGCCCAGATCGGGAGTCCTCT-TCATCAG--CAT

target          168 TGATTTGCAACTGGAGCGCGTTAACGTTTACTACAATGAAGCGAGT--TGC-GGGAGGTT
                180 |||---|.||.|.|||...|..|..|.|--|-|.|||||.|.|.||--.||-.|||..|.
query           137 TGA---GAAAGTAGAGACTGGCACAGGT--C-ATAATGAGGTGGGTCAAGCATGGATATG

target          225 CGTTCCACGAGCCGTCCTTATGGACTTGG---AGCCTGGCACTATGGAT-AGTGTTAGAT
                240 .|..||..|..|.|.|.|.|..||...||---||.||.|||-||.|.||-||..||...|
query           191 AGGCCCTGGTTCTGCCATCACCGAGACGGCAAAGGCTAGCA-TAGGTATGAGACTTCCTT

target          281 CCGGGCCGTAC-GGGCAGATTTTCAGGCCTGACAACTTTGTTTTCGGCCAGTCCGG--TG
                300 ||.|.|.||.|-|.|.|-|.|...|-||||-|||--...||||..||||--|..||--||
query           250 CCTGCCAGTGCTGAGGA-ACTGGGA-GCCT-ACA--GAGGTTTGAGGCC--TTGGGTCTG

target          338 CAGGGAATAACTGGGCCAAGGGTCATTACACGGAAGGGGCCGAGTTGATCG--ACTCANT
                360 |-..|||.||--.|||||--|.|------.|||.|---.|....||||.||--.||||-.
query           303 C-CAGAAGAA--AGGCCA--GCT------GCGGTA---ACATCTTTGACCGTCCCTCA-C

target          396 GCTAG-ATGTGGTG-AGGAAAGAAGCCGAAAACTGTGACTGCTTGCAAGGATTTCAAGTA
                420 |-|||-.||...||-.||..||.|--.|||.|--||.|||..||-|..||||.|...||.
query           348 G-TAGTCTGCCTTGCTGGCCAGCA--TGAAGA--GTCACTAGTT-CCTGGATCTGCTGTC

target          454 TGCCATTCATTGGGAAGAGGAACTGGATCGGGCATGGGAACACTTTTGATTTCCAAGATC
                480 ||--|---|.|||||||||--------|||||-.||||..|||---------||----||
query           402 TG--A---ACTGGGAAGAG--------TCGGG-TTGGGTCCAC---------CC----TC

target          514 CGAGAAGAATACCCAGACCGAATGATGCTCACTTTCTCTGTGTTC---CCTTCCCCTAAG
                540 |-||.....||-||.|-..|||-|||--|..|||--||||.|..|---..|..|...|.|
query           435 C-AGGCTGTTA-CCTG-TGGAA-GAT--TGGCTT--TCTGGGGACAGGTGTGGCAGGAGG

target          571 GTGTCTGACACTGTTGTCGAACCTTACAATGCTACTCTCTCTGTTCACCANCTTGTTGAG
                600 |.|...|..|-.|.|||-|--||-|.|..|.||.|..|..|..|||-----|||.|----
query           487 GAGGGGGGTA-GGGTGT-G--CC-TGCCCTCCTCCAGTGCCGTTTC-----CTTCT----

target          631 AATGCTGATGAATGTATGGTTTTT 655
                660 -.|.||..||--.|||-||---|. 684
query           533 -CTCCTAGTG--GGTA-GG---TG 550

Pairwise global alignment is a practical first comparison when you want to quantify overall similarity between two homologs.

## Ranking Homolog Similarity Across Multiple Pairs

from Bio import SeqIO
from Bio.Align import PairwiseAligner
from itertools import combinations

records = [
    SeqIO.read("lupine.nu", "fasta"),
    SeqIO.read("lavender.nu", "fasta"),
    SeqIO.read("phlox.nu", "fasta"),
]

aligner = PairwiseAligner()
aligner.mode = "global"
aligner.match_score = 2
aligner.mismatch_score = -1
aligner.open_gap_score = -3
aligner.extend_gap_score = -1

pair_scores = []
for r1, r2 in combinations(records, 2):
    score = aligner.score(str(r1.seq), str(r2.seq))
    pair_scores.append((r1.id, r2.id, score))

pair_scores.sort(key=lambda x: x[2], reverse=True)
print("Pairwise homolog ranking:")
for item in pair_scores:
    print(item)
Pairwise homolog ranking:
('gi|5049839|gb|AI730987.1|AI730987', 'gi|5052071|gb|AF067555.1|AF067555', 228.0)
('gi|5690369|gb|AF158246.1|AF158246', 'gi|5052071|gb|AF067555.1|AF067555', 195.0)
('gi|5049839|gb|AI730987.1|AI730987', 'gi|5690369|gb|AF158246.1|AF158246', 185.0)
Ranking pairwise scores gives a practical similarity overview before selecting candidates for [phylogenetic tree construction](/tutorials/biopython-phylogenetic-trees).

## Preparing a Combined [FASTA](/tutorials/biopython-fasta-files) for Downstream MSA Tools

from Bio import SeqIO

records = [
    SeqIO.read("lupine.nu", "fasta"),
    SeqIO.read("lavender.nu", "fasta"),
    SeqIO.read("phlox.nu", "fasta"),
]

SeqIO.write(records, "homologs_combined.fasta", "fasta")
print("Wrote homologs_combined.fasta")
Wrote homologs_combined.fasta
Exporting a combined FASTA is a practical handoff step to external MSA tools or collaborative workflows where one shared input file is needed.