ABI files (`.ab1`) are produced by **Sanger DNA sequencing instruments** from companies such as Applied Biosystems. Unlike [FASTA](/tutorials/biopython-fasta-files) or [FASTQ files](/tutorials/biopython-fastq-files), ABI files contain much more information: - the DNA sequence - base quality scores - raw chromatogram trace data - sequencing metadata (instrument, run parameters, etc.) These files are commonly used when analyzing **Sanger sequencing results**, verifying cloned DNA sequences, or checking PCR products. Biopython provides built-in support for reading ABI files through the `Bio.SeqIO` module. In this tutorial, you'll learn how to: - download an example ABI file - read ABI sequencing data - access sequence and quality scores - examine metadata stored in the file - extract chromatogram trace data - convert ABI data to FASTA or FASTQ These skills are useful for building automated Sanger sequencing analysis pipelines. --- ## Downloading an Example ABI File First, let's download a small example ABI file that we can analyze.
import requests
url = "https://raw.githubusercontent.com/biopython/biopython/master/Tests/Abi/3100.ab1"
response = requests.get(url)
response.raise_for_status()
with open("example.ab1", "wb") as f:
f.write(response.content)
print("Downloaded example.ab1")This code downloads a real ABI sequencing file from the Biopython repository and saves it locally as `example.ab1`. The file contains Sanger sequencing data including the base calls and chromatogram traces. --- ## Reading an ABI File Biopython reads ABI files using `SeqIO.read()` with the `"abi"` format.
from Bio import SeqIO
record = SeqIO.read("example.ab1", "abi")
print("Sequence ID:", record.id)
print("Sequence length:", len(record.seq))
print("First 50 bases:", record.seq[:50])ABI files contain a **single sequencing read**, so `SeqIO.read()` is used instead of `SeqIO.parse()`. The returned object is a `SeqRecord` containing the base-called sequence from the chromatogram. --- ## Accessing Base Quality Scores Sanger sequencing also produces quality scores that estimate the confidence of each base call.
from Bio import SeqIO
record = SeqIO.read("example.ab1", "abi")
qualities = record.letter_annotations["phred_quality"]
print("Number of quality scores:", len(qualities))
print("First 20 quality scores:", qualities[:20])The quality scores are stored in `record.letter_annotations["phred_quality"]`. Each number corresponds to the confidence of the base call at the same position in the sequence. Higher scores indicate more reliable base calls. --- ## Calculating Average Read Quality You can quickly estimate overall sequencing quality by computing the average PHRED score.
from Bio import SeqIO
record = SeqIO.read("example.ab1", "abi")
qualities = record.letter_annotations["phred_quality"]
average_quality = sum(qualities) / len(qualities)
print("Average read quality:", round(average_quality, 2))This calculation can help determine whether the sequencing run produced reliable results or whether trimming might be necessary. --- ## Exploring ABI Metadata ABI files contain many additional metadata fields describing the sequencing run. Biopython stores these in the `record.annotations["abif_raw"]` dictionary.
from Bio import SeqIO
record = SeqIO.read("example.ab1", "abi")
metadata = record.annotations["abif_raw"]
print("Number of metadata entries:", len(metadata))
for key in list(metadata.keys())[:10]:
print(key)The `abif_raw` dictionary stores low-level data extracted from the ABI file structure. These entries include information such as: - instrument name - run parameters - base call data - trace intensities Exploring these values can help you understand how the sequencing run was performed. --- ## Accessing Chromatogram Trace Data ABI files store the raw fluorescence signal for each nucleotide (A, C, G, T). These traces create the familiar chromatogram peaks used to determine base calls.
from Bio import SeqIO
record = SeqIO.read("example.ab1", "abi")
raw_data = record.annotations["abif_raw"]
trace_a = raw_data["DATA9"]
trace_c = raw_data["DATA10"]
trace_g = raw_data["DATA11"]
trace_t = raw_data["DATA12"]
print("Trace length:", len(trace_a))
print("First 10 A-channel values:", trace_a[:10])Each trace corresponds to fluorescence intensity detected for a particular nucleotide during sequencing. Typical trace channels include: - `DATA9` → A channel - `DATA10` → C channel - `DATA11` → G channel - `DATA12` → T channel These signals are used by base-calling software to determine the DNA sequence. --- ## Converting ABI Files to FASTA Sometimes you want to extract just the sequence and store it in FASTA format.
from Bio import SeqIO
record = SeqIO.read("example.ab1", "abi")
record = SeqIO.read("example.ab1", "abi")
SeqIO.write(record, "sequence.fasta", "fasta")This writes the base-called DNA sequence into a FASTA file. This is useful when preparing sequences for [alignment](/tutorials/biopython-pairwise-sequence-alignment) or [BLAST searches](/tutorials/biopython-blast). --- ## Converting ABI Files to FASTQ You can also convert ABI files to FASTQ format, which includes both the sequence and the quality scores.
from Bio import SeqIO
record = SeqIO.read("example.ab1", "abi")
record = SeqIO.read("example.ab1", "abi")
SeqIO.write(record, "sequence.fastq", "fastq")The resulting FASTQ file preserves both the sequence and PHRED quality values, which can be useful when integrating Sanger reads with next-generation sequencing workflows. --- ## Inspecting All Available ABI Tags If you want to see all the available ABI data fields, you can list them.
from Bio import SeqIO
record = SeqIO.read("example.ab1", "abi")
for tag in record.annotations["abif_raw"]:
print(tag)This will display all ABI tags stored in the file. Different sequencing instruments may include different tags. --- ## Conclusion ABI files contain rich sequencing information including the base-called DNA sequence, quality scores, chromatogram traces, and instrument metadata. Biopython makes it easy to access and analyze all of this information directly from Python. In this tutorial you learned how to: - read `.ab1` ABI sequencing files - access sequences and PHRED quality scores - inspect metadata from the sequencing run - extract chromatogram trace data - convert ABI files to FASTA or FASTQ These techniques are useful when analyzing **Sanger sequencing data**, verifying DNA constructs, or building automated analysis tools for sequencing workflows. For more advanced use cases, you can combine these techniques with plotting libraries to visualize chromatograms or integrate the sequences into alignment and variant detection pipelines.