The Python SDK provides a comprehensive set of Parser classes for parsing and processing Aird format mass spectrometry data files. These Parser classes are located in the Parser module and support various mass spectrometry data acquisition modes including DDA, DIA, PRM, etc.
BaseParser is the base class for all Parser classes, providing common file operations and data parsing functionality.
Main Features:
- File path management and validation
- AirdInfo metadata loading
- Compressor configuration and management
- Random access file reading
Constructor:
# Create parser with index file path
parser = BaseParser("/path/to/index.json")Main Methods:
getSpectrum()- Get single spectrumgetSpectra()- Get multiple spectragetSpectrumByIndex()- Get spectrum by indexgetSpectrumByRt()- Get spectrum by retention timegetMzs()- Get m/z arraygetInts()- Get intensity arraygetMobilities()- Get mobility array (PASEF data)
DDAParser is specifically designed for parsing Data-Dependent Acquisition (DDA) mode mass spectrometry data.
Main Methods:
getMs1Index()- Get MS1 indexgetAllMs2Index()- Get all MS2 indicesgetMs2IndexMap()- Get MS2 index map keyed by parentNumreadAllToMemory()- Load all DDA data into memory at oncegetMs1SpectraMap()- Return MS1 spectra RT mappinggetSpectraByRtRange()- Get spectra by retention time range
DIAParser is used for parsing Data-Independent Acquisition (DIA) mode mass spectrometry data.
PRMParser is used for parsing Parallel Reaction Monitoring (PRM) mode data.
# 1. Import necessary modules
from Parser.DDAParser import DDAParser
from Beans.Common.Spectrum import Spectrum
# 2. Create parser instance
parser = DDAParser("/path/to/dda_data.json")
# 3. Get file information
aird_info = parser.airdInfo
print(f"File type: {aird_info.type}")
# 4. Read data into memory
dda_data = parser.readAllToMemory()
# 5. Process spectrum data
for dda_ms in dda_data:
rt = dda_ms.rt
spectrum = dda_ms.spectrum
# Process each MS1 spectrum
mz_array = spectrum.mzs
intensity_array = spectrum.ints
# 6. Release resources
parser.airdFile.close()# Query spectra with retention time between 10-20 minutes
spectra = parser.getSpectraByRtRange(10.0, 20.0)
for spectrum in spectra:
# Process each spectrum
mz_array = spectrum.mzs
intensity_array = spectrum.ints# Get specific spectrum by index
spectrum = parser.getSpectrumByIndex(block_index, 5) # Get 6th spectrum
# Get spectrum by retention time
spectrum = parser.getSpectrumByRt(block_index, rt_list, mz_offsets, int_offsets, 15.5)# Get all spectra from a data block
spectra_map = parser.getSpectra(block_index.startPtr, block_index.endPtr,
block_index.rts, block_index.mzs, block_index.ints)
for rt, spectrum in spectra_map.items():
print(f"Retention time: {rt}")
print(f"Number of m/z values: {len(spectrum.mzs)}")Represents a single mass spectrometry spectrum.
Main Properties:
mzs- m/z arrayints- Intensity arrayrt- Retention time
Constructor:
spectrum = Spectrum(mz_array, intensity_array, retention_time)Represents Aird file metadata information.
Main Properties:
type- File type (DDA, DIA, PRM, etc.)compressors- Compressor listindexList- Index list
Represents data block index information.
Main Properties:
startPtr- Data block start positionendPtr- Data block end positionrts- Retention time arraymzs- m/z offset arrayints- Intensity offset array
try:
parser = DDAParser(file_path)
# Use parser
data = parser.readAllToMemory()
# Process data
finally:
if parser.airdFile:
parser.airdFile.close()For large files, avoid loading all data at once:
# Process data in batches
spectra = parser.getSpectraByRtRange(start_rt, end_rt)try:
parser = DDAParser(file_path)
if parser.airdInfo is None:
raise ValueError("Invalid Aird file")
except Exception as e:
print(f"File reading error: {e}")# Custom context manager
class AirdParser:
def __init__(self, file_path):
self.parser = DDAParser(file_path)
def __enter__(self):
return self.parser
def __exit__(self, exc_type, exc_val, exc_tb):
if self.parser.airdFile:
self.parser.airdFile.close()
# Usage
with AirdParser(file_path) as parser:
data = parser.readAllToMemory()
# Process dataA: Check the parser.airdInfo.type property and select the appropriate parser based on file type.
A: The parser automatically handles data decompression, no manual intervention required.
A: Get complete file metadata through the parser.airdInfo property.
A: Supports various compression algorithms including Zstd, Brotli, Snappy, Zlib, etc.
- Batch Processing: Use batch operation methods like
getSpectra()whenever possible - Memory Management: Use streaming processing for large files to avoid memory overflow
- Caching Strategy: Cache frequently accessed data appropriately
- Parallel Processing: Consider using multiprocessing for different data blocks in multi-core environments
from Compressor.ByteComp.ZstdWrapper import ZstdWrapper
from Compressor.IntComp.VarByteWrapper import VarByteWrapper
# Create custom compressor
custom_compressor = Compressor()
custom_compressor.methods = ['VB', 'Zstd']
custom_compressor.precision = 0.001def process_spectrum(spectrum):
"""Custom spectrum processing function"""
# Filter low intensity signals
threshold = 100
filtered_mzs = []
filtered_ints = []
for mz, intensity in zip(spectrum.mzs, spectrum.ints):
if intensity > threshold:
filtered_mzs.append(mz)
filtered_ints.append(intensity)
return Spectrum(filtered_mzs, filtered_ints, spectrum.rt)
# Apply processing pipeline
processed_spectra = [process_spectrum(s) for s in spectra]This document is based on Python SDK version: 1.0.0