Python SDK Parser Usage Guide

Overview

The Python SDK provides a comprehensive set of Parser classes for parsing and processing Aird format mass spectrometry data files. These Parser classes are located in the Parser module and support various mass spectrometry data acquisition modes including DDA, DIA, PRM, etc.

Core Parser Classes

1. BaseParser (Base Parser)

BaseParser is the base class for all Parser classes, providing common file operations and data parsing functionality.

Main Features:

File path management and validation
AirdInfo metadata loading
Compressor configuration and management
Random access file reading

Constructor:

# Create parser with index file path
parser = BaseParser("/path/to/index.json")

Main Methods:

getSpectrum() - Get single spectrum
getSpectra() - Get multiple spectra
getSpectrumByIndex() - Get spectrum by index
getSpectrumByRt() - Get spectrum by retention time
getMzs() - Get m/z array
getInts() - Get intensity array
getMobilities() - Get mobility array (PASEF data)

2. DDAParser (DDA Mode Parser)

DDAParser is specifically designed for parsing Data-Dependent Acquisition (DDA) mode mass spectrometry data.

Main Methods:

getMs1Index() - Get MS1 index
getAllMs2Index() - Get all MS2 indices
getMs2IndexMap() - Get MS2 index map keyed by parentNum
readAllToMemory() - Load all DDA data into memory at once
getMs1SpectraMap() - Return MS1 spectra RT mapping
getSpectraByRtRange() - Get spectra by retention time range

3. DIAParser (DIA Mode Parser)

DIAParser is used for parsing Data-Independent Acquisition (DIA) mode mass spectrometry data.

4. PRMParser (PRM Mode Parser)

PRMParser is used for parsing Parallel Reaction Monitoring (PRM) mode data.

Usage Examples

Basic Workflow

# 1. Import necessary modules
from Parser.DDAParser import DDAParser
from Beans.Common.Spectrum import Spectrum

# 2. Create parser instance
parser = DDAParser("/path/to/dda_data.json")

# 3. Get file information
aird_info = parser.airdInfo
print(f"File type: {aird_info.type}")

# 4. Read data into memory
dda_data = parser.readAllToMemory()

# 5. Process spectrum data
for dda_ms in dda_data:
    rt = dda_ms.rt
    spectrum = dda_ms.spectrum
    # Process each MS1 spectrum
    mz_array = spectrum.mzs
    intensity_array = spectrum.ints

# 6. Release resources
parser.airdFile.close()

Query by Retention Time Range

# Query spectra with retention time between 10-20 minutes
spectra = parser.getSpectraByRtRange(10.0, 20.0)
for spectrum in spectra:
    # Process each spectrum
    mz_array = spectrum.mzs
    intensity_array = spectrum.ints

Get Specific Spectrum

# Get specific spectrum by index
spectrum = parser.getSpectrumByIndex(block_index, 5)  # Get 6th spectrum

# Get spectrum by retention time
spectrum = parser.getSpectrumByRt(block_index, rt_list, mz_offsets, int_offsets, 15.5)

Batch Spectrum Retrieval

# Get all spectra from a data block
spectra_map = parser.getSpectra(block_index.startPtr, block_index.endPtr, 
                                block_index.rts, block_index.mzs, block_index.ints)

for rt, spectrum in spectra_map.items():
    print(f"Retention time: {rt}")
    print(f"Number of m/z values: {len(spectrum.mzs)}")

Data Models

Spectrum Class

Represents a single mass spectrometry spectrum.

Main Properties:

mzs - m/z array
ints - Intensity array
rt - Retention time

Constructor:

spectrum = Spectrum(mz_array, intensity_array, retention_time)

AirdInfo Class

Represents Aird file metadata information.

Main Properties:

type - File type (DDA, DIA, PRM, etc.)
compressors - Compressor list
indexList - Index list

BlockIndex Class

Represents data block index information.

Main Properties:

startPtr - Data block start position
endPtr - Data block end position
rts - Retention time array
mzs - m/z offset array
ints - Intensity offset array

Best Practices

1. Resource Management

try:
    parser = DDAParser(file_path)
    # Use parser
    data = parser.readAllToMemory()
    # Process data
finally:
    if parser.airdFile:
        parser.airdFile.close()

2. Memory Optimization

For large files, avoid loading all data at once:

# Process data in batches
spectra = parser.getSpectraByRtRange(start_rt, end_rt)

3. Exception Handling

try:
    parser = DDAParser(file_path)
    if parser.airdInfo is None:
        raise ValueError("Invalid Aird file")
except Exception as e:
    print(f"File reading error: {e}")

4. Using Context Manager

# Custom context manager
class AirdParser:
    def __init__(self, file_path):
        self.parser = DDAParser(file_path)
    
    def __enter__(self):
        return self.parser
    
    def __exit__(self, exc_type, exc_val, exc_tb):
        if self.parser.airdFile:
            self.parser.airdFile.close()

# Usage
with AirdParser(file_path) as parser:
    data = parser.readAllToMemory()
    # Process data

Frequently Asked Questions

Q: How to determine if a file supports a specific parser?

A: Check the parser.airdInfo.type property and select the appropriate parser based on file type.

Q: How to handle compressed data?

A: The parser automatically handles data decompression, no manual intervention required.

Q: How to get file metadata information?

A: Get complete file metadata through the parser.airdInfo property.

Q: What compression algorithms does Python SDK support?

A: Supports various compression algorithms including Zstd, Brotli, Snappy, Zlib, etc.

Performance Recommendations

Batch Processing: Use batch operation methods like getSpectra() whenever possible
Memory Management: Use streaming processing for large files to avoid memory overflow
Caching Strategy: Cache frequently accessed data appropriately
Parallel Processing: Consider using multiprocessing for different data blocks in multi-core environments

Extended Features

Custom Compressor

from Compressor.ByteComp.ZstdWrapper import ZstdWrapper
from Compressor.IntComp.VarByteWrapper import VarByteWrapper

# Create custom compressor
custom_compressor = Compressor()
custom_compressor.methods = ['VB', 'Zstd']
custom_compressor.precision = 0.001

Data Processing Pipeline

def process_spectrum(spectrum):
    """Custom spectrum processing function"""
    # Filter low intensity signals
    threshold = 100
    filtered_mzs = []
    filtered_ints = []
    
    for mz, intensity in zip(spectrum.mzs, spectrum.ints):
        if intensity > threshold:
            filtered_mzs.append(mz)
            filtered_ints.append(intensity)
    
    return Spectrum(filtered_mzs, filtered_ints, spectrum.rt)

# Apply processing pipeline
processed_spectra = [process_spectrum(s) for s in spectra]

This document is based on Python SDK version: 1.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python SDK Parser Usage Guide

Overview

Core Parser Classes

1. BaseParser (Base Parser)

2. DDAParser (DDA Mode Parser)

3. DIAParser (DIA Mode Parser)

4. PRMParser (PRM Mode Parser)

Usage Examples

Basic Workflow

Query by Retention Time Range

Get Specific Spectrum

Batch Spectrum Retrieval

Data Models

Spectrum Class

AirdInfo Class

BlockIndex Class

Best Practices

1. Resource Management

2. Memory Optimization

3. Exception Handling

4. Using Context Manager

Frequently Asked Questions

Q: How to determine if a file supports a specific parser?

Q: How to handle compressed data?

Q: How to get file metadata information?

Q: What compression algorithms does Python SDK support?

Performance Recommendations

Extended Features

Custom Compressor

Data Processing Pipeline

FilesExpand file tree

Python_SDK_Parser_Usage_Guide_English.md

Latest commit

History

Python_SDK_Parser_Usage_Guide_English.md

File metadata and controls

Python SDK Parser Usage Guide

Overview

Core Parser Classes

1. BaseParser (Base Parser)

2. DDAParser (DDA Mode Parser)

3. DIAParser (DIA Mode Parser)

4. PRMParser (PRM Mode Parser)

Usage Examples

Basic Workflow

Query by Retention Time Range

Get Specific Spectrum

Batch Spectrum Retrieval

Data Models

Spectrum Class

AirdInfo Class

BlockIndex Class

Best Practices

1. Resource Management

2. Memory Optimization

3. Exception Handling

4. Using Context Manager

Frequently Asked Questions

Q: How to determine if a file supports a specific parser?

Q: How to handle compressed data?

Q: How to get file metadata information?

Q: What compression algorithms does Python SDK support?

Performance Recommendations

Extended Features

Custom Compressor

Data Processing Pipeline