Skip to content

Trouble with using 'circ_quant' function (CLEAR with STAR Alignment) #23

@jennynuyirs

Description

@jennynuyirs

Hello! I am having some trouble getting the circ_quant function to work. My code is as follows:

circ_quant -c "$name/circRNA_out/circularRNA_known.txt" -b "$name/Aligned.sortedByCoord.out.bam" -r "$ref_genome.ref.txt" -o "$name.circRNA_quant.txt"

It produces the error AttributeError: ‘list’ object has no attribute ‘split’ (line 83 of circ_quant.py). It seems like the BAM file input is having trouble being split because the elements are not strings, but I'm skeptical this is actually the case because fixing it would require changing the source code (probably not a good idea).

I am fairly new to bioinformatics and only somewhat experienced with coding, so I'm unsure how to proceed from here. Any potential solutions or suggestions for debugging would be immensely helpful.

I've included the full pipeline below, which is a slightly modified version of @bounlu 's CLEAR with STAR Alignment pipeline. I've tested all the steps separately, which work as they should except the very last circ_quant step.

# define parameters
file_extension="_R1_001.fastq.gz"
read_length=100
ref_genome="hg38"

# make output directories
mkdir "STAR_$ref_genome"
mkdir "STAR_$ref_genome/$read_length"

# download reference files
fetch_ucsc.py "$ref_genome" fa "$ref_genome.fa"
fetch_ucsc.py "$ref_genome" ref "$ref_genome.ref.txt"
cut -f2-11 "$ref_genome.ref.txt" | genePredToGtf file stdin "$ref_genome.ref.gtf"

# generate genome index file
STAR --runMode genomeGenerate --genomeDir "STAR_$ref_genome/$read_length" --limitIObufferSize 1000000000 --runThreadN 16 --genomeFastaFiles "$ref_genome.fa" --outFileNamePrefix ./ --sjdbGTFfile "$ref_genome.ref.gtf" --sjdbOverhang "$(($read_length-1))"

# run pipeline
for read1 in $(ls *$file_extension);
do
        name="${read1%$file_extension}"
        read2="${name}_R2_001.fastq.gz"
        mkdir -p "$name"
        STAR --chimSegmentMin 20 --runThreadN 16 --genomeLoad LoadAndRemove --limitBAMsortRAM 50000000000 --limitIObufferSize 1000000000 --outSAMtype BAM SortedByCoordinate --readFilesCommand zcat --outFileNamePrefix "$name/" --genomeDir "STAR_$ref_genome/$read_length" --readFilesIn "$read1" "$read2" > "$name/$name.circRNA_alignment.log" 2>&1
        samtools index "$name/Aligned.sortedByCoord.out.bam"
        fast_circ.py parse -r "$ref_genome.ref.txt" -g "$ref_genome.fa" -t STAR -o "$name/circRNA_out" "$name/Chimeric.out.junction" > "$name/$name.circRNA_parse.log" 2>&1
        circ_quant -c "$name/circRNA_out/circularRNA_known.txt" -b "$name/Aligned.sortedByCoord.out.bam" -r "$ref_genome.ref.txt" -o "$name.circRNA_quant.txt" > "$name/$name.circRNA_quant.log" 2>&1
done

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions