Skip to content

Added fastq_filtrator.py#2

Open
ibragimovaamina wants to merge 6 commits into
mainfrom
homework_2
Open

Added fastq_filtrator.py#2
ibragimovaamina wants to merge 6 commits into
mainfrom
homework_2

Conversation

@ibragimovaamina

Copy link
Copy Markdown
Owner

Added fastq_filtrator.py (homework 2)

@krglkvrmn krglkvrmn left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Прекрасная работа, маленькие замечания ниже)

Comment thread hw2/fastq_filtrator.py
@@ -0,0 +1,96 @@
# This function returns GC-content of sequence in percents
def gc_content(read_seq):
return sum(map(read_seq.upper().count, ['G','C'])) / len(read_seq.strip()) * 100

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Прикольно, конечно, но для такой простой задачки как посчитать ГЦ-состав, это оверкил.

+ это не очень оптимально по производительности. count внутри себя делает полный проход по строке, и мы вызываем его два раза, хотя GC можно посчитать за один проход. Но это мелочи)

Comment thread hw2/fastq_filtrator.py

# This function checks whether GC-content of sequence is in bounds
def gc_filter(read_seq, gc_bounds):
if isinstance(gc_bounds, (tuple, list)):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment thread hw2/fastq_filtrator.py
Comment on lines +12 to +15
if gc_content(read_seq) >= min_gc and gc_content(read_seq) <= max_gc: # GC-content is in bounds
return 'Pass'
else:
return 'Fail'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

В подобных случаях не стоит возвращать строки, для этого идеально подходят True и False. Тогда это ещё можно будет сильно упростить

Suggested change
if gc_content(read_seq) >= min_gc and gc_content(read_seq) <= max_gc: # GC-content is in bounds
return 'Pass'
else:
return 'Fail'
return min_gc <= gc_content(read_seq) <= max_gc: # GC-content is in bounds

Comment thread hw2/fastq_filtrator.py
@@ -0,0 +1,96 @@
# This function returns GC-content of sequence in percents
def gc_content(read_seq):
return sum(map(read_seq.upper().count, ['G','C'])) / len(read_seq.strip()) * 100

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Можно сделать strip сразу при прочтении рида, чтобы не писать его в каждой функции

Comment thread hw2/fastq_filtrator.py
Comment on lines +77 to +80
read_header = file.readline() ## I'm not stripping '\n' here consciously
read_seq = file.readline() ## for convenient appending to output files.
read_plus_str = file.readline() ## I'm going to take '\n' into account
read_qual = file.readline() ## when I'll be counting length of these strings.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Вот как раз вот тут можно вставить strip

Suggested change
read_header = file.readline() ## I'm not stripping '\n' here consciously
read_seq = file.readline() ## for convenient appending to output files.
read_plus_str = file.readline() ## I'm going to take '\n' into account
read_qual = file.readline() ## when I'll be counting length of these strings.
read_header = file.readline().strip() ## I'm not stripping '\n' here consciously
read_seq = file.readline().strip() ## for convenient appending to output files.
read_plus_str = file.readline().strip() ## I'm going to take '\n' into account
read_qual = file.readline().strip() ## when I'll be counting length of these strings.

Comment thread hw2/fastq_filtrator.py
Comment on lines +84 to +95
if (gc_filter(read_seq, gc_bounds) == 'Pass' and
len_filter(read_seq, length_bounds) == 'Pass' and
qual_filter(read_qual, quality_threshold) == 'Pass'):
# Appending read to output files
add_read_to_passed(read_header, read_seq,
read_plus_str, read_qual,
output_file_prefix)
else:
if save_filtered == True:
add_read_to_failed(read_header, read_seq,
read_plus_str, read_qual,
output_file_prefix)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Очень удобно, разделила логику

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants