Review CEP170B by nvaulin · Pull Request #40 · Python-BI-2023/Peer_review

nvaulin · 2024-02-26T17:57:48Z

Review CEP170B

zmitserbio

В целом, работа оставляет положительное впечатление. Код читабельный, в основном рабочий.
Однако есть проблема с нереализованностью функционала, и в будущем имеет смысл использовать линтер.

zmitserbio · 2024-03-09T21:42:20Z

+from abc import ABC, abstractmethod
+from Bio import SeqIO
+from Bio.SeqUtils import gc_fraction
+from typing import Tuple


Насколько мне известно, после python 3.9 не рекомендуется использовать Tuple, хотя работе кода это не мешает.

zmitserbio · 2024-03-09T21:44:00Z

+def filter_fastq(input_path: str, 
+                 output_filename: str = None, 
+                 gc_bounds: Tuple[int, int] = (0, 100), 
+                 length_bounds: Tuple[int, int] = (0, 2**32), 


Есть некоторое нарушение pep8: не следует оставлять пробелы на концах строк.

zmitserbio · 2024-03-09T21:44:56Z

+                 quality_threshold: int = 0) -> None:
+    '''
+    Filter FASTQ-sequences based on entered requirements.
+


Следует убрать \t.

zmitserbio · 2024-03-09T22:00:51Z

+    Arguments:
+        - input_path (str): path to the file with FASTQ-sequences
+        - output_filename (str): name of the output file with 
+        filtered FASTQ-sequences
+        - gc_bounds (tuple or int, default = (0, 100)): GC-content
+        interval (percentage) for filtering. Tuple if contains 
+        lower and upper bounds, int if only contains an upper bound.
+        - length_bounds (tuple or int, default = (0, 2**32)): length 
+        interval for filtering. Tuple if contains lower and upper 
+        bounds, int if only contains an upper bound.
+        - quality_threshold (int, default = 0): threshold value of average 
+        read quality for filtering.
+
+    Note: the output file is saved to the /fastq_filtrator_results 
+    directory. The default output file name is the name of the input file.


В некоторых строках имеются пробелы на концах.
Гораздо более существенно то, что функция не соответствует заявленному функционалу. Если подать int аргументам gc_bounds и length_bounds, то падает с ошибкой, т.к. не реализован перевод этого int в соответствующий tuple. Это можно было реализовать, например, так:

if isinstance(gc_bounds, int): gc_bounds = tuple([0, gc_bounds]) if isinstance(length_bounds, int): length_bounds = tuple([0, length_bounds])

Возможно, в будущем имеет смысл писать себе pass или комментарии, чтобы не забывать.

zmitserbio · 2024-03-09T22:05:38Z

+        records = [record for record in SeqIO.parse(handle, "fastq")]
+
+    filtered_records = []
+    for record in records:


Здесь была возможность использовать SeqIO:

Suggested change

for record in records:

for i, record in enumerate(SeqIO.parse(input_path, "fastq")):

zmitserbio · 2024-03-09T22:10:30Z

+    print(f"Filtered sequences saved to {output_path}")
+
+
+class BiologicalSequence(ABC):


Насколько мне известно, классы более принято располагать перед функциями.

Все так

zmitserbio · 2024-03-09T22:13:22Z

+
+
+class BiologicalSequence(ABC):
+    def __init__(self, seq: str = None):


Хотя это и не запрещено, в лекции/консультации упоминалось, что лучше не писать код в абстрактном классе.

zmitserbio · 2024-03-09T22:17:45Z

+        return self.seq
+
+    def check_alphabet(self):
+        return set(self.seq.upper()).issubset(self.ALPHABET)


Считаю это достаточно изящным решением.

zmitserbio · 2024-03-09T22:35:03Z

+    def __repr__(self):
+        return self.seq
+
+    def check_alphabet(self):


Я думаю, что проверять алфавит разумно было бы в init в обязательном порядке.

zmitserbio · 2024-03-09T22:42:07Z

+    ALPHABET = {"A", "C", "D", "E", "F", "G", "H", "I","K", "L", 
+                "M", "N","P", "Q", "R", "S", "T", "V", "W", "Y"}


Suggested change

ALPHABET = {"A", "C", "D", "E", "F", "G", "H", "I","K", "L",

"M", "N","P", "Q", "R", "S", "T", "V", "W", "Y"}

ALPHABET = {"A", "C", "D", "E", "F", "G", "H", "I", "K", "L",

"M", "N", "P", "Q", "R", "S", "T", "V", "W", "Y"}

Стоит добавить пробелы. Помимо этого, есть еще некоторое количество пробелов на концах строк и \t в пустых строках, но я полагаю, не имеет смысла на каждом останавливаться. Стоит проверять код линтером, так можно выявить те огрехи, которые невооруженному глазу плохо заметны.

anisssum

Особенно хорошо, что вы использовали абстрактные классы и соблюдали принципы ООП. Ваш код довольно лаконичен и легко читаем, что является плюсом.
В целом, код отлично справляется с поставленными задачами, и замечания касаются мелких деталей и стиля. Отличная работа!

anisssum · 2024-03-10T06:17:55Z

+
+    def gc_content(self):
+        gc_count = self.seq.count('G') + self.seq.count('C')
+        return gc_count / len(self.seq) * 100


Можно учесть деление на 0.

Верно подмечено

anisssum · 2024-03-10T06:19:10Z

+    if output_filename is None:
+        output_filename = input_path.split("/")[-1].split(".")[0] + "_filtered.fastq"
+
+    output_path = "fastq_filtrator_results/" + output_filename


Отличный вариант с добавлением папки для результатов. Но можно проверить ее существование перед созданием.

Это очень очень хорошее замечание. Папку при чем можно было бы делать где нибудь отдельно в начале специальной функцией для создания папок

Add CEP170B.py

6992e38

zmitserbio reviewed Mar 9, 2024

View reviewed changes

anisssum reviewed Mar 10, 2024

View reviewed changes

	for record in records:
	for i, record in enumerate(SeqIO.parse(input_path, "fastq")):

		print(f"Filtered sequences saved to {output_path}")


		class BiologicalSequence(ABC):



		class BiologicalSequence(ABC):
		def __init__(self, seq: str = None):

		ALPHABET = {"A", "C", "D", "E", "F", "G", "H", "I","K", "L",
		"M", "N","P", "Q", "R", "S", "T", "V", "W", "Y"}

Conversation

nvaulin commented Feb 26, 2024

Uh oh!

zmitserbio left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zmitserbio Mar 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anisssum left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zmitserbio Mar 9, 2024 •

edited

Loading