Review TRAF4 by nvaulin · Pull Request #32 · Python-BI-2023/Peer_review

nvaulin · 2024-02-26T17:57:27Z

Review TRAF4

nastasia-iv

Хороший рабочий код 😊
Есть небольшие расхождения с условием заданий, внесла парочку исправлений и предложений.
Но в общем и целом всё гуд, молодец!)

nastasia-iv · 2024-02-29T18:26:37Z

+    pass
+
+
+class BiologicalSequence(str):


В этой дз такое вроде было разрешено, но в целом лучше не наследоваться от встроенных типов данных. Подробнее об этом можно посмотреть в консультации от 28 февраля (примерно 28-я минута)

Suggested change

class BiologicalSequence(str):

class BiologicalSequence():

nastasia-iv · 2024-02-29T18:29:29Z

+            'g': 'c', 'G': 'C',
+            'c': 'g', 'C': 'G'
+        }
+        if 'U' in self.sequence.upper():


Здорово, что этот биологический момент предусмотрен!

nastasia-iv · 2024-02-29T18:32:39Z

+        super().__init__(sequence)
+
+    def complement(self):
+        if self.complement_dict == None:


С None лучше использовать is, а не ==:

Suggested change

if self.complement_dict == None:

if self.complement_dict is None:

nastasia-iv · 2024-02-29T18:38:04Z

+
+    def check_seq(self):
+        valid_nucleotide_symbols = {'A', 'C', 'G', 'T', 'U'}
+        valid_prot_symbols = {'A','C','D','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y'}


По PEP-8 здесь стоит поставить пробелы после запятых :) Также можно было бы немного разнести символы по строкам, это бы улучшило восприятие кода

nastasia-iv · 2024-02-29T18:47:36Z

+        't': 'u', 'T': 'U',
+        'g': 'g', 'G': 'G',
+        'c': 'c', 'C': 'C'
+    }


Замечательно, что здесь подумали о читабельности кода и вывели элементы словаря на разных строках :)
Потерялась табуляция, возвращаю:

Suggested change

}

transcription_dict = {

'a': 'a', 'A': 'A',

't': 'u', 'T': 'U',

'g': 'g', 'G': 'G',

'c': 'c', 'C': 'C'

}

(по PEP-8 закрывающая скобка идёт на уровне с названием переменной)

nastasia-iv · 2024-02-29T19:18:39Z

+        return f'AminoAcidSequence("{self.sequence}")'
+
+
+


Затесалась лишняя строка

Suggested change

nastasia-iv · 2024-03-03T13:32:03Z

+        elif seq_symbols.issubset(valid_prot_symbols):
+            print('AA sequence')
+        else:
+            raise ValueError('Incorrect sequence input!')


Мне кажется, здесь тоже было бы здорово вызвать кастомную ошибку, тем более что одна из таких у вас уже определена в начале скрипта

nastasia-iv · 2024-03-05T12:40:00Z

+    def __init__(self, sequence):
+        self.sequence = sequence
+
+    def check_seq(self):


Насколько я помню, BiologicalSequence должен был содержать только абстрактные методы, которые уже переопределялись бы в дочерних классах

Все так

nastasia-iv · 2024-03-05T13:02:58Z

+        't': 'u', 'T': 'U',
+        'g': 'g', 'G': 'G',
+        'c': 'c', 'C': 'C'
+    }


Тут получается, что метод transcribe создает словарь transcription_dict каждый раз, когда вызывается метод. Чтобы этого избежать, можно было бы создать этот словарь как атрибут класса, т.е. просто поставить его перед __init__

nastasia-iv · 2024-03-05T13:16:09Z

+        - average phred scores, not less than specified
+        Default output file name 'filtered.fastq'
+    """
+    records = list(SeqIO.parse(input_path, 'fastq'))


В подобных случаях лучше обрабатывать записи в цикле, а не загружать всё целиком (на больших файлах может не хватить памяти):

Suggested change

records = list(SeqIO.parse(input_path, 'fastq'))

for record in SeqIO.parse(input_path, "fastq"):

...

iliapopov17

В целом мне всё понравилось!
Очень хороший код, всё супер классно.
Возможно, мои отдельные комментарии ощущались как-то душно, но я сам всё ещё смешарик в питоне, и специально для ревью сидел и вчитывался во всё что можно...

iliapopov17 · 2024-03-10T09:18:59Z

+        super().__init__(sequence)
+
+    def complement(self):
+        if self.complement_dict == None:


Хорошо

if self.complement_dict == None:

Лучше

if self.complement_dict is None:

Suggested change

if self.complement_dict == None:

if self.complement_dict is None:

iliapopov17 · 2024-03-10T09:22:36Z

+from Bio.SeqUtils import gc_fraction
+
+
+class NuclAcidnucleotideError(ValueError):


Класс NuclAcidnucleotideError:
Этот класс представляет собой пользовательскую ошибку для классов, связанных с нуклеиновыми кислотами. Очень хорошо! Однако, название класса немного запутанное + не соответствует CamelCase. Можно сделать его более ясным. Например, NucleotideError или InvalidNucleotideError.

Suggested change

class NuclAcidnucleotideError(ValueError):

class InvalidNucleotideError(ValueError):

iliapopov17 · 2024-03-10T09:28:33Z

+    def gravy(self):
+        """Calculate GRAVY (grand average of hydropathy) value"""
+        gravy_aa_values = {'L': 3.8,
+                           'K': -3.9,
+                           'M': 1.9,
+                           'F': 2.8,
+                           'P': -1.6,
+                           'S': -0.8,
+                           'T': -0.7,
+                           'W': -0.9,
+                           'Y': -1.3,
+                           'V': 4.2,
+                           'A': 1.8,
+                           'R': -4.5,
+                           'N': -3.5,
+                           'D': -3.5,
+                           'C': 2.5,
+                           'Q': -3.5,
+                           'E': -3.5,
+                           'G': -0.4,
+                           'H': -3.2,
+                           'I': 4.5}
+        gravy_aa_sum = 0
+        for amino_ac in self.sequence.upper():
+            gravy_aa_sum += gravy_aa_values[amino_ac]
+        return round(gravy_aa_sum / len(self.sequence), 3)


Можно использовать функцию sum с генератором, чтобы упростить код.

Suggested change

def gravy(self):

"""Calculate GRAVY (grand average of hydropathy) value"""

gravy_aa_values = {'L': 3.8,

'K': -3.9,

'M': 1.9,

'F': 2.8,

'P': -1.6,

'S': -0.8,

'T': -0.7,

'W': -0.9,

'Y': -1.3,

'V': 4.2,

'A': 1.8,

'R': -4.5,

'N': -3.5,

'D': -3.5,

'C': 2.5,

'Q': -3.5,

'E': -3.5,

'G': -0.4,

'H': -3.2,

'I': 4.5}

gravy_aa_sum = 0

for amino_ac in self.sequence.upper():

gravy_aa_sum += gravy_aa_values[amino_ac]

return round(gravy_aa_sum / len(self.sequence), 3)

def gravy(self):

gravy_aa_values = {'L': 3.8, 'K': -3.9, 'M': 1.9, ...}

return round(sum(gravy_aa_values[amino_ac] for amino_ac in self.sequence.upper()) / len(self.sequence), 3)

iliapopov17 · 2024-03-10T09:30:01Z

+    for count, record in enumerate(records):
+        gc_percent = gc_fraction(record.seq) * 100
+        if min_gc <= gc_percent <= max_gc:
+            filtered_1_gc_idxs.append(count)


Можно использовать более читаемую конструкцию for record in records вместо for count, record in enumerate(records).

Suggested change

for count, record in enumerate(records):

gc_percent = gc_fraction(record.seq) * 100

if min_gc <= gc_percent <= max_gc:

filtered_1_gc_idxs.append(count)

for record in records:

gc_percent = gc_fraction(record.seq) * 100

if min_gc <= gc_percent <= max_gc:

filtered_1_gc_idxs.append(record)

iliapopov17 · 2024-03-10T09:31:44Z

+def make_thresholds(threshold: int | float | tuple) -> tuple:
+    """Check thresholds input and convert single value to tuple"""
+    if isinstance(threshold, int) or isinstance(threshold, float):
+        lower = 0
+        upper = threshold
+    else:
+        lower = threshold[0]
+        upper = threshold[1]
+    return lower, upper


Можно изменить make_thresholds для более явного понимания его назначения.

Suggested change

def make_thresholds(threshold: int | float | tuple) -> tuple:

"""Check thresholds input and convert single value to tuple"""

if isinstance(threshold, int) or isinstance(threshold, float):

lower = 0

upper = threshold

else:

lower = threshold[0]

upper = threshold[1]

return lower, upper

def make_thresholds(threshold: int | float | tuple) -> tuple:

if isinstance(threshold, (int, float)):

return 0, threshold

elif isinstance(threshold, tuple):

return threshold

else:

raise ValueError('Invalid threshold input')

iliapopov17 · 2024-03-10T09:33:00Z

+        return f'AminoAcidSequence("{self.sequence}")'
+
+
+


Лишняя строчка

Suggested change

iliapopov17 · 2024-03-10T09:36:28Z

+    for idx in filtered_3_phred_idxs:
+        filtered_results.append(records[idx])


Можно использовать генераторное выражение для создания списка filtered_results.

Suggested change

for idx in filtered_3_phred_idxs:

filtered_results.append(records[idx])

filtered_results = [records[idx] for idx in filtered_3_phred_idxs]

iliapopov17 · 2024-03-10T09:39:14Z

+    def complement(self):
+        if self.complement_dict == None:
+            raise NotImplementedError('It is a basic NA class. You should implement it for descendant class: DNASequence or RNASequence.')
+        result = type(self)(''.join([self.complement_dict[nuc] for nuc in self.sequence]))
+        return result


В методе complement можно использовать метод .get для получения значения из словаря. Это позволит избежать ошибок, если нуклеотид отсутствует в словаре.

Suggested change

def complement(self):

if self.complement_dict == None:

raise NotImplementedError('It is a basic NA class. You should implement it for descendant class: DNASequence or RNASequence.')

result = type(self)(''.join([self.complement_dict[nuc] for nuc in self.sequence]))

return result

def complement(self):

if not all(nuc in self.complement_dict for nuc in self.sequence):

raise NuclAcidnucleotideError('Invalid nucleotide in the sequence')

result = type(self)(''.join([self.complement_dict.get(nuc, '') for nuc in self.sequence]))

return result

iliapopov17 · 2024-03-10T09:42:05Z

+class AminoAcidSequence(BiologicalSequence):
+    def gravy(self):
+        """Calculate GRAVY (grand average of hydropathy) value"""
+        gravy_aa_values = {'L': 3.8,
+                           'K': -3.9,
+                           'M': 1.9,
+                           'F': 2.8,
+                           'P': -1.6,
+                           'S': -0.8,
+                           'T': -0.7,
+                           'W': -0.9,
+                           'Y': -1.3,
+                           'V': 4.2,
+                           'A': 1.8,
+                           'R': -4.5,
+                           'N': -3.5,
+                           'D': -3.5,
+                           'C': 2.5,
+                           'Q': -3.5,
+                           'E': -3.5,
+                           'G': -0.4,
+                           'H': -3.2,
+                           'I': 4.5}
+        gravy_aa_sum = 0
+        for amino_ac in self.sequence.upper():
+            gravy_aa_sum += gravy_aa_values[amino_ac]
+        return round(gravy_aa_sum / len(self.sequence), 3)


Константу - словарь gravy_aa_values можно определить в начале класса, что поможет улучшить их видимость и управление.

Suggested change

class AminoAcidSequence(BiologicalSequence):

def gravy(self):

"""Calculate GRAVY (grand average of hydropathy) value"""

gravy_aa_values = {'L': 3.8,

'K': -3.9,

'M': 1.9,

'F': 2.8,

'P': -1.6,

'S': -0.8,

'T': -0.7,

'W': -0.9,

'Y': -1.3,

'V': 4.2,

'A': 1.8,

'R': -4.5,

'N': -3.5,

'D': -3.5,

'C': 2.5,

'Q': -3.5,

'E': -3.5,

'G': -0.4,

'H': -3.2,

'I': 4.5}

gravy_aa_sum = 0

for amino_ac in self.sequence.upper():

gravy_aa_sum += gravy_aa_values[amino_ac]

return round(gravy_aa_sum / len(self.sequence), 3)

class AminoAcidSequence(BiologicalSequence):

GRAVY_AA_VALUES = {'L': 3.8, 'K': -3.9, ...}

def gravy(self):

return round(sum(AminoAcidSequence.GRAVY_AA_VALUES[amino_ac] for amino_ac in self.sequence.upper()) / len(self.sequence), 3)

icalledmyselfmoon · 2024-03-10T12:27:47Z

+        self.sequence = sequence
+
+    def check_seq(self):
+        valid_nucleotide_symbols = {'A', 'C', 'G', 'T', 'U'}


Как я поняла из задания, проверки необходимо делать в дочерних классах, это абстрактный класс

icalledmyselfmoon · 2024-03-10T12:29:00Z

+        else:
+            raise ValueError('Incorrect sequence input!')
+
+    def __str__(self):


сначала вы насладитесь от str, а потом пишете метод def str. как я понимаю, тут наследование от str- лишнее тогда

В целом нет, __str__ же задает правило отображения, мы вполне можем хотеть их переопределить

icalledmyselfmoon · 2024-03-10T12:32:04Z

+            'c': 'g', 'C': 'G'
+        }
+        if 'T' in self.sequence.upper():
+            raise NuclAcidnucleotideError('T-contain sequence is not proper RNA sequence')


молодец, что тут и раннее пишешь кастомные ошибки. очень хорошо вышло

icalledmyselfmoon · 2024-03-10T12:33:16Z

+            raise NuclAcidnucleotideError('U-contain sequence is not proper DNA sequence')
+
+    def transcribe(self):
+        transcription_dict = {


словарь лучше бы вынести за функцию, в начало класса

icalledmyselfmoon · 2024-03-10T12:35:12Z

+
+class NucleicAcidSequence(BiologicalSequence):
+    def __init__(self, sequence):
+        self.complement_dict = None


отличный подход с complement_dict = None в родительском классе и его последующей спецификации в дочерних классах

icalledmyselfmoon · 2024-03-10T12:37:10Z

+            raise NuclAcidnucleotideError('U-contain sequence is not proper DNA sequence')
+
+    def transcribe(self):
+        transcription_dict = {


может быть в таком случае, можно было не создавать словарь, а изменить всего одну букву. это кажется более оптимальным здесь

icalledmyselfmoon · 2024-03-10T12:38:40Z

+def make_thresholds(threshold: int | float | tuple) -> tuple:
+    """Check thresholds input and convert single value to tuple"""
+    if isinstance(threshold, int) or isinstance(threshold, float):
+        lower = 0
+        upper = threshold
+    else:
+        lower = threshold[0]
+        upper = threshold[1]
+    return lower, upper


хорошо, что вы нашли такой удобный и универсальный подход через функцию к тому, что границы могут по разному задаваться

icalledmyselfmoon · 2024-03-10T12:44:32Z

+    for count, record in enumerate(records):
+        gc_percent = gc_fraction(record.seq) * 100
+        if min_gc <= gc_percent <= max_gc:
+            filtered_1_gc_idxs.append(count)
+
+    for idx in filtered_1_gc_idxs:
+        if min_len <= len(records[idx].seq) <= max_len:
+            filtered_2_len_idxs.append(idx)
+
+    for idx in filtered_2_len_idxs:
+        phred_values = records[idx].letter_annotations['phred_quality']
+        if sum(phred_values) / len(phred_values) >= quality_threshold:
+            filtered_3_phred_idxs.append(idx)
+
+    for idx in filtered_3_phred_idxs:
+        filtered_results.append(records[idx])


все логично, но очень громоздко. как известно, чем проще и короче, тем легче читается код и быстрее выполняется, можно было бы записать каждый из фильтров как свою функцию, а потом проверить для каждой последовательности выполнение трех условий (вывод каждой функции True/False).

но твой вариант тоже рабочий, это главное

nvaulin added 2 commits February 26, 2024 20:50

Add TRAF4.py

e89aa04

Add TRAF4.py

5ef053d

nastasia-iv reviewed Mar 8, 2024

View reviewed changes

iliapopov17 reviewed Mar 10, 2024

View reviewed changes

icalledmyselfmoon reviewed Mar 10, 2024

View reviewed changes

	if self.complement_dict == None:
	if self.complement_dict is None:

	records = list(SeqIO.parse(input_path, 'fastq'))
	for record in SeqIO.parse(input_path, "fastq"):
	...

		from Bio.SeqUtils import gc_fraction


		class NuclAcidnucleotideError(ValueError):

	class NuclAcidnucleotideError(ValueError):
	class InvalidNucleotideError(ValueError):

		for idx in filtered_3_phred_idxs:
		filtered_results.append(records[idx])

	for idx in filtered_3_phred_idxs:
	filtered_results.append(records[idx])
	filtered_results = [records[idx] for idx in filtered_3_phred_idxs]

Conversation

nvaulin commented Feb 26, 2024

Uh oh!

nastasia-iv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iliapopov17 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Хорошо

Лучше

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants