Skip to content

Added homework on pandas#5

Open
ibragimovaamina wants to merge 1 commit into
mainfrom
homework_pandas
Open

Added homework on pandas#5
ibragimovaamina wants to merge 1 commit into
mainfrom
homework_pandas

Conversation

@ibragimovaamina

Copy link
Copy Markdown
Owner

No description provided.

@krglkvrmn krglkvrmn left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Всё круто, жалко, что без второй части
image

Comment on lines +16 to +17
rrna_gff_df = read_gff('data/rrna_annotation.gff')
alignment_bed_df = read_bed('data/alignment.bed')

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Если код подразумевает работу с данными из каких-то файлов, то эти файлы обязательно нужно класть в репозиторий и указывать относительный путь до них. Исключение только одно - слишком большие файлы, в таком случае можно заливать их маленький кусочек.

# Function for reading gff files
def read_gff(path_to_gff):
gff_header = ['chromosome', 'source', 'type', 'start', 'end', 'score', 'strand', 'phase', 'attributes']
return pd.read_csv(path_to_gff, sep='\t', names=gff_header, comment = '#')

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return pd.read_csv(path_to_gff, sep='\t', names=gff_header, comment = '#')
return pd.read_csv(path_to_gff, sep='\t', names=gff_header, comment='#')

rrnas_by_types = pd.DataFrame({'count' : rrna_gff_df.groupby(['chromosome','attributes']).size()}).reset_index()

# Merging gff and bed files
merged_df = pd.merge(rrna_gff_df, alignment_bed_df, how='outer', left_on=['chromosome'], right_on=['chromosome'])

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Можно чуть проще

Suggested change
merged_df = pd.merge(rrna_gff_df, alignment_bed_df, how='outer', left_on=['chromosome'], right_on=['chromosome'])
merged_df = pd.merge(rrna_gff_df, alignment_bed_df, how='outer', on='chromosome')

plt.xticks(rotation=90, size=10);

# Extracting rRNAs which intersect with alignment
rrnas_align_intersect = merged_df[(merged_df['start_x'] >= merged_df['start_y']) & (merged_df['end_x'] <= merged_df['end_y'])]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Рабочий вариант. Можно ещё через query, ИМХО так чуть лаконичнее

Suggested change
rrnas_align_intersect = merged_df[(merged_df['start_x'] >= merged_df['start_y']) & (merged_df['end_x'] <= merged_df['end_y'])]
rrnas_align_intersect = merged_df.query('start_x >= start_y and end_x <= end_y')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants