Skip to content

data sources are sometimes lost when annotating glycosylations #279

@mtiberti

Description

@mtiberti

e.g. with KCNK1

from cancermuts.datasources import UniProt

# create the corresponding uniprot object
up = UniProt()

# alternatively, we can specifically ask for a Uniprot ID
seq = up.get_sequence('KCNK1', upid='KCNK1_HUMAN')

from cancermuts.datasources import PhosphoSite, dbPTM, GlyGen, MobiDB, NetPhos

# add annotations from PhosphoSite
ps = PhosphoSite('/data/databases/phosphosite/')
ps.add_position_properties(seq)

print(seq.positions[4].properties)
print(seq.positions[28].properties)

# add annotations from dbPTM
db = dbPTM('/data/databases/dbPTM/')
db.add_position_properties(seq)

# add annotations from GlyGen
gg = GlyGen(
    '/data/databases/GlyGen/',
    database_file='human_proteoform_glycosylation_sites_uniprotkb_filtered.csv'
)
gg.add_position_properties(seq)

# add annotations from NetPhos
np = NetPhos('/data/databases/netphos_human_proteome/netphos_human_isoforms/raw/')
np.add_position_properties(seq)

# save table
from cancermuts.table import Table

tbl = Table()

df = tbl.to_dataframe(seq)

we have entries in both dbPTM and GlyGen:

$ grep O00180 /data/databases/dbPTM/N-linked_Glycosylation
KCNK1_HUMAN	O00180	95	N-linked Glycosylation	11053038;8978667	ASNYGVSVLSNASGNWNWDFT
$ grep O00180 /data/databases/GlyGen/human_proteoform_glycosylation_sites_uniprotkb_filtered.csv
O00180-1,95,Asn,,N-linked,protein_xref_uniprotkb_gly,O00180,protein_xref_uniprotkb_gly,O00180,N-linked (GlcNAc...) asparagine,PubMed,ECO_0000269,,GlcNAc...,,NAS,NXS,95,95,Asn,Asn,N
O00180-1,95,Asn,,N-linked,protein_xref_pubmed,8978667,protein_xref_uniprotkb_gly,O00180,N-linked (GlcNAc...) asparagine,PubMed,ECO_0000269,,GlcNAc...,,NAS,NXS,95,95,Asn,Asn,N

however in standard output we get:

Wed May 27 10:37:42 2026 INFO added property <PositionProperty Glycosylation Site from dbPTM>
Wed May 27 10:37:42 2026 INFO property <PositionProperty Glycosylation Site from dbPTM> was replaced with <PositionProperty Glycosylation Site from GlyGen>
Wed May 27 10:37:42 2026 INFO property <PositionProperty Glycosylation Site from GlyGen> was replaced with <PositionProperty Glycosylation Site from GlyGen>

in the final csv file:

94,95,N,,,,,,,,,Gly,N-GlcNAc,,GlyGen,,,,,,,,,,,,,,,,,,,,,,,,

so we are at least missing the dbPTM source in the output.
N95 is the only Glycosilation site for this protein

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions