-
Notifications
You must be signed in to change notification settings - Fork 5
Description
From OSL page:
I performed a search, pretty much any search, and go the the Browse window. In the "H Bond" column, way over on the right, the value is always listed as "--". If I dump the hits and look at the raw numbers I see that the value for "H Bond" is always equal to "0.0". I've never used "H Bond" but the documentation on the PGD site says that this is the energy of the hydrogen bond calculated by DSSP. I'm guessing that the parser for DSSP in Splicer is not picking up the value, or our DSSP is always writing out "0.0" for some reason.
If DSSP no longer produces a hydrogen bond energy, the value should not be reported in the Browser table nor dumped. If it does produce a value it should be loaded into the database.
This is not a huge problem. If there is a very cheap solution we should go for it.
History
#1 Updated by Jack Twilley 10 months ago
My primary concern about removing the entire column lies in breaking any tools that someone has made to process the data.
If that's not a problem, then I'm happy to make a quick change that removes the column from the database, data dumps, and statistics.
Are there any other columns that should be removed while we're at it?
#2 Updated by Jack Twilley 10 months ago
If you want to keep the values and make them real, then that's a different story.
The current version of BioPython returns the following information from the DSSP program's output:
dssp[(chainid, res_id)] = (aa, ss, acc, phi, psi, dssp_index, NH_O_1_relidx, NH_O_1_energy, O_NH_1_relidx, O_NH_1_energy, NH_O_2_relidx, NH_O_2_energy, O_NH_2_relidx, O_NH_2_energy)
The first five elements are used by ProcessPDBTask.py but none of the relidx or energy values are used. Instead, the code explicitly sets res_dict['h_bond_energy'] to 0.00 -- as if someone added the column, but never got around to adding the equation to fill it with data.
In terms of technical work, it's easier to make the h_bond_energy value real than to remove it entirely, especially if the equation for the value is as simple as summing the energy values or something to that effect.
#3 Updated by Dale Tronrud - 10 months ago
It turns out the original plan for "H Bond" is no longer relevant. Instead of removing this column from the database we would like to re-purpose it.
The "H Bond" column should be renamed "Accessibility", which is short for "solvent accessibility". The name would have to be changed in the data base schema as well as the "Browse" page and the "Dump" output. Splicer should populate this column using the data in the "ACC" column of the output of DSSP. DSSP writes this value as an integer, but its units are square Angstrom and the PGD should store it as a Real.
If there is documentation somewhere that describes "H Bond" it should be updated to describe "Accessibility" instead.
While it would be nice to filter and plot based on accessibility these functions are not required at this time.