Conversation
matthiasblum
left a comment
There was a problem hiding this comment.
The converted XML fails validation against InterProScan XSD:
InterProScan 6 command:
nextflow run ebi-pf-team/interproscan6 \
-r 6.0.0 \
-profile docker,test \
--datadir data \
--interpro 108.0 \
--no-matches-api \
--formats xmltest.faa.xml is created in the working directory.
Convert to IPS5:
python scripts/convert_output.py -i test.faa.xml -o test.faa.i5.xml -xmlGet InterProScan 5 XSD:
wget https://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/schemas/interproscan-model-4.7.xsdValidate:
xmllint --noout --schema interproscan-model-4.7.xsd test.faa.i5.xml 2> xmllint.log|
the defaults that need adding are for CDD matches, score and evalue are mandatory in ips5 also for sites, hmmer3-site tags require children group, hmmEnd and hmmStart, which we don't have in ips6. I believe in both cases perhaps dropping the requirement constraint is the way to go, ips6 should not have additional not required data... |
matthiasblum
left a comment
There was a problem hiding this comment.
the defaults that need adding are for CDD matches, score and evalue are mandatory in ips5
This was a bug, fixed with ebi-pf-team/interproscan6#304.
also for sites, hmmer3-site tags require children group, hmmEnd and hmmStart, which we don't have in ips6.
I believe in both cases perhaps dropping the requirement constraint is the way to go, ips6 should not have additional not required data...
Then please update the Hmmer3Site model so hmmStart and hmmEnd are still reported in the XML/JSON outputs, but are no longer required, then generate the XSD and share it here. group can still be reported as it's basically a 1-based index so it's easy to infer.
|
New XSD can be found at https://drive.google.com/file/d/1cI2a67zGPZn3U6J-8frASTxXcptei25a/view |
matthiasblum
left a comment
There was a problem hiding this comment.
The XML looks good.
Now, for the JSON output, I created a JSON schema for the InterProScan 6 output: https://github.com/ebi-pf-team/interproscan6/blob/schemas/utilities/schemas/schema.json. We need to generate one for InterProScan 5 as well. Ideally, we would have a Maven plugin generating the JSON schema like one is generating the XSD, but it may difficult to add one since many of our dependencies are old. It might be faster to adapt my schema for InterProScan 5.
|
I made a JSON schema for the InterProScan 5, based on the existing one: |
|
With a interproscan6 test.faa.json output file. Convert to IPS5:
With the IPS5 schema.json, and python check-jsonschema installed Validate: should output: I tried the |
https://embl.atlassian.net/browse/IBU-11047
scripts/convert_output.py added to convert IPS6 xml or json output to IPS5 format.
Usage is: python convert_output.py [-h] -i INPUT -o OUTPUT (-xml | -json)
The json formats are quite similar, one difference of note is the converted json contains cigarAlignment on locations while original IPS5 json does not (let it in but can be removed if required).
The xml formats are quite different, needing renaming of match, location, fragment and site tags depending of application.