Skip to content

IPS6 to IPS5 output converter#432

Open
tgrego wants to merge 9 commits intodevelopfrom
ips6converter
Open

IPS6 to IPS5 output converter#432
tgrego wants to merge 9 commits intodevelopfrom
ips6converter

Conversation

@tgrego
Copy link
Copy Markdown
Contributor

@tgrego tgrego commented Feb 24, 2026

https://embl.atlassian.net/browse/IBU-11047

scripts/convert_output.py added to convert IPS6 xml or json output to IPS5 format.

Usage is: python convert_output.py [-h] -i INPUT -o OUTPUT (-xml | -json)

The json formats are quite similar, one difference of note is the converted json contains cigarAlignment on locations while original IPS5 json does not (let it in but can be removed if required).

The xml formats are quite different, needing renaming of match, location, fragment and site tags depending of application.

@tgrego tgrego requested a review from matthiasblum February 27, 2026 09:50
Copy link
Copy Markdown
Contributor

@matthiasblum matthiasblum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The converted XML fails validation against InterProScan XSD:

InterProScan 6 command:

nextflow run ebi-pf-team/interproscan6 \
    -r 6.0.0 \
    -profile docker,test \
    --datadir data \
    --interpro 108.0 \
    --no-matches-api \
    --formats xml

test.faa.xml is created in the working directory.

Convert to IPS5:

python scripts/convert_output.py -i test.faa.xml -o test.faa.i5.xml -xml

Get InterProScan 5 XSD:

wget https://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/schemas/interproscan-model-4.7.xsd

Validate:

xmllint --noout --schema interproscan-model-4.7.xsd test.faa.i5.xml 2> xmllint.log

xmllint.log

@tgrego
Copy link
Copy Markdown
Contributor Author

tgrego commented Mar 13, 2026

the defaults that need adding are for CDD matches, score and evalue are mandatory in ips5

also for sites, hmmer3-site tags require children group, hmmEnd and hmmStart, which we don't have in ips6.

I believe in both cases perhaps dropping the requirement constraint is the way to go, ips6 should not have additional not required data...

@tgrego tgrego requested a review from matthiasblum March 16, 2026 16:05
Copy link
Copy Markdown
Contributor

@matthiasblum matthiasblum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the defaults that need adding are for CDD matches, score and evalue are mandatory in ips5

This was a bug, fixed with ebi-pf-team/interproscan6#304.

also for sites, hmmer3-site tags require children group, hmmEnd and hmmStart, which we don't have in ips6.

I believe in both cases perhaps dropping the requirement constraint is the way to go, ips6 should not have additional not required data...

Then please update the Hmmer3Site model so hmmStart and hmmEnd are still reported in the XML/JSON outputs, but are no longer required, then generate the XSD and share it here. group can still be reported as it's basically a 1-based index so it's easy to infer.

@tgrego
Copy link
Copy Markdown
Contributor Author

tgrego commented Mar 23, 2026

New XSD can be found at https://drive.google.com/file/d/1cI2a67zGPZn3U6J-8frASTxXcptei25a/view
only change is the drop of requirement for hmmStart hmmEnd and group elements in Hmmer3Site elements.

@tgrego tgrego requested a review from matthiasblum March 23, 2026 11:21
Copy link
Copy Markdown
Contributor

@matthiasblum matthiasblum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The XML looks good.

Now, for the JSON output, I created a JSON schema for the InterProScan 6 output: https://github.com/ebi-pf-team/interproscan6/blob/schemas/utilities/schemas/schema.json. We need to generate one for InterProScan 5 as well. Ideally, we would have a Maven plugin generating the JSON schema like one is generating the XSD, but it may difficult to add one since many of our dependencies are old. It might be faster to adapt my schema for InterProScan 5.

@tgrego
Copy link
Copy Markdown
Contributor Author

tgrego commented Mar 24, 2026

I made a JSON schema for the InterProScan 5, based on the existing one:
https://drive.google.com/file/d/1CbNptvcPPJd-pqVYd7uwNH7kNoLHpJpV/view

@tgrego
Copy link
Copy Markdown
Contributor Author

tgrego commented Mar 24, 2026

With a interproscan6 test.faa.json output file.

Convert to IPS5:

python scripts/convert_output.py -i test.faa.json -o test.faa.i5.json -json

With the IPS5 schema.json, and python check-jsonschema installed

Validate:
check-jsonschema --schemafile schema.json test.faa.i5.json

should output: ok -- validation done

I tried the test_all_appl.fasta directly from interproscan5, and interproscan6 converted output. Both pass the validation against that slightly modified schema.json.

@tgrego tgrego requested a review from matthiasblum March 27, 2026 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants