#xCurator - Semi-Structured Data to Linked Data
xCurator transforms semi-structured data to linked data by leveraging information from both structure and data of the input sources. It supports both XML and JSON data sources from file, set of files and URLs. It generates mapping file as well as RDF files stored in TDB format.
- Java 1.7 (or newer)
1- Clone the repository (or download the zip file)
git clone --recursive https://github.com/Aleyasen/xcurator.gitIf you don't need to download datasets please clone without --recursive parameter.
2- Run the xcurator.sh (xcurator.bat in windows) in the bin diretory.
| Parameter | Description |
|---|---|
| -d,--dir | Input directory path |
| -f,--file | Input file (xml/json) path |
| -h,--domain | The generated RDFs will have this domain name in their URIs. |
| -m,--mapping-file | The output mapping file. If none then there will be no mapping file output. |
| -o,--output | Directory of the output TDB |
| -t,--type | Type of the input (xml or json). (default: xml) |
| -u,--url | The URL for the source xml |
| -s,--steps | The curation steps (default: DIOFK) |
| -eval,--evaluation | Evalutate the generated mapping file using ground-truth entities and attributes files |
| -e,--ent-file | Ground-truth entity file for evaluation, use only with -eval option |
| -a,--attr-file | Ground-truth attribute file for evaluation, use only with -eval option |
| -v,--verbose | Verbose output |
The curation steps identifier is a string that specify the steps will run on the input data in order. For example, DOF will perform Duplicate Removal, Intra Linking and Schema Flatting in order.
| Step ID | Description |
|---|---|
| D | Duplicate Removal |
| I | Inter Linking |
| O | Intra Linking |
| F | Schema Flatting |
xcurator.bat -d data/dir -m mapping.xml -h http://xyz.comGenerate mapping.xml for the set of XML files in the data/dir directory.
xcurator.bat -f sample.json -m mapping.xml -h http://xyz.com -t json -o tdbGenerate mapping.xml and tdb directory for the sample.json input file.
xcurator.bat -f sample.json -m mapping.xml -h http://xyz.com -s DFGenerate mapping.xml for the sample.json input file by running only Duplicat Removal and Schema Flatting steps.
xcurator.bat -eval -m mapping.xml -e gEntities.txt -a gAttributes.txt -vEvaluate mapping.xml file using the provided ground-truth entity and attribute files (gEntities.txt and gAttributes.txt). The -v enable verbose output.