-
Notifications
You must be signed in to change notification settings - Fork 2
ConfigurationFile
The iBench configuration file controls the metadata and data generation of iBench. A documented configuration file template can be found in exampleConfigurations/configTemplate.txt. Here we just give a brief overview of the different types of parameters.
These options govern what type of metadata / data is generated, where the output is stored and so on. Typically, iBench generates multiple files to store the generated metadata and data. These options control naming of these files, what files to output, and where to store them. The following files may be produced.
- Mapping Scenario Metadata XML File - This file stores all the generated metadata: schemas, constraints, correspondences, mappings, and information about the location of data files. This is the main output file generated by iBench and can also be used to load the generated schema and data into a database supported by iBench (PostgreSQL)
- Source and Target Schema XML Schemas - XMLSchemas modelling the source and target schema
- ToXgene configuration file - Configuration file for the ToXgene data generator
- Source Instance XML file - XML file storing data generated for the source schema using ToXgene
- HTML mapping and schema files - HTML files describing the generated schemas and mappings
- Clio files (.mapjob, .xsml) - files for generating schema mappings from the schemas and schema matches generated by iBench using the IBM Clio data exchange system
The following options control where metadata and data files are stored:
# relative path for storing metadata related files
SchemaPathPrefix=metadataOut
# relative path for storing data files
InstancePathPrefix=dataOut
The following options control what files are produced.
- Generate HTML files describing the generated schema and mappings.
OutputOption.HTMLSchemas = false
OutputOption.HTMLMapping = false
- Generate CSV files storing the source data.
OutputOption.Data = true
- Generate CSV files storing target data (independent of the source instance).
OutputOption.EnableTargetData = true
- Generate XMLSchema files modelling the source and target schema
OutputOption.XMLSchemas = true
- Generate the main mapping scenario metadata XML file
OutputOption.TrampXML = true
- Generate the
.xsmland.mapjobfiles used by Clio
OutputOption.Clio = true
The following options covern which types of information are store in the generated mapping scenario metadata XML file.
- Generate correspondences aka schema matches
TrampXMLOutput.Correspondences = true
- Generate SQL code implementing the schema mapping generated by iBench
TrampXMLOutput.Transformations = true
- Generate a section storing where the generated data is located
TrampXMLOutput.Data = true
- Generate a section storing database connection information (that section is used when loading the generated schema and data into a database).
TrampXMLOutput.ConnectionInfo = true
- Generate functional dependencies
TrampXMLOutput.FDs = true
These options control the names of files generated by iBench.
- The file name of the mapping scenario metadata XML file
FileNames.Schemas=metadata.xml
- Names of the XML Schema files modelling the source and target schema
FileNames.SourceSchema = sourceSchema.xsd
FileNames.TargetSchema = targetSchema.xsd
- Name of the configuration file for the ToXgene data generator.
FileNames.SourceInstance=dataGen.tsl
- Name of the XML file storing the generated source data (omit the
.xmlprefix)
FileNames.SourceDocumentName=sourceInst
Below is a copy of the template configuration file exampleConfigurations/configTemplate.txt
################################################################################
# IBENCH CONFIGURATION FILE #
################################################################################
# This is a Java properties file, i.e., options are represented as key-value
# pairs separated by "=". Comment lines start with a "hash" character (like
# this line).
################################################################################
################################################################################
# Output path prefixes and file names
################################################################################
# relative path for storing metadata related files
SchemaPathPrefix=templateOut
# relative path for storing data files
InstancePathPrefix=templateOut
# Source and target schema xml file name
# FileNames.SourceSchema = sourceSchema.xml
# FileNames.TargetSchema = targetSchema.xml
# Name for the data generators configuration file
# FileNames.SourceInstance=dataGen.tsl
# Name for the XML file storing source instance data
# FileNames.SourceDocumentName=sourceInst
# Name of the metadata XML file storing schemas, constraints, and mappings
# FileNames.Schemas=metadata.xml
################################################################################
# Number of instances for each primitive type. A primitive is a mapping template
# consisting of source and target schema elements, mappings between these schema
# fragments, and other metadata.
################################################################################
# Copies a source relation to the target
Scenarios.COPY = 1
# Horizontally partition a source relation into multiple target relations
Scenarios.HORIZPARTITION = 0
# Copy a relation and create a surrogate key
Scenarios.SURROGATEKEY = 0
# Inverse of vertical partitioning, join multiple vertical fragments from the source to create a target relation
Scenarios.MERGING = 0
# Inverse of horizontal partitioning, i.e., a union of multiple source fragments
Scenarios.FUSION = 0
# Copy a source relation that has a foreign key to itself (e.g., employee with a foreign key mapping each employee to their manager)
Scenarios.SELFJOINS = 0
# Vertically partition a source relation
Scenarios.VERTPARTITION = 0
# Copies a source relation to the target and adds one or more new attributes without correspondence to the source
Scenarios.ADDATTRIBUTE = 0
# Copies a source relation to the target removing one or more new attributes
Scenarios.DELATTRIBUTE = 0
# Combination of ADDATTRIBUTE and DELATTRIBUTE
Scenarios.ADDDELATTRIBUTE = 0
# Vertically partitions a source relation that corresponds to an IsA relationship
Scenarios.VERTPARTITIONISA = 0
# Vertically partitions a source relation that corresponds to an HasA relationship
Scenarios.VERTPARTITIONHASA = 0
# Vertically partitions a source relation that corresponds to an N-to-M relationship
Scenarios.VERTPARTITIONNTOM = 0
# Inverse of vertical partitioning combined wth ADDATTRIBUTE
Scenarios.MERGEADD = 0
# Vertically partitions a source relation that corresponds to several independent star-shaped relationships
Scenarios.VERTPARTITIONISAAUTHORITY = 0
################################################################################
# ConfigOptions, configuration options that control the metadata and data
# generation. ConfigOptions.X sets the mean and ConfigOptionsDeviation.X
# the variance of a standard distribution. Whenever an option has to be applied
# a new value is sampled from this normal distribution.
################################################################################
# Number of attributes per generated relation
ConfigOptions.NumOfSubElements = 5
# Number of attributes added by primitives that delete attributes
ConfigOptions.NumOfNewAttributes = 1
# Number of attributes deleted by primitives that delete source attributes
ConfigOptions.NumOfAttributesToDelete = 1
# Number of relations joined by a mapping (this is also used for other subdivisions by mappings that do not join, e.g., number of horizontal partitions)
ConfigOptions.JoinSize = 2
# Number of parameters to functions
ConfigOptions.NumOfParamsInFunctions = 1
# Number of attributes in the primary key of a relation
ConfigOptions.PrimaryKeySize = 1
# Number of attributes from each input compared by join conditions, e.g., for 2 the condition may be A=C AND B=D
ConfigOptions.NumOfJoinAttributes = 2
# Star is 0, chain is 1, random is 2
ConfigOptions.JoinKind = 0
########################################
# Controls the amount of sharing of
# relations across primitives
# After NoReuseScenPerc, each primitive has ReuseSourcePerc probability of reusing source relations part of already created primitives
#ConfigOptions.ReuseSourcePerc = 0
# After NoReuseScenPerc, each primitive has ReuseSourcePerc probability of reusing target relations part of already created primitives
#ConfigOptions.ReuseTargetPerc = 0
# Percentage of primitives that are created without any reuse, for the remaining primitives ReuseSourcePerc is applied
#ConfigOptions.NoReuseScenPerc = 100
########################################
#
# Determines how parameters of skolem functions are chosen:
#ConfigOptions.SkolemKind = 1
#
#ConfigOptions.SourceSkolemPerc = 0
#
#ConfigOptions.SourceFDPerc = 0
########################################
# Generate random inclusion dependencies
#ConfigOptions.SourceInclusionDependencyPerc = 0
#ConfigOptions.SourceInclusionDependencyFKPerc = 100
#ConfigOptions.TargetInclusionDependencyPerc = 0
#ConfigOptions.TargetInclusionDependencyFKPerc = 100
# exists is 1 and not exists is 0
#ConfigOptions.SourceCircularInclusionDependency = 0
#ConfigOptions.SourceCircularFK = 0
#ConfigOptions.TargetCircularInclusionDependency = 0
#ConfigOptions.TargetCircularFK = 0
########################################
# Complexity of VP authority primitives
# ConfigOptions.VPAuthorityComplexity = 2
########################################
# Controls whether source data types are propagated to the target based on correspondences
# activate 1 / deactivate 0
# ConfigOptions.PropagateDTsToTarget = 0
# chance (%) of a propagated DT to deviate from the source DT
# ConfigOptions.PropagateDTsChanceOfDeviation = 0
# how much a deviated datatype differs from the original one (max 100)
# ConfigOptions.PropagateDTsDegreeOfDeviation = 10
########################################
# Variance for each of the above parameters
#ConfigOptionsDeviation.NumOfSubElements = 0
#ConfigOptionsDeviation.NumOfNewAttributes = 0
#ConfigOptionsDeviation.NumOfAttributesToDelete = 0
#ConfigOptionsDeviation.JoinSize = 0
#ConfigOptionsDeviation.NumOfParamsInFunctions = 0
#ConfigOptionsDeviation.PrimaryKeySize = 0
#ConfigOptionsDeviation.NumOfJoinAttributes = 0
#ConfigOptionsDeviation.JoinKind = 0
#ConfigOptionsDeviation.ReuseSourcePerc = 0
#ConfigOptionsDeviation.ReuseTargetPerc = 0
#ConfigOptionsDeviation.NoReuseScenPerc = 0
#ConfigOptionsDeviation.SkolemKind = 0
#ConfigOptionsDeviation.SourceSkolemPerc = 0
#ConfigOptionsDeviation.SourceFDPerc = 0
#ConfigOptionsDeviation.SourceInclusionDependencyPerc = 0
#ConfigOptionsDeviation.SourceInclusionDependencyFKPerc = 0
#ConfigOptionsDeviation.TargetInclusionDependencyPerc = 0
#ConfigOptionsDeviation.TargetInclusionDependencyFKPerc = 0
#ConfigOptionsDeviation.SourceCircularInclusionDependency = 0
#ConfigOptionsDeviation.SourceCircularFK = 0
#ConfigOptionsDeviation.TargetCircularInclusionDependency = 0
#ConfigOptionsDeviation.TargetCircularFK = 0
#ConfigOptionsDeviation.PropagateDTsToTarget = 0
#ConfigOptionsDeviation.PropagateDTsChanceOfDeviation = 0
# how much a deviated datatype differs from the original one (max 100)
# ConfigOptions.PropagateDTsDegreeOfDeviation = 10
################################################################################
# User defined primitives (UDP) specification.
################################################################################
# The number of user defined primitives to be loaded
#LoadScenarios.NumScenarios = 1
########################################
# For each user defined primitive, three parameters prefixed with LoadScenarios.i for the ith UDP
# starting from 0. See exampleScenarios/fh.xml for an example of a UDP
########################################
# the TrampXML format file storing the schema, correspondences, and mappings defining the UDP
#LoadScenarios.0.File = exampleScenarios/fh.xml
# A user-defined name for the UDP
#LoadScenarios.0.Name = simpleTest
# how many instances should be created
#LoadScenarios.0.Inst = 10
################################################################################
# User defined data types, i.e., value generator
#
# The user needs to specify the number of such data types.
# For each data type specify its name (any name would do), the fully classified
# class implementing the value generator (a subclass of ToxGene's ToXgeneCdataGenerator),
# and the chance of the value generator being picked for any given attribute in the
# generated source schema
################################################################################
# number of user-defined datatypes to be used
#DataType.NumDataType = 1
########################################
# For each user defined data type (UDT), four parameters need to be provided.
# A user-defined ToXgeneCdataGenerator needs to implement interface ToXgeneCdataGenerator and
# has to have a constructor without parameters.
########################################
# name for the datatype, choosen by user
#DataType.0.Name = myemail
# class implementing a value generator for this datatype
#DataType.0.ClassPath = toxgene.util.cdata.xmark.Emails
# the percentage of attributes that should use this DT (this is interpreted as a probability)
#DataType.0.Percentage = 60.0
# when loading generated data into a database, use this SQL datatype for the column of this UDT
#DataType.0.DBType = TEXT
# if the datatype implementation is provided as a separate jar file, the path to this jar file
#DataType.0.JarFile = exampleDataTypes/mydt.jar
#DataType.1.Name = mybirds
#DataType.1.ClassPath = toxgene.util.cdata.xmark.Birds
#DataType.1.Percentage = 20.0
#DataType.1.DBType = INT8
################################################################################
# CSV Data Types
#
# iBench also supports UDTs which are based on data given a CSV file. The
# user specifies a list of CSV files. The values in each column of a file
# in this list can be turned in to UDT. iBench reads the bag of values from
# the column and interprets it as a probability distribution.
#
# For instance, if the values in a column are ( 0, 1, 1, 2, 2 ) then values
# for the corresponding data type are chosen as follows:
# 0 => with probability 0.2
# 1 => with probability 0.4
# 2 => with probability 0.4
#
################################################################################
# number of CSV files to load
# CSVDataType.NumFiles = 1
# the path to the ith CSV file
# CSVDataType.0.File = zip_codes_states.csv
# number of datatypes to define based on the CSV files
# CSVDataType.NumDataType = 2
########################################
# CSV file this UDT is coming from
# CSVDataType.0.File = zip_codes_states.csv
# Name of the attribute from which the values are read
# CSVDataType.0.AttrName = state
# the percentage of attributes that should use this DT (this is interpreted as a probability)
# CSVDataType.0.Percentage = 20.0
# when loading generated data into a database, use this SQL datatype for the column of this UDT
# CSVDataType.0.DBType = TEXT
################################################################################
# Various additional options
# Random number generator and max values, DataGenerator and MappingLang
################################################################################
# Seed for the random number generator, used for repeatability
RandomSeed = 2
# Number of tuples per relation, if data is generated
RepElementCount = 100
# Maximum length of strings created by the data generator
MaxStringLength = 100
# Maximum numerical value created by the data generator
MaxNumValue = 1000
# Type of data generator (currently only TrampCSV supported)
DataGenerator = TrampCSV
# Type of query generator to use (Postgres, Perm)
QueryGenerator = Postgres
# Mapping language used (currently FO tgds or SO tgds are supported)
# MappingLanguage = FOtgds
# Specify a renamer, a class that renames all generated attribute values (None, AllLowerCase)
# AttrRenamer = None
# Number of independent tuples (not created by data exchange) generated for each target relation
# TargetTableNumRows = 50
# Generate target data as source data exchanged by the generated mappings
# ExchangeTargetData = true
################################################################################
# Optional activation/deactivation of output options
################################################################################
# Generate HTML schema
OutputOption.HTMLSchemas = false
# Generate source data
OutputOption.Data = true
# Generate target data that is independent of the source data
OutputOption.EnableTargetData = false
# Generate XMLSchema schemas for the source and target schemas
OutputOption.XMLSchemas = true
# Generate an HTML description of the source to target mappings
OutputOption.HTMLMapping = false
# Generate TrampXML file, an XML based metadata format storing the generated schemas, mappings, constraints, etc.
OutputOption.TrampXML = true
# Generate a Clio conformant mapping file
OutputOption.Clio = true
################################################################################
# Optional activation/deactivation of parts of the generated Tramp XML document
################################################################################
# Generate correspondences aka schema matches
TrampXMLOutput.Correspondences = true
# Generate transformations implementing the mappings (currently only SQL)
TrampXMLOutput.Transformations = true
# Generate data
TrampXMLOutput.Data = true
# Generate a connection info (allows Tramp tools to connect to a database, e.g., to load a schema)
TrampXMLOutput.ConnectionInfo = false
# Generate functional dependencies
TrampXMLOutput.FDs = false