Skip to content

2. Getting Started

Maxime Costalonga edited this page Oct 21, 2025 · 16 revisions

Marketplace

The marketplace serves as a centralized platform where users can share their datasets and models with others in the SEDIMARK community. The present documentation walks users through the process of creating an identity, browsing the catalogue, and managing offerings within the SEDIMARK Marketplace.

Creating an identity

The first step to interact with the SEDIMARK Marketplace is to create an identity. This identity will be used to publish and consume offerings within the marketplace. A wizard is available to guide users through the creation of an identity, and is accessible via the Register button on the right of the marketplace navigation bar.

image

This wizard consists of three steps:

  1. an introduction to the prerequisites for creating an identity (no user action required),
  2. a form to fill in the identity details (name, description, URLs to public services, etc.),
  3. a summary of the created identity, featuring the user's Decentralized Identifier (DID) and Verifiable Credential (VC).

The form in step 2 requires the following fields to be filled in:

  • a username: this will be the public name of the identity in the marketplace.
  • the self-listing URL: this URL should point to the list of published offerings by the user, i.e. the offerings endpoint of the user's offering manager instance. It is used by catalogue coordinators to index the user's offerings. It must be publicly accessible.
  • the connector data space protocol URL: this URL should point to the user's connector Data Space Protocol endpoint. It is used by connectors to exchange information and proceed to transactions. It must be publicly accessible.
  • the profile server URL: this URL should point to the user's profile server endpoint. It is used by other users to fetch information about the user. It must be publicly accessible.
image

These information will be stored in the user's DID document. All other information (names, website, profile picture, etc.) is optional and stored only in the user's server profile, which is hosted in the user's premises.

Once completed and submitted, the form will trigger the creation of a DID and a VC for the user. The latter will be stored in the user's DLT booth on the user's premises, so no action is required from the user to manage them. Because of this, the DLT booth must NOT be exposed to the public internet. In the final step of the wizard, the user is presented with their DID and VC.

Offering Catalogue

Browsing the catalogue

The SEDIMARK Marketplace catalogue is the central place to discover datasets and models shared by the SEDIMARK community. It can be accessed in two ways:

  • via the Catalogue button in the marketplace navigation bar, from any page of the marketplace,
  • via the search bar in the home page.
image

The catalogue features a search bar to filter offerings by their title or description. Searches can be further refined by filtering by keywords/tags or providers. Each offering is presented as a card, featuring its name, description, creation date, tags, and the provider's username. At the end of the search results, users can review a list of recommended offerings based on their search.

Consuming an offering

Selecting an offering card will redirect the user to the offering details page, where more information about the offering is presented, including:

  • the offering's metadata (name, description, creation date, tags, provider, etc.),
  • the offering's provider information (username, profile picture, description, website, etc.),
  • a button to initiate the negotiation process to consume the offering.

At this stage, clicking the Negotiate button automatically creates a contract agreement with the offering's provider.

image

Managing offerings and contracts

Any participant in the SEDIMARK Marketplace can be both a provider and a consumer of offerings. As a provider, users can publish their datasets and models to share them with the community. As a consumer, users can browse the catalogue and consume offerings published by others.

Publishing an offering

The offering publication form can be accessed via the Publish button in the marketplace navigation bar. Upon opening the form, users are presented with two options:

  • creating an offering based on an existing asset (dataset or model): this option allows users who already created datasets or models (for instance in Mage AI) to prefill the offering publication form with the asset's metadata.
  • creating a new offering from scratch.

To create an offering, users must provide the following information:

  • the metadata of the offering: this includes the offering's name, description, tags/keywords, etc...
  • the location of the asset: this is the URL where the dataset or model can be fetched from. It should be accessible via an HTTP GET request, however it does not need to be publicly accessible, as the SEDIMARK Connector will handle secure data transfer to the consumers. It should only be accessible to the user's connector.
  • the license and usage policy: this section allows users to specify the terms under which the offering can be used by others. It can include a link to a license document and/or a time-limited usage policy.
image

Once submitted, the offering will be published in the user's offering manager instance. It may take some time (~5 min) for catalogue coordinators to index the new offering and make it visible in the marketplace catalogue.

Managing provided offerings and contracts

Users can manage their published offerings via the Dashboard button in the marketplace navigation bar. The dashboard features an Offerings section, where users can review their published offerings and delete them if needed, effectively removing them from the marketplace catalogue.

image

Similarly, the Contracts section of the dashboard allows users to review their active contract agreements, both as providers and consumers. Selecting an offering expands it to show the recent data transfers associated with the contract, along with their status (in progress, completed, failed, etc.). A more general overview of all the transfers the user is involved in can be found in the Overview section of the dashboard.

image

Acquiring data from consumed offerings

image

Users can access the data asset from an offering they have consumed by selecting it in the Consumed section of the Contracts Dashboard. Selecting an offering expands it to reveal a Start Transfer button. Clicking this button opens a dialog, where the user is presented with two methods to acquire the data:

  • the push method: for this, the user must provide a URL where the data will be pushed to. This URL should point to an endpoint in the user's premises that is accessible by her/his connector, it doesn't have to be publicly accessible.
  • the pull method: in that case, the user requests access to the data asset directly from the provider's connector. The provider's connector will then make the data available for download at a secure URL, issuing a token to the user, which he/she can use to fetch the data.
image

Toolbox

image

Figure 1. SEDIMARK Toolbox architecture

The toolbox is the intelligence part of the SEDIMARK platform, containing all the tools and components that are used to create, as illustrated in the above figure, manipulate and work with standardized datasets. Besides datasets, the tools inside the Toolbox can be used to create, train and make predictions with AI models that are stored and managed using a model registry (MLFlow).

Pipeline Management: The Toolbox uses automated pipelines to orchestrate workflows across all components:

  • Data Pipelines: Handle data ingestion, preprocessing, validation, and standardization
  • ML Pipelines: Manage feature engineering, model training, validation, and deployment
  • Inference Pipelines: Execute real-time and batch predictions

These pipelines ensure seamless integration between tools, datasets, and models, providing automated dependency management, version control, and reproducible workflows throughout the entire process lifecycle.

Creating pipelines

From Mage AI

The documentation on how to create an interact with a pipeline from Mage AI can be found at https://docs.mage.ai/design/data-pipeline-management

From the Orchestrator UI

Configuring a pipeline for the Orchestrator UI

To accommodate a better and more intuitive user experience for the Orchestrator UI, MageAI pipelines are designed with a simpler single flow design. Therefore, more complexity can be added through the chained subsequent pipelines described in the section Generic Pipeline Architecture.

Below there are 2 examples of pipeline flows, one compliant presenting a single flow in Figure 2 and another one which presents a multi-flow architecture in Figure 3.

Screenshot_2025-07-07_151724

Figure 2. ✅ Pipeline compliant with Orchestrator UI

image

Figure 3. ❌ Pipeline not compliant with Orchestrator UI

Pipelines tagging

To achieve compliance between MageAI pipelines and Orchestrator UI workflows a set of specifications were defined. One important specification is related to tagging of the MageAI pipelines enabling Orchestrator UI workflows visibility and grouping into categories based on their scope and purpose.

The available identification tags for MageAI pipelines that are compatible with Orchestrator are presented in the following table:

Tag Name Description
data_preprocessing Tag for subsequent (child) preprocessing pipelines of the generic pipeline
data_manipulation Tag for subsequent (child) manipulation pipelines of the generic pipeline
train Training pipelines
predict Inference pipelines
processing Processing pipelines
streaming Pipelines that run continuously and stream data, used by federated learning pipelines.

Disclaimer! MageAI Pipelines that do not contain any of the identification tag will not be shown in the Orchestrator UI. The process of creating and tagging a pipeline is illustrated in the following MageAI pop-up screenshot.

image 1 Figure 4. Pipeline tagging at the creation stage

Configuration variables

The configuration of the Orchestrator UI workflows runs is done through the usage of variables. variables definition help users configure and control the execution of their workflows at the block level. In Mage the definition of variables for a pipeline is done through the metadata.yaml pipeline file which is found in the file explorer under the pipelines directory as shown in the following screenshot:

image 2

Figure 5. metadata.yaml file location for pipeline anomaly_annotator

The metadata.yaml configuration file is automatically created by Mage after a new Pipeline is instantiated and comprises information that describes the pipeline structure. The anomaly_annotator pipeline example is comprised of 3 blocks (Data loader, Transformer and Data exporter), where each block contains the custom definitions of variables under the configuration attribute as shown in the following code snippet:

blocks:
  - all_upstream_blocks_executed: true
    color: null
    configuration:
      attrs:
        default: https://vocab.sedimark.io/temperature
        description: Filtering attributes to filter timeseries for the selected entity
        type: string
      end_time:
        default: null
        description: The end date of the time interval.
        format: YYYY-MM-DDThh:mm:ssZ
        type: date
      entity_id:
        default: urn:ngsi-ld:Sedimark:Temeperature:123456789
        description: This is the ID of the entity that is stored in the NGSI-LD Broker
        type: string
      start_time:
        default: '2022-11-16T07:00:00Z'
        description: The start date of the time interval.
        format: YYYY-MM-DDThh:mm:ssZ
        type: date
      get_data_from_broker:
        default: true
        description: If true, the data will be fetched from the NGSI-LD Broker.
        type: boolean
    downstream_blocks:
      - anomaly_detection
      - histogram_for_broker_loader_1707813944696
    executor_config: null
    executor_type: local_python
    has_callback: false
    language: python
    name: broker_loader
    retry_config: {}
    status: executed
    timeout: null
    type: data_loader
    upstream_blocks: []
    uuid: broker_loader
  - all_upstream_blocks_executed: true
    color: null
    configuration:
      threshold_type:
        default: AUCP
        description: This is the threshold type for the anomaly detection algorithm.
        type: drop_down
    downstream_blocks:
      - export_anomalies
    executor_config: null
    executor_type: local_python
    has_callback: false
    language: python
    name: anomaly_detection
    retry_config: {}
    status: executed
    timeout: null
    type: transformer
    upstream_blocks:
      - broker_loader
    uuid: anomaly_detection
  - all_upstream_blocks_executed: true
    color: null
    configuration: {}
    downstream_blocks: []
    executor_config: null
    executor_type: local_python
    has_callback: false
    language: python
    name: export_anomalies
    retry_config: {}
    status: failed
    timeout: null
    type: data_exporter
    upstream_blocks:
      - anomaly_detection
    uuid: export_anomalies
cache_block_output_in_memory: false
callbacks: []
concurrency_config: {}
conditionals: []
created_at: '2023-11-14 11:26:30.357670+00:00'
data_integration: null
description: data_preprocessing
executor_config: {}
executor_count: 1
executor_type: null
extensions: {}
name: anomaly_annotator
notification_config: {}
remote_variables_dir: null
retry_config: {}
run_pipeline_in_one_process: false
settings:
  triggers:
    save_in_code_automatically: true
spark_config: {}
tags:
  - data_preprocessing
type: python
uuid: anomaly_annotator
variables_dir: /home/src/mage_data/default_repo
widgets: []

The variables definition are represented as mappings in Mage configuration file, following a list of attributes such as: default, description, type and format. Furthermore, there is currently support for 10 types of variables which users can choose from:

Variable type Description
string Simple text input that can be used for general purpose string input
secret Password input
number Number input
multiple_selection Drop down for multiple selections
drop_down Drop down for a single selection
date Date input
boolean True or False value
array A list of values
trigger Child pipeline trigger reference
dictionary Dictionary of key value pairs.

Examples of configuration variables definition

  1. string

    The string type needs to contain a description about what the variables represents. It might also contain a regex entry to specify how to variable needs to look like and it is used to validate user input in Orchestrator UI.

    Example:

    string_name:
      type: string
      description: About the variable
      default: ''
      regex: '^.*$'
  2. secret

    The secret type of variable needs to contain the variable description, it renders an input of type password on the user interface.

    Example:

    secret_name:
      type: secret
      description: What this secret is about
  3. number

    The number type variable must specify the description, optional a default value and range interval for the input.

    Example:

    number_name:
      type: number
      range: [0, 10]
      description: The description
      default: 0
  4. drop_down

    The drop_down type specify a list of values that can be selected from.

    Example:

    drop_down_name:
      type: drop_down
      description: The description
      default: value1
      values:
        - value1
        - value2
  5. multiple_selection

    Multiple selection type is simmilar to drop_down type providing additional support for multiple variable selection.

    multiple_selection_name:
      type: drop_down
      description: The description
      default: value1
      values:
        - value1
        - value2
  6. date

    The date time type is used to specify a date by following the specified format.

    date_name:
       type: date
       description: The description
       default: 2025-01-02
       format: "YYYY-MM-DD"    ```
    
  7. array

    The array format is used to enumerate multiple values.

    array_name:
      type: array
      description: The description.
  8. trigger

    The trigger type enables the selection for execution of a child Mage pipeline, the pipelines are filtered using the input tag.

    trigger_name:
      default: data_preprocessing_test # The actual pipeline_id to run by default.
      description: Trigger for the data preprocessing pipeline
      tag: data_preprocessing # The tag to specify the type of pipeline
      type: trigger
  9. dictionary

    The dictionary type is used to input key-value pairs.

    dictionary_name:
      type: dictionary
      description: The description.

SEDIMARK Generic pipeline

To ensure faster development, seamless compatibility and support across multiple use cases a generic Mage pipeline was developed. The Generic pipeline enables cross operability between various data preprocessing, data manipulation and data postprocessing tasks that allow for a configurable execution process inside the Orchestrator UI application. Moreover, the generic pipeline ensures compatibility with the SEDIMARK ecosystem by providing the necessary tools for handling NGSI-LD assets to support both consumers and producers on leveraging their own resources. The architecture of the Generic pipeline architecture is depicted in the following figure:

image Figure 6. Generic pipeline architecture

To ensure compatibility within the generalized pipeline (parent) and subsequent executed pipeline (child) , the architecture is centered around a standardized data flow that uses pandas DataFrame objects, with specialized Data Interoperability blocks serving as crucial conversion layers between NGSI-LD format and DataFrame structures, enabling bidirectional data transformation. The system supports comprehensive data processing child pipelines through two primary categories:

  • Data Preprocessing - which includes data cleaning, transformation, anonymization, feature engineering, time series preprocessing, and data validation operations
  • Data Manipulation - encompassing AI model training, inference, data aggregation and summarization, and KPI computation capabilities
  • Data Postprocessing - includes all the necessary operation to either apply the inverse operation of the processing stage, or to prepare the data for exporting it back to the broker.

Furthermore, another important feature is the external data source integration which enables users to enrich their datasets and metadata information beyond marketplace offerings, while the integrated MLOps component provides essential model provisioning, storage, versioning, and lifecycle management throughout the pipeline execution. The output DataFrame maintains detailed variable information that serves as the foundation for generating comprehensive data asset metadata, facilitating the creation of new marketplace offerings based on processed results. This architecture ensures that regardless of pipeline complexity, users can leverage both the advanced capabilities of MageAI and the simplified interface of the Orchestrator UI according to their technical requirements.

To support all of this the SEDIMARK Tool Box includes built-in SEDIMARK generic execution pipeline and pipelines for data preprocessing, data manipulation and data postprocessing operations, some of the techniques used by the child pipelines are shown in the architecture diagram on the right side of the figure.

Running pipelines

Using the API

To run a pipeline directly from Mage AI using the API, you need first to create a trigger for the wanted pipeline, that can be called with the specified configuration for that pipeline run.

To do this go to the pipeline and on the left menu click on triggers:

image

Figure 7. Mage AI pipeline view

After that you need to create the trigger using the New Trigger button:

image

Figure 8. Creating a trigger

For a new trigger the type must be set to API. The name and description can be specified, and the API endpoint that start the pipeline is shown as in the figure below:

image

Figure 9. Configuring the trigger

Enabling the trigger:

image

Figure 10. Activating the trigger for request

Making an API call to start the pipeline:

image

Figure 11. API call to the trigger to start a new run for the pipeline

Checking the status in Mage AI:

image

Figure 12. Checking the status in the UI

From the Orchestrator UI

Exporting/Importing pipelines

Interacting with MLFlow

MLflow serves as the primary model registry within the SEDIMARK Toolbox. It stores all models created through the Toolbox, along with their associated metadata—such as performance metrics, plots, and the actual model files.

Once deployed, MLflow can be accessed at http://localhost:5000/, using the default credentials specified in the .env file (or custom ones if they were updated).

The MLflow UI provides two key sections:

  • Experiment page – Displays all model runs. A run corresponds to the process of creating a model through MLflow after training.
image

Figure 13. MLFlow runs page

By clicking on a model we can see the metrics and metadata saved for a specific training epoch of the model.

image

Figure 14. Showing the metadata saved for an epoch in the UI

  • Registered models page – Shows the verified models that are recognized as the most accurate and reliable for their intended tasks.
image

Figure 15. All the registered models and their current version

image

Figure 16. Showing the information of a registered model version