Skip to content

Add Apache Spark converter module#93

Open
jbonofre wants to merge 1 commit into
mainfrom
jbonofre/spark-converter
Open

Add Apache Spark converter module#93
jbonofre wants to merge 1 commit into
mainfrom
jbonofre/spark-converter

Conversation

@jbonofre
Copy link
Copy Markdown
Collaborator

Summary

  • Adds a new converters/spark/ Java/Maven module that reads OSI semantic model YAML files and generates PySpark code
  • Generated code includes dataset loaders (spark.table() + computed columns via F.expr()), join helpers from relationship definitions, metric aggregation functions via spark.sql(), and a convenience load_all_datasets() function
  • Supports multi-dialect expression selection (ANSI_SQL, SNOWFLAKE, DATABRICKS) with fallback

Test plan

  • Unit tests pass (mvn clean test — 4 tests covering parsing, field extraction, code generation, and multi-dialect support)
  • End-to-end validation against examples/tpcds_semantic_model.yaml produces valid PySpark output
  • Review generated PySpark code for correctness on an actual Spark cluster

Introduces a Java/Maven converter under converters/spark/ that parses
OSI semantic model YAML files and generates PySpark code including
dataset loaders, join helpers, and metric functions.
@jbonofre
Copy link
Copy Markdown
Collaborator Author

A very simple converter for Apache Spark, showing the kind of code we can have in the repo.

@jbonofre jbonofre requested a review from khush-bhatia March 23, 2026 12:43
Comment on lines +7 to +10
/**
* Java representation of an OSI semantic model parsed from YAML.
*/
public class OsiModel {
Copy link
Copy Markdown
Member

@khush-bhatia khush-bhatia Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The java representation code seems to be in two places in both spark and polaris modules. How about we instead follow this directory structure

core-spec/
│ ├── spec.yaml
│ ├── spec.json
│ └── spec.proto

And then use the proto version of the spec to dynamically generate the java/python/go bindings for the spec ?

I think dbt was also suggesting doing that. Many applications can also directly consume the proto version of the spec.

For the purpose for this PR, I think it is fine to add the java bindings, I would still keep the java bindings in one folder

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the other converters are directly in the converters module (Python based for instance).

I'm happy to do a previous PR to reorg the modules with multi-languages binding.

public class OsiModel {

private String version;
private List<SemanticModel> semanticModels = new ArrayList<>();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also there is no parent OSI model that has list of semantic_models.

The OSI spec represents the spec for one semantic_model.

Comment on lines +194 to +195
sb.append(" return spark.sql(\"SELECT ").append(escapeString(expr))
.append(" AS ").append(metric.getName()).append("\")\n\n");
Copy link
Copy Markdown
Member

@khush-bhatia khush-bhatia Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jbonofre As far as I know spark does not have native support for querying a semantic model.

And this code is generating some SQL , but I don't think that query will just work in spark. We need to align on the query semantics on OSI and then have a query rewriter implementation in spark.

I suggest we keep CodeGen out of scope for now, and instead have the logic to load the semantic model to the spark catalog ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's reasonable. I will update accordingly.

Copy link
Copy Markdown
Member

@khush-bhatia khush-bhatia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants