Add enterprise analytics tutorial by smithjilks · Pull Request #44 · ultravioletrs/ai

smithjilks · 2026-03-26T09:00:55Z

What type of PR is this?

What does this do?

Which issue(s) does this PR fix/relate to?

Have you included tests for your changes?

Did you document any new/modified features?

Notes

SammyOina

include prism screenshots, and cli output as well

fbugarski · 2026-03-31T11:28:17Z

enterprise-analytics/README.md

+- **Secure Computation (aTLS)** — Attested TLS verifies the TEE hardware and software stack before any data is uploaded
+- **Multi-Party Computation** — Three independent data providers each upload proprietary datasets into the same encrypted enclave
+- **Real-World Data** — Uses the [UCI Online Retail II](https://www.kaggle.com/datasets/mashlyn/online-retail-ii-uci) dataset (real European e-commerce transactions) split across simulated companies
+- **Enterprise Value** — Benchmark proves the consortium model outperforms any single-company model


In practice this is not guaranteed. In my local run, the consortium model achieved lower R² than at least one of the individual models. The claim should be softened.

fbugarski · 2026-03-31T11:28:59Z

enterprise-analytics/README.md

+To train the consortium model locally:
+
+```bash
+python train.py


This step is sensitive to the current working directory. Running train.py from the repository root fails because it expects datasets/ relative to CWD. This is not clearly documented and makes the workflow non-intuitive.

fbugarski · 2026-03-31T11:29:06Z

enterprise-analytics/README.md

+To train the consortium model locally:
+
+```bash
+python train.py


This step is sensitive to the current working directory. Running train.py from the repository root fails because it expects datasets/ relative to CWD. This is not clearly documented and makes the workflow non-intuitive.

fbugarski · 2026-03-31T11:29:55Z

enterprise-analytics/README.md

+```bash
+python3 -m venv venv
+source venv/bin/activate
+pip install -r requirements.txt


The example references a requirements.txt, but no such file is present in the repository. This makes the setup incomplete.

fbugarski · 2026-03-31T11:34:06Z

enterprise-analytics/README.md

+
+## Install
+
+Fetch the data from Kaggle — [Online Retail II UCI](https://www.kaggle.com/datasets/mashlyn/online-retail-ii-uci) dataset:


The setup assumes Kaggle CLI + credentials, but does not clearly explain the need for a legacy API key (kaggle.json). This can be confusing with the newer Kaggle token system.

fbugarski · 2026-03-31T11:57:44Z

enterprise-analytics/train.py

+    X = combined[FEATURE_COLS].values
+    y = combined[TARGET_COL].values
+
+    X_train, X_test, y_train, y_test = train_test_split(


Since this is presented as a demand forecasting example, a time-based split would likely be more appropriate than a random train/test split for evaluation. That would better reflect how the model performs on future periods rather than shuffled observations from the same overall time range.

fbugarski · 2026-03-31T11:57:46Z

enterprise-analytics/train.py

+    X = combined[FEATURE_COLS].values
+    y = combined[TARGET_COL].values
+
+    X_train, X_test, y_train, y_test = train_test_split(


Since this is presented as a demand forecasting example, a time-based split would likely be more appropriate than a random train/test split for evaluation. That would better reflect how the model performs on future periods rather than shuffled observations from the same overall time range.

fbugarski · 2026-03-31T11:57:48Z

enterprise-analytics/train.py

+    X = combined[FEATURE_COLS].values
+    y = combined[TARGET_COL].values
+
+    X_train, X_test, y_train, y_test = train_test_split(


Since this is presented as a demand forecasting example, a time-based split would likely be more appropriate than a random train/test split for evaluation. That would better reflect how the model performs on future periods rather than shuffled observations from the same overall time range.

fbugarski · 2026-03-31T12:00:07Z

enterprise-analytics/train.py

+    df["WeekOfYear"] = df["InvoiceDate"].dt.isocalendar().week.astype(int)
+
+    # Aggregate to monthly product-level demand
+    monthly = (


The code comment mentions "monthly product-level demand", but including WeekOfYear in the grouping means the aggregation is no longer purely monthly. This effectively splits data within the same month, so it may be worth either adjusting the grouping or clarifying the description.

fbugarski · 2026-03-31T12:01:54Z

enterprise-analytics/predict.py

+"""
+
+import os
+import sys


The sys import appears to be unused and can likely be removed.

fbugarski

Thanks for the detailed example — I was able to run the full pipeline locally end-to-end.

Overall, the example works well, but there are a few areas that would benefit from clarification or adjustment:

some claims around consortium performance are stronger than what is observed in practice
the local workflow is sensitive to the current working directory (datasets path)
setup steps (requirements and Kaggle credentials) could be made more explicit
a few minor inconsistencies between code and documentation (e.g. aggregation wording)

Add enterprise analytics tutorial

09907a5

smithjilks requested a review from SammyOina March 26, 2026 09:00

smithjilks self-assigned this Mar 26, 2026

smithjilks added 2 commits March 27, 2026 12:31

Fix enterprise analytics tutorial

8de2b68

Fix enterprise tutorial

4d47b03

SammyOina requested a review from fbugarski March 27, 2026 11:20

SammyOina requested changes Mar 27, 2026

View reviewed changes

fbugarski reviewed Mar 31, 2026

View reviewed changes

fbugarski requested changes Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add enterprise analytics tutorial#44

Add enterprise analytics tutorial#44
smithjilks wants to merge 3 commits intoultravioletrs:mainfrom
smithjilks:feat-enterprise-tutorial

smithjilks commented Mar 26, 2026

Uh oh!

SammyOina left a comment

Uh oh!

fbugarski Mar 31, 2026

Uh oh!

fbugarski Mar 31, 2026

Uh oh!

fbugarski Mar 31, 2026

Uh oh!

fbugarski Mar 31, 2026

Uh oh!

fbugarski Mar 31, 2026

Uh oh!

fbugarski Mar 31, 2026

Uh oh!

fbugarski Mar 31, 2026

Uh oh!

fbugarski Mar 31, 2026

Uh oh!

fbugarski Mar 31, 2026

Uh oh!

fbugarski Mar 31, 2026

Uh oh!

fbugarski left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		## Install

		Fetch the data from Kaggle — [Online Retail II UCI](https://www.kaggle.com/datasets/mashlyn/online-retail-ii-uci) dataset:

Conversation

smithjilks commented Mar 26, 2026

What type of PR is this?

What does this do?

Which issue(s) does this PR fix/relate to?

Have you included tests for your changes?

Did you document any new/modified features?

Notes

Uh oh!

SammyOina left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fbugarski left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants