Back Data with DataRow object #4773

mpolson64 · 2026-01-15T18:19:43Z

Summary:
NOTE: This is much slower than the implementation which is backed by a dataframe. For clarity, Ive put this naive implementation up as its own diff and the next diff hunts for speedups.

Creates new source of truth for Data: the DataRow. The df is now a cached property which is dynamically generated based on these rows.

In the future, these will become a Base object in SQLAlchemy st. Data will have a SQLAlchemy relationship to a list of DataRows which live in their own table.

RFC:

Im renaming sem -> se here (but keeping sem in the df for now, since this could be an incredibly involved cleanup). Do we have alignment that this is a positive change? If so I can either start of backlog the cleanup across the codebase. cc Balandat who Ive talked about this with a while back.
This removes the ability for Data to contain arbitrary columns, which was added in D83682740 and afaik unused. Arbitrary new columns would not be compatible with the new storage setup (it was easy in the old setup which is why we added it), and I think we should take a careful look at how to store contextual data in the future in a structured way.

Differential Revision: D90605846

meta-codesync · 2026-01-15T18:20:18Z

@mpolson64 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D90605846.

Summary: NOTE: This is much slower than the implementation which is backed by a dataframe. For clarity, Ive put this naive implementation up as its own diff and the next diff hunts for speedups. Creates new source of truth for Data: the DataRow. The df is now a cached property which is dynamically generated based on these rows. In the future, these will become a Base object in SQLAlchemy st. Data will have a SQLAlchemy relationship to a list of DataRows which live in their own table. RFC: 1. Im renaming sem -> se here (but keeping sem in the df for now, since this could be an incredibly involved cleanup). Do we have alignment that this is a positive change? If so I can either start of backlog the cleanup across the codebase. cc Balandat who Ive talked about this with a while back. 2. This removes the ability for Data to contain arbitrary columns, which was added in D83682740 and afaik unused. Arbitrary new columns would not be compatible with the new storage setup (it was easy in the old setup which is why we added it), and I think we should take a careful look at how to store contextual data in the future in a structured way. Differential Revision: D90605846

codecov-commenter · 2026-01-15T19:02:49Z

Codecov Report

❌ Patch coverage is 94.64286% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.71%. Comparing base (bf0c104) to head (6633bbd).

Files with missing lines	Patch %	Lines
ax/core/data.py	92.10%	3 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #4773   +/-   ##
=======================================
  Coverage   96.71%   96.71%           
=======================================
  Files         587      587           
  Lines       61311    61272   -39     
=======================================
- Hits        59295    59261   -34     
+ Misses       2016     2011    -5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Summary: NOTE: This is much slower than the implementation which is backed by a dataframe. For clarity, Ive put this naive implementation up as its own diff and the next diff hunts for speedups. Creates new source of truth for Data: the DataRow. The df is now a cached property which is dynamically generated based on these rows. In the future, these will become a Base object in SQLAlchemy st. Data will have a SQLAlchemy relationship to a list of DataRows which live in their own table. RFC: 1. Im renaming sem -> se here (but keeping sem in the df for now, since this could be an incredibly involved cleanup). Do we have alignment that this is a positive change? If so I can either start of backlog the cleanup across the codebase. cc Balandat who Ive talked about this with a while back. 2. This removes the ability for Data to contain arbitrary columns, which was added in D83682740 and afaik unused. Arbitrary new columns would not be compatible with the new storage setup (it was easy in the old setup which is why we added it), and I think we should take a careful look at how to store contextual data in the future in a structured way. Differential Revision: D90605846

Summary: TData was necesssary whern we had multiple different Data classes, but recent developments have made this no longer needed Reviewed By: esantorella, saitcakmak Differential Revision: D90596942

Summary: Moved these tests into TestData, since Data is the only data-related class in Ax. Reviewed By: saitcakmak Differential Revision: D90605845

Summary: NOTE: This is much slower than the implementation which is backed by a dataframe. For clarity, Ive put this naive implementation up as its own diff and the next diff hunts for speedups. Creates new source of truth for Data: the DataRow. The df is now a cached property which is dynamically generated based on these rows. In the future, these will become a Base object in SQLAlchemy st. Data will have a SQLAlchemy relationship to a list of DataRows which live in their own table. RFC: 1. Im renaming sem -> se here (but keeping sem in the df for now, since this could be an incredibly involved cleanup). Do we have alignment that this is a positive change? If so I can either start of backlog the cleanup across the codebase. cc Balandat who Ive talked about this with a while back. 2. This removes the ability for Data to contain arbitrary columns, which was added in D83682740 and afaik unused. Arbitrary new columns would not be compatible with the new storage setup (it was easy in the old setup which is why we added it), and I think we should take a careful look at how to store contextual data in the future in a structured way. Differential Revision: D90605846

meta-cla bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Jan 15, 2026

meta-codesync bot added fb-exported meta-exported labels Jan 15, 2026

mpolson64 force-pushed the export-D90605846 branch from a2b1788 to ba5c887 Compare January 15, 2026 18:25

mpolson64 force-pushed the export-D90605846 branch from ba5c887 to 4d8f78e Compare January 16, 2026 20:14

mpolson64 force-pushed the export-D90605846 branch 2 times, most recently from 55564ea to 8b28f97 Compare January 20, 2026 17:47

mpolson64 added 3 commits January 20, 2026 11:46

Remove TData (facebook#4771)

abeaf58

Summary: TData was necesssary whern we had multiple different Data classes, but recent developments have made this no longer needed Reviewed By: esantorella, saitcakmak Differential Revision: D90596942

Remove TestDataBase now that DataBase is gone (facebook#4772)

3370e68

Summary: Moved these tests into TestData, since Data is the only data-related class in Ax. Reviewed By: saitcakmak Differential Revision: D90605845

mpolson64 force-pushed the export-D90605846 branch from 8b28f97 to 6633bbd Compare January 20, 2026 19:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Back Data with DataRow object #4773

Back Data with DataRow object #4773

mpolson64 commented Jan 15, 2026

Uh oh!

meta-codesync bot commented Jan 15, 2026

Uh oh!

codecov-commenter commented Jan 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Back Data with DataRow object #4773

Are you sure you want to change the base?

Back Data with DataRow object #4773

Conversation

mpolson64 commented Jan 15, 2026

Uh oh!

meta-codesync bot commented Jan 15, 2026

Uh oh!

codecov-commenter commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov-commenter commented Jan 15, 2026 •

edited

Loading