[DEV-12156] - Add Park Loader by zachflanders-frb · Pull Request #3 · fedspendingtransparency/brus-backend-common

zachflanders-frb · 2026-04-13T21:52:56Z

Added a load_park.py script. I made changes to the Dockerfile in order to run spark commands locally, I am not sure what the implications are of uncommenting these lines or how others have their local set up.

Additional questions:

Do we want to continue to support the postgres loading pattern in this script?
Is DeltaModel the right type of model, or should I have used CSVModel? It seems like the structure is not really used anywhere.

dpb-bah · 2026-04-14T18:02:09Z

I made changes to the Dockerfile in order to run spark commands locally, I am not sure what the implications are of uncommenting these lines or how others have their local set up.

I think this is fine if you want to do that but would ask to not commit it to the repo. I removed it cause it added a bit of extra time to reinstalling and running pyspark within the main container should setup the jars/installation at runtime. Also there are other spark containers using Dockerfile.spark that include those steps setting up in the background.

Do we want to continue to support the postgres loading pattern in this script?

Unclear what you mean here. The resulting data in the model should be the same (so the same logic of transformations, cleaning the data, etc.) but shouldn't be postgres-based no.

Is DeltaModel the right type of model, or should I have used CSVModel? It seems like the structure is not really used anywhere.

That's a good catch ahead of time. I can go over this in a call/chat but I updated the data model.xlsx, figma, and ticket such that the data models should be

PARKBronze(CSVModel): no loader required, just based on the raw park file placed in the bucket
PARKGold(CSVModel): updated in the script you're making here (I'm trying to set a precedent of loaders/[name]_[bronze/silver/gold].py so something like park_gold.py) that pulls from PARKBronze, cleans it up, and saves it to PARKGold(CSVModel).

Over time we're trying to figure out the requirements to separate CSVModel vs DeltaModel and so far I'm basing it off size (<500k) so this would apply, but that can change/evolve.

…d date table

zachflanders-frb · 2026-04-16T15:43:58Z

-    """Simply converts datetime's to datetime64[us] for dataframes"""
-    return np.datetime64(dt).astype("datetime64[us]")
+    """Simply converts datetime's to datetime64[ns] for dataframes"""
+    return np.datetime64(dt).astype("datetime64[ns]")


Seems that nanoseconds is the default when pandas parses datetime columns. The microseconds were resulting in a cryptic error of: ValueError: Shape of passed values is (1, 7), indices imply (2, 7) when performing a pandas.concat.

zachflanders-frb · 2026-04-16T15:46:27Z


 class DeltaModel(LakeHouseModel):
    FORMAT = LakeHouseModelFormat.DELTA
+    STRUCTURE: StructType


The STRUCTURE with type StructType is really only used by the DeltaModel, so I moved this property to this model instead of the LakeHouseModel. I added a DTYPES property to the CSVModel that performs a similar purpose, but uses more pandas like paradigm of column names and dtypes.

zachflanders-frb · 2026-04-16T15:47:40Z

+        params = {
+            "dtype": {k: v for k, v in self.DTYPES.items() if v != datetime},
+            "parse_dates": [k for k, v in self.DTYPES.items() if v == datetime],
+            "usecols": cols,
+        }
+        # Ensure that any passed in kwargs take precedence over the default params
+        params.update(kwargs)


I added these params to ensure that the pandas dataframes are read from csv with consistent data types. This uses the new DTYPES property of the CSVModel.

zachflanders-frb added 3 commits April 13, 2026 14:52

[DEV-12156] - Add park loader

96f9ae8

[DEV-12156] - Add tests

e4828c3

[DEV-12156] - Updating load_park test

dcb7c5d

github-actions bot assigned zachflanders-frb Apr 13, 2026

[DEV-12156] - style fix

776c4f5

zachflanders-frb added 5 commits April 15, 2026 09:42

[DEV-12156] - Adding fixtures to test

9f25e8d

[DEV-12156] - Update tests, revert dockerfile changes

7604900

[DEV-12156] - code style updates

7094534

[DEV-12156] - allow kwargs to take precedence when calling to_pandas_df

01c1a71

[DEV-12156] - update tests to patch bucket name for external data loa…

c1361b9

…d date table

zachflanders-frb commented Apr 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DEV-12156] - Add Park Loader#3

[DEV-12156] - Add Park Loader#3
zachflanders-frb wants to merge 9 commits intoqatfrom
ftr/dev-12156-load-park

zachflanders-frb commented Apr 13, 2026

Uh oh!

dpb-bah commented Apr 14, 2026 •

edited

Loading

Uh oh!

zachflanders-frb Apr 16, 2026

Uh oh!

zachflanders-frb Apr 16, 2026

Uh oh!

zachflanders-frb Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zachflanders-frb commented Apr 13, 2026

Uh oh!

dpb-bah commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zachflanders-frb Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

zachflanders-frb Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

zachflanders-frb Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dpb-bah commented Apr 14, 2026 •

edited

Loading