Anomaly tools changes by kiihne-noaa · Pull Request #133 · NOAA-GFDL/epmt

kiihne-noaa · 2026-03-10T17:39:16Z

Describe your changes

Work-in-progress changes to epmt_query designed to enable anomaly detection. Focus on allowing saving of models separate from creation, active editing of already created models, and management of models in DB.

Issue ticket number, link (if applicable)

Checklist

A loose guide to provide structure for contributions

the code runs
the code is readable
the code is commented
there are no additional failures in GUARDED pipeline tasks
a new test was written (if applicable)
new instructions/doc was written (if applicable)
I ran pylint and attempted to implement some of it's feedback

Copilot

Pull request overview

Work-in-progress updates to epmt_query to support anomaly detection workflows by enabling saving reference models separately from creation and improving DB model management.

Changes:

Added a new save_refmodel() helper for persisting reference models.
Updated create_refmodel() to call save_refmodel() instead of directly creating ORM objects.
Added a new test_save_refmodel.py script intended to exercise refmodel creation/saving.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File	Description
test_save_refmodel.py	Adds a script-like “test” that creates/deletes a refmodel and calls `save_refmodel()`.
src/epmt/epmt_query.py	Alters refmodel creation flow and introduces `save_refmodel()` for DB persistence.

Copilot · 2026-03-12T16:02:45Z

@@ -1357,9 +1361,18 @@ def create_refmodel(jobs, name=None, tag={}, op_tags=[],
        return r.id
    r_dict = orm_to_dict(r, with_collections=True)
    return pd.Series(r_dict) if fmt == 'pandas' else r_dict
-
+'''


create_refmodel() no longer returns the created reference model (the prior return logic is now inside a triple-quoted string). This is a behavior-breaking change: callers will now receive None instead of the created model/id/dict. Make create_refmodel() return the result of save_refmodel(...) (and remove the triple-quoted block rather than leaving it as a stray string literal statement).

Copilot · 2026-03-12T16:02:45Z

+def save_refmodel(ReferenceModel, jobs, computed, info_dict, enabled, name=None, tags={}, op_tags=[], fmt='dict'):
+    r = orm_create(ReferenceModel, jobs=jobs, computed=computed, info_dict = info_dict, enabled=enabled, name=name, tags=tags, op_tags=op_tags, fmt=fmt)
+    orm_commit()
+    if fmt == 'orm':
+        return r
+    elif fmt == 'terse':
+        return r.id
+    r_dict = orm_to_dict(r, with_collections=True)
+    return pd.Series(r_dict) if fmt == 'pandas' else r_dict


A few API/implementation issues in save_refmodel():\n- It uses mutable default arguments (tags={}, op_tags=[]), which can leak state between calls. Use None defaults and assign inside the function.\n- The first parameter is named ReferenceModel, which reads like the ORM class rather than an argument; it’s clearer as model_cls (or omit it and reference ReferenceModel directly if it’s always the same).\n- Consider decorating with @db_session (like create_refmodel) so save_refmodel() is safe to call directly outside an existing session.\n- orm_create(..., fmt=fmt) is potentially problematic if orm_create doesn’t accept fmt (the prior code handled formatting after orm_create). If orm_create doesn’t support it, remove fmt=fmt and keep formatting in save_refmodel().

Copilot · 2026-03-12T16:02:46Z

+
+feature_list = [ 'duration', 'rchar', 'syscr', 'syscw', 'wchar', 'cstime', 'cutime', 'majflt', 'cpu_time', 'minflt', 'rssmax', 'cmajflt','cminflt', 'inblock', 'outblock', 'usertime', 'num_procs', 'starttime', 'vol_ctxsw', 'read_bytes', 'systemtime', 'time_oncpu', 'timeslices', 'invol_ctxsw', 'write_bytes', 'time_waiting', 'cancelled_write_bytes']
+random_jobs =  eq.get_jobs(limit = 100, fmt = 'dict', trigger_post_process=False)
+
+try:
+    r = eq.get_refmodels("bronx_test_model")
+    eq.delete_refmodels(r[0]['id'])
+finally:
+    print("ready for new bronx model")
+r = eq.create_refmodel(random_jobs, methods=es.mvod_classifiers(), features = feature_list, name = "bronx_test_model")
+
+eq.save_refmodel(ReferenceModel, r['jobs'], r['computed'], r['info_dict'], r['enabled'], name='test_name', tag={}, op_tags=[])


This file is named like a unit test but behaves like a destructive integration script: it performs DB reads/writes, deletes existing models, and has no assertions. This will be flaky and potentially harmful in CI/dev environments.\n\nSuggested direction (pick one):\n- Convert it to a real automated test (e.g., pytest) with assertions and isolated DB/fixtures (and avoid deleting arbitrary existing models), or\n- Move it to a scripts/ or examples/ location and keep it out of the test discovery path.\n\nAlso, the call uses tag={} but save_refmodel() expects tags=... (per its signature).

Suggested change

feature_list = [ 'duration', 'rchar', 'syscr', 'syscw', 'wchar', 'cstime', 'cutime', 'majflt', 'cpu_time', 'minflt', 'rssmax', 'cmajflt','cminflt', 'inblock', 'outblock', 'usertime', 'num_procs', 'starttime', 'vol_ctxsw', 'read_bytes', 'systemtime', 'time_oncpu', 'timeslices', 'invol_ctxsw', 'write_bytes', 'time_waiting', 'cancelled_write_bytes']

random_jobs = eq.get_jobs(limit = 100, fmt = 'dict', trigger_post_process=False)

try:

r = eq.get_refmodels("bronx_test_model")

eq.delete_refmodels(r[0]['id'])

finally:

print("ready for new bronx model")

r = eq.create_refmodel(random_jobs, methods=es.mvod_classifiers(), features = feature_list, name = "bronx_test_model")

eq.save_refmodel(ReferenceModel, r['jobs'], r['computed'], r['info_dict'], r['enabled'], name='test_name', tag={}, op_tags=[])

import uuid

feature_list = [ 'duration', 'rchar', 'syscr', 'syscw', 'wchar', 'cstime', 'cutime', 'majflt', 'cpu_time', 'minflt', 'rssmax', 'cmajflt','cminflt', 'inblock', 'outblock', 'usertime', 'num_procs', 'starttime', 'vol_ctxsw', 'read_bytes', 'systemtime', 'time_oncpu', 'timeslices', 'invol_ctxsw', 'write_bytes', 'time_waiting', 'cancelled_write_bytes']

def test_save_refmodel_persists_refmodel():

"""

Integration-style test that creates a reference model, saves it,

verifies it can be retrieved, and then cleans it up.

"""

model_name = "bronx_test_model_" + uuid.uuid4().hex

random_jobs = eq.get_jobs(limit=100, fmt='dict', trigger_post_process=False)

r = eq.create_refmodel(

random_jobs,

methods=es.mvod_classifiers(),

features=feature_list,

name=model_name,

)

try:

eq.save_refmodel(

ReferenceModel,

r['jobs'],

r['computed'],

r['info_dict'],

r['enabled'],

name=model_name,

tags={},

op_tags=[],

)

saved_models = eq.get_refmodels(model_name)

assert saved_models, "Expected at least one saved reference model"

finally:

# Clean up only the models created for this test

try:

models_to_delete = eq.get_refmodels(model_name)

except Exception:

models_to_delete = []

for m in models_to_delete:

if 'id' in m:

eq.delete_refmodels(m['id'])

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

+# epmt_stat contains statistical functions
+from epmt import epmt_stat as es
+from epmt import epmt_query as eq
+from epmt.orm import *


@@ -0,0 +1,23 @@
+import numpy as np


@@ -0,0 +1,23 @@
+import numpy as np
+import epmt


@@ -0,0 +1,23 @@
+import numpy as np
+import epmt
+from epmt import epmt_query


+import epmt
+from epmt import epmt_query
+# epmt_outliers contains the EPMT Outlier Detection API
+from epmt import epmt_outliers as eod


+from epmt import epmt_stat as es
+from epmt import epmt_query as eq
+from epmt.orm import *
+import pandas


+from epmt import epmt_query as eq
+from epmt.orm import *
+import pandas
+from pandas import DataFrame


@@ -0,0 +1,23 @@
+import numpy as np
+import epmt


codecov · 2026-04-02T13:24:58Z

Codecov Report

❌ Patch coverage is 29.23077% with 46 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.19%. Comparing base (9c80daf) to head (5ec82d4).

Files with missing lines	Patch %	Lines
src/epmt/epmt_query.py	29.23%	46 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #133      +/-   ##
==========================================
- Coverage   76.67%   76.19%   -0.48%     
==========================================
  Files          34       34              
  Lines        6636     6696      +60     
==========================================
+ Hits         5088     5102      +14     
- Misses       1548     1594      +46

Flag	Coverage Δ
unittests	`76.19% <29.23%> (-0.48%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

+    name = name if name is not None else None  # create with None to avoid unique-name conflicts
+    tag = tag if tag is not None else (old.tags or {})
+    op_tags = op_tags if op_tags is not None else (old.op_tags or [])
+    methods = methods  # leave None so create_refmodel uses default classifiers if desired


ilaflott

some of the bot suggestions / crits are a bit spammy, sorry about that, i'm configuring them still

this is fairly close to ready, pretty sure the one workflow failed because of a connection latency thing.

primary feedback has to do with readability and neatness, try to excise unnecessary operations

ilaflott · 2026-04-16T18:10:40Z

looks like this file can be removed

ilaflott · 2026-04-16T18:12:20Z

 @db_session
-def ops_costs(jobs=[], tags=['op:hsmput', 'op:dmget', 'op:untar', 'op:mv',
-              'op:dmput', 'op:hsmget', 'op:rm', 'op:cp'], metric='cpu_time'):
+def ops_costs(jobs=[], tag=None, tags=None, metric='cpu_time'):


why tag AND tags ? can't one specify tags as a list of length 1?

ilaflott · 2026-04-16T18:13:09Z

+        tags = tag
+    # if neither provided, use default op-list
+    if tags is None:
+        tags = ['op:hsmput', 'op:dmget', 'op:untar', 'op:mv', 'op:dmput', 'op:hsmget', 'op:rm', 'op:cp']


prefer the default value for tags in the function argument

ilaflott · 2026-04-16T18:14:30Z

+    if tag is not None:
+        tags = tag
+    if tags is None:
+        tags = {}
+    if op_tags is None:
+        op_tags = []


tag v tags again, why does a one-entry tags list not cut it?

whats the difference between tags and ops_tags ?

ilaflott · 2026-04-16T18:16:09Z

+    r_dict = orm_to_dict(r, with_collections=True)
+    return pd.Series(r_dict) if fmt == 'pandas' else r_dict


combine this into one line to avoid allocating another object to memory transiently:

return pd.Series(r_dict) if fmt == 'pandas' else orm_to_dict(r, with_collection = True)

ilaflott · 2026-04-16T18:16:42Z

+    methods = methods  # leave None so create_refmodel uses default classifiers if desired
+    features = features


remove, redundant

ilaflott · 2026-04-16T18:17:07Z

+    tag = tag if tag is not None else (old.tags or {})
+    op_tags = op_tags if op_tags is not None else (old.op_tags or [])


we're gonna wanna clean this up

ilaflott · 2026-04-16T18:18:00Z

+from epmt.orm import ReferenceModel
+
+
+def test_save_refmodel_create_and_cleanup():


ilaflott · 2026-04-16T18:18:16Z

+    if not jobs or len(jobs) < 3:
+        pytest.skip("Not enough jobs available to create a reference model")
+
+    jobs_sample = jobs[:3]


can change to jobs_sample = jobs[:]

ilaflott

oh i caught a bigger issue- you need to add an explicit step to the workflows calling the new test file, or add the test to an existing test file.

ilaflott · 2026-04-16T18:29:32Z

will close #217

kiihne-noaa linked an issue Mar 10, 2026 that may be closed by this pull request

bugs/fragility: IOPub data rate exceeded in Jupyter #65

Closed

ilaflott force-pushed the anomaly_tools_changes branch from 67e059c to 8442a35 Compare March 12, 2026 15:45

ilaflott requested a review from Copilot March 12, 2026 15:56

Copilot started reviewing on behalf of ilaflott March 12, 2026 16:02 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

kiihne-noaa and others added 19 commits March 16, 2026 17:02

creating save_refmodel

f052068

Update epmt_query.py

eb5cba7

Update epmt_query.py

eab48d5

Update epmt_query.py

ef473e9

Update epmt_query.py

ff70178

Update epmt_query.py

d4c5eaf

Update epmt_query.py

39de787

Create test_save_refmodel.py

3135e6d

fixed orm import

b9fb9d1

Update test_save_refmodel.py

aca0802

Update epmt_query.py

2855601

Update epmt_query.py

d211bf0

changing tag variable to account for new settup

74d6a20

Update epmt_query.py

6e8276f

adding fmt to save_refmodel default

011ea21

Update epmt_query.py

7695e43

tag to {} if =None

9578eb4

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update src/epmt/epmt_query.py

e968ace

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update src/epmt/epmt_query.py

1281d9f

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

ilaflott force-pushed the anomaly_tools_changes branch from 26df9c3 to 1281d9f Compare March 16, 2026 21:02

ilaflott removed a link to an issue Mar 24, 2026

bugs/fragility: IOPub data rate exceeded in Jupyter #65

Closed

ilaflott assigned kiihne-noaa Mar 24, 2026

ilaflott self-requested a review March 24, 2026 22:06

added test and additonal parameters for save_refmodel

824b69b

github-code-quality Bot found potential problems Mar 31, 2026

View reviewed changes

kiihne-noaa force-pushed the anomaly_tools_changes branch from 035b6ea to 824b69b Compare April 2, 2026 13:00

resolving merge conflicts

d6e0eae

added extend_refmodel, changed tags vs tag

5ec82d4

github-code-quality Bot found potential problems Apr 8, 2026

View reviewed changes

ilaflott requested changes Apr 16, 2026

View reviewed changes

ilaflott linked an issue Apr 16, 2026 that may be closed by this pull request

feature: new function(s) to explicitly save, and update, ReferenceModels #217

Open

		r_dict = orm_to_dict(r, with_collections=True)
		return pd.Series(r_dict) if fmt == 'pandas' else r_dict

		methods = methods # leave None so create_refmodel uses default classifiers if desired
		features = features

		tag = tag if tag is not None else (old.tags or {})
		op_tags = op_tags if op_tags is not None else (old.op_tags or [])

		from epmt.orm import ReferenceModel


		def test_save_refmodel_create_and_cleanup():

Conversation

kiihne-noaa commented Mar 10, 2026

Describe your changes

Issue ticket number, link (if applicable)

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

ilaflott left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilaflott left a comment

Choose a reason for hiding this comment

Uh oh!

ilaflott commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov Bot commented Apr 2, 2026 •

edited

Loading