Anomaly tools changes#133
Conversation
67e059c to
8442a35
Compare
There was a problem hiding this comment.
Pull request overview
Work-in-progress updates to epmt_query to support anomaly detection workflows by enabling saving reference models separately from creation and improving DB model management.
Changes:
- Added a new
save_refmodel()helper for persisting reference models. - Updated
create_refmodel()to callsave_refmodel()instead of directly creating ORM objects. - Added a new
test_save_refmodel.pyscript intended to exercise refmodel creation/saving.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| test_save_refmodel.py | Adds a script-like “test” that creates/deletes a refmodel and calls save_refmodel(). |
| src/epmt/epmt_query.py | Alters refmodel creation flow and introduces save_refmodel() for DB persistence. |
| @@ -1357,9 +1361,18 @@ def create_refmodel(jobs, name=None, tag={}, op_tags=[], | |||
| return r.id | |||
| r_dict = orm_to_dict(r, with_collections=True) | |||
| return pd.Series(r_dict) if fmt == 'pandas' else r_dict | |||
|
|
|||
| ''' | |||
There was a problem hiding this comment.
create_refmodel() no longer returns the created reference model (the prior return logic is now inside a triple-quoted string). This is a behavior-breaking change: callers will now receive None instead of the created model/id/dict. Make create_refmodel() return the result of save_refmodel(...) (and remove the triple-quoted block rather than leaving it as a stray string literal statement).
| def save_refmodel(ReferenceModel, jobs, computed, info_dict, enabled, name=None, tags={}, op_tags=[], fmt='dict'): | ||
| r = orm_create(ReferenceModel, jobs=jobs, computed=computed, info_dict = info_dict, enabled=enabled, name=name, tags=tags, op_tags=op_tags, fmt=fmt) | ||
| orm_commit() | ||
| if fmt == 'orm': | ||
| return r | ||
| elif fmt == 'terse': | ||
| return r.id | ||
| r_dict = orm_to_dict(r, with_collections=True) | ||
| return pd.Series(r_dict) if fmt == 'pandas' else r_dict |
There was a problem hiding this comment.
A few API/implementation issues in save_refmodel():\n- It uses mutable default arguments (tags={}, op_tags=[]), which can leak state between calls. Use None defaults and assign inside the function.\n- The first parameter is named ReferenceModel, which reads like the ORM class rather than an argument; it’s clearer as model_cls (or omit it and reference ReferenceModel directly if it’s always the same).\n- Consider decorating with @db_session (like create_refmodel) so save_refmodel() is safe to call directly outside an existing session.\n- orm_create(..., fmt=fmt) is potentially problematic if orm_create doesn’t accept fmt (the prior code handled formatting after orm_create). If orm_create doesn’t support it, remove fmt=fmt and keep formatting in save_refmodel().
|
|
||
| feature_list = [ 'duration', 'rchar', 'syscr', 'syscw', 'wchar', 'cstime', 'cutime', 'majflt', 'cpu_time', 'minflt', 'rssmax', 'cmajflt','cminflt', 'inblock', 'outblock', 'usertime', 'num_procs', 'starttime', 'vol_ctxsw', 'read_bytes', 'systemtime', 'time_oncpu', 'timeslices', 'invol_ctxsw', 'write_bytes', 'time_waiting', 'cancelled_write_bytes'] | ||
| random_jobs = eq.get_jobs(limit = 100, fmt = 'dict', trigger_post_process=False) | ||
|
|
||
| try: | ||
| r = eq.get_refmodels("bronx_test_model") | ||
| eq.delete_refmodels(r[0]['id']) | ||
| finally: | ||
| print("ready for new bronx model") | ||
| r = eq.create_refmodel(random_jobs, methods=es.mvod_classifiers(), features = feature_list, name = "bronx_test_model") | ||
|
|
||
| eq.save_refmodel(ReferenceModel, r['jobs'], r['computed'], r['info_dict'], r['enabled'], name='test_name', tag={}, op_tags=[]) |
There was a problem hiding this comment.
This file is named like a unit test but behaves like a destructive integration script: it performs DB reads/writes, deletes existing models, and has no assertions. This will be flaky and potentially harmful in CI/dev environments.\n\nSuggested direction (pick one):\n- Convert it to a real automated test (e.g., pytest) with assertions and isolated DB/fixtures (and avoid deleting arbitrary existing models), or\n- Move it to a scripts/ or examples/ location and keep it out of the test discovery path.\n\nAlso, the call uses tag={} but save_refmodel() expects tags=... (per its signature).
| feature_list = [ 'duration', 'rchar', 'syscr', 'syscw', 'wchar', 'cstime', 'cutime', 'majflt', 'cpu_time', 'minflt', 'rssmax', 'cmajflt','cminflt', 'inblock', 'outblock', 'usertime', 'num_procs', 'starttime', 'vol_ctxsw', 'read_bytes', 'systemtime', 'time_oncpu', 'timeslices', 'invol_ctxsw', 'write_bytes', 'time_waiting', 'cancelled_write_bytes'] | |
| random_jobs = eq.get_jobs(limit = 100, fmt = 'dict', trigger_post_process=False) | |
| try: | |
| r = eq.get_refmodels("bronx_test_model") | |
| eq.delete_refmodels(r[0]['id']) | |
| finally: | |
| print("ready for new bronx model") | |
| r = eq.create_refmodel(random_jobs, methods=es.mvod_classifiers(), features = feature_list, name = "bronx_test_model") | |
| eq.save_refmodel(ReferenceModel, r['jobs'], r['computed'], r['info_dict'], r['enabled'], name='test_name', tag={}, op_tags=[]) | |
| import uuid | |
| feature_list = [ 'duration', 'rchar', 'syscr', 'syscw', 'wchar', 'cstime', 'cutime', 'majflt', 'cpu_time', 'minflt', 'rssmax', 'cmajflt','cminflt', 'inblock', 'outblock', 'usertime', 'num_procs', 'starttime', 'vol_ctxsw', 'read_bytes', 'systemtime', 'time_oncpu', 'timeslices', 'invol_ctxsw', 'write_bytes', 'time_waiting', 'cancelled_write_bytes'] | |
| def test_save_refmodel_persists_refmodel(): | |
| """ | |
| Integration-style test that creates a reference model, saves it, | |
| verifies it can be retrieved, and then cleans it up. | |
| """ | |
| model_name = "bronx_test_model_" + uuid.uuid4().hex | |
| random_jobs = eq.get_jobs(limit=100, fmt='dict', trigger_post_process=False) | |
| r = eq.create_refmodel( | |
| random_jobs, | |
| methods=es.mvod_classifiers(), | |
| features=feature_list, | |
| name=model_name, | |
| ) | |
| try: | |
| eq.save_refmodel( | |
| ReferenceModel, | |
| r['jobs'], | |
| r['computed'], | |
| r['info_dict'], | |
| r['enabled'], | |
| name=model_name, | |
| tags={}, | |
| op_tags=[], | |
| ) | |
| saved_models = eq.get_refmodels(model_name) | |
| assert saved_models, "Expected at least one saved reference model" | |
| finally: | |
| # Clean up only the models created for this test | |
| try: | |
| models_to_delete = eq.get_refmodels(model_name) | |
| except Exception: | |
| models_to_delete = [] | |
| for m in models_to_delete: | |
| if 'id' in m: | |
| eq.delete_refmodels(m['id']) |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
26df9c3 to
1281d9f
Compare
| # epmt_stat contains statistical functions | ||
| from epmt import epmt_stat as es | ||
| from epmt import epmt_query as eq | ||
| from epmt.orm import * |
| @@ -0,0 +1,23 @@ | |||
| import numpy as np | |||
| @@ -0,0 +1,23 @@ | |||
| import numpy as np | |||
| import epmt | |||
| @@ -0,0 +1,23 @@ | |||
| import numpy as np | |||
| import epmt | |||
| from epmt import epmt_query | |||
| import epmt | ||
| from epmt import epmt_query | ||
| # epmt_outliers contains the EPMT Outlier Detection API | ||
| from epmt import epmt_outliers as eod |
| from epmt import epmt_stat as es | ||
| from epmt import epmt_query as eq | ||
| from epmt.orm import * | ||
| import pandas |
| from epmt import epmt_query as eq | ||
| from epmt.orm import * | ||
| import pandas | ||
| from pandas import DataFrame |
| @@ -0,0 +1,23 @@ | |||
| import numpy as np | |||
| import epmt | |||
035b6ea to
824b69b
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #133 +/- ##
==========================================
- Coverage 76.67% 76.19% -0.48%
==========================================
Files 34 34
Lines 6636 6696 +60
==========================================
+ Hits 5088 5102 +14
- Misses 1548 1594 +46
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| name = name if name is not None else None # create with None to avoid unique-name conflicts | ||
| tag = tag if tag is not None else (old.tags or {}) | ||
| op_tags = op_tags if op_tags is not None else (old.op_tags or []) | ||
| methods = methods # leave None so create_refmodel uses default classifiers if desired |
ilaflott
left a comment
There was a problem hiding this comment.
some of the bot suggestions / crits are a bit spammy, sorry about that, i'm configuring them still
this is fairly close to ready, pretty sure the one workflow failed because of a connection latency thing.
primary feedback has to do with readability and neatness, try to excise unnecessary operations
There was a problem hiding this comment.
looks like this file can be removed
| @db_session | ||
| def ops_costs(jobs=[], tags=['op:hsmput', 'op:dmget', 'op:untar', 'op:mv', | ||
| 'op:dmput', 'op:hsmget', 'op:rm', 'op:cp'], metric='cpu_time'): | ||
| def ops_costs(jobs=[], tag=None, tags=None, metric='cpu_time'): |
There was a problem hiding this comment.
why tag AND tags ? can't one specify tags as a list of length 1?
| tags = tag | ||
| # if neither provided, use default op-list | ||
| if tags is None: | ||
| tags = ['op:hsmput', 'op:dmget', 'op:untar', 'op:mv', 'op:dmput', 'op:hsmget', 'op:rm', 'op:cp'] |
There was a problem hiding this comment.
prefer the default value for tags in the function argument
| if tag is not None: | ||
| tags = tag | ||
| if tags is None: | ||
| tags = {} | ||
| if op_tags is None: | ||
| op_tags = [] |
There was a problem hiding this comment.
tag v tags again, why does a one-entry tags list not cut it?
whats the difference between tags and ops_tags ?
| r_dict = orm_to_dict(r, with_collections=True) | ||
| return pd.Series(r_dict) if fmt == 'pandas' else r_dict |
There was a problem hiding this comment.
combine this into one line to avoid allocating another object to memory transiently:
return pd.Series(r_dict) if fmt == 'pandas' else orm_to_dict(r, with_collection = True)
| methods = methods # leave None so create_refmodel uses default classifiers if desired | ||
| features = features |
| tag = tag if tag is not None else (old.tags or {}) | ||
| op_tags = op_tags if op_tags is not None else (old.op_tags or []) |
There was a problem hiding this comment.
we're gonna wanna clean this up
| from epmt.orm import ReferenceModel | ||
|
|
||
|
|
||
| def test_save_refmodel_create_and_cleanup(): |
| if not jobs or len(jobs) < 3: | ||
| pytest.skip("Not enough jobs available to create a reference model") | ||
|
|
||
| jobs_sample = jobs[:3] |
There was a problem hiding this comment.
can change to jobs_sample = jobs[:]
ilaflott
left a comment
There was a problem hiding this comment.
oh i caught a bigger issue- you need to add an explicit step to the workflows calling the new test file, or add the test to an existing test file.
|
will close #217 |
Describe your changes
Work-in-progress changes to epmt_query designed to enable anomaly detection. Focus on allowing saving of models separate from creation, active editing of already created models, and management of models in DB.
Issue ticket number, link (if applicable)
Checklist
A loose guide to provide structure for contributions