Implement Tenable Data Pipeline Integration Tests#5
Conversation
kennethnym
left a comment
There was a problem hiding this comment.
this is going in the right direction! here are my thoughts:
- avoid mocking threads if possible because it is an implementation detail that your tests should not worry about
- there is
requests-mockthat lets you mock responses from urls. you can set them up in a module specificconftest.py(src/tenable/test/conftest.pyin this case) - as for the file system, i see that you are mocking constants from the constants file. mocking exports should generally be avoided. instead, there are two options i can think of:
- allow customization of the output directory through the
pipelinefunction. this is more work, but might be useful in the future - use
pyfakefsto mock the filesystem. this introduces no side effects to the actual file system and is also easier to set up
- allow customization of the output directory through the
- the mock data file is too big. consider reducing the amount of data in the JSON, and fake the data using something like mockaroo to avoid leaking internal data
|
Applied requested changes, please review when you get chance |
|
Implemented test_pipeline.py and refactored the other tests, please review when you get chance |
kennethnym
left a comment
There was a problem hiding this comment.
i see that you are now using separate database connections for loading assets/vulnerabilities which is good because it fixes the segmentation fault issue. however, instead of making that connection inside the function, i suggest creating a ConnectionPool and pass it to each function. the function can then obtain a connection from the pool. this avoids the need to patch CONN_PARAMS, as patching module exports directly is generally avoided.
|
|
||
|
|
||
| def test_tenable_success(fs, mocker): | ||
| mocker.patch("src.tenable.load_vulnerabilities.CONN_PARAMS", TEST_CONN_PARAMS_DB) |
There was a problem hiding this comment.
avoid monkey patching imports - i see that you removed the db param in the functions that require the database. you can use that parameter to pass the test db instance.
| def load_tenable_assets(export: AssetExportStatus, to: mariadb.Connection): | ||
| conn = to | ||
| def load_tenable_assets(export: AssetExportStatus): | ||
| conn = mariadb.connect(**CONN_PARAMS) |
There was a problem hiding this comment.
Instead of removing the Connection parameter, pass in a ConnectionPool which this function can obtain a connection from.
| """ | ||
|
|
||
|
|
||
| def load_tenable_vulnerabilities(export: VulnExportStatus, to: mariadb.Connection): |
There was a problem hiding this comment.
Pass a ConnectionPool here instead of removing the parameter altogether - this avoids having to monkey patch CONN_PARAMS, which is generally a bad idea.
| """Loads all vulnerabilities within the given export to the database""" | ||
|
|
||
| conn = to | ||
| conn = mariadb.connect(**CONN_PARAMS) |
There was a problem hiding this comment.
with the ConnectionPool parameter, you can obtain a Connection from the pool
There was a problem hiding this comment.
I see that you are using testdb as the name of the test database. I think you can avoid creating a copy of the migration file by simply using a separate instance of postgres for testing, that way you can reuse the existing migration file.
This PR implements multiple integration tests for the Tenable data pipeline. Tests related to data loading setup and teardown a temporary database using the database migration files. A dedicated test_data folder for Tenable test data has also been created.