Skip to content

Implement Tenable Data Pipeline Integration Tests#5

Open
jcornall wants to merge 33 commits into
mainfrom
tenable-test
Open

Implement Tenable Data Pipeline Integration Tests#5
jcornall wants to merge 33 commits into
mainfrom
tenable-test

Conversation

@jcornall

Copy link
Copy Markdown
Owner

This PR implements multiple integration tests for the Tenable data pipeline. Tests related to data loading setup and teardown a temporary database using the database migration files. A dedicated test_data folder for Tenable test data has also been created.

@jcornall jcornall requested a review from kennethnym October 11, 2024 13:51

@kennethnym kennethnym left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going in the right direction! here are my thoughts:

  • avoid mocking threads if possible because it is an implementation detail that your tests should not worry about
  • there is requests-mock that lets you mock responses from urls. you can set them up in a module specific conftest.py (src/tenable/test/conftest.py in this case)
  • as for the file system, i see that you are mocking constants from the constants file. mocking exports should generally be avoided. instead, there are two options i can think of:
    • allow customization of the output directory through the pipeline function. this is more work, but might be useful in the future
    • use pyfakefs to mock the filesystem. this introduces no side effects to the actual file system and is also easier to set up
  • the mock data file is too big. consider reducing the amount of data in the JSON, and fake the data using something like mockaroo to avoid leaking internal data

Comment thread src/tenable/test/conftest.py
Comment thread src/tenable/test/test_export_assets.py Outdated
Comment thread src/tenable/test/test_export_assets.py Outdated
Comment thread src/tenable/test/test_export_assets.py Outdated
Comment thread src/tenable/test/test_export_assets.py Outdated
Comment thread src/tenable/test/test_export_assets.py Outdated
@jcornall

Copy link
Copy Markdown
Owner Author

Applied requested changes, please review when you get chance

@jcornall jcornall requested a review from kennethnym October 18, 2024 15:30
@jcornall

Copy link
Copy Markdown
Owner Author

Implemented test_pipeline.py and refactored the other tests, please review when you get chance

@kennethnym kennethnym left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see that you are now using separate database connections for loading assets/vulnerabilities which is good because it fixes the segmentation fault issue. however, instead of making that connection inside the function, i suggest creating a ConnectionPool and pass it to each function. the function can then obtain a connection from the pool. this avoids the need to patch CONN_PARAMS, as patching module exports directly is generally avoided.

reference to ConnectionPool

Comment thread src/tenable/test/test_pipeline.py Outdated


def test_tenable_success(fs, mocker):
mocker.patch("src.tenable.load_vulnerabilities.CONN_PARAMS", TEST_CONN_PARAMS_DB)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid monkey patching imports - i see that you removed the db param in the functions that require the database. you can use that parameter to pass the test db instance.

Comment thread src/tenable/load_assets.py Outdated
def load_tenable_assets(export: AssetExportStatus, to: mariadb.Connection):
conn = to
def load_tenable_assets(export: AssetExportStatus):
conn = mariadb.connect(**CONN_PARAMS)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of removing the Connection parameter, pass in a ConnectionPool which this function can obtain a connection from.

"""


def load_tenable_vulnerabilities(export: VulnExportStatus, to: mariadb.Connection):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass a ConnectionPool here instead of removing the parameter altogether - this avoids having to monkey patch CONN_PARAMS, which is generally a bad idea.

Comment thread src/tenable/load_vulnerabilities.py Outdated
"""Loads all vulnerabilities within the given export to the database"""

conn = to
conn = mariadb.connect(**CONN_PARAMS)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the ConnectionPool parameter, you can obtain a Connection from the pool

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you are using testdb as the name of the test database. I think you can avoid creating a copy of the migration file by simply using a separate instance of postgres for testing, that way you can reuse the existing migration file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants