-
Notifications
You must be signed in to change notification settings - Fork 1
Generic data upload #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
d4fb6e1
5629ac6
11765cf
b7de005
df88c66
827c7d8
0a6a823
c009a41
f9e0051
f4f90c8
74e6593
c832866
6aff09d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,114 @@ | ||
| # Generic File Upload Script | ||
|
|
||
| This script provides a lightweight, generalized CSV upload solution for Civis Platform. It handles uploading CSV files to database tables based on a user's primary group ID and a metadata table that defines schema mappings. | ||
|
|
||
| ## How It Works | ||
|
|
||
| The script performs the following operations: | ||
|
|
||
| 1. **Schema Determination**: Looks up the user's schema from a metadata table based on their `primary_group_id` | ||
| 2. **Table Creation**: Drops the existing table if present and imports the CSV data | ||
| 3. **Email Notification**: Sends an email notification upon successful completion | ||
|
|
||
| ## Setup Requirements | ||
|
|
||
| ### 1. Create a Metadata Table | ||
|
|
||
| Create a metadata table in your Civis database with the following structure: | ||
|
|
||
| ```sql | ||
| CREATE TABLE your_database.your_schema.metadata_data_upload( | ||
| primary_group_id INTEGER, | ||
| schema_name VARCHAR | ||
| ); | ||
| ``` | ||
|
|
||
| Populate this table with mappings of primary group IDs to their corresponding schemas: | ||
|
|
||
| ```sql | ||
| INSERT INTO your_database.your_schema.metadata_data_upload | ||
| VALUES | ||
| (123, 'team_a_schema'), | ||
| (456, 'team_b_schema'), | ||
| (789, 'team_c_schema'); | ||
| ``` | ||
|
|
||
| ### 2. Create a Notification Script | ||
|
|
||
| Create a blank Python script in Civis Platform that will be used to send notification emails. This script doesn't need to contain any code - it's only used to trigger the email notification system. Note the script ID for use in the configuration. | ||
|
|
||
| ### 3. Set Up the Container Script in Platform | ||
|
|
||
| When setting up this script in Civis Platform, create a container script with the following configuration: | ||
|
|
||
| ```bash | ||
| cd /app; | ||
| export DATABASE='redshift-general' | ||
| export TESTING=0 | ||
| export EMAIL="" | ||
| export METADATA_TABLE="metadata_data_upload" | ||
| export EMAIL_SCRIPT_ID="340695845" | ||
| python python/custom_file_uploads/generic_upload.py | ||
| ``` | ||
|
Comment on lines
+44
to
+52
|
||
|
|
||
| **Configuration Variables:** | ||
|
|
||
| - `DATABASE`: The name of your Civis database (e.g., 'redshift-general') | ||
| - `TESTING`: Set to `1` to skip email notifications (for testing), `0` for production | ||
| - `METADATA_TABLE`: The full table name (including schema) of your metadata table | ||
| - `EMAIL_SCRIPT_ID`: The ID of the blank notification script you created in step 2 | ||
|
|
||
| ### 4. Configure Script Parameters | ||
|
|
||
| In the Civis Platform script configuration, add the following parameters: | ||
|
|
||
| - **FILE**: File parameter (required) - Users will upload their CSV file here | ||
| - **TABLE_NAME**: Dropdown or text parameter (required) - The name of the target table | ||
| - **EMAIL**: Text parameter (optional) - Email address for notification | ||
| - **TESTING**: Numeric 0/1 (optional) - Set to true to skip sending emails | ||
|
|
||
| ## Usage | ||
|
|
||
| Once configured, users can run the script by: | ||
|
|
||
| 1. Uploading a CSV file via the FILE parameter | ||
| 2. Selecting or entering the target table name via the TABLE_NAME parameter | ||
| 3. Optionally providing an email address for notification | ||
| 4. Running the script | ||
|
|
||
| The script will: | ||
| - Automatically determine the correct schema based on the user's primary group | ||
| - Drop and recreate the table with the uploaded CSV data | ||
| - Send a notification email upon completion | ||
|
|
||
| ## Example Metadata Table Setup | ||
|
|
||
| Here's a complete example for setting up your metadata table: | ||
|
|
||
| ```sql | ||
| -- Create the metadata table | ||
| CREATE TABLE your_database.your_schema.metadata_data_upload ( | ||
| primary_group_id INTEGER, | ||
| schema_name VARCHAR | ||
| ); | ||
|
|
||
| -- Add your group mappings | ||
| INSERT INTO your_database.your_schema.metadata_data_upload | ||
| (primary_group_id, schema_name) | ||
| VALUES | ||
| (123, 'analytics_team'), | ||
| (456, 'marketing_team'), | ||
| (789, 'operations_team'); | ||
| ``` | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| - **"No schema mapping found"**: Ensure your primary group ID is added to the metadata table | ||
| - **Schema creation errors**: You may need database permissions to create schemas, or ensure the schema already exists | ||
| - **Email not received**: Check that the EMAIL_SCRIPT_ID points to a valid script and TESTING is set to 0 | ||
|
|
||
| ## Notes | ||
|
|
||
| - The script will drop and recreate the table on each run, so existing data will be replaced | ||
| - Users must have appropriate database permissions to write to their assigned schema | ||
| - The metadata table must be readable by all users who will run this script | ||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,227 @@ | ||||||||||||||||||||||||||||||
| """Lightweight generalized file upload script for Civis Platform. | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| Uploads a CSV file to a database table based on user's group and | ||||||||||||||||||||||||||||||
| dropdown selection. | ||||||||||||||||||||||||||||||
| The script: | ||||||||||||||||||||||||||||||
| 1. Determines user's schema from metadata table based on primary_group_id | ||||||||||||||||||||||||||||||
| 2. Drops existing table if present and imports CSV | ||||||||||||||||||||||||||||||
| 3. Sends email notification on completion | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| In order to function correctly, the following environment variables must be | ||||||||||||||||||||||||||||||
| set: | ||||||||||||||||||||||||||||||
| - TABLE_NAME: Platform dropdown selection that maps to the table name | ||||||||||||||||||||||||||||||
| - DATABASE: Target Civis database name | ||||||||||||||||||||||||||||||
| - METADATA_TABLE: Civis table containing mapping of primary_group_id | ||||||||||||||||||||||||||||||
| to schema_name | ||||||||||||||||||||||||||||||
| - SCRIPT_ID: Civis script ID used to send notification emails | ||||||||||||||||||||||||||||||
| - this should just be a blank script configured to send success emails | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| The template script will also need to accept the following parameters: | ||||||||||||||||||||||||||||||
| - FILE: Civis file parameter | ||||||||||||||||||||||||||||||
|
Comment on lines
+16
to
+20
|
||||||||||||||||||||||||||||||
| - SCRIPT_ID: Civis script ID used to send notification emails | |
| - this should just be a blank script configured to send success emails | |
| The template script will also need to accept the following parameters: | |
| - FILE: Civis file parameter | |
| - EMAIL_SCRIPT_ID: Civis script ID used to send notification emails | |
| - this should just be a blank script configured to send success emails | |
| The template script will also need to accept the following parameters: | |
| - FILE: Civis file parameter (exposed to this script as environment variable | |
| FILE_ID) |
Copilot
AI
Mar 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tmp_path is built using file_obj["name"] directly. Since file names can be user-controlled, it's safer to sanitize it (e.g., os.path.basename(...)) to avoid unexpected path separators or traversal-like behavior when writing to disk.
| tmp_path = os.path.join(tmpdir, file_obj["name"]) | |
| safe_name = os.path.basename(file_obj["name"]) | |
| tmp_path = os.path.join(tmpdir, safe_name) |
Copilot
AI
Mar 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
download_data_create_table catches upload exceptions, logs an error, and then returns without re-raising. This means main() will still send a success email and log "Upload process completed successfully" even when the upload failed. Re-raise the exception (or return a success flag that main() checks) so failures stop the workflow and don't trigger success notifications.
| raise |
Copilot
AI
Mar 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
email_script_id = int(os.getenv("EMAIL_SCRIPT_ID")) will raise a non-obvious TypeError if the env var is missing/empty. Prefer reading via os.environ[...] (so the KeyError names the variable) or add an explicit check that raises a clear ValueError explaining how to configure EMAIL_SCRIPT_ID.
| email_script_id = int(os.getenv("EMAIL_SCRIPT_ID")) | |
| email_script_id_str = os.getenv("EMAIL_SCRIPT_ID") | |
| if not email_script_id_str: | |
| raise ValueError( | |
| "EMAIL_SCRIPT_ID environment variable must be set to a valid " | |
| "Civis Platform script ID in order to send notification emails." | |
| ) | |
| try: | |
| email_script_id = int(email_script_id_str) | |
| except (TypeError, ValueError): | |
| raise ValueError( | |
| f"EMAIL_SCRIPT_ID must be an integer Civis Platform script ID, " | |
| f"got {email_script_id_str!r}." | |
| ) |
Copilot
AI
Mar 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Patching a shared Platform script's notifications (client.scripts.patch_python3(...)) right before running it is race-prone: concurrent uploads can overwrite each other's notification settings, potentially emailing the wrong recipient/body. Consider a per-run notification mechanism that doesn't mutate shared script state (e.g., separate notification scripts per tenant, or avoid patching and instead configure notifications on the upload script itself).
Copilot
AI
Mar 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
full_table is built directly from schema (read from a table) and table_name (user-controlled parameter) and then interpolated into SQL (DROP/CREATE/INSERT paths). This allows SQL injection or accidental writes/drops outside the intended target if either value contains characters like ./;/quotes. Validate both as safe SQL identifiers (e.g., strict regex + disallow dots) and/or map dropdown values to a fixed allowlist of table names instead of using raw input.
Copilot
AI
Mar 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testing = int(os.getenv("TESTING", 0)) == 1 only works for numeric values; if the Platform boolean parameter or env var is set to "true"/"false" (a common pattern elsewhere in this repo), this will crash. Parse booleans more defensively (e.g., accept 1/0 and true/false strings).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SQL examples qualify the metadata table as
your_database.your_schema.metadata_data_upload_mmk, but the script already selects the database via theDATABASEsetting and then interpolatesMETADATA_TABLEintoFROM .... In Redshift/Postgres-style SQL you typically wantschema.tablehere; consider updating the examples (and theMETADATA_TABLEguidance) to avoid suggesting an invalid 3-part identifier.