Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
114 changes: 114 additions & 0 deletions python/custom_file_uploads/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# Generic File Upload Script

This script provides a lightweight, generalized CSV upload solution for Civis Platform. It handles uploading CSV files to database tables based on a user's primary group ID and a metadata table that defines schema mappings.

## How It Works

The script performs the following operations:

1. **Schema Determination**: Looks up the user's schema from a metadata table based on their `primary_group_id`
2. **Table Creation**: Drops the existing table if present and imports the CSV data
3. **Email Notification**: Sends an email notification upon successful completion

## Setup Requirements

### 1. Create a Metadata Table

Create a metadata table in your Civis database with the following structure:

```sql
CREATE TABLE your_database.your_schema.metadata_data_upload(
primary_group_id INTEGER,
schema_name VARCHAR
);
Comment on lines +17 to +23
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SQL examples qualify the metadata table as your_database.your_schema.metadata_data_upload_mmk, but the script already selects the database via the DATABASE setting and then interpolates METADATA_TABLE into FROM .... In Redshift/Postgres-style SQL you typically want schema.table here; consider updating the examples (and the METADATA_TABLE guidance) to avoid suggesting an invalid 3-part identifier.

Copilot uses AI. Check for mistakes.
```

Populate this table with mappings of primary group IDs to their corresponding schemas:

```sql
INSERT INTO your_database.your_schema.metadata_data_upload
VALUES
(123, 'team_a_schema'),
(456, 'team_b_schema'),
(789, 'team_c_schema');
```

### 2. Create a Notification Script

Create a blank Python script in Civis Platform that will be used to send notification emails. This script doesn't need to contain any code - it's only used to trigger the email notification system. Note the script ID for use in the configuration.

### 3. Set Up the Container Script in Platform

When setting up this script in Civis Platform, create a container script with the following configuration:

```bash
cd /app;
export DATABASE='redshift-general'
export TESTING=0
export EMAIL=""
export METADATA_TABLE="metadata_data_upload"
export EMAIL_SCRIPT_ID="340695845"
python python/custom_file_uploads/generic_upload.py
```
Comment on lines +44 to +52
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example container config exports EMAIL_RECIPIENTS, but the script reads EMAIL for the recipient address and EMAIL_SCRIPT_ID for the notification script. Update the example to use the same env var names the script actually consumes, otherwise users will configure this and never receive emails.

Copilot uses AI. Check for mistakes.

**Configuration Variables:**

- `DATABASE`: The name of your Civis database (e.g., 'redshift-general')
- `TESTING`: Set to `1` to skip email notifications (for testing), `0` for production
- `METADATA_TABLE`: The full table name (including schema) of your metadata table
- `EMAIL_SCRIPT_ID`: The ID of the blank notification script you created in step 2

### 4. Configure Script Parameters

In the Civis Platform script configuration, add the following parameters:

- **FILE**: File parameter (required) - Users will upload their CSV file here
- **TABLE_NAME**: Dropdown or text parameter (required) - The name of the target table
- **EMAIL**: Text parameter (optional) - Email address for notification
- **TESTING**: Numeric 0/1 (optional) - Set to true to skip sending emails

## Usage

Once configured, users can run the script by:

1. Uploading a CSV file via the FILE parameter
2. Selecting or entering the target table name via the TABLE_NAME parameter
3. Optionally providing an email address for notification
4. Running the script

The script will:
- Automatically determine the correct schema based on the user's primary group
- Drop and recreate the table with the uploaded CSV data
- Send a notification email upon completion

## Example Metadata Table Setup

Here's a complete example for setting up your metadata table:

```sql
-- Create the metadata table
CREATE TABLE your_database.your_schema.metadata_data_upload (
primary_group_id INTEGER,
schema_name VARCHAR
);

-- Add your group mappings
INSERT INTO your_database.your_schema.metadata_data_upload
(primary_group_id, schema_name)
VALUES
(123, 'analytics_team'),
(456, 'marketing_team'),
(789, 'operations_team');
```

## Troubleshooting

- **"No schema mapping found"**: Ensure your primary group ID is added to the metadata table
- **Schema creation errors**: You may need database permissions to create schemas, or ensure the schema already exists
- **Email not received**: Check that the EMAIL_SCRIPT_ID points to a valid script and TESTING is set to 0

## Notes

- The script will drop and recreate the table on each run, so existing data will be replaced
- Users must have appropriate database permissions to write to their assigned schema
- The metadata table must be readable by all users who will run this script
227 changes: 227 additions & 0 deletions python/custom_file_uploads/generic_upload.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,227 @@
"""Lightweight generalized file upload script for Civis Platform.

Uploads a CSV file to a database table based on user's group and
dropdown selection.
The script:
1. Determines user's schema from metadata table based on primary_group_id
2. Drops existing table if present and imports CSV
3. Sends email notification on completion

In order to function correctly, the following environment variables must be
set:
- TABLE_NAME: Platform dropdown selection that maps to the table name
- DATABASE: Target Civis database name
- METADATA_TABLE: Civis table containing mapping of primary_group_id
to schema_name
- SCRIPT_ID: Civis script ID used to send notification emails
- this should just be a blank script configured to send success emails

The template script will also need to accept the following parameters:
- FILE: Civis file parameter
Comment on lines +16 to +20
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module docstring says the notification script env var is SCRIPT_ID, but the implementation reads EMAIL_SCRIPT_ID. Update the docstring to match, and consider explicitly noting that a Platform file parameter named FILE is exposed to the script as FILE_ID (since __main__ reads FILE_ID).

Suggested change
- SCRIPT_ID: Civis script ID used to send notification emails
- this should just be a blank script configured to send success emails
The template script will also need to accept the following parameters:
- FILE: Civis file parameter
- EMAIL_SCRIPT_ID: Civis script ID used to send notification emails
- this should just be a blank script configured to send success emails
The template script will also need to accept the following parameters:
- FILE: Civis file parameter (exposed to this script as environment variable
FILE_ID)

Copilot uses AI. Check for mistakes.
- EMAIL: Optional email address for notification
- TABLE_NAME: Platform dropdown selection that maps to the table name
- TESTING: Optional boolean to indicate testing mode (skip email)
"""

import os
import tempfile

import civis
import pandas as pd

LOG = civis.loggers.civis_logger()


def get_schema(metadata_table: str, database: str, client=None) -> tuple[str, str]:
client = client or civis.APIClient()

# Get user info
user = client.users.list_me()
user_id = user["id"]
user_email = user["email"]
primary_group_id = client.users.get(user_id)["primary_group_id"]

# Lookup schema from metadata table
LOG.info(f"Looking up schema for primary_group_id {primary_group_id}")
schema_query = f"""
SELECT schema_name
FROM {metadata_table}
WHERE primary_group_id = {primary_group_id}
"""
schema_result = civis.io.read_civis_sql(
schema_query, database=database, use_pandas=True
)

if schema_result.empty:
raise ValueError(
f"No schema mapping found for primary_group_id {primary_group_id}."
f"Please add a row to {metadata_table}."
)

schema = schema_result["schema_name"].iloc[0]
LOG.info(f"Using schema: {schema}")

# Create schema if needed
LOG.info(f"Creating schema {schema} if needed")
try:
civis.io.query_civis(
f"CREATE SCHEMA IF NOT EXISTS {schema};",
database=database,
).result()
except Exception:
LOG.warning(
"""You do not have permissions to create schemas.
Script will continue and raise an error later
if the schema doesn't exist"""
)

return schema, user_email


def download_data_create_table(
file_id: int, full_table: str, database: str, client=None
):
client = client or civis.APIClient()
file_obj = client.files.get(file_id)
LOG.info(f"Downloading file: {file_obj['name']}")

with tempfile.TemporaryDirectory() as tmpdir:
tmp_path = os.path.join(tmpdir, file_obj["name"])
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tmp_path is built using file_obj["name"] directly. Since file names can be user-controlled, it's safer to sanitize it (e.g., os.path.basename(...)) to avoid unexpected path separators or traversal-like behavior when writing to disk.

Suggested change
tmp_path = os.path.join(tmpdir, file_obj["name"])
safe_name = os.path.basename(file_obj["name"])
tmp_path = os.path.join(tmpdir, safe_name)

Copilot uses AI. Check for mistakes.
civis.io.civis_to_file(file_id, tmp_path)
df = pd.read_csv(tmp_path)

LOG.info(f"Read CSV with {len(df)} rows and {len(df.columns)} columns")

# Drop existing table
LOG.info(f"Dropping table if exists: {full_table}")
civis.io.query_civis(
f"DROP TABLE IF EXISTS {full_table};",
database=database,
).result()

# Import data to table
LOG.info(f"Importing data to {full_table}")
try:
civis.io.dataframe_to_civis(
df=df,
database=database,
table=full_table,
existing_table_rows="fail",
)
LOG.info(f"Successfully uploaded {len(df)} rows to {full_table}")
return "scuccess"
except Exception as e:
LOG.error(
f"""Failed to upload data to {full_table}: {e}
Please check that the schema exists and you have
permissions to write to it."""
)
return "failure"


Comment on lines +120 to +121
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

download_data_create_table catches upload exceptions, logs an error, and then returns without re-raising. This means main() will still send a success email and log "Upload process completed successfully" even when the upload failed. Re-raise the exception (or return a success flag that main() checks) so failures stop the workflow and don't trigger success notifications.

Suggested change
raise

Copilot uses AI. Check for mistakes.
def send_email_notification(
email_address: str,
table_name: str,
schema: str,
full_table: str,
database: str,
user_email: str,
file_obj: dict,
testing: bool = False,
client=None,
):
client = client or civis.APIClient()
recipient_email = email_address if email_address else user_email
email_subject = "Data Upload Complete"
email_body = f"""Your data upload has been completed successfully.

File: {file_obj['name']}
Database: {database}
Schema: {schema}
Table: {table_name}
User: {user_email}

The data is now available at: {database}.{full_table}
"""

if not testing:
LOG.info(f"Sending notification email to {recipient_email}")
# Use a blank script on platform to trigger email notification
# NOTE: This requires an existing script ID that you can configure
# to send success notification emails
email_script_id = int(os.getenv("EMAIL_SCRIPT_ID"))
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

email_script_id = int(os.getenv("EMAIL_SCRIPT_ID")) will raise a non-obvious TypeError if the env var is missing/empty. Prefer reading via os.environ[...] (so the KeyError names the variable) or add an explicit check that raises a clear ValueError explaining how to configure EMAIL_SCRIPT_ID.

Suggested change
email_script_id = int(os.getenv("EMAIL_SCRIPT_ID"))
email_script_id_str = os.getenv("EMAIL_SCRIPT_ID")
if not email_script_id_str:
raise ValueError(
"EMAIL_SCRIPT_ID environment variable must be set to a valid "
"Civis Platform script ID in order to send notification emails."
)
try:
email_script_id = int(email_script_id_str)
except (TypeError, ValueError):
raise ValueError(
f"EMAIL_SCRIPT_ID must be an integer Civis Platform script ID, "
f"got {email_script_id_str!r}."
)

Copilot uses AI. Check for mistakes.

client.scripts.patch_python3(
id=email_script_id,
name=f"Upload notification for {user_email}",
notifications={
"success_email_subject": email_subject,
"success_email_body": email_body,
"success_email_addresses": [recipient_email],
},
)
civis.utils.run_job(email_script_id, client=client).result()
Comment on lines +154 to +163
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patching a shared Platform script's notifications (client.scripts.patch_python3(...)) right before running it is race-prone: concurrent uploads can overwrite each other's notification settings, potentially emailing the wrong recipient/body. Consider a per-run notification mechanism that doesn't mutate shared script state (e.g., separate notification scripts per tenant, or avoid patching and instead configure notifications on the upload script itself).

Copilot uses AI. Check for mistakes.
else:
LOG.info(f"Testing mode: skipping email to {recipient_email}")


def main(
file_id: int,
table_name: str,
database: str,
metadata_table: str,
email_address: str = None,
testing: bool = False,
):
"""Main function to upload CSV file to Civis database table."""

client = civis.APIClient()

schema, user_email = get_schema(
metadata_table=metadata_table,
database=database,
client=client,
)

full_table = f"{schema}.{table_name}"
LOG.info(f"Target table: {full_table}")
Comment on lines +186 to +187
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

full_table is built directly from schema (read from a table) and table_name (user-controlled parameter) and then interpolated into SQL (DROP/CREATE/INSERT paths). This allows SQL injection or accidental writes/drops outside the intended target if either value contains characters like ./;/quotes. Validate both as safe SQL identifiers (e.g., strict regex + disallow dots) and/or map dropdown values to a fixed allowlist of table names instead of using raw input.

Copilot uses AI. Check for mistakes.

status = download_data_create_table(
file_id=file_id,
full_table=full_table,
database=database,
client=client,
)

if status == "success":
send_email_notification(
email_address=email_address,
table_name=table_name,
schema=schema,
full_table=full_table,
database=database,
user_email=user_email,
file_obj=client.files.get(file_id),
testing=testing,
client=client,
)

LOG.info("Upload process completed successfully")


if __name__ == "__main__":
# Get environment variables
file_id = int(os.environ["FILE_ID"])
table_name = os.environ["TABLE_NAME"]
database = os.environ["DATABASE"]
metadata_table = os.environ["METADATA_TABLE"]
email_address = os.getenv("EMAIL")
testing = int(os.getenv("TESTING", 0)) == 1
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testing = int(os.getenv("TESTING", 0)) == 1 only works for numeric values; if the Platform boolean parameter or env var is set to "true"/"false" (a common pattern elsewhere in this repo), this will crash. Parse booleans more defensively (e.g., accept 1/0 and true/false strings).

Copilot uses AI. Check for mistakes.
main(
file_id=file_id,
table_name=table_name,
database=database,
metadata_table=metadata_table,
email_address=email_address,
testing=testing,
)