Add mongosync_insights to repository#24
Conversation
| Returns the full snapshot dict (including template_data) or None. | ||
| """ | ||
| path = _snapshot_path(snapshot_id) | ||
| if not os.path.exists(path): |
| return None | ||
|
|
||
| try: | ||
| with open(path, 'r', encoding='utf-8') as f: |
|
|
||
| # Refresh mtime on snapshot file | ||
| try: | ||
| os.utime(path, None) |
| pass | ||
|
|
||
| meta_path = _snapshot_meta_path(snapshot_id) | ||
| if os.path.exists(meta_path): |
| meta_path = _snapshot_meta_path(snapshot_id) | ||
| if os.path.exists(meta_path): | ||
| try: | ||
| os.utime(meta_path, None) |
|
|
||
| if os.path.exists(meta_path): | ||
| try: | ||
| os.remove(meta_path) |
|
|
||
| response.set_cookie( | ||
| SESSION_COOKIE_NAME, | ||
| session_id, |
|
|
||
| response.set_cookie( | ||
| SESSION_COOKIE_NAME, | ||
| session_id, |
| ), 400 | ||
|
|
||
| db_name = (session_data or {}).get("verifier_db_name", "migration_verification_metadata") | ||
| return jsonify(gatherVerifierMetrics(connection_string, db_name)) |
| return jsonify(result) | ||
| except Exception as e: | ||
| logger.error("Log search error: %s", e) | ||
| return jsonify({"error": "Search failed", "detail": str(e)}), 500 |
| # Make HTTP GET request to the endpoint | ||
| url = f"http://{endpoint_url}" | ||
| logger.info(f"Fetching data from endpoint: {url}") | ||
| response = requests.get(url, timeout=10) |
There was a problem hiding this comment.
Semgrep identified an issue in your code:
requests.get fetches url over plain HTTP, so anyone on the network can read or alter the endpoint response.
More details about this
requests.get(url, timeout=10) sends data to the url built as f"http://{endpoint_url}", so this call always uses plain HTTP instead of encrypted HTTPS.
A plausible attack is:
- A user runs this code on a shared office, cloud, or coffee-shop network, and
endpoint_urlpoints to a real service such asapi.internal.example.com/status. - Because
urlstarts withhttp://, the request fromrequests.get(...)is sent in cleartext. An attacker on the same network can intercept it with tools liketcpdumpormitmproxy. - The attacker can read the full request and response, including any headers, tokens, cookies, or operational data this endpoint returns in
data = response.json(). - The attacker can also tamper with the response before your code parses it. That lets them inject fake JSON into
data, which then flows intoprogress,warnings_list, and the dashboard values shown to users.
Example interception on the same network:
sudo tcpdump -A -i wlan0 'tcp port 80 and host api.internal.example.com'If this request includes sensitive headers or the endpoint returns internal replication state, those values are exposed because the url is fetched over HTTP.
Dataflow graph
flowchart LR
classDef invis fill:white, stroke: none
classDef default fill:#e7f5ff, color:#1c7fd6, stroke: none
subgraph File0["<b>mongosync_insights/lib/live_migration_metrics.py</b>"]
direction LR
%% Source
subgraph Source
direction LR
v0["<a href=https://github.com/mongodb/kb-assets/blob/b185be682d0b9442bd10bf49cedcee7d23ed754f/mongosync_insights/lib/live_migration_metrics.py#L561 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 561] http://</a>"]
end
%% Intermediate
subgraph Traces0[Traces]
direction TB
v2["<a href=https://github.com/mongodb/kb-assets/blob/b185be682d0b9442bd10bf49cedcee7d23ed754f/mongosync_insights/lib/live_migration_metrics.py#L561 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 561] url</a>"]
end
%% Sink
subgraph Sink
direction LR
v1["<a href=https://github.com/mongodb/kb-assets/blob/b185be682d0b9442bd10bf49cedcee7d23ed754f/mongosync_insights/lib/live_migration_metrics.py#L563 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 563] url</a>"]
end
end
%% Class Assignment
Source:::invis
Sink:::invis
Traces0:::invis
File0:::invis
%% Connections
Source --> Traces0
Traces0 --> Sink
To resolve this comment:
✨ Commit fix suggestion
- Change the URL construction to use
https://instead ofhttp://, for exampleurl = f"https://{endpoint_url}". - Keep the
requests.getcall the same, but point it at the new HTTPS URL:response = requests.get(url, timeout=10). - If
endpoint_urlcan already include a scheme, normalize it before building the request so you do not accidentally keep an insecure URL. For example, replace a leadinghttp://withhttps://, or parse it and rebuild it with thehttpsscheme. - If the endpoint does not support HTTPS yet, update the service configuration or caller input so
endpoint_urlrefers to an HTTPS-enabled endpoint such asexample.com:443instead of a plain HTTP listener. - Manually verify that the request still succeeds against the target endpoint and that certificate errors do not occur. If you see TLS verification failures, fix the server certificate or trust configuration rather than disabling verification with
verify=False.
💬 Ignore this finding
Reply with Semgrep commands to ignore this finding.
/fp <comment>for false positive/ar <comment>for acceptable risk/other <comment>for all other reasons
Alternatively, triage in Semgrep AppSec Platform to ignore the finding created by request-with-http.
🛟 Help? Slack #semgrep-help or go/semgrep-help.
Resolution Options:
- Fix the code
- Reply
/fp $reason(if security gap doesn’t exist) - Reply
/ar $reason(if gap is valid but intentional; add mitigations/monitoring) - Reply
/other $reason(e.g., test-only)
You can view more details about this finding in the Semgrep AppSec Platform.
| ) | ||
| response.headers["X-XSS-Protection"] = "1; mode=block" | ||
| response.headers["Permissions-Policy"] = "geolocation=(), microphone=(), camera=()" | ||
| return response |
There was a problem hiding this comment.
Semgrep identified an issue in your code:
add_security_headers sets several security headers but omits X-Permitted-Cross-Domain-Policies before returning the Flask response. This can leave older Flash/Silverlight cross-domain access to your origin less restricted than intended.
More details about this
add_security_headers(response) adds several browser security headers to every Flask response, but it returns response without setting response.headers["X-Permitted-Cross-Domain-Policies"].
A plausible misuse is: 1) an attacker hosts a malicious Flash or Silverlight file on another site, 2) that content tries to load data from this app's origin, such as pages returned by hub() or files served by mi_static_js(filename), 3) because add_security_headers does not send X-Permitted-Cross-Domain-Policies, older Adobe/Microsoft cross-domain policy behavior is not explicitly locked down, so the plugin may rely on permissive defaults or other policy files, 4) the attacker then reads or embeds information from your domain through that plugin content.
The issue is specifically at the end of add_security_headers: the function sets Strict-Transport-Security, X-Content-Type-Options, X-Frame-Options, Referrer-Policy, Content-Security-Policy, and Permissions-Policy, then return response without adding X-Permitted-Cross-Domain-Policies.
Dataflow graph
flowchart LR
classDef invis fill:white, stroke: none
classDef default fill:#e7f5ff, color:#1c7fd6, stroke: none
subgraph File0["<b>mongosync_insights/mongosync_insights.py</b>"]
direction LR
%% Source
subgraph Source
direction LR
v0["<a href=https://github.com/mongodb/kb-assets/blob/b185be682d0b9442bd10bf49cedcee7d23ed754f/mongosync_insights/mongosync_insights.py#L64 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 64] response.headers["Strict-Transport-Security"] = "max-age=31536000; includeSubDomains"</a>"]
end
%% Intermediate
%% Sink
subgraph Sink
direction LR
v1["<a href=https://github.com/mongodb/kb-assets/blob/b185be682d0b9442bd10bf49cedcee7d23ed754f/mongosync_insights/mongosync_insights.py#L78 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 78] return response</a>"]
end
end
%% Class Assignment
Source:::invis
Sink:::invis
File0:::invis
%% Connections
Source --> Sink
To resolve this comment:
✨ Commit fix suggestion
| return response | |
| response.headers["X-Permitted-Cross-Domain-Policies"] = "none" | |
| return response |
View step-by-step instructions
- Update the
add_security_headersfunction to set the missingX-Permitted-Cross-Domain-Policiesheader on every response. - Add a header assignment before
return response, for example:response.headers["X-Permitted-Cross-Domain-Policies"] = "none". - Use
"none"unless the application must explicitly allow Adobe cross-domain policy files. This tells clients not to load cross-domain policies from your site. - Keep this header in the existing
@app.after_requesthandler so it is applied consistently to all routes.
Alternatively, if you intentionally serve a valid cross-domain policy file for legacy Flash or Silverlight integrations, set the header to the narrowest value that still works, such as response.headers["X-Permitted-Cross-Domain-Policies"] = "master-only" instead of a broader policy.
💬 Ignore this finding
Reply with Semgrep commands to ignore this finding.
/fp <comment>for false positive/ar <comment>for acceptable risk/other <comment>for all other reasons
Alternatively, triage in Semgrep AppSec Platform to ignore the finding created by after-request-permitted-cross-domain-policies.
🛟 Help? Slack #semgrep-help or go/semgrep-help.
Resolution Options:
- Fix the code
- Reply
/fp $reason(if security gap doesn’t exist) - Reply
/ar $reason(if gap is valid but intentional; add mitigations/monitoring) - Reply
/other $reason(e.g., test-only)
You can view more details about this finding in the Semgrep AppSec Platform.
| ) | ||
| response.headers["X-XSS-Protection"] = "1; mode=block" | ||
| response.headers["Permissions-Policy"] = "geolocation=(), microphone=(), camera=()" | ||
| return response |
There was a problem hiding this comment.
Semgrep identified an issue in your code:
add_security_headers(response) sets several security headers but leaves caching unspecified, so authenticated responses may be stored and later exposed from browser or proxy caches.
More details about this
add_security_headers(response) adds headers like Strict-Transport-Security, X-Frame-Options, and Content-Security-Policy, but it returns the same response without setting any Cache-Control header. If /, /logout, or an error page later includes user-specific data, a shared browser, disk cache, or proxy can store that authenticated response and show it to the next person who uses the machine or intermediary.
Plausible exploit path:
- A victim signs in and requests a page that returns sensitive content through Flask, and
add_security_headers(response)runs on thatresponse. - Because
response.headersnever gets aCache-Controlvalue, the browser or an intermediate cache is free to keep a copy of that page. - On a shared kiosk, VDI, or corporate proxy, an attacker opens the same URL after the victim logs out and the cached
responseis reused. - The attacker can read whatever the original
responsecontained, such as account details, internal app data, or tokens embedded in the page, without needing the victim’s current session.
The risky part here is not the existing security headers themselves; it is that this global @app.after_request handler applies to every returned response but leaves caching behavior unspecified.
Dataflow graph
flowchart LR
classDef invis fill:white, stroke: none
classDef default fill:#e7f5ff, color:#1c7fd6, stroke: none
subgraph File0["<b>mongosync_insights/mongosync_insights.py</b>"]
direction LR
%% Source
subgraph Source
direction LR
v0["<a href=https://github.com/mongodb/kb-assets/blob/b185be682d0b9442bd10bf49cedcee7d23ed754f/mongosync_insights/mongosync_insights.py#L64 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 64] response.headers["Strict-Transport-Security"] = "max-age=31536000; includeSubDomains"</a>"]
end
%% Intermediate
%% Sink
subgraph Sink
direction LR
v1["<a href=https://github.com/mongodb/kb-assets/blob/b185be682d0b9442bd10bf49cedcee7d23ed754f/mongosync_insights/mongosync_insights.py#L78 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 78] return response</a>"]
end
end
%% Class Assignment
Source:::invis
Sink:::invis
File0:::invis
%% Connections
Source --> Sink
To resolve this comment:
✨ Commit fix suggestion
| return response | |
| @app.after_request | |
| def add_security_headers(response): | |
| response.headers["Strict-Transport-Security"] = "max-age=31536000; includeSubDomains" | |
| response.headers["X-Content-Type-Options"] = "nosniff" | |
| response.headers["X-Frame-Options"] = "DENY" | |
| response.headers["Referrer-Policy"] = "no-referrer" | |
| response.headers["Content-Security-Policy"] = ( | |
| "default-src 'self'; " | |
| "script-src 'self' 'unsafe-inline' 'unsafe-eval' https://cdn.plot.ly; " | |
| "style-src 'self' 'unsafe-inline'; " | |
| "img-src 'self' data: blob: https:; " | |
| "font-src 'self' data:; " | |
| "connect-src 'self' blob:;" | |
| ) | |
| response.headers["X-XSS-Protection"] = "1; mode=block" | |
| response.headers["Permissions-Policy"] = "geolocation=(), microphone=(), camera=()" | |
| # Keep static/public assets cacheable while preventing caching of pages that may | |
| # contain authenticated, session-bound, or user-specific content. | |
| if request.path.startswith("/static/") or request.path.startswith("/images/"): | |
| response.headers["Cache-Control"] = "public, max-age=31536000, immutable" | |
| else: | |
| response.headers["Cache-Control"] = "no-store, no-cache, must-revalidate, private" | |
| response.headers["Pragma"] = "no-cache" | |
| response.headers["Expires"] = "0" | |
| return response |
View step-by-step instructions
- Update the
add_security_headers@app.after_requesthandler to also set aCache-Controlheader on responses. - Use a restrictive value for authenticated or user-specific pages, for example
response.headers["Cache-Control"] = "no-store, no-cache, must-revalidate, private". - Add a legacy compatibility header if you need to support older HTTP/1.0 caches:
response.headers["Pragma"] = "no-cache". - Add an expiration header to prevent stale cached copies, for example
response.headers["Expires"] = "0". This makes browsers and proxies avoid storing sensitive responses. - Apply the strict cache headers at least to routes that return authenticated content or anything tied to a session cookie, such as pages reached after login, logout responses, and any page that shows user-specific data.
- Alternatively, if some routes serve static or fully public content and you want them to remain cacheable, set
Cache-Controlconditionally inadd_security_headers, such as checkingrequest.pathand only usingno-store, no-cache, must-revalidate, privatefor sensitive routes while leaving public assets with a separate cache policy.
💬 Ignore this finding
Reply with Semgrep commands to ignore this finding.
/fp <comment>for false positive/ar <comment>for acceptable risk/other <comment>for all other reasons
Alternatively, triage in Semgrep AppSec Platform to ignore the finding created by after-request-cache-control.
🛟 Help? Slack #semgrep-help or go/semgrep-help.
Resolution Options:
- Fix the code
- Reply
/fp $reason(if security gap doesn’t exist) - Reply
/ar $reason(if gap is valid but intentional; add mitigations/monitoring) - Reply
/other $reason(e.g., test-only)
You can view more details about this finding in the Semgrep AppSec Platform.
|
Semgrep found 1 Detected a logger that logs user input without properly neutralizing the output. The log message could contain characters like View Dataflow Graphflowchart LR
classDef invis fill:white, stroke: none
classDef default fill:#e7f5ff, color:#1c7fd6, stroke: none
subgraph File0["<b>mongosync_insights/lib/logs_metrics.py</b>"]
direction LR
%% Source
subgraph Source
direction LR
v0["<a href=https://github.com/mongodb/kb-assets/blob/b185be682d0b9442bd10bf49cedcee7d23ed754f/mongosync_insights/lib/logs_metrics.py#L64 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 64] request.files</a>"]
end
%% Intermediate
subgraph Traces0[Traces]
direction TB
v2["<a href=https://github.com/mongodb/kb-assets/blob/b185be682d0b9442bd10bf49cedcee7d23ed754f/mongosync_insights/lib/logs_metrics.py#L64 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 64] file</a>"]
v3["<a href=https://github.com/mongodb/kb-assets/blob/b185be682d0b9442bd10bf49cedcee7d23ed754f/mongosync_insights/lib/logs_metrics.py#L76 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 76] filename</a>"]
v4["<a href=https://github.com/mongodb/kb-assets/blob/b185be682d0b9442bd10bf49cedcee7d23ed754f/mongosync_insights/lib/logs_metrics.py#L107 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 107] detect_mime_type</a>"]
v5["<a href=https://github.com/mongodb/kb-assets/blob/b185be682d0b9442bd10bf49cedcee7d23ed754f/mongosync_insights/lib/logs_metrics.py#L29 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 29] filename</a>"]
v6["<a href=https://github.com/mongodb/kb-assets/blob/b185be682d0b9442bd10bf49cedcee7d23ed754f/mongosync_insights/lib/logs_metrics.py#L43 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 43] mime_type</a>"]
v7["<a href=https://github.com/mongodb/kb-assets/blob/b185be682d0b9442bd10bf49cedcee7d23ed754f/mongosync_insights/lib/logs_metrics.py#L107 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 107] file_mime_type</a>"]
end
v2 --> v3
v3 --> v4
v4 --> v5
v5 --> v6
v6 --> v7
%% Sink
subgraph Sink
direction LR
v1["<a href=https://github.com/mongodb/kb-assets/blob/b185be682d0b9442bd10bf49cedcee7d23ed754f/mongosync_insights/lib/logs_metrics.py#L113 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 113] f"Invalid MIME type: {file_mime_type}. Allowed: {ALLOWED_MIME_TYPES}"</a>"]
end
end
%% Class Assignment
Source:::invis
Sink:::invis
Traces0:::invis
File0:::invis
%% Connections
Source --> Traces0
Traces0 --> Sink
🛟 Help? Slack #semgrep-help or go/semgrep-help. Resolution Options:
|
| --license "Apache-2.0" \ | ||
| --vendor "MongoDB Support" \ | ||
| --description "Mongosync Insights — MongoDB migration monitoring dashboard" \ | ||
| --url "https://github.com/mongodb/support-tools" \ |
There was a problem hiding this comment.
this build script will pull the code from support-tools repo.
No description provided.