Provides tooling to explore and analyze ionos-loop data using DuckDB.
This project downloads WordPress instance data from S3, converts it to DuckDB format, and generates comprehensive reports with analytics and visualizations.
- bash - Shell scripting environment
- docker - For running DuckDB and AWS CLI (no local installation needed)
- pnpm - Package manager for Node.js dependencies
- uv (optional) - Required for DuckDB MCP server integration with AI assistants
DuckDB runs in Docker, so you don't need to install it locally.
-
Install dependencies:
pnpm install
-
Configure AWS credentials:
- Create a
.secretsfile (see*.exampletemplates for format) - Obtain credentials from https://dcd.ionos.com/latest/#/key-management
- Log in using the "Loop User"
- Create a
-
Download Loop data from S3:
pnpm download-loop-data-s3
-
Generate the report and database:
pnpm generate-report
-
(Optional) Install MCP server support:
curl -LsSf https://astral.sh/uv/install.sh | sh
Generate fresh reports:
pnpm download-loop-data-s3 # Get latest data from S3
pnpm generate-report # Generate new reportExplore data interactively:
pnpm start-report-ui # Opens browser-based DuckDB UIThis project provides the following npm scripts (defined in package.json):
Purpose: Downloads Loop data from S3 bucket to local ./s3 directory
Script: ./scripts/download-loop-data-s3.sh
Prerequisites:
- Docker installed and running
- AWS credentials configured in
.secretsfile
What it does:
- Syncs all JSON files from the S3
loopbucket to./s3/ - Excludes the
duckdb/*directory from sync - Adjusts file permissions for local access
Dependencies: None (first step in the workflow)
Output: JSON files in ./s3/ directory
Example:
pnpm download-loop-data-s3Purpose: Generates the report with database and markdown output (PDF optional)
Script: ./scripts/generate-report.sh
Prerequisites:
- Loop data must exist in
./s3directory (runpnpm download-loop-data-s3first) - Docker installed and running
What it does:
- Validates that
./s3directory contains JSON files - Converts JSON data to Parquet format (
generate-report.parquet) - Creates DuckDB database (
generate-report.db) - Sets up tables:
loop_items,plugins,themes,events - Creates views:
recent_loops(latest loop per instance) - Executes all report parts in
./scripts/generate-report-parts/(alphabetical order) - Generates markdown report (
generate-report.md) - Formats markdown with Prettier
- (Optional) Converts to PDF format (
generate-report.pdf) if--pdfflag is used
Dependencies: Requires data from pnpm download-loop-data-s3
Output: 3-4 files in ./generate-report/ directory:
generate-report.parquet- Parquet file with all loop datagenerate-report.db- DuckDB databasegenerate-report.md- Markdown report with mermaid chartsgenerate-report.pdf- PDF version of the report (only with--pdfflag)
Command-line options:
--verbose- Enable verbose output showing which files are being processed (disabled by default)--dry-run- Show which scripts would run without executing them--pdf- Generate PDF output in addition to markdown (disabled by default)--help- Show help message with usage examples
Examples:
# Generate report (markdown only, no verbose output)
pnpm generate-report
# Generate report with PDF output
pnpm generate-report --pdf
# Generate report with verbose output
pnpm generate-report --verbose
# Generate report with both PDF and verbose output
pnpm generate-report --verbose --pdf
# Dry run to see which scripts would execute
pnpm generate-report --dry-run
# Generate only specific report parts (see Advanced Usage section)
pnpm generate-report '050*' '060*'
# Generate specific parts with PDF output
pnpm generate-report --pdf '050*' '060*'Purpose: Starts the browser-based DuckDB UI for interactive data exploration
Script: ./scripts/start-report-ui.sh
Prerequisites:
- Docker installed and running
- Optionally, run
pnpm generate-reportfirst to create the database
What it does:
- Checks if
./generate-report/generate-report.dbexists - If database doesn't exist, automatically runs
pnpm generate-report - Starts DuckDB with web UI on http://localhost (default port)
- Mounts the database in read-write mode for exploration
Dependencies:
- Auto-generates database if not present (calls
pnpm generate-report) - If database exists: None (can run independently)
Output: Interactive browser UI at http://localhost
Example:
pnpm start-report-uiPurpose: Analyzes plugins that use .htaccess, php.ini, or WordPress drop-ins
Script: ./snippets/plugins_using_htaccess_phpini_or_dropins.sh snippets/500-most-active-plugins.csv
Prerequisites:
- CSV file with plugin list (
snippets/500-most-active-plugins.csv)
What it does:
- Analyzes plugins for specific file usage patterns
- Useful for identifying plugins with special server requirements
Dependencies: None (standalone utility script)
Example:
pnpm detect-plugins-using-htaccess-phpini-or-dropinsInitial Setup:
1. pnpm install
2. Configure .secrets file
3. pnpm download-loop-data-s3 → downloads JSON files to ./s3
4. pnpm generate-report → requires ./s3 data
→ generates ./generate-report/* files
Interactive Exploration:
pnpm start-report-ui → requires ./generate-report/generate-report.db
→ auto-runs pnpm generate-report if needed
Utility Scripts:
pnpm detect-plugins-using-htaccess-phpini-or-dropins → standalone
S3 Bucket (loop)
↓
[pnpm download-loop-data-s3]
↓
./s3/*.json (Raw Loop Data)
↓
[pnpm generate-report]
↓
./generate-report/
├── generate-report.parquet (Converted data)
├── generate-report.db (DuckDB database)
├── generate-report.md (Markdown report)
└── generate-report.pdf (PDF report)
↓
[pnpm start-report-ui] → Interactive UI
The report parts are located in ./scripts/generate-report-parts/. They will be executed in alphabetical order by the generate-report script.
Each script located in
./scripts/generate-report-partsmust be executable. Make it executable usingchmod +x ...(or copy an existing executable one).
The output of all scripts will be collected together into ./generate-report/generate-report.md (and displayed in the terminal).
-
Choose where the new insight should appear (remember: alphabetical order)
Example: To insert between
050-php_versions.shand060-most-active-plugins.sh, name your script055-my-new-insight.shTo append as the last part:
<last number + 1>-my-new-insight.shPosition your report part after
030-generate-report-header.shwhich creates the markdown report header. -
Make the script executable:
chmod +x ./scripts/generate-report-parts/055-my-new-insight.sh
-
Start with this skeleton:
#!/usr/bin/env bash # # generates markdown output for my new insight # readonly SQL=" -- select the 3 first rows of the loop_items table SELECT * from loop_items LIMIT 3; " readonly TITLE="My new insight" cat <<EOF # $TITLE $(ionos.loop-duckdb.exec_duckdb "$SQL" '-markdown') EOF
This skeleton will query the first 3 rows of table
loop_itemsand output the results as a markdown table.Table
loop_itemscontains all collected loop items -
Test the report generation:
pnpm generate-report
The output should contain your script at the desired location.
-
Develop your SQL query:
- Look at the loop JSON data files to understand the data structure
- Open the DuckDB UI using
pnpm start-report-uiand explore the prepopulated tables - Pro tip: Copy table structure and sample data into an AI assistant and ask it to generate the query
Example prompt for AI:
You are a DuckDB expert. I have a table `[paste table name here]` with the following structure: [paste table structure here] Here are some sample data in csv format: [paste the csv sample table data here] [your question here]- Check if the generated statement looks correct
- Test it in the DuckDB UI
- If it works, add it to your script's SQL variable and regenerate the report
- If not, ask the AI to improve the query
Have a look at the existing report parts to see how to output tables and charts.
To test only your specific report part during development:
# Test with dry-run to see what would execute (without verbose output)
pnpm generate-report --dry-run '055-my-new-insight.sh'
# Test with verbose and dry-run to see detailed execution info
pnpm generate-report --verbose --dry-run '055-my-new-insight.sh'
# Run only your report part
pnpm generate-report '055-my-new-insight.sh'
# Run your report part with verbose output to see processing details
pnpm generate-report --verbose '055-my-new-insight.sh'
# Run your report part and generate PDF
pnpm generate-report --pdf '055-my-new-insight.sh'The generate-report script supports wildcard filtering to run only specific report parts. You can pass one or more wildcard patterns as arguments:
Examples:
Run only your own report script called 200-my-own-insight.sh:
./scripts/generate-report.sh 200-my-own-insight.shRun all scripts starting with 200-:
./scripts/generate-report.sh '200*'Run multiple specific parts using wildcards:
pnpm generate-report 030* 050* '*nba*' 140-security-settingsThis will execute only:
- Scripts matching
030*(e.g.,030-generate-report-header.sh) - Scripts matching
050*(e.g.,050-php_versions.sh) - Scripts containing
nbaanywhere in the name (e.g.,110-nbas-dismissed.sh) - The exact script
140-security-settings.sh
You can combine options with filters:
# Dry run with verbose output
./scripts/generate-report.sh --verbose --dry-run '200*'
# Generate with PDF output
./scripts/generate-report.sh --pdf '200*'
# All options together
./scripts/generate-report.sh --verbose --pdf '200*'Option 1: Using wildcard filtering (recommended)
Run only the parts you want by specifying them explicitly. For example, to skip 050-php_versions.sh, run all other parts:
# First, check what would run (use --verbose to see detailed list)
pnpm generate-report --dry-run
# With verbose output to see which scripts match
pnpm generate-report --verbose --dry-run
# Run everything except 050* by listing all the parts you want
pnpm generate-report 010* 020* 030* 040* 060* 070* 080* 090* 1*Option 2: Temporarily disable execution
Make the script non-executable:
chmod -x ./scripts/generate-report-parts/050-php_versions.shTo re-enable it later:
chmod +x ./scripts/generate-report-parts/050-php_versions.shOption 3: Early exit in the script
Insert exit 0 at the beginning of the script (after the shebang):
#!/usr/bin/env bash
exit 0
# Rest of the script...The pnpm generate-report command generates a DuckDB database.
Use pnpm start-report-ui to start DuckDB with the ./generate-report/generate-report.db database generated by pnpm generate-report. You can open the browser-based user interface to explore and query the collected loop data.
The UI allows you to:
- Browse tables and views
- Execute SQL queries interactively
- Export results in various formats
- Inspect table schemas and data
The project provides comprehensive AI integration through MCP servers and skills, enabling AI assistants to query data and create report insights.
See docs/ai-integration.md for the complete AI integration guide, including:
- Available MCP servers and how to use them
- Skills for querying data and creating insights
- Example prompts for common tasks
- Workflow examples and troubleshooting
The project includes MCP (Model Context Protocol) server integration for DuckDB, allowing AI assistants to query the report database directly.
Install uv (required for the MCP server):
curl -LsSf https://astral.sh/uv/install.sh | shThe MCP server is configured in .mcp.json and uses the readonly database at generate-report/generate-report.db.
The DuckDB skill is documented in docs/skills/duckdb:
- SKILL.md - Overview and database schema
- examples.md - Query examples from the project
- reference.md - DuckDB SQL reference and documentation links
AI assistants (Claude Code, Gemini CLI, VS Code Copilot) can use the MCP server to execute queries against the report database.
loop-duckdb/
├── .mcp.json # MCP server configuration
├── .secrets # AWS credentials (create this, not in git)
├── package.json # npm scripts and dependencies
├── README.md # This file
├── docs/
│ ├── ai-integration.md # AI integration guide (MCP + skills)
│ └── skills/ # AI assistant skills
│ ├── duckdb/ # DuckDB skill documentation
│ │ ├── SKILL.md # Overview and schema
│ │ ├── examples.md # Query examples
│ │ └── reference.md # DuckDB reference
│ └── report-insight/ # Report insight skill documentation
│ ├── SKILL.md # Creating insights guide
│ ├── examples.md # Working examples
│ └── reference.md # Pattern library
├── scripts/
│ ├── download-loop-data-s3.sh # Downloads JSON data from S3
│ ├── generate-report.sh # Main report generation script
│ ├── start-report-ui.sh # Starts DuckDB UI
│ └── generate-report-parts/ # Individual report sections
│ ├── 010-generate-parquet-file.sh
│ ├── 020-create-duckdb-database.sh
│ ├── 030-generate-report-header.sh
│ ├── 040-wordpress_versions.sh
│ ├── 050-php_versions.sh
│ ├── 060-most-active-plugins.sh
│ ├── 070-most-active-themes.sh
│ ├── 080-logins.sh
│ ├── 090-extendify.sh
│ ├── 100-getting-started-finished.sh
│ ├── 110-nbas-dismissed.sh
│ ├── 120-nbas-completed-dismissed.sh
│ ├── 130-quicklinks-usage.sh
│ ├── 140-security-settings.sh
│ ├── 150-extendify-onboarding-status.sh
│ ├── 160-maintenance-status.sh
│ └── 170-mcp-enabled.sh
├── snippets/
│ └── plugins_using_htaccess_phpini_or_dropins.sh # Plugin analysis utility
├── s3/ # Downloaded Loop data (git-ignored)
│ └── *.json
└── generate-report/ # Generated output (git-ignored)
├── generate-report.parquet
├── generate-report.db
├── generate-report.md
└── generate-report.pdf
This section explains the purpose of each top-level directory in the project.
Purpose: Configuration for Claude Code AI assistant
Contents:
CLAUDE.md- Project instructions for Claudesettings.json- Claude-specific settingssettings.local.json- Local overrides (not in git)
When to modify: When customizing Claude AI assistant behavior for this project
Purpose: Configuration for Google Gemini CLI AI assistant
Contents:
GEMINI.md- Project instructions for Geminisettings.json- Gemini-specific settings- Skills are symlinked from
docs/skills/
When to modify: When customizing Gemini AI assistant behavior for this project
Purpose: VS Code workspace settings
Contents: Editor-specific configuration and preferences
When to modify: When customizing VS Code behavior for this project
Purpose: Project documentation and AI assistant skills
Contents:
ai-integration.md- Comprehensive AI integration guide (MCP servers + skills)skills/- AI assistant skills for querying and creating insightsduckdb/- DuckDB skill documentation for AI assistantsSKILL.md- Overview and database schemaexamples.md- Query examples from the projectreference.md- DuckDB SQL reference
report-insight/- Report insight skill for creating analyticsSKILL.md- Comprehensive guide for creating insightsexamples.md- Working examples from the projectreference.md- Pattern library and techniques
When to modify: When adding new documentation or AI skills
Purpose: Main automation scripts for the project
Contents:
download-loop-data-s3.sh- Downloads JSON data from S3generate-report.sh- Main report generation scriptstart-report-ui.sh- Starts DuckDB interactive UIgenerate-report-parts/- Modular report section generators010-generate-parquet-file.sh- Converts JSON to Parquet020-create-duckdb-database.sh- Creates database schema030-generate-report-header.sh- Generates report header040-170-*.sh- Various report sections (WordPress versions, PHP versions, plugins, themes, logins, etc.)
When to modify:
- When adding new report sections (add scripts to
generate-report-parts/) - When modifying report generation logic
- When adding new utility scripts
Purpose: Utility scripts and analysis tools
Contents:
plugins_using_htaccess_phpini_or_dropins.sh- Plugin analysis utility500-most-active-plugins.csv- Sample data for plugin analysis*.md- Analysis documentation
When to modify: When adding one-off analysis scripts or utilities
Purpose: Local copy of Loop data downloaded from S3
Contents: JSON files containing WordPress instance data
Populated by: pnpm download-loop-data-s3
Size: Can be large (hundreds of MB to GBs)
When to clean: Can be removed and re-downloaded anytime
Purpose: Generated report output directory
Contents:
generate-report.parquet- Converted data in columnar formatgenerate-report.db- DuckDB databasegenerate-report.md- Markdown reportgenerate-report.pdf- PDF report
Populated by: pnpm generate-report
Size: Several hundred MB (database can be large)
When to clean: Can be removed and regenerated anytime
Purpose: Legacy DuckDB files and notebooks (for reference)
Contents:
ui/- Exported notebook files*.db- Old database files- Documentation from previous versions
Status: Legacy/reference material
When to modify: Generally not modified (use generate-report/ instead)
Purpose: Temporary directory for plugin analysis
Contents: Downloaded plugin files for analysis
Created by: Plugin analysis utilities in snippets/
When to clean: Can be removed anytime (will be recreated as needed)
Purpose: Logs from plugin analysis runs
Contents: Log files from plugin analysis scripts
Created by: snippets/plugins_using_htaccess_phpini_or_dropins.sh
When to clean: Can be removed anytime
Purpose: Installed npm dependencies
Contents: Node.js packages (prettier, md-to-pdf, mermaid)
Populated by: pnpm install
When to clean: Run pnpm install to recreate
| Directory | Purpose | Git Tracked | Size | Can Delete? |
|---|---|---|---|---|
.claude/ |
Claude AI config | Yes | Small | No |
.gemini/ |
Gemini AI config | Yes | Small | No |
.vscode/ |
VS Code settings | Yes | Small | No |
docs/ |
Documentation | Yes | Small | No |
scripts/ |
Automation scripts | Yes | Small | No |
snippets/ |
Utility scripts | Yes | Small | No |
s3/ |
Downloaded data | No | Large | Yes* |
generate-report/ |
Generated output | No | Large | Yes* |
duckdb/ |
Legacy files | Partial | Medium | Use with caution |
plugin_temp/ |
Plugin analysis temp | No | Large | Yes |
plugins_using_.../ |
Analysis logs | No | Small | Yes |
node_modules/ |
npm packages | No | Medium | Yes* |
* Can be regenerated by running the appropriate command
Solution: Run pnpm download-loop-data-s3 to download the Loop data from S3 first.
Solution: Create a .secrets file with AWS credentials (see setup section).
Solution:
- Ensure Docker is running
- Check that
./s3directory contains JSON files - Run with
--verboseflag for detailed output to see which scripts are executing:pnpm generate-report --verbose
- Use
--dry-runto see what would execute without running:pnpm generate-report --dry-run
Solution:
- Ensure Docker is running
- Check if port is already in use
- If database doesn't exist, it will be auto-generated (requires
./s3data)
Solution: The scripts handle permissions automatically, but if issues persist:
# Fix permissions on s3 directory
docker run -v $(pwd)/s3:/local --rm library/bash -c "chmod -R a+rw /local"
# Fix permissions on generate-report directory
docker run -v $(pwd)/generate-report:/local --rm library/bash -c "chmod -R a+rw /local"