diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index 00fd628b..02f17f66 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -31,7 +31,7 @@ jobs: - 5432:5432 strategy: matrix: - python-version: ['3.9', '3.10', '3.11', '3.12', '3.13', '3.14'] + python-version: ['3.10', '3.11', '3.12', '3.13', '3.14'] steps: - uses: actions/checkout@v4 with: @@ -123,5 +123,5 @@ jobs: uses: coverallsapp/github-action@v2 with: parallel-finished: true - carryforward: "run-3.9,run-3.10,run-3.11,run-3.12,run-3.13" + carryforward: "run-3.10,run-3.11,run-3.12,run-3.13,run-3.14" github-token: ${{ secrets.GITHUB_TOKEN }} diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 809ed509..7794748e 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,6 +1,54 @@ -# Contributing to CommCare Export +Contributing to the CommCare Data Export Tool +============================================= -## Coding style +Thank you for your interest in contributing! This document covers the +contribution process, coding standards, and release procedures. + + +Getting Started +--------------- + +1. Sign up for [GitHub](https://github.com) if you haven't already +2. Fork the repository at https://github.com/dimagi/commcare-export +3. Clone your fork, install into a virtualenv, and start a feature + branch: + +```shell +git clone git@github.com:your-username/commcare-export.git +cd commcare-export +uv venv +source .venv/bin/activate # On Windows: .venv\Scripts\activate +uv pip install -e ".[test]" +git checkout -b my-feature-branch +``` + + +Making Changes +-------------- + +1. Create a feature branch from `master` +2. Make your changes following the coding style below +3. Make sure the tests pass: + ```shell + pytest + ``` +4. Check type hints (if modifying typed modules): + ```shell + mypy --install-types commcare_export/ tests/ migrations/ + ``` +5. Push and submit a pull request: + ```shell + git push -u origin my-feature-branch + ``` +6. Visit https://github.com/dimagi/commcare-export and submit a pull + request. + +For detailed testing instructions, including database setup and +troubleshooting, see the [Testing Guide](docs/testing.md). + + +Coding Style +------------ > Perfection is achieved, not when there is nothing more to add, but > when there is nothing left to take away. @@ -76,3 +124,43 @@ def test_doctests(): results = doctest.testmod(module, optionflags=doctest.ELLIPSIS) assert results.failed == 0 ``` + + +Release Process +--------------- + +For maintainers only. + +1. **Create a tag** for the release: + ```shell + git tag -a "X.YY.0" -m "Release X.YY.0" + git push --tags + ``` + +2. **Create the distribution**: + ```shell + uv build + ``` + Ensure that the archives in `dist/` have the correct version number + (matching the tag name). + +3. **Upload to PyPI**: + ```shell + uv publish + ``` + +4. **Verify the upload** at https://pypi.python.org/pypi/commcare-export + +5. **Create a release on GitHub** at + https://github.com/dimagi/commcare-export/releases + + Once the release is published, a GitHub workflow compiles executables + of the DET for Linux and Windows, adding them to the release as + assets. + +For Linux-based users: If you download and use the executable file, make +sure the file has the executable permission enabled: + +```shell +chmod +x commcare-export +``` diff --git a/LICENSE b/LICENSE new file mode 100644 index 00000000..29f307c7 --- /dev/null +++ b/LICENSE @@ -0,0 +1,23 @@ +MIT License +=========== + +Copyright (c) 2013-2026 Dimagi Inc. + +Permission is hereby granted, free of charge, to any person obtaining a +copy of this software and associated documentation files (the +"Software"), to deal in the Software without restriction, including +without limitation the rights to use, copy, modify, merge, publish, +distribute, sublicense, and/or sell copies of the Software, and to +permit persons to whom the Software is furnished to do so, subject to +the following conditions: + +The above copyright notice and this permission notice shall be included +in all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS +OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. +IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY +CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, +TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE +SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. diff --git a/README.md b/README.md index d031d5b3..ba95ba95 100644 --- a/README.md +++ b/README.md @@ -1,791 +1,87 @@ -CommCare Export -=============== +CommCare Data Export Tool +======================== -https://github.com/dimagi/commcare-export +https://github.com/dimagi/commcare-export -[![Build Status](https://app.travis-ci.com/dimagi/commcare-export.svg?branch=master)](https://app.travis-ci.com/dimagi/commcare-export) +[![Build Status](https://github.com/dimagi/commcare-export/actions/workflows/test.yml/badge.svg)](https://github.com/dimagi/commcare-export/actions) [![Test coverage](https://coveralls.io/repos/dimagi/commcare-export/badge.png?branch=master)](https://coveralls.io/r/dimagi/commcare-export) [![PyPI version](https://badge.fury.io/py/commcare-export.svg)](https://badge.fury.io/py/commcare-export) -A command-line tool (and Python library) to generate customized exports from the [CommCare HQ](https://www.commcarehq.org) [REST API](https://wiki.commcarehq.org/display/commcarepublic/Data+APIs). +A command-line tool (and Python library) to generate customized exports +from the [CommCare HQ](https://www.commcarehq.org) +[REST API](https://wiki.commcarehq.org/display/commcarepublic/Data+APIs). -* [User documentation](https://wiki.commcarehq.org/display/commcarepublic/CommCare+Data+Export+Tool) -* [Changelog](https://github.com/dimagi/commcare-export/releases) -Installation & Quick Start --------------------------- +Quick Start +----------- -Following commands are to be run on a terminal or a command line. - -Once on a terminal window or command line, for simplicity, run commands from the home directory. - -### Python - -Check which Python version is installed. - -This tool is tested with Python versions from 3.9 to 3.13. - -```shell -$ python3 --version -``` -If Python is installed, its version will be shown. - -If Python isn't installed, [download and install](https://www.python.org/downloads/) -a version of Python from 3.9 to 3.13. - -## Virtualenv (Optional) - -It is recommended to set up a virtual environment for CommCare Export -to avoid conflicts with other Python applications. - -More about virtualenvs on https://docs.python.org/3/tutorial/venv.html - -Setup a virtual environment using: - -```shell -$ python3 -m venv venv -``` - -Activate virtual environment by running: - -```shell -$ source venv/bin/activate -``` - -**Note**: virtualenv needs to be activated each time you start a new terminal session or command line prompt. - -For convenience, to avoid doing that, you can create an alias to activate virtual environments in -"venv" directory by adding the following to your -`.bashrc` or `.zshrc` file: - -```shell -$ alias venv='if [[ -d venv ]] ; then source venv/bin/activate ; fi' -``` - -Then you can activate virtual environments with simply typing -```shell -$ venv -``` - -## Install CommCare Export - -[uv](https://docs.astral.sh/uv/) is a fast Python package installer and resolver. +### Installation ```shell -$ uv pip install commcare-export +uv pip install commcare-export ``` -## CommCare HQ - -1. Sign up for [CommCare HQ](https://www.commcarehq.org/) if you have not already. - -2. Create a project space and application. - -3. Visit the Release Manager, make a build, click the star to release it. - -4. Use Web Apps and fill out some forms. - -5. Modify one of example queries in the `examples/` directory, modifying the "Filter Value" column - to match your form XMLNS / case type. - See [this page](https://confluence.dimagi.com/display/commcarepublic/Finding+a+Form%27s+XMLNS) to - determine the XMLNS for your form. - -Now you can run the following examples: +### Basic Usage ```shell -$ commcare-export \ - --query examples/demo-registration.xlsx \ - --project YOUR_PROJECT \ - --output-format markdown - -$ commcare-export \ - --query examples/demo-registration.json \ - --project YOUR_PROJECT \ - --output-format markdown +# Export forms to Markdown (useful for testing) +commcare-export \ + --query examples/demo-registration.xlsx \ + --project YOUR_PROJECT \ + --output-format markdown -$ commcare-export \ - --query examples/demo-deliveries.xlsx \ - --project YOUR_PROJECT \ - --output-format markdown - -$ commcare-export \ - --query examples/demo-deliveries.json \ - --project YOUR_PROJECT \ - --output-format markdown +# Export to a SQL database with incremental updates +commcare-export \ + --query examples/demo-registration.xlsx \ + --project YOUR_PROJECT \ + --output-format sql \ + --output postgresql://user:pass@localhost/dbname ``` -You'll see the tables printed out. Change to `--output-format sql --output URL_TO_YOUR_DB --since DATE` to -sync all forms submitted since that date. - -Example query files are provided in both Excel and JSON format. It is recommended -to use the Excel format as the JSON format may change upon future library releases. - -Command-line Usage ------------------- - -The basic usage of the command-line tool is with a saved Excel or JSON query (see how to write these, below) - -```shell -$ commcare-export --commcare-hq \ - --username \ - --project \ - --api-version \ - --version \ - --query \ - --output-format \ - --output \ - --users \ - --locations \ - --with-organization -``` - -See `commcare-export --help` for the full list of options. - -### Logging - -By default, commcare-export writes logs to a file named -`commcare_export.log` in the current working directory. Log entries are -appended to this file across multiple runs to preserve history. - -You can customize the log directory: - -```shell -$ commcare-export --log-dir /path/to/logs \ - --query my-query.xlsx \ - --project myproject -``` - -To disable file logging and show all output in the console only: - -```shell -$ commcare-export --no-logfile \ - --query my-query.xlsx \ - --project myproject -``` +Example query files are provided in the [examples/](examples/) directory +for both Excel and JSON formats. -> [!NOTE] -> The log directory will be created automatically if it doesn't exist. -> If the specified directory cannot be created or written to, -> commcare-export will fall back to console-only logging with a warning -> message. -There are example query files for the CommCare Demo App (available on the CommCare HQ Exchange) in the `examples/` -directory. - -`--output` - -CommCare Export uses SQLAlachemy's [create_engine](http://docs.sqlalchemy.org/en/latest/core/engines.html) to establish a database connection. This is based off of the [RFC-1738](https://www.ietf.org/rfc/rfc1738.txt) protocol. Some common examples: - -``` -# Postgres -postgresql+psycopg2://scott:tiger@localhost/mydatabase - -# MySQL -mysql+pymysql://scott:tiger@localhost/mydatabase - -# MSSQL -mssql+pyodbc://scott:tiger@localhost/mydatabases?driver=ODBC+Driver+17+for+SQL+Server -``` - - -Excel Queries +Documentation ------------- -An Excel query is any `.xlsx` workbook. Each sheet in the workbook represents one table you wish -to create. There are two grouping of columns to configure the table: - - - **Data Source**: Set this to `form` to export form data, or `case` for case data. - - **Filter Name** / *Filter Value*: These columns are paired up to filter the input cases or forms. - - **Field**: The destination in your SQL database for the value. - - **Source Field**: The particular field from the form you wish to extract. This can be any JSON path. - - -JSON Queries ------------- - -JSON queries are a described in the table below. You build a JSON object that represents the query you have in mind. -A good way to get started is to work from the examples, or you could make an Excel query and run the tool -with `--dump-query` to see the resulting JSON query. - - -User and Location Data ----------------------- - -The --users and --locations options export data from a CommCare project that -can be joined with form and case data. The --with-organization option does all -of that and adds a field to Excel query specifications to be joined on. - -Specifying the --users option or --with-organization option will export an -additional table named 'commcare_users' containing the following columns: - -| Column | Type | Note | -|----------------------------------|------|-------------------------------------| -| id | Text | Primary key | -| default_phone_number | Text | | -| email | Text | | -| first_name | Text | | -| groups | Text | | -| last_name | Text | | -| phone_numbers | Text | | -| resource_uri | Text | | -| commcare_location_id | Text | Foreign key to `commcare_locations` | -| commcare_location_ids | Text | | -| commcare_primary_case_sharing_id | Text | | -| commcare_project | Text | | -| username | Text | | - -The data in the 'commcare_users' table comes from the [List Mobile Workers -API endpoint](https://confluence.dimagi.com/display/commcarepublic/List+Mobile+Workers). - -Specifying the --locations option or --with-organization options will export -an additional table named 'commcare_locations' containing the following columns: - -| Column | Type | Note | -|------------------------------|------|-----------------------------------------------| -| id | Text | | -| created_at | Date | | -| domain | Text | | -| external_id | Text | | -| last_modified | Date | | -| latitude | Text | | -| location_data | Text | | -| location_id | Text | Primary key | -| location_type | Text | | -| longitude | Text | | -| name | Text | | -| parent | Text | Resource URI of parent location | -| resource_uri | Text | | -| site_code | Text | | -| location_type_administrative | Text | | -| location_type_code | Text | | -| location_type_name | Text | | -| location_type_parent | Text | | -| *location level code* | Text | Column name depends on project's organization | -| *location level code* | Text | Column name depends on project's organization | - -The data in the 'commcare_locations' table comes from the Location API -endpoint along with some additional columns from the Location Type API -endpoint. The last columns in the table exist if you have set up -organization levels for your projects. One column is created for each -organization level. The column name is derived from the Location Type -that you specified. The column value is the location_id of the containing -location at that level of your organization. Consider the example organization -from the [CommCare help page](https://confluence.dimagi.com/display/commcarepublic/Setting+up+Organization+Levels+and+Structure). -A piece of the 'commcare_locations' table could look like this: - -| location_id | location_type_name | chw | supervisor | clinic | district | -|-------------|--------------------|--------|------------|--------|----------| -| 939fa8 | District | NULL | NULL | NULL | 939fa8 | -| c4cbef | Clinic | NULL | NULL | c4cbef | 939fa8 | -| a9ca40 | Supervisor | NULL | a9ca40 | c4cbef | 939fa8 | -| 4545b9 | CHW | 4545b9 | a9ca40 | c4cbef | 939fa8 | - -In order to join form or case data to 'commcare_users' and 'commcare_locations' -the exported forms and cases need to contain a field identifying which user -submitted them. The --with-organization option automatically adds a field -called 'commcare_userid' to each query in an Excel specification for this -purpose. Using that field, you can use a SQL query with a join to report -data about any level of you organization. For example, to count the number -of forms submitted by all workers in each clinic: - -```sql -SELECT l.clinic, - COUNT(*) -FROM form_table t -LEFT JOIN (commcare_users u - LEFT JOIN commcare_locations l - ON u.commcare_location_id = l.location_id) -ON t.commcare_userid = u.id -GROUP BY l.clinic; -``` +### For Users -Note that the table names 'commcare_users' and 'commcare_locations' are -treated as reserved names and the export tool will produce an error if -given a query specification that writes to either of them. +See the [User Documentation](https://dimagi.atlassian.net/wiki/spaces/commcarepublic/pages/2143955952/CommCare+Data+Export+Tool+DET) +for installation, creating queries, command-line usage, scheduling, and +common use cases. -The export tool will write all users to 'commcare_users' and all locations to -'commcare_locations', overwriting existing rows with current data and adding -rows for new users and locations. If you want to remove obsolete users or -locations from your tables, drop them and the next export will leave only -the current ones. If you modify your organization to add or delete levels, -you will change the columns of the 'commcare_locations' table and it is -very likely you will want to drop the table before exporting with the new -organization. +### For Developers -Scheduling the DET ------------------- -Scheduling the DET to run at regular intervals is a useful tactic to keep your -database up to date with CommCare HQ. +See the [Technical Documentation](docs/index.md) for: -A common approach to scheduling DET runs is making use of the operating systems' scheduling -libraries to invoke a script to execute the `commcare-export` command. Sample scripts can be -found in the `examples/` directory for both Windows and Linux. +- [Python Library Usage](docs/library-usage.md) - Using `commcare-export` as a Python library +- [MiniLinq Reference](docs/minilinq-reference.md) - Query language documentation +- [Query Formats](docs/query-formats.md) - Excel and JSON query specifications +- [Output Formats](docs/output-formats.md) - Available output formats and dependencies +- [User and Location Data](docs/user-location-data.md) - Exporting organization data +- [Command-line Usage](docs/cli-usage.md) - CLI reference +- [Scheduling](docs/scheduling.md) - Running DET on a schedule -### Windows -On Windows systems you can make use of the [task scheduler](https://sqlbackupandftp.com/blog/how-to-schedule-a-script-via-windows-task-scheduler/) -to run scheduled scripts for you. - -The `examples/` directory contains a sample script file, `scheduled_run_windows.bat`, which can be used by the -task scheduler to invoke the `commcare-export` command. - -To set up the scheduled task you can follow the steps below. -1. Copy the file `scheduled_run_windows.bat` to any desired location on your system (e.g. `Documents`) -2. Edit the copied `.bat` file and populate your own details -3. Follow the steps outlined [here](https://sqlbackupandftp.com/blog/how-to-schedule-a-script-via-windows-task-scheduler/), -using the .bat file when prompted for the `Program/script`. - - -### Linux -On a Linux system you can make use of the [crontab](https://www.techtarget.com/searchdatacenter/definition/crontab) -command to create scheduled actions (cron jobs) in the system. - -The `examples/` directory contains a sample script file, `scheduled_run_linux.sh`, which can be used by the cron job. -To set up the cron job you can follow the steps below. -1. Copy the example file to the home directory -> cp ./examples/scheduled_run_linux.sh ~/scheduled_run_linux.sh -2. Edit the file to populate your own details -> nano ~/scheduled_run_linux.sh -3. Create a cron job by appending to the crontab file -> crontab -e - -Make an entry below any existing cron jobs. The example below executes the script file at the top of -every 12th hour of every day -> 0 12 * * * bash ~/scheduled_run_linux.sh - -You can consult the [crontab.guru](https://crontab.guru/) tool which is very useful to generate and interpret -any custom cron schedules. - -Python Library Usage --------------------- - -As a library, the various `commcare_export` modules make it easy to - - - Interact with the CommCare HQ REST API - - Execute "Minilinq" queries against the API (a very simple query language, described below) - - Load and save JSON representations of Minilinq queries - - Compile Excel configurations to Minilinq queries - -To directly access the CommCare HQ REST API: - -```python -from commcare_export.checkpoint import CheckpointManagerWithDetails -from commcare_export.commcare_hq_client import CommCareHqClient, AUTH_MODE_APIKEY -from commcare_export.commcare_minilinq import get_paginator, PaginationMode - -username = 'some@username.com' -domain = 'your-awesome-domain' -hq_host = 'https://commcarehq.org' -API_KEY= 'your_secret_api_key' - -api_client = CommCareHqClient(hq_host, domain, username, API_KEY, AUTH_MODE_APIKEY) -case_paginator=get_paginator(resource='case', pagination_mode=PaginationMode.date_modified) -case_paginator.init() -checkpoint_manager=CheckpointManagerWithDetails(None, None, PaginationMode.date_modified) - -cases = api_client.iterate('case', case_paginator, checkpoint_manager=checkpoint_manager) - -for case in cases: - print(case['case_id']) - -``` - -To issue a `minilinq` query against it, and then print out that query in a JSON serialization: - -```python -import json -import sys -from commcare_export.minilinq import * -from commcare_export.commcare_hq_client import CommCareHqClient -from commcare_export.commcare_minilinq import CommCareHqEnv -from commcare_export.env import BuiltInEnv, JsonPathEnv -from commcare_export.writers import StreamingMarkdownTableWriter - -api_client = CommCareHqClient( - url="http://www.commcarehq.org", - project='your_project', - username='your_username', - password='password', - version='0.5' -) - -source = Map( - source=Apply( - Reference("api_data"), - Literal("form"), - Literal({"filter": {"term": {"app_id": "whatever"}}}) - ), - body=List([ - Reference("received_on"), - Reference("form.gender"), - ]) -) - -query = Emit( - 'demo-table', - [ - Literal('Received On'), - Literal('Gender') - ], - source -) - -print(json.dumps(query.to_jvalue(), indent=2)) - -results = query.eval(BuiltInEnv() | CommCareHqEnv(api_client) | JsonPathEnv()) - -if len(list(env.emitted_tables())) > 0: - with StreamingMarkdownTableWriter(sys.stdout) as writer: - for table in env.emitted_tables(): - writer.write_table(table) -``` - -Which will output JSON equivalent to this: - -```json -{ - "Emit": { - "headings": [ - { - "Lit": "Received On" - }, - { - "Lit": "Gender" - } - ], - "source": { - "Map": { - "body": { - "List": [ - { - "Ref": "received_on" - }, - { - "Ref": "form.gender" - } - ] - }, - "name": null, - "source": { - "Apply": { - "args": [ - { - "Lit": "form" - }, - { - "Lit": { - "filter": { - "term": { - "app_id": "whatever" - } - } - } - } - ], - "fn": { - "Ref": "api_data" - } - } - } - } - }, - "table": "demo-table" - } -} -``` - - -MiniLinq Reference ------------------- - -The abstract syntax can be directly inspected in the `commcare_export.minilinq` module. Note that the choice between functions and primitives is deliberately chosen -to expose the structure of the MiniLinq for possible optimization, and to restrict the overall language. - -Here is a description of the astract syntax and semantics - -| Python | JSON | Which is evaluates to | -|-------------------------------|-----------------------------------------------------|----------------------------------| -| `Literal(v)` | `{"Lit": v}` | Just `v` | -| `Reference(x)` | `{"Ref": x}` | Whatever `x` resolves to in the environment | -| `List([a, b, c, ...])` | `{"List": [a, b, c, ...}` | The list of what `a`, `b`, `c` evaluate to | -| `Map(source, name, body)` | `{"Map": {"source": ..., "name": ..., "body": ...}` | Evals `body` for each elem in `source`. If `name` is provided, the elem will be bound to it, otherwise it will replace the whole env. | -| `FlatMap(source, name, body)` | `{"FlatMap": {"source" ... etc}}` | Flattens after mapping, like nested list comprehensions | -| `Filter(source, name, body)` | etc | | -| `Bind(value, name, body)` | etc | Binds the result of `value` to `name` when evaluating `body` | -| `Emit(table, headings, rows)` | etc | Emits `table` with `headings` and `rows`. Note that `table` is a string, `headings` is a list of expressions, and `rows` is a list of lists of expressions. See explanation below for emitted output. | -| `Apply(fn, args)` | etc | Evaluates `fn` to a function, and all of `args`, then applies the function to the args. | - -Built in functions like `api_data` and basic arithmetic and comparison are provided via the environment, -referred to be name using `Ref`, and utilized via `Apply`. - -List of builtin functions: - -| Function | Description | Example Usage | -|--------------------------------|--------------------------------------------------------------------------------|----------------------------------| -| `+, -, *, //, /, >, <, >=, <=` | Standard Math | | -| len | Length | | -| bool | Bool | | -| str2bool | Convert string to boolean. True values are 'true', 't', '1' (case insensitive) | | -| str2date | Convert string to date | | -| bool2int | Convert boolean to integer (0, 1) | | -| str2num | Parse string as a number | | -| format-uuid | Parse a hex UUID, and format it into hyphen-separated groups | | -| substr | Returns substring indexed by [first arg, second arg), zero-indexed. | substr(2, 5) of 'abcdef' = 'cde' | -| selected-at | Returns the Nth word in a string. N is zero-indexed. | selected-at(3) - return 4th word | -| selected | Returns True if the given word is in the value. | selected(fever) | -| count-selected | Count the number of words | | -| json2str | Convert a JSON object to a string | | -| template | Render a string template (not robust) | template({} on {}, state, date) | -| attachment_url | Convert an attachment name into it's download URL | | -| form_url | Output the URL to the form view on CommCare HQ | | -| case_url | Output the URL to the case view on CommCare HQ | | -| unique | Ouptut only unique values in a list | | - -Output Formats --------------- - -Your MiniLinq may define multiple tables with headings in addition to their body rows by using `Emit` -expressions, or may simply return the results of a single query. - -If your MiniLinq does not contain any `Emit` expressions, then the results of the expression will be -printed to standard output as pretty-printed JSON. - -If your MiniLinq _does_ contain `Emit` expressions, then there are many formats available, selected -via the `--output-format ` option, and it can be directed to a file with the `--output ` command-line option. - - - `csv`: Each table will be a CSV file within a Zip archive. - - `xls`: Each table will be a sheet in an old-format Excel spreadsheet. - - `xlsx`: Each table will be a sheet in a new-format Excel spreadsheet. - - `json`: The tables will each be a member of a JSON dictionary, printed to standard output - - `markdown`: The tables will be streamed to standard output in Markdown format (very handy for debugging your queries) - - `sql`: All data will be idempotently "upserted" into the SQL database you specify, including creating the needed tables and columns. - - -Dependencies ------------- - -Required dependencies will be automatically installed. Optional dependencies -for specific export formats can be installed as extras: - -```shell -# To export "xlsx" -$ uv pip install "commcare-export[xlsx]" - -# To export "xls" -$ uv pip install "commcare-export[xls]" - -# To sync with a Postgres database -$ uv pip install "commcare-export[postgres]" - -# To sync with a mysql database -$ uv pip install "commcare-export[mysql]" - -# To sync with a database which uses odbc (e.g. mssql) -$ uv pip install "commcare-export[odbc]" - -# To sync with another SQL database supported by SQLAlchemy -$ uv pip install "commcare-export[base_sql]" -# Then install the Python package for your database -``` Contributing ------------ -0\. Sign up for GitHub, if you have not already, at https://github.com. - -1\. Fork the repository at https://github.com/dimagi/commcare-export. - -2\. Clone your fork, install into a virtualenv, and start a feature branch - -```shell -$ git clone git@github.com:your-username/commcare-export.git -$ cd commcare-export -$ uv venv -$ source .venv/bin/activate # On Windows: .venv\Scripts\activate -$ uv pip install -e ".[test]" -$ git checkout -b my-super-duper-feature -``` - -3\. Make your edits. - -4\. Make sure the tests pass. The best way to test for all versions is to sign up for https://travis-ci.org and turn on automatic continuous testing for your fork. - -```shell -$ py.test -=============== test session starts =============== -platform darwin -- Python 2.7.3 -- pytest-2.3.4 -collected 17 items - -tests/test_commcare_minilinq.py . -tests/test_excel_query.py .... -tests/test_minilinq.py ........ -tests/test_repeatable_iterator.py . -tests/test_writers.py ... - -============ 17 passed in 2.09 seconds ============ -``` - -5\. Type hints are used in the `env` and `minilinq` modules. Check that any changes in those modules adhere to those types: +We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for +how to set up your development environment, coding style guidelines, +testing, and the release process. -```shell -$ mypy --install-types @mypy_typed_modules.txt -``` +- [Testing Guide](docs/testing.md) +- [Changelog](https://github.com/dimagi/commcare-export/releases) -6\. Push the feature branch up - -```shell -$ git push -u origin my-super-duper-feature -``` -7\. Visit https://github.com/dimagi/commcare-export and submit a pull request. - -8\. Accept our gratitude for contributing: Thanks! - -Release process +Python Versions --------------- -1\. Create a tag for the release - -```shell -$ git tag -a "X.YY.0" -m "Release X.YY.0" -$ git push --tags -``` - -2\. Create the distribution +Tested with Python 3.10, 3.11, 3.12, and 3.13. -```shell -$ uv build -``` - -Ensure that the archives in `dist/` have the correct version number (matching the tag name). - -3\. Upload to pypi - -```shell -$ uv publish -``` - -4\. Verify upload - -https://pypi.python.org/pypi/commcare-export - -5\. Create a release on github - -https://github.com/dimagi/commcare-export/releases - -Once the release is published a GitHub workflow is kicked off that compiles executables of the DET compatible with -Linux and Windows machines, adding it to the release as assets. - -[For Linux-based users] If you decide to download and use the executable file, please make sure the file has the executable permission enabled, -after which it can be invoked like any other executable though the command line. - - -Testing and Test Databases --------------------------- - -The following command will run the entire test suite (requires DB environment variables to be set as per below): - -```shell -$ py.test -``` - -To run an individual test class or method you can run, e.g.: - -```shell -$ py.test -k "TestExcelQuery" -$ py.test -k "test_get_queries_from_excel" -``` - -To exclude the database tests you can run: - -```shell -$ py.test -m "not dbtest" -``` -When running database tests, supported databases are PostgreSQL, MySQL, MSSQL. - -To run tests against selected databases can be done using test marks as follows: -```shell -$ py.test -m [postgres,mysql,mssql] -``` - -Use Docker and docker-compose to start database services for tests: - -1. Start the services: - ```shell - docker-compose up -d - ``` - -2. Wait for services to be healthy: - ```shell - docker-compose ps - ``` - -3. Run your tests. The default environment variables in - `tests/conftest.py` work automatically: - - PostgreSQL: `postgresql://postgres@localhost/` - - MySQL: `mysql+pymysql://travis@/` - - MS SQL Server: `mssql+pyodbc://SA:Password-123@localhost/` - - If needed, you can override with environment variables: - ```shell - export POSTGRES_URL='postgresql://postgres@localhost/' - export MYSQL_URL='mysql+pymysql://root@localhost/' - export MSSQL_URL='mssql+pyodbc://SA:Password-123@localhost/' - ``` -4. Stop the services when done: - ```shell - docker-compose down - ``` - To also remove the data volumes: - ```shell - docker-compose down -v - ``` - -> [!NOTE] -> For MS SQL Server tests, you'll need the ODBC Driver for SQL Server -> installed on your host system for the `pyodbc` connection to work. - -From [learn.microsoft.com](https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server) -([source](https://github.com/MicrosoftDocs/sql-docs/blob/live/docs/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server.md)) - -#### Debian/Ubuntu - -```shell -# Download the package to configure the Microsoft repo -curl -sSL -O https://packages.microsoft.com/config/debian/$(grep VERSION_ID /etc/os-release | cut -d '"' -f 2 | cut -d '.' -f 1)/packages-microsoft-prod.deb -# Install the package -sudo dpkg -i packages-microsoft-prod.deb -# Delete the file -rm packages-microsoft-prod.deb - -sudo apt-get update -sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18 - -odbcinst -q -d -``` - -#### Mac OS - -```shell -/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)" -brew tap microsoft/mssql-release https://github.com/Microsoft/homebrew-mssql-release -brew update -HOMEBREW_ACCEPT_EULA=Y brew install msodbcsql18 -``` - - -Integration Tests ------------------ -Running the integration tests requires API credentials from CommCare HQ -that have access to the `corpora` domain. This user should only have -access to the corpora domain. - -These need to be set as environment variables as follows: - -```shell -$ export HQ_USERNAME= -$ export HQ_API_KEY= -``` +License +------- -For Travis builds these are included as encrypted vars in the travis -config. +MIT License - see [LICENSE](LICENSE) for details. diff --git a/docs/cli-usage.md b/docs/cli-usage.md new file mode 100644 index 00000000..a081b888 --- /dev/null +++ b/docs/cli-usage.md @@ -0,0 +1,57 @@ +Command-line Usage +================== + +For comprehensive command-line instructions, see the +[User Documentation](https://dimagi.atlassian.net/wiki/spaces/commcarepublic/pages/2143955952/CommCare+Data+Export+Tool+DET#Run-a-CommCare-Export). + + +Basic Usage +----------- + +```shell +commcare-export \ + --username \ + --project \ + --query \ + --output-format \ + --output +``` + +See `commcare-export --help` for the full list of options. + + +Logging +------- + +By default, `commcare-export` writes logs to `commcare_export.log` in +the current working directory. Log entries are appended across runs. + +```shell +# Custom log directory +commcare-export --log-dir /path/to/logs --query my-query.xlsx --project myproject + +# Disable file logging (console only) +commcare-export --no-logfile --query my-query.xlsx --project myproject +``` + +> [!NOTE] +> The log directory will be created automatically if it doesn't exist. +> If the directory cannot be created or written to, `commcare-export` +> will fall back to console-only logging with a warning. + + +Database Output +--------------- + +The `--output` option accepts a SQLAlchemy +[connection string](http://docs.sqlalchemy.org/en/latest/core/engines.html) +following [RFC-1738](https://www.ietf.org/rfc/rfc1738.txt): + +``` +postgresql+psycopg2://user:password@localhost/mydatabase +mysql+pymysql://user:password@localhost/mydatabase +mssql+pyodbc://user:password@localhost/mydatabase?driver=ODBC+Driver+17+for+SQL+Server +``` + +For more connection string examples, see the +[User Documentation](https://dimagi.atlassian.net/wiki/spaces/commcarepublic/pages/2143955952/CommCare+Data+Export+Tool+DET#Generating-Database-Connection-Strings). diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 00000000..7136e022 --- /dev/null +++ b/docs/index.md @@ -0,0 +1,41 @@ +CommCare Data Export Tool - Technical Documentation +=================================================== + +This documentation is for developers who want to use `commcare-export` +as a Python library, contribute to the project, or understand its +internals. + +For end-user documentation about installing and using the command-line +tool, see the +[User Documentation](https://dimagi.atlassian.net/wiki/spaces/commcarepublic/pages/2143955952/CommCare+Data+Export+Tool+DET). + + +For Library Users +----------------- + +- [Python Library Usage](library-usage.md) - Using `commcare-export` as + a Python library +- [MiniLinq Reference](minilinq-reference.md) - Query language syntax + and built-in functions + + +Query and Output +---------------- + +- [Query Formats](query-formats.md) - Excel and JSON query formats +- [Output Formats](output-formats.md) - CSV, Excel, JSON, SQL, and + Markdown outputs +- [User and Location Data](user-location-data.md) - Exporting + organization data +- [Command-line Usage](cli-usage.md) - CLI reference and logging +- [Scheduling](scheduling.md) - Running the DET on a schedule + + +Development +----------- + +- [Contributing Guide](../CONTRIBUTING.md) - How to contribute, + coding style, and release process +- [Testing Guide](testing.md) - Running tests with multiple databases +- [Database Migrations](../migrations/README.md) - Using Alembic + migrations diff --git a/docs/library-usage.md b/docs/library-usage.md new file mode 100644 index 00000000..6b0ba7c9 --- /dev/null +++ b/docs/library-usage.md @@ -0,0 +1,119 @@ +Python Library Usage +==================== + +As a library, the various `commcare_export` modules make it easy to: + +- Interact with the CommCare HQ REST API +- Execute [MiniLinq](minilinq-reference.md) queries against the API +- Load and save JSON representations of MiniLinq queries +- Compile Excel configurations to MiniLinq queries + + +CommCare HQ API Client +---------------------- + +To directly access the CommCare HQ REST API: + +```python +from commcare_export.checkpoint import CheckpointManagerWithDetails +from commcare_export.commcare_hq_client import CommCareHqClient, AUTH_MODE_APIKEY +from commcare_export.commcare_minilinq import get_paginator, PaginationMode + +username = 'some@username.com' +domain = 'your-awesome-domain' +hq_host = 'https://www.commcarehq.org' +API_KEY= 'your_secret_api_key' + +api_client = CommCareHqClient(hq_host, domain, username, API_KEY, AUTH_MODE_APIKEY) +case_paginator=get_paginator(resource='case', pagination_mode=PaginationMode.date_modified) +case_paginator.init() +checkpoint_manager=CheckpointManagerWithDetails(None, None, PaginationMode.date_modified) + +cases = api_client.iterate('case', case_paginator, checkpoint_manager=checkpoint_manager) + +for case in cases: + print(case['case_id']) + +``` + +The `CommCareHqClient` supports two authentication modes: + +- `AUTH_MODE_PASSWORD` - Username and password authentication +- `AUTH_MODE_APIKEY` - API key authentication (recommended) + + +Executing MiniLinq Queries +-------------------------- + +To issue a MiniLinq query against the API, and print the query as JSON: + +```python +import json +import sys +from commcare_export.minilinq import * +from commcare_export.commcare_hq_client import CommCareHqClient +from commcare_export.commcare_minilinq import CommCareHqEnv +from commcare_export.env import BuiltInEnv, JsonPathEnv +from commcare_export.writers import StreamingMarkdownTableWriter + +api_client = CommCareHqClient( + url="http://www.commcarehq.org", + project='your_project', + username='your_username', + password='password', + version='0.5' +) + +source = Map( + source=Apply( + Reference("api_data"), + Literal("form"), + Literal({"filter": {"term": {"app_id": "whatever"}}}) + ), + body=List([ + Reference("received_on"), + Reference("form.gender"), + ]) +) + +query = Emit( + 'demo-table', + [ + Literal('Received On'), + Literal('Gender') + ], + source +) + +print(json.dumps(query.to_jvalue(), indent=2)) + +results = query.eval(BuiltInEnv() | CommCareHqEnv(api_client) | JsonPathEnv()) + +if len(list(env.emitted_tables())) > 0: + with StreamingMarkdownTableWriter(sys.stdout) as writer: + for table in env.emitted_tables(): + writer.write_table(table) +``` + +### Environment Composition + +MiniLinq query evaluation relies on composing multiple environments: + +- `BuiltInEnv()` - Built-in functions (math, string operations, etc.) +- `CommCareHqEnv(api_client)` - The `api_data` function for fetching + from CommCare HQ +- `JsonPathEnv()` - JSON path navigation (e.g., `form.gender`) + +These are composed using the `|` operator: + +```python +env = BuiltInEnv() | CommCareHqEnv(api_client) | JsonPathEnv() +``` + + +See Also +-------- + +- [MiniLinq Reference](minilinq-reference.md) +- [Query Formats](query-formats.md) +- [Output Formats](output-formats.md) diff --git a/docs/minilinq-reference.md b/docs/minilinq-reference.md new file mode 100644 index 00000000..cbf10287 --- /dev/null +++ b/docs/minilinq-reference.md @@ -0,0 +1,99 @@ +MiniLinq Reference +================== + +MiniLinq is a simple query language for extracting and transforming data +from CommCare HQ. It can be expressed in both Python (for library +users) and JSON (for serialization and Excel compilation). + +The abstract syntax can be directly inspected in the +`commcare_export.minilinq` module. The choice between functions and +primitives is deliberately chosen to expose the structure of the MiniLinq +for possible optimization, and to restrict the overall language. + + +Abstract Syntax +--------------- + +| Python | JSON | Evaluates to | +|-------------------------------|----------------------------------------------------------|------------------------------------------------------------------| +| `Literal(v)` | `{"Lit": v}` | Just `v` | +| `Reference(x)` | `{"Ref": x}` | Whatever `x` resolves to in the environment | +| `List([a, b, c, ...])` | `{"List": [a, b, c, ...]}` | The list of what `a`, `b`, `c` evaluate to | +| `Map(source, name, body)` | `{"Map": {"source": ..., "name": ..., "body": ...}}` | Evals `body` for each elem in `source`. If `name` is provided, the elem will be bound to it, otherwise it will replace the whole env. | +| `FlatMap(source, name, body)` | `{"FlatMap": {"source": ..., "name": ..., "body": ...}}` | Flattens after mapping, like nested list comprehensions | +| `Filter(source, name, body)` | `{"Filter": {"source": ..., "name": ..., "body": ...}}` | Filters `source` keeping elements where `body` evaluates to true | +| `Bind(value, name, body)` | `{"Bind": {"value": ..., "name": ..., "body": ...}}` | Binds the result of `value` to `name` when evaluating `body` | +| `Emit(table, headings, rows)` | `{"Emit": {"table": ..., "headings": ..., "rows": ...}}` | Emits `table` with `headings` and `rows`. `table` is a string, `headings` is a list of expressions, and `rows` is a list of lists of expressions. See [Output Formats](output-formats.md). | +| `Apply(fn, args)` | `{"Apply": {"fn": ..., "args": [...]}}` | Evaluates `fn` to a function, and all of `args`, then applies the function to the args. | + + +Built-in Functions +------------------ + +Built-in functions like `api_data` and basic arithmetic and comparison +are provided via the environment, referred to by name using `Ref`, and +utilized via `Apply`. + +### Arithmetic and Comparison + +| Function | Description | +|--------------------------------|----------------| +| `+, -, *, //, /, >, <, >=, <=` | Standard math | + +### Type Conversions + +| Function | Description | +|------------|-------------------------------------------------------------------------| +| `len` | Length of a string or list | +| `bool` | Convert to boolean | +| `str2bool` | Convert string to boolean. True values are 'true', 't', '1' (case insensitive) | +| `str2date` | Convert string to date | +| `bool2int` | Convert boolean to integer (0, 1) | +| `str2num` | Parse string as a number | + +### String Operations + +| Function | Description | Example | +|---------------|----------------------------------------------------------------------|--------------------------------------| +| `substr` | Returns substring indexed by [first arg, second arg), zero-indexed | `substr(2, 5)` of 'abcdef' = 'cde' | +| `template` | Render a string template (not robust) | `template("{} on {}", state, date)` | +| `format-uuid` | Parse a hex UUID, and format it into hyphen-separated groups | | +| `json2str` | Convert a JSON object to a string | | + +### Multi-select Operations + +Useful for working with CommCare multi-select questions: + +| Function | Description | Example | +|------------------|-----------------------------------------------------|------------------------------------| +| `selected-at` | Returns the Nth word in a string. N is zero-indexed | `selected-at(3)` - return 4th word | +| `selected` | Returns True if the given word is in the value | `selected("fever")` | +| `count-selected` | Count the number of words | | + +### CommCare-specific Functions + +| Function | Description | +|------------------|--------------------------------------------------| +| `attachment_url` | Convert an attachment name into its download URL | +| `form_url` | Output the URL to the form view on CommCare HQ | +| `case_url` | Output the URL to the case view on CommCare HQ | +| `unique` | Output only unique values in a list | + + +Converting Excel to JSON +------------------------- + +To see the MiniLinq JSON generated from an Excel query, use the +`--dump-query` option: + +```shell +commcare-export --query my-query.xlsx --dump-query +``` + + +See Also +-------- + +- [Python Library Usage](library-usage.md) - Using MiniLinq from Python +- [Query Formats](query-formats.md) - Excel and JSON query + specifications diff --git a/docs/output-formats.md b/docs/output-formats.md new file mode 100644 index 00000000..14cd0ca9 --- /dev/null +++ b/docs/output-formats.md @@ -0,0 +1,54 @@ +Output Formats +============== + +For end-user documentation on exporting data (including database +connection strings, checkpoints, and detailed usage), see the +[User Documentation](https://dimagi.atlassian.net/wiki/spaces/commcarepublic/pages/2143955952/CommCare+Data+Export+Tool+DET#Exporting-Data). + + +Format Summary +-------------- + +If your query does not contain any `Emit` expressions, results are +printed to standard output as pretty-printed JSON. + +If your query _does_ contain `Emit` expressions, the format is selected +via `--output-format ` and the destination via `--output `: + +| Format | Description | +|------------|------------------------------------------------------------------| +| `csv` | Each table as a CSV file within a Zip archive | +| `xls` | Each table as a sheet in an old-format Excel spreadsheet | +| `xlsx` | Each table as a sheet in a new-format Excel spreadsheet | +| `json` | Tables as members of a JSON dictionary, printed to stdout | +| `markdown` | Tables streamed to stdout in Markdown format (handy for debugging) | +| `sql` | Idempotent "upsert" into a SQL database, creating tables and columns as needed | + + +Optional Dependencies +--------------------- + +Required dependencies are installed automatically. Install extras for +specific output formats: + +```shell +# Excel formats +uv pip install "commcare-export[xlsx]" +uv pip install "commcare-export[xls]" + +# Database backends +uv pip install "commcare-export[postgres]" +uv pip install "commcare-export[mysql]" +uv pip install "commcare-export[odbc]" # MS SQL Server +uv pip install "commcare-export[base_sql]" # Other SQLAlchemy databases +``` + +For database connection string formats, see the +[User Documentation](https://dimagi.atlassian.net/wiki/spaces/commcarepublic/pages/2143955952/CommCare+Data+Export+Tool+DET#Generating-Database-Connection-Strings). + + +See Also +-------- + +- [Query Formats](query-formats.md) - Creating queries +- [MiniLinq Reference](minilinq-reference.md) - The `Emit` expression diff --git a/docs/query-formats.md b/docs/query-formats.md new file mode 100644 index 00000000..aa904729 --- /dev/null +++ b/docs/query-formats.md @@ -0,0 +1,51 @@ +Query Formats +============= + +The Data Export Tool supports two query formats: Excel and JSON. Both +are compiled to [MiniLinq](minilinq-reference.md) for execution. + +For detailed guidance on creating queries, including field mappings, +filter examples, and tips, see the +[User Documentation](https://dimagi.atlassian.net/wiki/spaces/commcarepublic/pages/2143955952/CommCare+Data+Export+Tool+DET#Creating-an-Excel-Query-File-in-CommCare-HQ). + + +Excel Queries +------------- + +An Excel query is any `.xlsx` workbook. Each sheet represents one output +table. Columns are grouped as follows: + +- **Data Source**: `form` for form data, or `case` for case data +- **Filter Name** / **Filter Value**: Paired columns to filter the data +- **Field**: The destination column name in your output +- **Source Field**: The JSON path to extract from the form or case + +It is recommended to use the Excel format as it is more user-friendly +and stable across library versions. + + +JSON Queries +------------ + +JSON queries represent [MiniLinq](minilinq-reference.md) expressions +directly. To get started with JSON, create an Excel query and convert +it: + +```shell +commcare-export --query my-query.xlsx --dump-query +``` + + +Examples +-------- + +Example query files in both formats are provided in the +[examples/](../examples/) directory. + + +See Also +-------- + +- [MiniLinq Reference](minilinq-reference.md) - Query language + documentation +- [Output Formats](output-formats.md) - Available output formats diff --git a/docs/scheduling.md b/docs/scheduling.md new file mode 100644 index 00000000..166b6e2d --- /dev/null +++ b/docs/scheduling.md @@ -0,0 +1,52 @@ +Scheduling DET Runs +=================== + +Scheduling the Data Export Tool to run at regular intervals keeps your +database up to date with CommCare HQ. + +For detailed scheduling instructions (including Windows Task Scheduler +setup), see the +[User Documentation](https://dimagi.atlassian.net/wiki/spaces/commcarepublic/pages/2143955952/CommCare+Data+Export+Tool+DET#Configuring-DET-to-Run-as-a-Scheduled-Task-on-Windows). + + +Quick Reference +--------------- + +Sample scripts are provided in the `examples/` directory: + +- **Windows**: `examples/scheduled_run_windows.bat` -- use with + [Task Scheduler](https://sqlbackupandftp.com/blog/how-to-schedule-a-script-via-windows-task-scheduler/) +- **Linux/Mac**: `examples/scheduled_run_linux.sh` -- use with + [cron](https://www.techtarget.com/searchdatacenter/definition/crontab) + +### Linux/Mac Setup + +1. Copy the example script: + ```shell + cp ./examples/scheduled_run_linux.sh ~/scheduled_run_linux.sh + ``` + +2. Edit with your project details: + ```shell + nano ~/scheduled_run_linux.sh + ``` + +3. Add a cron job (runs every 12 hours in this example): + ```shell + crontab -e + ``` + ``` + 0 */12 * * * bash ~/scheduled_run_linux.sh + ``` + +Use [crontab.guru](https://crontab.guru/) to generate custom cron +schedules. + + +Best Practices +-------------- + +- Use API keys instead of passwords in scheduled scripts +- Use SQL output format to leverage automatic checkpoints +- Use `--log-dir` to specify a log directory for troubleshooting +- Test manually before scheduling diff --git a/docs/testing.md b/docs/testing.md new file mode 100644 index 00000000..85237604 --- /dev/null +++ b/docs/testing.md @@ -0,0 +1,123 @@ +Testing Guide +============= + +Running Tests +------------- + +Run the full test suite: + +```shell +pytest +``` + +Run individual test classes or methods: + +```shell +pytest -k "TestExcelQuery" +pytest -k "test_get_queries_from_excel" +``` + +Exclude database tests: + +```shell +pytest -m "not dbtest" +``` + +Run tests against specific databases: + +```shell +pytest -m postgres +pytest -m mysql +pytest -m mssql +``` + + +Database Setup with Docker +-------------------------- + +Use Docker Compose to start database services for tests: + +1. Start the services: + ```shell + docker-compose up -d + ``` + +2. Wait for services to be healthy: + ```shell + docker-compose ps + ``` + +3. Run your tests. The default environment variables in + `tests/conftest.py` work automatically: + - PostgreSQL: `postgresql://postgres@localhost/` + - MySQL: `mysql+pymysql://travis@/` + - MS SQL Server: `mssql+pyodbc://SA:Password-123@localhost/` + + If needed, you can override with environment variables: + ```shell + export POSTGRES_URL='postgresql://postgres@localhost/' + export MYSQL_URL='mysql+pymysql://root@localhost/' + export MSSQL_URL='mssql+pyodbc://SA:Password-123@localhost/' + ``` + +4. Stop the services when done: + ```shell + docker-compose down + ``` + To also remove the data volumes: + ```shell + docker-compose down -v + ``` + + +ODBC Driver Installation +------------------------- + +For MS SQL Server tests, you need the ODBC Driver for SQL Server +installed on your host system. + +From [learn.microsoft.com](https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server) +([source](https://github.com/MicrosoftDocs/sql-docs/blob/live/docs/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server.md)) + +### Debian/Ubuntu + +```shell +# Download the package to configure the Microsoft repo +curl -sSL -O https://packages.microsoft.com/config/debian/$(grep VERSION_ID /etc/os-release | cut -d '"' -f 2 | cut -d '.' -f 1)/packages-microsoft-prod.deb +# Install the package +sudo dpkg -i packages-microsoft-prod.deb +# Delete the file +rm packages-microsoft-prod.deb + +sudo apt-get update +sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18 + +odbcinst -q -d +``` + +### macOS + +```shell +/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)" +brew tap microsoft/mssql-release https://github.com/Microsoft/homebrew-mssql-release +brew update +HOMEBREW_ACCEPT_EULA=Y brew install msodbcsql18 +``` + + +Integration Tests +----------------- + +Running the integration tests requires API credentials from CommCare HQ +that have access to the `corpora` domain. The API key should only have +access to the `corpora` domain. + +Set the credentials as environment variables: + +```shell +export HQ_USERNAME= +export HQ_API_KEY= +``` + +These are included as encrypted variables in the GitHub Actions +configuration. diff --git a/docs/user-location-data.md b/docs/user-location-data.md new file mode 100644 index 00000000..a61a7111 --- /dev/null +++ b/docs/user-location-data.md @@ -0,0 +1,109 @@ +User and Location Data +====================== + +The Data Export Tool can export user and location data from your +CommCare project, which can be joined with form and case data for +organizational reporting. + +For detailed usage instructions and examples, see the +[User Documentation](https://dimagi.atlassian.net/wiki/spaces/commcarepublic/pages/2143955952/CommCare+Data+Export+Tool+DET#Exporting-User-and-Location-Data). + + +Overview +-------- + +- `--users` exports a `commcare_users` table +- `--locations` exports a `commcare_locations` table +- `--with-organization` exports both tables and adds a + `commcare_userid` field to each query for joining + + +User Table Schema +----------------- + +The `commcare_users` table contains data from the +[List Mobile Workers API endpoint](https://confluence.dimagi.com/display/commcarepublic/List+Mobile+Workers): + +| Column | Type | Note | +|----------------------------------|------|-------------------------------------| +| id | Text | Primary key | +| default_phone_number | Text | | +| email | Text | | +| first_name | Text | | +| groups | Text | | +| last_name | Text | | +| phone_numbers | Text | | +| resource_uri | Text | | +| commcare_location_id | Text | Foreign key to `commcare_locations` | +| commcare_location_ids | Text | | +| commcare_primary_case_sharing_id | Text | | +| commcare_project | Text | | +| username | Text | | + + +Location Table Schema +--------------------- + +The `commcare_locations` table contains data from the Location API and +Location Type API endpoints: + +| Column | Type | Note | +|------------------------------|------|-----------------------------------------------| +| id | Text | | +| created_at | Date | | +| domain | Text | | +| external_id | Text | | +| last_modified | Date | | +| latitude | Text | | +| location_data | Text | | +| location_id | Text | Primary key | +| location_type | Text | | +| longitude | Text | | +| name | Text | | +| parent | Text | Resource URI of parent location | +| resource_uri | Text | | +| site_code | Text | | +| location_type_administrative | Text | | +| location_type_code | Text | | +| location_type_name | Text | | +| location_type_parent | Text | | +| *location level code* | Text | Column name depends on project's organization | + +If you have set up +[organization levels](https://confluence.dimagi.com/display/commcarepublic/Setting+up+Organization+Levels+and+Structure), +one additional column is created for each level. The column name is +derived from the Location Type, and the value is the `location_id` of +the containing location at that level. + + +Joining Data +------------ + +The `--with-organization` option adds a `commcare_userid` field to each +Excel query. Use this field to join form or case data with user and +location data: + +```sql +SELECT l.clinic, + COUNT(*) +FROM form_table t +LEFT JOIN (commcare_users u + LEFT JOIN commcare_locations l + ON u.commcare_location_id = l.location_id) +ON t.commcare_userid = u.id +GROUP BY l.clinic; +``` + +> [!NOTE] +> The table names `commcare_users` and `commcare_locations` are reserved. +> The export tool will produce an error if given a query specification +> that writes to either of them. + + +Data Refresh Behavior +--------------------- + +The export tool overwrites existing rows with current data and adds rows +for new users and locations. To remove obsolete entries, drop the table +and re-export. If you modify your organization levels, drop the +`commcare_locations` table before re-exporting. diff --git a/pyproject.toml b/pyproject.toml index 5c5152aa..406097b8 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -11,7 +11,7 @@ license = {text = "MIT"} authors = [ {name = "Dimagi", email = "information@dimagi.com"} ] -requires-python = ">=3.9,<3.15" +requires-python = ">=3.10,<3.15" classifiers = [ "Development Status :: 4 - Beta", "Environment :: Console", @@ -22,7 +22,6 @@ classifiers = [ "Intended Audience :: End Users/Desktop", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3", - "Programming Language :: Python :: 3.9", "Programming Language :: Python :: 3.10", "Programming Language :: Python :: 3.11", "Programming Language :: Python :: 3.12",