Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ repo to install.
- [**Skill Registry API on Agent Platform**](./skills/cloud/agent-platform-skill-registry)
- [**AlloyDB Basics**](./skills/cloud/alloydb-basics)
- [**BigQuery Basics**](./skills/cloud/bigquery-basics)
- [**Bigtable Basics**](./skills/cloud/bigtable-basics)
- [**Cloud Run Basics**](./skills/cloud/cloud-run-basics)
- [**Cloud SQL Basics**](./skills/cloud/cloud-sql-basics)
- [**Firebase Basics**](./skills/cloud/firebase-basics)
Expand Down
115 changes: 115 additions & 0 deletions skills/cloud/bigtable-basics/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
---
name: bigtable-basics
description: >-
Assists in provisioning instances/tables, designing performant schemas, and querying data in Bigtable. Use when designing Bigtable row keys, configuring column families, writing SQL queries or client library code (Java, Go, Python) for Bigtable, or diagnosing performance/hotspotting issues. Also use when provisioning Bigtable clusters using gcloud or cbt CLIs. Don't use for generic Cloud SQL administration.
---

# Bigtable Basics

This skill provides core workflows and guidance for administering and developing
with Google Bigtable.

## Core Principles

- **Control Plane vs. Data Plane:**
- Use **`gcloud`** for Control Plane operations: Manage Instances,
Clusters, App Profiles, Backups and IAM. Create Tables, Logical Views,
Materialized Views and Authorized Views.
- Use **`cbt`** for Data Plane operations: Update Tables, Column Families,
and reading/writing data.
- **Performance First:** Bigtable is a NoSQL database. Efficiency is tied to
Row Key design. Always warn about Full Table Scans.
- **Client Selection:** For production use cases, prefer **Java** or **Go**
for their superior performance and feature coverage compared to other
languages.
- **Observability:** When diagnosing performance or hotspotting, **always**
mention **Key Visualizer** (via Cloud Console) as the primary diagnostic
tool because it provides the most granular view of access patterns across
row keys. This should be followed by the hot-tablets tool and table stats
in gcloud CLI and `include-stats=full` option under `cbt read` to diagnose
slow queries.

> [!IMPORTANT] **Safety Rule:** You MUST obtain explicit user confirmation before
> making non-emulator database changes. You MUST mention this safety requirement
> when providing commands or instructions that modify the database structure or
> data.

## Quick Recipes

### 1. Querying Data

Use SQL for complex transforms or aggregations and key-value APIs for simpler
query patterns. *Note: Use exact match, prefix (`_key LIKE 'myprefix%'`), or
range predicates on `_key` to avoid expensive unbounded scans. Recommend
explicit row ranges (`_key BETWEEN 'start' AND 'end'`) as a more performant
alternative to prefix matches where possible.*

If expensive scans (either unbounded or prefix or range queries scanning a large
range) are unavoidable due to multiple access patterns that can’t all be
accommodated in a single schema, consider one of these two options:

- If the query will be used in user facing and/or latency sensitive
applications, use continuous materialized views with keys optimized for the
additional access patterns.
- If secondary access patterns are infrequent, batch patterns like ETL, ML
model training or analytical read-only tasks, use Bigtable Data Boost
instead.

### 2. Manipulating Data

Use key-value APIs for insert, update, increment and delete operations. SQL API
is read-only.

### 3. Data Model Definition (DDL)

SQL API doesn't support DDL operations. Table creation, deletion, updates should
be made using gcloud CLI. Logical Views and Continuous Materialized Views are
defined as SQL queries but they must be created using gcloud CLI.

## Reference Guides

- **CLI Operations**:
- [infrastructure_management.md](references/infrastructure_management.md):
Provisioning instances, clusters, and table schemas.
- [cli_data_access.md](references/cli_data_access.md): Reading and writing
data via the `cbt` CLI.
- **Design & Discovery**:
- [schema_design.md](references/schema_design.md): Best practices for row
keys and performance with tables and continuous materialized views.
- [dataplex.md](references/dataplex.md): Data catalog search for Bigtable
assets.
- **Querying & Code**:
- [sql_guide.md](references/sql_guide.md): Querying structured row keys
via SQL and CLI.
- [client_libraries.md](references/client_libraries.md): Patterns for
high-performance Go/Java/Python code.

## Common Workflows

### Schema Evolution (DevOps)

1. **Prefer Terraform** for production schema changes to prevent accidental
data loss.
2. For manual `cbt` changes, first check the existing state by listing the table's column families and GC policies before proposing any modifications:

```bash
cbt ls {table}
```

If modifications are needed, create the family or update the GC policy:

```bash
cbt createfamily {table} {family}
cbt setgcpolicy {table} {family} "maxversions=5 AND maxage=30d"
```

3. Reference
[infrastructure_management.md](references/infrastructure_management.md) for
full syntax.

## External Resources

* [Cloud Bigtable Documentation](https://cloud.google.com/bigtable/docs)
* [Bigtable SQL Reference](https://cloud.google.com/bigtable/docs/reference/sql)
* [cbt CLI Reference](https://cloud.google.com/bigtable/docs/cbt-reference)
* [gcloud bigtable Reference](https://cloud.google.com/sdk/gcloud/reference/bigtable)
14 changes: 14 additions & 0 deletions skills/cloud/bigtable-basics/assets/row_key_schema.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
encoding:
delimitedBytes:
delimiter: '#'
fields:
- fieldName: field1
type:
bytesType:
encoding:
raw: {}
- fieldName: field2
type:
bytesType:
encoding:
raw: {}
79 changes: 79 additions & 0 deletions skills/cloud/bigtable-basics/references/cli_data_access.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Bigtable CLI Data Access

This document provides patterns for reading and writing data in Bigtable using
the `cbt` CLI. This is primarily used for debugging and quick data validation.

## Configuring cbt for Data Access

```bash
echo project = ${BIGTABLE_PROJECT} > ~/.cbtrc
echo instance = ${BIGTABLE_INSTANCE} >> ~/.cbtrc
```

## Reading Data

### Read Single Row (Lookup)

Reads all columns and versions for a specific row.

```bash
cbt lookup {table_name} {row_key}
```

*Note: `cbt lookup` is optimized for point reads and is significantly more
efficient than using `cbt read` with a count or filter for retrieving a single
known row.*

### Read N Rows

Reads the first `N` rows from the table.

```bash
cbt read {table_name} count={n}
```

### Read Range

Reads rows between `START_KEY` (inclusive) and `END_KEY` (exclusive).

```bash
cbt read {table_name} start={start_key} end={end_key}
```

### Read using SQL

For complex queries and aggregations use SQL via the `cbt sql` command

```bash
cbt sql "SELECT * FROM my_table WHERE _key = 'user#123'"
```

### Row Count (Estimate)

Provides an estimate of the number of rows in the table.

```bash
gcloud bigtable instances tables describe {table_id} --instance={instance_id} --view stats
```

**Note**: cbt count {table_name} would do a full table scan.

## Writing Data

### Write Cell (Set)

Writes a value to a specific cell (row, family, and column).

```bash
cbt set {table_name} {row_key} {family}:{column}={value}
```

*Example:* `cbt set my-table user123 profile:email=user@example.com`

## Deleting Data

### Delete Row

```bash
cbt deleterow {table_name} {row_key}
```
48 changes: 48 additions & 0 deletions skills/cloud/bigtable-basics/references/client_libraries.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Bigtable Client Library User Guide

This document outlines critical technical details about Bigtable data model and
client libraries.

## Language Recommendations

For production use cases requiring the **best performance and feature
coverage**, **Java** or **Go** are highly recommended. These libraries are
mature, highly optimized, and typically receive new features first. Python is
suitable for scripting and data science but may have lower throughput for
high-concurrency production workloads.

- [Go Example](https://docs.cloud.google.com/bigtable/docs/samples-go-hello)
- [Java Example](https://docs.cloud.google.com/bigtable/docs/samples-java-hello-world)
- [Python Example](https://docs.cloud.google.com/bigtable/docs/samples-python-hello)
- [Node Example](https://docs.cloud.google.com/bigtable/docs/samples-nodejs-hello)

## Timestamp Precision & Granularity

Bigtable stores timestamps as **64-bit integers** representing **microseconds**
since the Unix epoch. However, Bigtable’s internal garbage collection and
versioning operate at **millisecond granularity**.

> [!IMPORTANT] **Implementation Rule:** When generating code to store data,
> calculate the timestamp in milliseconds and multiply by 1,000.
>
> * **Correct:** `timestamp_micros = time_ms() * 1000`
> * **Incorrect:** Using raw microsecond precision (e.g., `time_micros()`), as
> this can lead to unexpected behavior with cell versioning and TTL.

## Replication & Atomic Operations

Bigtable’s replication model impacts the availability of certain "atomicity"
features. These atomic operations are generally less efficient than standard
writes.

* **The Conflict:** **ReadModifyWrite** (increments/appends) and
**CheckAndMutateRow** (conditional updates) require a single-point-of-truth
to maintain consistency. They also require a read before a write, making
them significantly slower and more resource-intensive than standard blind
writes.
* **The Constraint:** These operations **will not work** with multi-cluster
routing (App Profiles set to Multi-cluster).
* **Agent Action:** If a user’s code contains these methods, proactively warn
them that these operations are inefficient and that they must use a
**Single-cluster routing** App Profile or accept that these operations will
fail in a multi-cluster configuration.
27 changes: 27 additions & 0 deletions skills/cloud/bigtable-basics/references/dataplex.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Dataplex Catalog Search for Bigtable

This document provides patterns for searching Bigtable data assets in the
Dataplex Universal Catalog.

## Searching Entries

Searches for entries matching a query in a specific Google Cloud project and
location.

```bash
curl -X POST \
-H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "Content-Type: application/json" \
"https://dataplex.googleapis.com/v1/projects/${BIGTABLE_PROJECT}/locations/{location}:searchEntries" \
-d '{"query": "{search_term} system=Bigtable"}'
```

*Example:* Search for "customer list" in `us-east1`:

```bash
curl -X POST \
-H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "Content-Type: application/json" \
"https://dataplex.googleapis.com/v1/projects/my-project/locations/us-east1:searchEntries" \
-d '{"query": "customer list system=Bigtable"}'
```
Loading