Skip to content

[dark-data-agent-chat] Unable to create the DataScan through curl #1995

@ikhwani

Description

@ikhwani

When running the script as described in step 6, one will get the following error:

{
  "error": {
    "code": 400,
    "message": "Invalid JSON payload received. Unknown name \"entity_inference_enabled\" at 'data_scan.data_discovery_spec.storage_config.unstructured_data_options': Cannot find field.",
    "status": "INVALID_ARGUMENT",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.BadRequest",
        "fieldViolations": [
          {
            "field": "data_scan.data_discovery_spec.storage_config.unstructured_data_options",
            "description": "Invalid JSON payload received. Unknown name \"entity_inference_enabled\" at 'data_scan.data_discovery_spec.storage_config.unstructured_data_options': Cannot find field."
          }
        ]
      }
    ]
  }
}

There are 2 problems:

  1. The API expects camel case, so it should be "onDemand" instead of "on_demand".
  2. "entity_inference_enabled" is also not a camel case, but replacing it with "entityInferenceEnabled" will not work, since the key is replaced with "semanticInferenceEnabled".

So the script in step 6 should be:

# 1. Set your variables
PROJECT_ID="<PROJECT_ID>"
REGION="<REGION>"
ENV_SUFFIX="stg1"
DATASCAN_ID="froyo-data-${ENV_SUFFIX}"
BUCKET_NAME="<BUCKET_NAME>"

# 2. Set this to the Name of the connection you created in Step 7
CONNECTION_ID="<CONNECTION_ID_NAME>"

# 3. Define the API Endpoint
DATAPLEX_API="dataplex.googleapis.com/v1/projects/${PROJECT_ID}/locations/${REGION}"

# 4. Create the DataScan via CURL
echo "Creating Dataplex DataScan: ${DATASCAN_ID}..."

curl -X POST "https://$DATAPLEX_API/dataScans?dataScanId=${DATASCAN_ID}" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d '{
"data": {
   "resource": "//storage.googleapis.com/projects/'"${PROJECT_ID}"'/buckets/'"${BUCKET_NAME}"'"
   },
"executionSpec": {
   "trigger": {
      "onDemand": {}
   }
},
"dataDiscoverySpec": {
   "bigqueryPublishingConfig": {
      "tableType": "BIGLAKE",
      "connection": "projects/'"${PROJECT_ID}"'/locations/'"${REGION}"'/connections/'"${CONNECTION_ID}"'"
   },
   "storageConfig": {
      "unstructuredDataOptions": {
      "semanticInferenceEnabled": true
      }
   }
   }
}'

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions