[PTDT-4832] Prelabels for text subclasses under global text classes #2040

lgluszek · 2026-01-31T03:08:49Z

Description

text subclass prelabels are supported for global classes

Fixes # (issue)

Type of change

How to test

this script creates a prelabel

import labelbox.types as lb_types
import json
import os
from uuid import uuid4
from labelbox.schema.tool_building.types import FeatureSchemaAttribute

client = lb.Client(
    api_key=os.environ["API_KEY"],
    endpoint=os.environ["ENDPOINT"],
)

# Step 1: Define the data row JSON
print("Defining data row...")
row_data = {
    "type": "application/vnd.labelbox.conversational.model-chat-evaluation",
    "version": 2,
    "actors": {
        "user1": {"role": "human", "metadata": {"name": "User"}},
        "model1": {"role": "model", "metadata": {"modelConfigName": "Assistant"}}
    },
    "messages": {
        "message1": {
            "actorId": "user1",
            "content": [
                {"type": "dataRowAttachment", "attachmentName": "image_0"},
                {"type": "text", "content": "**Question:**\n\nDescribe the visual elements and composition of this image."}
            ],
            "childMessageIds": ["message2"]
        },
        "message2": {
            "actorId": "model1",
            "content": [{"type": "text", "content": "The image displays an interior space, likely an art gallery, with a polished wooden floor and white walls. There are three large paintings prominently featured on the walls. The left painting has a predominantly turquoise color with white and black strokes, depicting abstract forms. The central painting is smaller and has a darker, muted color palette with indistinct forms. The right painting is similar in size to the left one and features abstract figures in blue and white tones.\n\nAbove the paintings, there are multiple spotlights affixed to the ceiling, providing illumination to the artworks. The ceiling is industrial in style, with visible ductwork and a series of recessed lights. The gallery space is well-lit, with the lighting creating a warm ambiance.\n\nThe text \"artasiapacific A Gift Like Kung Pao Chicken Xie Nanxing\" is overlaid on the image at the top, suggesting the title of the artwork or exhibition and the artist's name. The gallery's name, \"artasiapacific,\" is also visible, indicating the venue or the organizer of the exhibition. The overall composition of the image is balanced, with the artworks evenly spaced and the spotlights symmetrically arranged."}],
            "childMessageIds": []
        }
    },
    "rootMessageIds": ["message1"]
}

# Step 2: Create dataset and upload data row
print("Creating dataset...")
dataset = client.create_dataset(name="AI Automotive Assistant Tool Verification Dataset")
print(f"Created dataset: {dataset.name} (ID: {dataset.uid})")

global_key = f"automotive-tool-verification-{uuid4()}"
asset = {
    "global_key": global_key,
    "row_data": json.dumps(row_data),
}

print("Uploading data row...")
task = dataset.create_data_rows([asset])
task.wait_till_done()
print(f"Data row uploaded with global_key: {global_key}")

# Get the data row ID for prelabels
data_rows = list(dataset.data_rows())
data_row_id = data_rows[0].uid
print(f"Data row ID: {data_row_id}")

# Step 3: Create project
print("\nCreating offline MMC project...")
project = client.create_offline_model_evaluation_project(
   name="MMC Offline Prelabels Project 2",
    description="Offline MMC project with nested ontology and prelabels demo",
)
print(f"Created project: {project.name} (ID: {project.uid})")

# Step 4: Create ontology with nested text classifications
print("\nCreating ontology...")
ontology_builder = lb.OntologyBuilder(
    tools=[],
    classifications=[
        lb.Classification(
            class_type=lb.Classification.Type.TEXT,
            name="competitor_a_response_1",
            instructions="Competitor_A_Response_1",
            scope=lb.Classification.Scope.GLOBAL,
            required=True,
            options=[
                lb.Classification(
                    class_type=lb.Classification.Type.TEXT,
                    name="non_verifiable_rubric_criteria",
                    instructions="Non-Verifiable Rubric Criteria",
                    required=True,
                    attributes=[
                        FeatureSchemaAttribute(
                            attributeName="instructions",
                            attributeValue="For this competitor response please respond in a JSON/dictionary format that has the key-value pairs as criterion name (copied from above) and PASS/FAIL\n\ne.g.\n\n{\nwrites in childish tone: PASS,\noutput in haiku: FAIL,\nsummarises main points of conversation: PASS\n}"
                        )
                    ]
                )
            ]
        )
    ]
)

ontology = client.create_ontology(
    "AI Automotive Assistant Tool Verification Ontology",
    ontology_builder.asdict(),
    media_type=lb.MediaType.Conversational,
    ontology_kind=lb.OntologyKind.ModelEvaluation
)
print(f"Created ontology: {ontology.name} (ID: {ontology.uid})")

# Step 5: Connect ontology to project
print("\nConnecting ontology to project...")
project.connect_ontology(ontology)
print("Successfully connected ontology to project!")

# Step 6: Create batch to add data row to project
print("\nCreating batch...")
project.create_batch(
    name="prelabels-batch",
    global_keys=[global_key],
    priority=1,
)
print("Batch created successfully!")

# Step 7: Create prelabels
print("\nCreating prelabels...")

# Global prelabel for competitor_a_response_1 with nested non_verifiable_rubric_criteria
competitor_label = lb_types.ClassificationAnnotation(
    name="Competitor_A_Response_1",
    value=lb_types.Text(answer="Sample competitor response text", classifications=[
        lb_types.ClassificationAnnotation(
            name="Non-Verifiable Rubric Criteria",
            value=lb_types.Text(answer="{\nwrites in childish tone: PASS,\noutput in haiku: FAIL\n}")
        )
    ]),
    
)

labels = [
    lb_types.Label(
        data={"uid": data_row_id},
        annotations=[competitor_label]
    )
]

upload_job = lb.MALPredictionImport.create_from_objects(
    client=client,
    project_id=str(project.uid),
    name="import-mmc-labels-" + str(uuid4()),
    predictions=labels,
)
upload_job.wait_till_done()

print(f"\nPrelabel upload complete!")
print(f"Errors: {upload_job.errors}")

assert len(upload_job.errors) == 0, f"Upload had errors: {upload_job.errors}"

print(f"Statuses: {upload_job.statuses}")

# Print summary
print("\n" + "=" * 50)
print("SUMMARY")
print("=" * 50)
print(f"Dataset ID: {dataset.uid}")
print(f"Data Row ID: {data_row_id}")
print(f"Project ID: {project.uid}")
print(f"Ontology ID: {ontology.uid}")

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Document change (fix typo or modifying any markdown files, code comments or anything in the examples folder only)

All Submissions

Have you followed the guidelines in our Contributing document?
Have you provided a description?
Are your changes properly formatted?

New Feature Submissions

Does your submission pass tests?
Have you added thorough tests for your new feature?
Have you commented your code, particularly in hard-to-understand areas?
Have you added a Docstring?

Changes to Core Features

Have you written new tests for your core changes, as applicable?
Have you successfully run tests with your changes locally?
Have you updated any code comments, as applicable?

Note

Medium Risk
Touches NDJSON (de)serialization for Text classifications by adding a new optional nested classifications field; incorrect mapping could break prelabel import/export for text classifications.

Overview
Adds an optional classifications field to the Text classification answer model to support nested subclassifications.

Updates NDJSON conversion for text classifications (NDTextSubclass.to_common/from_common and NDText.from_common) to round-trip these nested classifications via NDSubclassification, enabling prelabels for text subclasses.

^{Written by Cursor Bugbot for commit c9bfc39. This will update automatically on new commits. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-01-31T03:16:51Z

libs/labelbox/src/labelbox/data/serialization/ndjson/classification.py

+                for annot in text.classifications
+            ]
+            if text.classifications
+            else None,


Missing serializer to remove empty classifications in NDText

Low Severity

NDText is missing the serialize_model method that NDChecklist and NDRadio have to remove empty classifications from serialized output. When NDText has classifications=[] (e.g., from deserializing ndjson with an empty list), it will serialize with "classifications": [], while NDChecklist and NDRadio would omit the key entirely. This creates inconsistent serialization behavior across classification types.

Additional Locations (2)

libs/labelbox/src/labelbox/data/serialization/ndjson/classification.py#L268-L275

libs/labelbox/src/labelbox/data/serialization/ndjson/classification.py#L350-L356

Prelabels for text subclasses under global text classes

c9bfc39

lgluszek assigned Jmsa and lgluszek Jan 31, 2026

lgluszek requested a review from a team as a code owner January 31, 2026 03:08

lgluszek requested review from Jmsa, KahvehS, cyrusj89, jvega21, labelbox-engineering, mnoszczak and ramy1951 January 31, 2026 03:08

lgluszek temporarily deployed to Test-PyPI January 31, 2026 03:09 — with GitHub Actions Inactive

lb-mwetter approved these changes Jan 31, 2026

View reviewed changes

Jmsa approved these changes Jan 31, 2026

View reviewed changes

cursor bot reviewed Jan 31, 2026

View reviewed changes

lgluszek merged commit 4d8931d into develop Jan 31, 2026
25 of 27 checks passed

lgluszek deleted the lgluszek/PTDT-4832 branch January 31, 2026 03:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PTDT-4832] Prelabels for text subclasses under global text classes #2040

[PTDT-4832] Prelabels for text subclasses under global text classes #2040

Uh oh!

lgluszek commented Jan 31, 2026 •

edited

Loading

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Jan 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[PTDT-4832] Prelabels for text subclasses under global text classes #2040

[PTDT-4832] Prelabels for text subclasses under global text classes #2040

Uh oh!

Conversation

lgluszek commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

How to test

All Submissions

New Feature Submissions

Changes to Core Features

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Jan 31, 2026

Choose a reason for hiding this comment

Missing serializer to remove empty classifications in NDText

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lgluszek commented Jan 31, 2026 •

edited

Loading