Skip to content

text_to_dialogue.convert_with_timestamps returns duplicate timestamps after ~60 seconds #707

@gardner

Description

@gardner

Description

Steps to Reproduce

  1. Create dialogue input with 2674+ characters
  2. Call the API
  3. Inspect timings

Expected Behavior

Accurate timings to be returned

Observed Behavior

Many duplicate timings even though the character_start_index and character_end_index increase while the start_time_seconds remains exactly the same. (See additional context)

Code example

response = client.text_to_dialogue.convert_with_timestamps(
    inputs=inputs["inputs"],
    language_code="en",
    apply_text_normalization="off",
)
print(json.dumps(response.model_dump(), indent=2))

Additional context

Duplicate Timings Example Response

Please note the character_start_index and character_end_index increase while the start_time_seconds remains exactly the same:

Details
"voice_segments": [
  {
    "start_time_seconds": 0.0,
    "end_time_seconds": 3.112,
    "character_start_index": 0,
    "character_end_index": 37,
    "dialogue_input_index": 0
  },
  {
    "start_time_seconds": 3.112,
    "end_time_seconds": 8.227,
    "character_start_index": 37,
    "character_end_index": 88,
    "dialogue_input_index": 1
  },
  {
    "start_time_seconds": 8.227,
    "end_time_seconds": 12.059000000000001,
    "character_start_index": 88,
    "character_end_index": 127,
    "dialogue_input_index": 2
  },
  {
    "start_time_seconds": 12.059000000000001,
    "end_time_seconds": 21.881,
    "character_start_index": 127,
    "character_end_index": 252,
    "dialogue_input_index": 3
  },
  {
    "start_time_seconds": 21.881,
    "end_time_seconds": 36.422999999999995,
    "character_start_index": 252,
    "character_end_index": 421,
    "dialogue_input_index": 4
  },
  {
    "start_time_seconds": 36.422999999999995,
    "end_time_seconds": 47.361999999999995,
    "character_start_index": 421,
    "character_end_index": 532,
    "dialogue_input_index": 5
  },
  {
    "start_time_seconds": 47.361999999999995,
    "end_time_seconds": 56.071,
    "character_start_index": 532,
    "character_end_index": 639,
    "dialogue_input_index": 6
  },
  {
    "start_time_seconds": 56.071,
    "end_time_seconds": 60.778999999999996,
    "character_start_index": 639,
    "character_end_index": 700,
    "dialogue_input_index": 7
  },
  {
    "start_time_seconds": 60.778999999999996,
    "end_time_seconds": 62.324999999999996,
    "character_start_index": 700,
    "character_end_index": 712,
    "dialogue_input_index": 8
  },
  {
    "start_time_seconds": 62.324999999999996,
    "end_time_seconds": 65.40899999999999,
    "character_start_index": 712,
    "character_end_index": 775,
    "dialogue_input_index": 9
  },
  {
    "start_time_seconds": 65.40899999999999,
    "end_time_seconds": 68.11399999999999,
    "character_start_index": 775,
    "character_end_index": 825,
    "dialogue_input_index": 10
  },
  {
    "start_time_seconds": 68.11399999999999,
    "end_time_seconds": 70.37799999999999,
    "character_start_index": 825,
    "character_end_index": 847,
    "dialogue_input_index": 11
  },
  {
    "start_time_seconds": 70.37799999999999,
    "end_time_seconds": 76.13799999999999,
    "character_start_index": 847,
    "character_end_index": 930,
    "dialogue_input_index": 12
  },
  {
    "start_time_seconds": 76.13799999999999,
    "end_time_seconds": 76.13799999999999,
    "character_start_index": 930,
    "character_end_index": 940,
    "dialogue_input_index": 13
  },
  {
    "start_time_seconds": 76.13799999999999,
    "end_time_seconds": 76.13799999999999,
    "character_start_index": 940,
    "character_end_index": 987,
    "dialogue_input_index": 14
  },
  {
    "start_time_seconds": 76.13799999999999,
    "end_time_seconds": 76.13799999999999,
    "character_start_index": 987,
    "character_end_index": 1042,
    "dialogue_input_index": 15
  },
  {
    "start_time_seconds": 76.13799999999999,
    "end_time_seconds": 76.41799999999999,
    "character_start_index": 1042,
    "character_end_index": 1079,
    "dialogue_input_index": 16
  },
  {
    "start_time_seconds": 76.41799999999999,
    "end_time_seconds": 76.41799999999999,
    "character_start_index": 1079,
    "character_end_index": 1114,
    "dialogue_input_index": 17
  },
  {
    "start_time_seconds": 76.41799999999999,
    "end_time_seconds": 76.41799999999999,
    "character_start_index": 1114,
    "character_end_index": 1161,
    "dialogue_input_index": 18
  },
  {
    "start_time_seconds": 76.41799999999999,
    "end_time_seconds": 76.41799999999999,
    "character_start_index": 1161,
    "character_end_index": 1171,
    "dialogue_input_index": 19
  },
  {
    "start_time_seconds": 76.41799999999999,
    "end_time_seconds": 81.13099999999999,
    "character_start_index": 1171,
    "character_end_index": 1254,
    "dialogue_input_index": 20
  },
  {
    "start_time_seconds": 81.13099999999999,
    "end_time_seconds": 81.13099999999999,
    "character_start_index": 1254,
    "character_end_index": 1314,
    "dialogue_input_index": 21
  },
  {
    "start_time_seconds": 81.13099999999999,
    "end_time_seconds": 82.77999999999999,
    "character_start_index": 1314,
    "character_end_index": 1410,
    "dialogue_input_index": 22
  },
  {
    "start_time_seconds": 82.77999999999999,
    "end_time_seconds": 82.77999999999999,
    "character_start_index": 1410,
    "character_end_index": 1474,
    "dialogue_input_index": 23
  },
  {
    "start_time_seconds": 82.77999999999999,
    "end_time_seconds": 83.60199999999999,
    "character_start_index": 1474,
    "character_end_index": 1570,
    "dialogue_input_index": 24
  },
  {
    "start_time_seconds": 83.60199999999999,
    "end_time_seconds": 83.60199999999999,
    "character_start_index": 1570,
    "character_end_index": 1690,
    "dialogue_input_index": 25
  },
  {
    "start_time_seconds": 83.60199999999999,
    "end_time_seconds": 85.39099999999999,
    "character_start_index": 1690,
    "character_end_index": 1711,
    "dialogue_input_index": 26
  },
  {
    "start_time_seconds": 85.39099999999999,
    "end_time_seconds": 85.39099999999999,
    "character_start_index": 1711,
    "character_end_index": 1732,
    "dialogue_input_index": 27
  },
  {
    "start_time_seconds": 85.39099999999999,
    "end_time_seconds": 85.39099999999999,
    "character_start_index": 1732,
    "character_end_index": 1751,
    "dialogue_input_index": 28
  },
  {
    "start_time_seconds": 85.39099999999999,
    "end_time_seconds": 85.39099999999999,
    "character_start_index": 1751,
    "character_end_index": 1761,
    "dialogue_input_index": 29
  },
  {
    "start_time_seconds": 85.39099999999999,
    "end_time_seconds": 85.39099999999999,
    "character_start_index": 1761,
    "character_end_index": 1827,
    "dialogue_input_index": 30
  },
  {
    "start_time_seconds": 85.39099999999999,
    "end_time_seconds": 85.39099999999999,
    "character_start_index": 1827,
    "character_end_index": 1952,
    "dialogue_input_index": 31
  },
  {
    "start_time_seconds": 85.39099999999999,
    "end_time_seconds": 85.39099999999999,
    "character_start_index": 1952,
    "character_end_index": 2100,
    "dialogue_input_index": 32
  },
  {
    "start_time_seconds": 85.39099999999999,
    "end_time_seconds": 85.39099999999999,
    "character_start_index": 2100,
    "character_end_index": 2123,
    "dialogue_input_index": 33
  },
  {
    "start_time_seconds": 85.39099999999999,
    "end_time_seconds": 85.39099999999999,
    "character_start_index": 2123,
    "character_end_index": 2133,
    "dialogue_input_index": 34
  },
  {
    "start_time_seconds": 85.39099999999999,
    "end_time_seconds": 85.39099999999999,
    "character_start_index": 2133,
    "character_end_index": 2164,
    "dialogue_input_index": 35
  },
  {
    "start_time_seconds": 85.39099999999999,
    "end_time_seconds": 85.39099999999999,
    "character_start_index": 2164,
    "character_end_index": 2181,
    "dialogue_input_index": 36
  },
  {
    "start_time_seconds": 85.39099999999999,
    "end_time_seconds": 85.39099999999999,
    "character_start_index": 2181,
    "character_end_index": 2383,
    "dialogue_input_index": 37
  },
  {
    "start_time_seconds": 85.39099999999999,
    "end_time_seconds": 85.39099999999999,
    "character_start_index": 2383,
    "character_end_index": 2386,
    "dialogue_input_index": 38
  },
  {
    "start_time_seconds": 85.39099999999999,
    "end_time_seconds": 85.39099999999999,
    "character_start_index": 2386,
    "character_end_index": 2399,
    "dialogue_input_index": 39
  },
  {
    "start_time_seconds": 85.39099999999999,
    "end_time_seconds": 85.39099999999999,
    "character_start_index": 2399,
    "character_end_index": 2428,
    "dialogue_input_index": 40
  },
  {
    "start_time_seconds": 85.39099999999999,
    "end_time_seconds": 85.39099999999999,
    "character_start_index": 2428,
    "character_end_index": 2437,
    "dialogue_input_index": 41
  },
  {
    "start_time_seconds": 85.39099999999999,
    "end_time_seconds": 86.07799999999999,
    "character_start_index": 2437,
    "character_end_index": 2552,
    "dialogue_input_index": 42
  },
  {
    "start_time_seconds": 86.07799999999999,
    "end_time_seconds": 86.07799999999999,
    "character_start_index": 2552,
    "character_end_index": 2555,
    "dialogue_input_index": 43
  },
  {
    "start_time_seconds": 86.07799999999999,
    "end_time_seconds": 86.07799999999999,
    "character_start_index": 2555,
    "character_end_index": 2616,
    "dialogue_input_index": 44
  },
  {
    "start_time_seconds": 86.07799999999999,
    "end_time_seconds": 86.07799999999999,
    "character_start_index": 2616,
    "character_end_index": 2645,
    "dialogue_input_index": 45
  },
  {
    "start_time_seconds": 86.07799999999999,
    "end_time_seconds": 88.23599999999999,
    "character_start_index": 2645,
    "character_end_index": 2660,
    "dialogue_input_index": 46
  },
  {
    "start_time_seconds": 88.23599999999999,
    "end_time_seconds": 98.05199999999998,
    "character_start_index": 2660,
    "character_end_index": 2674,
    "dialogue_input_index": 47
  }
]

Related Issue

"return word level alignment when running text_to_speech.convert_with_timestamps" #556

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghas reproductionA reproduction case has been providedupstreamIssue originates from upstream or generated code

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions