Improve docx importing with recursive text splitting to avoid excessively large cells

When importing a .docx file, block styles (e.g., paragraphs) provide the main cell divisions. Some paragraphs can be very long, however, making validation and drafting unwieldy and unclear, diluting our main value proposition. 

Therefore, we can implement recursive text splitting until a typical target cell length is found.

Basically, split on new paragraphs. If the length is longer than the target cell length, split by newlines, then split by `.|!|?`, then split by minor stops, then split by whitespace (or some variation of this pecking order).

This will ensure we extract usefully sized validation units.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve docx importing with recursive text splitting to avoid excessively large cells #782

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Improve docx importing with recursive text splitting to avoid excessively large cells #782

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions