Skip to content

WIP: Add tutorials about ragged tensors.#823

Open
csukuangfj wants to merge 7 commits into
k2-fsa:masterfrom
csukuangfj:doc
Open

WIP: Add tutorials about ragged tensors.#823
csukuangfj wants to merge 7 commits into
k2-fsa:masterfrom
csukuangfj:doc

Conversation

@csukuangfj

Copy link
Copy Markdown
Collaborator

No description provided.

@csukuangfj

Copy link
Copy Markdown
Collaborator Author

A preview can be found at

https://csukuangfj.github.io/k2/python_tutorials/ragged/basics.html#

@danpovey

danpovey commented Sep 11, 2021 via email

Copy link
Copy Markdown
Collaborator

@GNroy

GNroy commented Sep 13, 2021

Copy link
Copy Markdown

@csukuangfj Thanks for this tutorial!
Could you please clarify how ragged tensors relate to, say, PyTorch sparse matrices? They look quite similar.

@csukuangfj

Copy link
Copy Markdown
Collaborator Author

@csukuangfj Thanks for this tutorial!
Could you please clarify how ragged tensors relate to, say, PyTorch sparse matrices? They look quite similar.

TensorFlow has sparse matrices and ragged tensors, see

PyTorch also has sparse matrices and nested tensors, see

We use the same terminology, i.e., row splits, row ids, etc, as the one used in tf.RaggedTensor, though ragged tensors in k2 were designed by @danpovey independently. We were later told that TensorFlow was using the same ideas.


A ragged tensor with 2 axes looks similar to a sparse matrix in CSR format, but they are different.

From https://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_row_(CSR,_CRS_or_Yale_format) , a sparse matrix in CSR format has the following components:

  • ROW_INDEX
  • COL_INDEX
  • V

The ROW_INDEX is called row_splits in k2 and V is called values in k2. That's why I said a ragged tensor in k2
shares some similarities with sparse matrices.

However, there is no COL_INDEX in ragged tensors. We are not viewing a ragged tensor as a ragged matrix.
For a ragged tensor of 2 axes, what we care about is the number of elements in each row, we don't assign a column index to entries in a row.

PyTorch's sparse matrices use COO format. But anyway, they are still matrices with row indexes and column indexes.


Also, ragged tensors in k2 are not designed for linear algebra operations, i.e., there are no matrix-vector or matrix-matrix multiplications. Instead, they are designed for efficiently manipulating irregular data structures on GPU.

@GNroy

GNroy commented Sep 14, 2021

Copy link
Copy Markdown

Many thanks for the clarification!

A humble suggestion: you might consider including this information in the tutorial because I am hardly the last person to ask questions like this.

@csukuangfj csukuangfj force-pushed the doc branch 4 times, most recently from 2c20650 to 5fc2189 Compare September 15, 2021 04:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants