Skip to content

Add table_histogram endpoint#677

Open
will-moore wants to merge 3 commits into
ome:masterfrom
will-moore:table_histogram
Open

Add table_histogram endpoint#677
will-moore wants to merge 3 commits into
ome:masterfrom
will-moore:table_histogram

Conversation

@will-moore

@will-moore will-moore commented Jun 5, 2026

Copy link
Copy Markdown
Member

We have a histogram functionality in omero-parade and now I also need it for iviewer (ome/omero-iviewer#532), so it makes sense for this to go into omero-web.

This endpoint behaves similarly to the existing OMERO.table slice endpoint e.g. /webgateway/table/FILE_ID/slice/?columns=0&rows=0-100 and wraps the table_slice() for loading the data, then generates a histogram using numpy and returns the result.

By default, we use ALL the rows to generate the histogram.
Since we don't want to have load the table twice (to get the row-count before passing the rows = 0-row_count-1 to table_slice(), I have updated the table_slice() to allow rows=* (no change on max amount of data permitted).

So you can now do /webgateway/table/FILE_ID/slice/?columns=0&rows=*

Histogram supports the bins request parameter (int or string) - behaves as described at https://numpy.org/devdocs/reference/generated/numpy.histogram.html

Sample response to /webgateway/table/15908/histogram/?columns=2,3 on merge-ci

{
  "histograms": [
    {
      "column": "x_centroid",
      "histogram": [1449, 2750, 2982, 3161, 3393, 3455, 3012, 2643, 1161, 400],
      "bin_edges": [
        3.757423210144043, 766.061197490692, 1528.36497177124,
        2290.6687460517883, 3052.9725203323364, 3815.2762946128846,
        4577.580068893432, 5339.8838431739805, 6102.187617454529,
        6864.491391735077, 7626.795166015625
      ]
    },
    {
      "column": "y_centroid",
      "histogram": [52, 142, 32, 1388, 3627, 3905, 4269, 4326, 4111, 2554],
      "bin_edges": [
        39.39493064880371, 614.6632842636108, 1189.9316378784179,
        1765.199991493225, 2340.4683451080323, 2915.7366987228393,
        3491.0050523376467, 4066.2734059524537, 4641.541759567261,
        5216.810113182068, 5792.078466796875
      ]
    }
  ],
  "meta": {
    "columns": ["x_centroid", "y_centroid"],
    "rowCount": 24406,
    "columnCount": 13,
    "maxCells": 1000000
  }
}

@knabar

knabar commented Jun 19, 2026

Copy link
Copy Markdown
Member

The time to calculate a histogram on demand will directly depend on the number of rows in the table and likely won't be sustainable for tables with millions of rows, which we are seeing regularly now.

Our strategy is to calculate column statistics for most numeric columns at the time of table creation and store them in the table metadata (The roi column and a few others are excluded, as statistics are not meaningful there). Metadata fields created include

  • <column name>.min
  • <column name>.max
  • <column name>.mean
  • <column name>.median
  • <column name>.std
  • <column name>.skew
  • <column name>.kurtosis
  • <column name>.histogram.count
  • <column name>.histogram.division

All custom metadata fields are already returned via the webgateway/table/<id>/metadata/ endpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants