Skip to content

[FEATURE]: Thresholding of CFVariable values #25

@ErikKusch

Description

@ErikKusch

Is your feature request related to a problem? Please describe:
I am in the process of developing ETCCDI calculation functionality based on ncdfCF and need to apply thresholding to the data in two ways:

  1. Each value not exceeding a given threshold is set to FALSEor 0. Every other value is set to TRUE or 1.
  2. Each value not exceeding a given threshold is set to NA. Every other value is untouched.

Describe the solution you'd like
The former is already supported by ncdfCFs CFVariable object:

CFVar <- ncdfCF::open_ncdf("https://thredds.met.no/thredds/dodsC/KSS/Klima_i_Norge/utgave2025/DailyTimeSeries/tas/eqm/ssp370/noresm-r1i1p1f1-hclim/noresm-r1i1p1f1-hclim_ssp370_eqm-sn2018v2005_rawbc_norway_1km_tas_daily_2100.nc4")[["tas"]]
CFVar <- CFVar$subset(lon = c(9, 11), lat = c(59, 61))
Thresh <- CFVar > 273.15
Thresh
<Variable> tas_273_15 

Values: [0 ... 1] 
    NA: 577065 (6.4%)

Axes:
 axis name length values                                        unit
 X    lon  111    [9.008932 ... 10.974082]                      degrees_east
 Y    lat  224    [59.004474 ... 61.000102]                     degrees_north
 T    time 365-U  [2100-01-01T12:00:00 ... 2100-12-31T12:00:00] hours since 1951-01-01 12:00:00

Attributes:
 name         type     length value
 actual_range NC_SHORT 2      0, 1

I am now wondering how I can make the second requirement happen with CFVariable objects. Am I missing an obvious for selectively transforming values into NA? I know that I could simply multiply the original data with the Tresh object like so but that doesn't introduce the required NAs (likewise, of the original data with Tresh leads to unaltered data and Inf values):

Thresh * CFVar
<Variable> tas_273_15_tas 

Values: [0 ... 301.1784] 
    NA: 577065 (6.4%)

Axes:
 axis name length values                                        unit
 X    lon  111    [9.008932 ... 10.974082]                      degrees_east
 Y    lat  224    [59.004474 ... 61.000102]                     degrees_north
 T    time 365-U  [2100-01-01T12:00:00 ... 2100-12-31T12:00:00] hours since 1951-01-01 12:00:00

Attributes:
 name         type      length value
 actual_range NC_DOUBLE 2      0, 301.178406

Are you willing and able to implement the solution
I already attempted to implement this via the $raw() method and then iterating over the layers in the resulting array, but that ended up being very RAM heavy and slow. Happy to help with an implementation wherever I can.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions