Skip to content

Refactor of hdfeos5_or_csv_2json_mbtiles.py #3

@falkamelung

Description

@falkamelung

Hi @stackTom
As we now ingest multiple file types we think that hdfeos5_or_csv_2json_mbtiles.py can be significantly simplified. Emirhan and myself will do it, but I want to check whether you see any potential problems. IThis is what we want to do:

  • We can significantly reduce the number of needed (required) attributes. Only very few will be required. Some of them can can inferred from the data as outlined here:

    needed_attributes = {
    "prf", "first_date", "mission", "WIDTH", "X_STEP", "processing_software",
    "wavelength", "processing_type", "beam_swath", "Y_FIRST", "look_direction",
    "flight_direction", "last_frame", "post_processing_method", "min_baseline_perp",
    "unwrap_method", "relative_orbit", "beam_mode", "LENGTH", "max_baseline_perp",
    "X_FIRST", "atmos_correct_method", "last_date", "first_frame", "frame", "Y_STEP", "history",
    "scene_footprint", "data_footprint", "downloadUnavcoUrl", "referencePdfUrl", "areaName", "referenceText",
    "REF_LAT", "REF_LON", "CENTER_LINE_UTC", "insarmaps_download_flag", "mintpy.subset.lalo"
    }
    # FA 4/2025 suggestions:
    # required_attributes_in_data = {
    # "mission",
    # "beam_mode",
    # "flight_direction",
    # "relative_orbit",
    # "processing_method", # {MiaplPy, MintPy, Sarvey, TRE}
    # }
    # required_attributes_inferred = {
    # "data_footprint", # infer if not given
    # "data_type", # Default: LOS_TIMESERIES
    # "look_direction", # Default: R (L for mission=NISAR)
    # "start_date", # Always infer
    # "end_date", # Always infer
    # "history", # Always infer (processing day's date, i.e. today)
    # }
    # optional_attributes_in_data = {
    # "REF_LAT",
    # "REF_LON",
    # "areaName", # to be used for search
    # "beamSwath"
    # }

  • We don't need a distinction of high-res mode as long as we remove X_STEP, Y_STEP, X_FIRST, Y_FIRST' as required attributes. We also don't need WIDTH, LENGTH`. We read all data as a 1D list. No need to go to a grid. The following code can be removed:

    # get the attributes for calculating latitude and longitude
    x_step, y_step, x_first, y_first = 0, 0, 0, 0
    if high_res_mode(attributes):
    needed_attributes.remove("X_STEP")
    needed_attributes.remove("Y_STEP")
    needed_attributes.remove("X_FIRST")
    needed_attributes.remove("Y_FIRST")
    else:
    x_step = float(attributes["X_STEP"])
    y_step = float(attributes["Y_STEP"])
    x_first = float(attributes["X_FIRST"])
    y_first = float(attributes["Y_FIRST"])
    num_columns = int(attributes["WIDTH"])
    num_rows = int(attributes["LENGTH"])
    print("columns: %d" % num_columns)

  • We don't need to consider a 2D grid with rows and cols. We just read the lat/long into 1D lists. This code creating lats_grid, lons_grid can be eliminated:

    padded_lats = np.full(num_cols * num_rows, np.nan)
    padded_lats [:num_points] = lats
    lats_grid = padded_lats.reshape((num_rows, num_cols))
    padded_lons = np.full(num_cols * num_rows, np.nan)
    padded_lons[:num_points] = lons
    lons_grid = padded_lons.reshape((num_rows, num_cols))

  • Same in create_json function. We don't need to loop over [row][col] if we deal with 1D lists. This code will significantly simplify:

    for (row, col), value in np.ndenumerate(timeseries_datasets[dates[0]]):
    cur_iter_point_num = row * num_columns + col
    if cur_iter_point_num < work_idxs[0]:
    continue
    elif cur_iter_point_num > work_idxs[1]:
    break
    longitude = float(lons[row][col])
    latitude = float(lats[row][col])

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions