glamod · ludwiglierhammer · Jun 25, 2026 · May 29, 2026 · May 29, 2026 · May 29, 2026
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -23,7 +23,6 @@ repos:
       - id: fix-byte-order-marker
       - id: name-tests-test
         args: [ '--pytest-test-first' ]
-        exclude: ^tests/_duplicates.py$
       - id: no-commit-to-branch
         args: [ '--branch', 'main' ]
       - id: trailing-whitespace

diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -20,6 +20,18 @@ Breaking changes
 ^^^^^^^^^^^^^^^^
 * Development dependencies ("dev", "docs") are now installed via the new `dependency-groups` conventions (`PEP 735 <https://peps.python.org/pep-0735/>`_) (:pull:`419`)
 * `prek` is now the suggested pre-commit runner (installed by default via `pip install --group dev`) (:pull:`419`)
+* delete submodule ``src.cdm_reader_mapper.duplicates`` (:issue:`152`, :issue:`283`, :pull:`434`)
+
+  * ``cdm_reader_mapper.DupDetect`` is not importable anymore
+  * ``cdm_reader_mapper.duplicate_check`` is not importable anymore
+  * ``cdm_reader_mapper.DataBundle.duplicate_check`` is not callable anymore
+  * ``cdm_reader_mapper.DataBundle.get_duplicates`` is not callable anymore
+  * ``cdm_reader_mapper.DataBundle.flag_duplicates`` is not callable anymore
+  * ``cdm_reader_mapper.DataBundle.remove_duplicates`` is not callable anymore
+  * ``cdm_reader_mapper.DataBundle`` does not have attribute ``DupDetect`` anymore
+
+* submodule ``src.cdm_reader_mapper.duplicates`` has been moved to `marine_qc <https://github.com/glamod/marine_qc/pull/207/>`_ (:issue:`283`, :pull:`434`)
+
 
 Internal changes
 ^^^^^^^^^^^^^^^^

diff --git a/docs/api.rst b/docs/api.rst
@@ -43,9 +43,6 @@ Useful functions
 .. autofunction:: cdm_reader_mapper.correct_pt
    :noindex:
 
-.. autofunction:: cdm_reader_mapper.duplicate_check
-   :noindex:
-
 .. autofunction:: cdm_reader_mapper.map_model
    :noindex:
 
@@ -84,12 +81,3 @@ Useful functions
 
 .. autofunction:: cdm_reader_mapper.write_tables
    :noindex:
-
-.. _dupdetect:
-
-DupDetect
-=========
-
-.. autoclass:: cdm_reader_mapper.DupDetect
-   :members:
-   :noindex:
diff --git a/docs/getting-started.rst b/docs/getting-started.rst
@@ -41,24 +41,7 @@ In this case deck 704: US Marine Meteorological Journal collection of data code:
 
     cdm_tables = db_cdm.data
 
-4. Detect duplicated observations
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Detect and flag duplicated observations without overwriting the original CDM tables:
-
-.. code-block:: console
-
-    db_dup = db.duplicate_check()
-
-     db_dup_f = db_dup.flag_duplicates()
-
-     flagged_tables = db_dup_f.data
-
-     db_dup_r = db_dup.remove_duplicates()
-
-     removed_tables = db_dup_r.data
-
-5. Write the output
+4. Write the output
 ~~~~~~~~~~~~~~~~~~~
 This writes the output to an ascii file with a pipe delimited format using the following function:
 

diff --git a/docs/hyperlinks.rst b/docs/hyperlinks.rst
@@ -8,8 +8,6 @@
 
 .. _CDM: https://github.com/glamod/common_data_model/blob/master/cdm_latest.pdf
 
-.. _CDM code tables for duplicate_status: https://glamod.github.io/cdm-obs-documentation/tables/code_tables/duplicate_status/duplicate_status.html
-
 .. _CDM code tables for report_quality: https://glamod.github.io/cdm-obs-documentation/tables/code_tables/quality_flag/quality_flag.html
 
 .. _conda: https://docs.conda.io/en/latest/

diff --git a/docs/index.rst b/docs/index.rst
@@ -7,7 +7,6 @@ The **cdm_reader_mapper** toolbox is a python3_ tool designed for
 
 * reading original marine-meteorological data files compliant with a user specified data model (:ref:`data-models`) into a Marine Data Format (MDF) file.
 * mapping observed meteorological variables and its associated metadata from a data model (:ref:`data-models`) to the C3S CDS Common Data Model (CDM_) format or **imodel** as called in this tool.
-* to detect and flag or remove duplicated observations
 
 It was developed with the initial idea of reading data from the International Comprehensive Ocean-Atmosphere Data Set (ICOADS_) stored in the International Maritime Meteorological Archive (IMMA_) data format. In the meanwhile, it can read data C-RAID_ Copernicus in situ project too.
 
@@ -34,9 +33,6 @@ The reader allows for basic transformations of the data. This feature includes `
 In addition, the **cdm_reader_mapper.DataBundle** object has several main method functions:
 
 * :py:func:`DataBundle.map_model`: map observed variables and its associated metadata from a data model or models combination to the standardized C3S CDS Common Data Model (CDM_) format.
-* :py:func:`DataBundle.duplicate_check`: detect duplicated observations
-* :py:func:`DataBundle.flag_duplicates`: flag detected duplicated observations
-* :py:func:`DataBundle.remove_duplicates`: remove detected duplicated observations
 * :py:func:`DataBundle.write`: save both observational MDF files as a coma-separated list and observational standardized CDM tables as pipe-seperated lists
 
 .. toctree::

diff --git a/docs/tool-overview-databundle.rst b/docs/tool-overview-databundle.rst
@@ -84,22 +84,4 @@ Now the meteorological data can be maqpped to the Common Data Model (CDM_) using
 
 For more information how the mapping is working, please see :ref:`tool-overview-mapper` and/or :ref:`how-to-register-a-new-data-model-mapping`.
 
-:ref:`dupdetect`
-^^^^^^^^^^^^^^^^
-
-After mapping to the CDM format it is useful to check if the CDM tables contain any duplicates. The duplicate checker included in the ``cdm_reader_mapper`` toolbox is based on python record linkage toolkit RecordLinkage_.
-
-The first step is to call the method function :func:`.DataBundle.duplicate_check`. This function scans the CDM tables for any duplicates.
-
-.. code-block:: console
-
-    db_dup = db.duplicate_check()
-
-Afterwards there are two options how to deal with the detected duplicates:
-
-1. :func:`.DataBundle.flag_duplicates`
-2. :func:`.DataBundle.remove_duplicates`
-
-The first function flags the detected duplicates. For more information about the flags see `CDM code tables for duplicate_status`_ and `CDM code tables for report_quality`_. The second function removes the detected duplicates.
-
 .. include:: hyperlinks.rst
diff --git a/environment-docs.yml b/environment-docs.yml
@@ -57,4 +57,3 @@ dependencies:
       - msgpack
       - requests
       - platformdirs >4.0.0
-      - recordlinkage >=0.15
diff --git a/pyproject.toml b/pyproject.toml
@@ -96,7 +96,6 @@ dependencies = [
   "pandas>=2.2.0",
   "platformdirs >4.0.0",
   "pyarrow >=15.0.0",
-  "recordlinkage >= 0.15",
   "requests",
   "timezonefinder >6.5.0,<9.0.0",
   "xarray >=2023.11.0,!=2024.10.0"

diff --git a/src/cdm_reader_mapper/__init__.py b/src/cdm_reader_mapper/__init__.py
@@ -19,10 +19,6 @@
 from .core.reader import read
 from .core.writer import write
 from .data import test_data
-from .duplicates.duplicates import (
-    DupDetect,
-    duplicate_check,
-)
 from .mdf_reader.reader import read_data, read_mdf
 from .mdf_reader.writer import write_data
 from .metmetpy import (
@@ -35,11 +31,9 @@
 
 __all__ = [
     "DataBundle",
-    "DupDetect",
     "cdm_tables",
     "correct_datetime",
     "correct_pt",
-    "duplicate_check",
     "map_model",
     "read",
     "read_data",

diff --git a/src/cdm_reader_mapper/core/databundle.py b/src/cdm_reader_mapper/core/databundle.py
@@ -17,7 +17,6 @@
     split_by_index,
 )
 from cdm_reader_mapper.common.iterators import ParquetStreamReader, is_valid_iterator
-from cdm_reader_mapper.duplicates.duplicates import DupDetect, duplicate_check
 from cdm_reader_mapper.metmetpy import (
     correct_datetime,
     correct_pt,
@@ -154,7 +153,6 @@ def __init__(
         self._mask: pd.DataFrame | ParquetStreamReader = mask
         self._imodel = imodel
         self._mode = mode
-        self.DupDetect: DupDetect | None = None
 
     def __len__(self) -> int:
         """
@@ -1414,208 +1412,3 @@ def write(
             mode=mode,
             **kwargs,
         )
-
-    def duplicate_check(self, inplace: bool = False, **kwargs: Any) -> DataBundle | None:
-        r"""
-        Duplicate check in :py:attr:`data`.
-
-        Parameters
-        ----------
-        inplace : bool, default: False
-            If True overwrite :py:attr:`data` in :py:class:`~DataBundle`
-            else return a copy of :py:class:`~DataBundle` with :py:attr:`data` as CDM tables.
-        \**kwargs : Any
-            Additional keyword-arguments for duplicate check.
-
-        Returns
-        -------
-        :py:class:`~DataBundle` or None
-            DataBundle containing new :py:class:`~DupDetect` class for further duplicate check methods or None if "inplace=True".
-
-        See Also
-        --------
-        DataBundle.get_duplicates : Get duplicate matches in `data`.
-        DataBundle.flag_duplicates : Flag detected duplicates in `data`.
-        DataBundle.remove_duplicates : Remove detected duplicates in `data`.
-
-        Notes
-        -----
-        Following columns have to be provided:
-
-          * `longitude`
-          * `latitude`
-          * `primary_station_id`
-          * `report_timestamp`
-          * `station_course`
-          * `station_speed`
-
-        This adds a new class :py:class:`~DupDetect` to :py:class:`~DataBundle`.
-        This class is necessary for further duplicate check methods.
-
-        For more information see :py:func:`duplicate_check`
-
-        Examples
-        --------
-        >>> db.duplicate_check()
-        """
-        db_ = self._get_db(inplace)
-        if db_ is None:
-            return None
-        if db_._mode == "tables" and "header" in db_._data:
-            data = db_._data["header"]
-        else:
-            data = db_._data
-        db_.DupDetect = duplicate_check(data, **kwargs)
-        return self._return_db(db_, inplace)
-
-    def flag_duplicates(self, inplace: bool = False, **kwargs: Any) -> DataBundle | None:
-        r"""
-        Flag detected duplicates in :py:attr:`data`.
-
-        Parameters
-        ----------
-        inplace : bool, default: False
-            If True overwrite :py:attr:`data` in :py:class:`~DataBundle`
-            else return a copy of :py:class:`~DataBundle` with :py:attr:`data` containing flagged duplicates.
-        \**kwargs : Any
-            Additional keyword-arguments for flagging duplicates.
-
-        Returns
-        -------
-        :py:class:`~DataBundle` or None
-            DataBundle containing duplicate flags in :py:attr:`data` or None if "inplace=True".
-
-        Raises
-        ------
-        RuntimeError
-            Before flagging duplicates, a duplictate check has to be done, :py:func:`DataBundle.duplicate_check`.
-
-        See Also
-        --------
-        DataBundle.remove_duplicates : Remove detected duplicates in `data`.
-        DataBundle.get_duplicates : Get duplicate matches in `data`.
-        DataBundle.duplicate_check : Duplicate check in `data`.
-
-        Notes
-        -----
-        For more information see :py:func:`DupDetect.flag_duplicates`
-
-        Examples
-        --------
-        Flag duplicates without overwriting :py:attr:`data`.
-
-        >>> flagged_tables = db.flag_duplicates()
-
-        Flag duplicates with overwriting :py:attr:`data`.
-
-        >>> db.flag_duplicates(inplace=True)
-        >>> flagged_tables = db.data
-        """
-        db_ = self._get_db(inplace)
-        if db_ is None:
-            return None
-
-        if db_.DupDetect is None:
-            raise RuntimeError("Before flagging duplicates, a duplictate check has to be done: 'db.duplicate_check()'")
-
-        db_.DupDetect.flag_duplicates(**kwargs)
-
-        if db_._mode == "tables" and "header" in db_._data:
-            db_._data["header"] = db_.DupDetect.result
-        else:
-            db_._data = db_.DupDetect.result
-        return self._return_db(db_, inplace)
-
-    def get_duplicates(self, **kwargs: Any) -> pd.DataFrame:
-        r"""
-        Get duplicate matches in :py:attr:`data`.
-
-        Parameters
-        ----------
-        \**kwargs : Any
-            Additional keyword-arguments used for getting duplicates.
-
-        Returns
-        -------
-        pd.DataFrame
-            DataFrame containing duplicate matches.
-
-        Raises
-        ------
-        RuntimeError
-            Before getting duplicates, a duplictate check has to be done, :py:func:`DataBundle.duplicate_check`.
-
-        See Also
-        --------
-        DataBundle.remove_duplicates : Remove detected duplicates in `data`.
-        DataBundle.flag_duplicates : Flag detected duplicates in `data`.
-        DataBundle.duplicate_check : Duplicate check in `data`.
-
-        Notes
-        -----
-        For more information see :py:func:`DupDetect.get_duplicates`
-
-        Examples
-        --------
-        >>> matches = db.get_duplicates()
-        """
-        if self.DupDetect is None:
-            raise RuntimeError("Before getting duplicates, a duplictate check has to be done: 'db.duplicate_check()'")
-        return self.DupDetect.get_duplicates(**kwargs)
-
-    def remove_duplicates(self, inplace: bool = False, **kwargs: Any) -> DataBundle | None:
-        r"""
-        Remove detected duplicates in :py:attr:`data`.
-
-        Parameters
-        ----------
-        inplace : bool, default: False
-            If True overwrite :py:attr:`data` in :py:class:`~DataBundle`
-            else return a copy of :py:class:`~DataBundle` with :py:attr:`data` containing no duplicates.
-        \**kwargs : Any
-            Additional keyword-arguments used to remove duplicates.
-
-        Returns
-        -------
-        :py:class:`~DataBundle` or None
-            DataBundle without duplicated rows or None if "inplace=True".
-
-        Raises
-        ------
-        RuntimeError
-            Before removing duplicates, a duplictate check has to be done, :py:func:`DataBundle.duplicate_check`.
-
-        See Also
-        --------
-        DataBundle.flag_duplicates : Flag detected duplicates in `data`.
-        DataBundle.get_duplicates : Get duplicate matches in `data`.
-        DataBundle.duplicate_check : Duplicate check in `data`.
-
-        Notes
-        -----
-        For more information see :py:func:`DupDetect.remove_duplicates`
-
-        Examples
-        --------
-        Remove duplicates without overwriting :py:attr:`data`.
-
-        >>> removed_tables = db.remove_duplicates()
-
-        Remove duplicates with overwriting :py:attr:`data`.
-
-        >>> db.remove_duplicates(inplace=True)
-        >>> removed_tables = db.data
-        """
-        db_ = self._get_db(inplace)
-        if db_ is None:
-            return None
-
-        if db_.DupDetect is None:
-            raise RuntimeError("Before removing duplicates, a duplictate check has to be done: 'db.duplicate_check()'")
-
-        db_.DupDetect.remove_duplicates(**kwargs)
-        header_ = db_.DupDetect.result
-        if not isinstance(db_._data, pd.DataFrame):
-            raise TypeError("data has unsupported type: {type(db_._data)}.")
-        db_._data = db_._data[db_._data.index.isin(header_.index)]
-        return self._return_db(db_, inplace)