See also:
rank_by. Thesubsetoperator's declarative top-k mode (subset(ranking_key=…, k=…)) is one of the canonical motivating cases for ordering. The newerrank_byoperator generalizes it: any top-k expressed viasubset(ranking_key=…, k=k)can equivalently be expressed asrank_by(...) | filter_keys(k < k_max). Both stay in the codebase —subsetfor the declarative one-shot form,rank_bywhen you want a ranked AF that you can compose further (e.g. for pagination, median, percentile lookups). See rank.md for the FDM-faithful framing.
output: AF = subset(input: AF)
Computes a subset of the items contained in the input AF and returns a new output AF.
In contrast to filter, the subset operator computes a condition based on a global condition.
In other words:
filter operator: uses a predicate phrased against the individual items in the input AF (or just the keys or just the values). Such predicate may be evaluated for each item independently. For instance, a condition like
foo==42may be computed for eachfooindividually without influencing the outcome of other predicates.vs
subset operator: uses a predicate phrased against all items in the input AF. Such predicate cannot be evaluated for each item independently. For instance, a condition like 'k-smallest items with respect to their foo value' cannot be evaluated independently: the outcome depends on the other items present in the input AF.
The user has to specify how the subset should be formed. This can be done based on keys, values, or any combination of both.
TODO
Select the attributes to work on based on a global condition.
output: TF = subset(input: TF)
Computes a subset of the items (e.g. key/value-mappings) mapped to in the input TF.
For instance, this could be used to compute a subset of a tuple, i.e. 'give me the k-smallest items present in that tuple w.r.t. the condition specified'.
Select the tuples to work on based on a global condition.
output: RF = subset(input: RF)
Computes a subset of the items (e.g. key/TF-mappings) mapped to in the input RF. Semantically equivalent to a
top-k operator in extended relational algebra or simulating the same thing in SQL using ORDER BY and LIMIT.
Note that for k=1, this operation is equivalent to a classical min or max-aggregation (but not mean, avg, median, count as they compute a new value that does not have to exist in the input RF, see aggregate).
For instance, this could be used to compute the subset of tuples of a given relation, i.e. 'give me the k-smallest tuples based on the condition specified'.
Select the relations to work on based on a global condition.
output: DBF = subset(input: DBF)
Computes a subset of the items (e.g. key/RF-mappings) mapped to in the input DBF.
For instance, this could be used to compute the subset of relations of a given database, i.e. 'give me the k-smallest relations based on the condition specified'.
Select the databases to work on based on a global condition.
output: SDBF = subset(input: SDBF)
Computes a subset of the items (e.g. key/DBF-mappings) mapped to in the input SDBF.
For instance, this could be used to compute the subset of databases of a given set of databases, i.e. 'give me the k-smallest databases based on the condition specified'.