You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
weights.jl: Weighted sum cannot be implemented via a weights keyword arguments like other functions since the function lives in Base (RFC: Add weights argument to sum JuliaLang/julia#33310). We could either export wsum or keep it internal and do not support it for now.
counts.jl: counts sounds a bit too generic of a term for a function that only allows counting integer values. countmap is more general and its name is explicit. That said, counts could easily be extended to allow any type of levels -- its limitation is just that it returns a vector without names so the mapping to the levels has to be done by hand, which isn't user-friendly. APIs provided by FreqTables.jl are nicer to use, but they need NamedArrays.jl (or a similar package). Then there's the issue that countmap uses radix sort for performance with some types, but this needs SortingAlgorithms.jl, which isn't a stdlib (yet?).
deviation.jl: Do we really need all of these small convenience functions? counteq and countne don't really sound like statistical functions and I'm not sure how commonly they are used. sqL2dist, L2dist, L1dist, Linfdist have an uppercase in their name; these and remaining functions are redundant with functions provided in Distances.jl. That only leaves psnr.
misc.jl: indexmap is just indexin so remove it. levelsmap and indicatormat sound a bit limited compared with what StatsModels provides. rle and inverse_rle are not really related to statistics.
scalarstats.jl: mean_and_var and mean_and_std have weird names so I'm not sure we should keep them or not. zscore and zscore! are convenient but redundant with (more general and more verbose) functions in transformations.jl.
transformations.jl: transform and transform! are too generic names, I propose overloading LinearAlgebra's normalize and normalize!, since that name is actually the commonly used term for such transformations. I wonder whether we really need reconstruct and reconstruct! (which could be called unnormalize if we keep them). I'm also not sure what's the use of allowing a separate fit operation before actually applying the transformation (I'd imagine one would always normalize the data immediately).
moments.jl: moment is redundant with specific functions so I'd drop it.
robust.jl: trimvar(x) could be var(trim(x)) if trim(x) returned a special iterator type to dispatch on
This issue is to discuss what functions should be ported from StatsBase to Statistics (#2). Some functions would better move to a separate package:
Most APIs have passed the test of time so they are probably good enough, but I find some of them are not completely satisfying:
sumcannot be implemented via aweightskeyword arguments like other functions since the function lives in Base (RFC: Add weights argument to sum JuliaLang/julia#33310). We could either exportwsumor keep it internal and do not support it for now.countssounds a bit too generic of a term for a function that only allows counting integer values.countmapis more general and its name is explicit. That said,countscould easily be extended to allow any type of levels -- its limitation is just that it returns a vector without names so the mapping to the levels has to be done by hand, which isn't user-friendly. APIs provided by FreqTables.jl are nicer to use, but they need NamedArrays.jl (or a similar package). Then there's the issue thatcountmapuses radix sort for performance with some types, but this needs SortingAlgorithms.jl, which isn't a stdlib (yet?).counteqandcountnedon't really sound like statistical functions and I'm not sure how commonly they are used.sqL2dist,L2dist,L1dist,Linfdisthave an uppercase in their name; these and remaining functions are redundant with functions provided in Distances.jl. That only leavespsnr.indexmapis justindexinso remove it.levelsmapandindicatormatsound a bit limited compared with what StatsModels provides.rleandinverse_rleare not really related to statistics.mean_and_varandmean_and_stdhave weird names so I'm not sure we should keep them or not.zscoreandzscore!are convenient but redundant with (more general and more verbose) functions in transformations.jl.transformandtransform!are too generic names, I propose overloading LinearAlgebra'snormalizeandnormalize!, since that name is actually the commonly used term for such transformations. I wonder whether we really needreconstructandreconstruct!(which could be calledunnormalizeif we keep them). I'm also not sure what's the use of allowing a separatefitoperation before actually applying the transformation (I'd imagine one would always normalize the data immediately).momentis redundant with specific functions so I'd drop it.trimvar(x)could bevar(trim(x))iftrim(x)returned a special iterator type to dispatch onSee also my previous notes at JuliaLang/julia#27152 (comment).