forked from barcaroli/SamplingStrata
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathNEWS
More file actions
163 lines (117 loc) · 7.13 KB
/
NEWS
File metadata and controls
163 lines (117 loc) · 7.13 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
Changes in Version 1.4-1
o A new 'summaryStrata' function has been added, enabling to get structured information regarding
the strata produced by the 'optimizeStrata2' function (operating on only continuous stratification
variables)
o A new 'assignStrataLabel' function has been added, enabling to assign the optimized strata label
to new units to be added in the sampling frame
o Fixed a bug in the optimization step for continuous variables
Changes in Version 1.4
o A new function 'optimizeStrata2' is available. This function performs the same task
than 'optimizeStrata', but with a different Genetic Algorithm, operating on real values
genome, instead of an integer one.
This pemits to operate directly on the boundaries of the strata, instead of aggregating
the initial atomic strata.
In some situations (limited size of sampling frame) this new function is much more efficient.
The limitation is in the nature of the stratification variables, that are required to be
all continuous (though categorical ordinal could be handled).
o A new function 'expected_CV' has been added to calculate CV's on target
variables in different domains that may be expected from a given solution, output
of the 'optimizeStrata' execution
o Fixed a bug in the optimization step when considering also take-all strata
Changes in Version 1.3
o The optimization of population frame is run in parallel if different
domains are considered. To this end, the parameter 'parallel' can be set
to TRUE in the 'optimizeStrata' function. If not specified, n-1 of total
available cores are used OR if number of domains < (n-1) cores, then
number of cores equal to number of domains are used.
o A new function 'KmeansSolution' produces an initial solution using the
kmeans algorithm by clustering atomic strata considering the values of
the means of target variables in them.
Also, if the parameter 'nstrata' is not indicated, the optimal number of
clusters is determined inside each domain, and the overall solution is
obtained by concatenating optimal clusters obtained in domains. By
indicating this solution as a suggestion to the
optimization step, this may greatly speed the convergence to the optimal
solution.
o A new function 'selectSampleSystematic' has been added. It allows to
select a stratified sample with the systematic method, that is a selection
that begins selecting the first unit by an initial randomly chosen
starting point, and proceeding in selecting other units by adding an
interval that is the inverse of the sampling rate in the stratum. This
selection method can be useful if associated to a particular ordering of
the selection frame, where the ordering variable(s) can be considered as
additional stratum variable(s).
o It is now possible to handle "anticipated variance" by introducing a
model linking a proxy variable whose values are available for all units in
the sampling frame, with the target variable whose values are not
available. In this implementation only linear model can be chosen. When
calling the 'buildStrataDF' function, a dataframe is given, containing
three parameters (beta, sigma2 and gamma) for each couple target / proxy.
On the basis of these parameters, means and standard deviations in
sampling strata are calculated accordingly to given formulas.
Changes in Version 1.2
o The crossover function in the genetic algorithm has been modified by
considering the "grouping" version of this algorithm: instead of mixing
chromosomes in an indifferentiate way, groups of them in one parent
(representing already aggregated strata) are attributed to the other
parent when generating a child, preserving their composition. Moreover,
parents are selected with a probability proportional to their fitness (in
previous version selection was completely at random). In many cases this
can greatly speed the convergence to an optimal solution.
o A new function 'adjustSize' has been added. It allows to adjust the
sample size and related allocation in strata on the basis of an
externally indicated overall sample size. The adjustment of the sample
size is perfomed by increasing or decreasing it proportionally in each
optimized stratum.
o A new function 'buildFrameDF' has been added. It allows to create a
'sampling frame' dataframe by indicating the dataset in which the
information on all the units are contained, the identifier, the X
variables, the Y variable and the variable that indicates the domains
of interest.
o In function 'optimizeStrata' now the 'initialStrata' parameter is a
vector, whose length is equal to the number of strata in the different
domains.
o All outputs are written to an .\output subdirectory.
Changes in Version 1.1
o Function 'memoise' from the same package (now required) is applied
before each evaluation in order to save processing time. This may largely
increase the efficiency of the algorithm
o A 'recode' function is applied on every generated solution in order to
recode a genotype of n genes with k<=n distinct alleles 1, 2, ..., k in
such a way that the distinct alleles of the recoded genotype appear in
the natural order 1, 2, ..., k. This avoids to consider as distinct two
solutions that are equivalent but make use of a different coding
Changes in Version 1.0-4
o Bug fix for old releases
Changes in Version 1.0-3
o Modified the output to the console or to the file of results: instead
of all the solutions, only the optimal value for each generation is
visualised
o Now the visualisation of the trend in the optimal and mean values is
optional: the plot can be avoided by setting showPlot = FALSE when
calling optimizeStrata. It can be advisable when the number of iterations
is very high
Changes in Version 1.0-2
o Bug fix for old releases.
Changes in Version 1.0-1
o Bug fix for old releases.
o The object returned by function "optimizeStrata" is no more a dataframe
but a list:
* the first element of the list is the solution vector (solution$indices)
* the second element of the list is the dataframe containing aggregated
strata (solution$aggr_strata)
o In all the functions that previously produced .csv files and .pdf plots
in the working directory, as a default this is no more the current
behaviour. To write these files, it is now necessary to set the
"writeFiles" flag to TRUE
Changes in Version 1.0
o Bug fix for old releases.
o Two new functions:
* "evalSolution", to evaluate the found solution in terms of expected
target variables precision and bias obtainable by samples drawn from
the otpimized frame;
* "tuneParameters", to determine the best combination of values to
assign to the parameters necessary for the execution of the genetic
algorithm used for the optimization of the frame stratification.
o A demonstration on the use of the "tuneParameters" function is in the
vignette "tuneParameters.pdf" contained in the \inst\doc folder.