Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
9c516fd
Project.toml: support Symbolics v7 & Utils v4
Mar 21, 2026
6e1ffaa
prepare_start_params(): tighten type check
Mar 21, 2026
32068de
SemImplied/SemLossFun: drop meanstructure kwarg
alyst Mar 21, 2026
e81cec0
refactor Sem, SemEnsemble, SemLoss
alyst Mar 21, 2026
bab1317
params/param_labels(): use both as synonyms for now
Mar 21, 2026
f7f7452
check_same_semterm_type(): refactor check_single_lossfun()
Mar 21, 2026
961a3c8
update multi-group correction
Mar 21, 2026
a9ee00b
replace_observed(): simplify & refactor
Mar 21, 2026
84c6653
bootstrap: sync with Sem updates
Mar 22, 2026
24261d5
CFI: sync with Sem refactor
Mar 22, 2026
e4d38e5
test/build_models: remove redundant model
Mar 22, 2026
cb9b1e7
revert using
Mar 22, 2026
afac0b4
WLS: verbose option
Mar 22, 2026
53a615a
docs: sync with Sem refactor
Mar 22, 2026
240e3cd
test: fix formatting
Mar 22, 2026
a277cb0
fit_measures(): support vectors of funcs
Mar 23, 2026
60dbdc7
test_fitmeasures(): refactor/simplify
Mar 23, 2026
05abcd9
test/multigroup: small tweaks
Mar 23, 2026
91d6f47
finite_diff: replace_observed()
Mar 30, 2026
bfd32b4
replace_observed(): support kwargs
Mar 30, 2026
690d248
replace_observed(SemWLS, ...; update_internal_state)
Mar 30, 2026
b41e75b
tests/model: replace_observed() kwargs passing
Mar 30, 2026
b5e920a
replace_observed(...; recompute_obs_state=true)
Mar 30, 2026
293c88b
tests/model: test multi-group data ctor
Mar 31, 2026
7466a23
SemFiniteDiff constructor to keep same Syntax
Maximilian-Stefan-Ernst Mar 25, 2026
5cdcc63
Sem print methods
Maximilian-Stefan-Ernst Mar 25, 2026
6d803cb
add details method for AbstractSem
Maximilian-Stefan-Ernst Apr 10, 2026
1874351
shorten model
Maximilian-Stefan-Ernst Apr 11, 2026
7696e8f
Sem(): remove SemWLS kw check logic
Apr 13, 2026
9c5e446
Sem(): cleanup constructor
Apr 13, 2026
3aee9f4
show(::Sem): respect existing :compact key
Apr 13, 2026
0e948d7
add show method for AbstractSem
Maximilian-Stefan-Ernst Apr 25, 2026
d3ee1c9
simulation.md: whitespace fixes
alyst May 3, 2026
57ea987
replace_obs(sem): make sure Sem type is preserved
alyst May 3, 2026
5cd11cf
replace_obs(sem): update docstring
alyst May 3, 2026
42d4e64
replace_obs(loss): extract check_obs_vars() method
alyst May 3, 2026
7759721
test/multigroup: avoid clash with observed_vars() method
alyst May 4, 2026
13e118d
SemLoss(observed, implied, refloss; kwarg...) ctor
alyst May 4, 2026
d5ff15b
replace_observed(): use 3-arg SemLoss ctor
alyst May 4, 2026
bc6ef3b
boostrap!(): deepcopy the sem
alyst May 4, 2026
6f09822
unit_tests/model: more config-preserving tests
alyst May 4, 2026
32ad928
tests: replace_observed(UserSemML)
alyst May 4, 2026
25dd38c
show(ParTable): fix formatting
alyst May 4, 2026
ee91629
WIP SemImpliedState
May 4, 2026
0df5515
declare cov matrices symmetric
alyst May 4, 2026
85363fc
RAM: reuse sigma array
May 4, 2026
4c3d14b
RAM: optional sparse Sigma matrix
May 4, 2026
62a1a79
ML: refactor to minimize allocs
alyst May 4, 2026
1f225cc
add PackageExtensionCompat
May 4, 2026
708345f
variance_params(SEMSpec)
May 4, 2026
70ac792
predict_latent_vars()
alyst May 4, 2026
429723e
fixup docstring
May 4, 2026
255017e
lavaan_model()
May 4, 2026
29edb08
test_grad/hess(): check that alt calls give same results
May 4, 2026
3945afa
start_simple(): code cleanup
alyst May 4, 2026
f7b3176
start_simple(): start vals for lat and obs means
May 4, 2026
79f14e3
observed_vars(RAMMatrices; order): rows/cols order
alyst May 4, 2026
0191042
observed_var_indices(::RAMMatrices; order=:columns)
May 4, 2026
e08fc5e
move sparse mtx utils to new file
alyst May 4, 2026
b4d738c
reorder_observed_vars!(spec) method
alyst May 4, 2026
7b099fc
vech() and vechinds() functions
alyst May 4, 2026
7cc82bc
RAMMatrices(): ctor to replace params
May 4, 2026
3ae9f01
use `@printf` to limit signif digits printed
alyst May 4, 2026
f15d7fb
ML/FIML: workaround generic_matmul issue
alyst May 4, 2026
5dd1ae5
BlackBoxOptim.jl backend support
alyst May 4, 2026
20125ef
non_posdef_return(v) -> non_posdef_objective(v)
May 4, 2026
57fce33
MeanStruct(ram)
alyst May 4, 2026
c993486
SemObserved: fix mean_and_cov() call
May 4, 2026
55219e4
filter_used_params()
May 4, 2026
87796f7
param_indices(spec) method
May 4, 2026
8679990
SemNorm: generalize SemRidge
alyst May 4, 2026
bd0b62d
add SemHinge
May 4, 2026
6b3e69b
add SemSquaredHinge
May 4, 2026
7bc2c88
quad.jl: optimized methods for X*A*X', X*X' etc
Aug 31, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,11 @@ LineSearches = "d3d80556-e9d4-5f37-9878-2ab0fcc64255"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
NLSolversBase = "d41bc354-129a-5804-8e4c-c37616107c6c"
Optim = "429524aa-4258-5aef-a3af-852621145aeb"
PackageExtensionCompat = "65ce6f38-6b18-4e1d-a461-8949797d7930"
Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
PrettyTables = "08abe8d2-0d0c-5749-adfa-8a2ac140af0d"
ProgressMeter = "92933f4c-e287-5a05-a399-4b506db050ca"
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
SparseArrays = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
Expand All @@ -40,8 +42,8 @@ Optim = "1"
PrettyTables = "3"
ProximalAlgorithms = "0.7"
StatsBase = "0.33, 0.34"
Symbolics = "4, 5, 6"
SymbolicUtils = "1.4 - 1.5, 1.7, 2, 3"
Symbolics = "4, 5, 6, 7"
SymbolicUtils = "1.4 - 1.5, 1.7, 2, 3, 4"
StatsAPI = "1"

[extras]
Expand All @@ -51,9 +53,12 @@ Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
test = ["Test"]

[weakdeps]
BlackBoxOptim = "a134a8b2-14d6-55f6-9291-3336d3ab0209"
NLopt = "76087f3c-5699-56af-9a33-bf431cd00edd"
Optimisers = "3bd65402-5787-11e9-1adc-39752487f4e2"
ProximalAlgorithms = "140ffc9f-1907-541a-a177-7475e0a401e9"

[extensions]
SEMNLOptExt = "NLopt"
SEMProximalOptExt = "ProximalAlgorithms"
SEMBlackBoxOptimExt = ["BlackBoxOptim", "Optimisers"]
30 changes: 11 additions & 19 deletions docs/src/developer/implied.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ end
and a method to update!:

```julia
import StructuralEquationModels: objective!
import StructuralEquationModels: update!

function update!(targets::EvaluationTargets, implied::MyImplied, model::AbstractSemSingle, params)
function update!(targets::EvaluationTargets, implied::MyImplied, params)

if is_objective_required(targets)
...
Expand All @@ -31,11 +31,9 @@ function update!(targets::EvaluationTargets, implied::MyImplied, model::Abstract
end
```

As you can see, `update` gets passed as a first argument `targets`, which is telling us whether the objective value, gradient, and/or hessian are needed.
As you can see, `update!` gets passed as a first argument `targets`, which is telling us whether the objective value, gradient, and/or hessian are needed.
We can then use the functions `is_..._required` and conditional on what the optimizer needs, we can compute and store things we want to make available to the loss functions. For example, as we have seen in [Second example - maximum likelihood](@ref), the `RAM` implied type computes the model-implied covariance matrix and makes it available via `implied.Σ`.



Just as described in [Custom loss functions](@ref), you may define a constructor. Typically, this will depend on the `specification = ...` argument that can be a `ParameterTable` or a `RAMMatrices` object.

We implement an `ImpliedEmpty` type in our package that does nothing but serving as an `implied` field in case you are using a loss function that does not need any implied type at all. You may use it as a template for defining your own implied type, as it also shows how to handle the specification objects:
Expand All @@ -56,7 +54,7 @@ Empty placeholder for models that don't need an implied part.
- `specification`: either a `RAMMatrices` or `ParameterTable` object

# Examples
A multigroup model with ridge regularization could be specified as a `SemEnsemble` with one
A multigroup model with ridge regularization could be specified as a `Sem` with one
model per group and an additional model with `ImpliedEmpty` and `SemRidge` for the regularization part.

# Extended help
Expand All @@ -75,26 +73,20 @@ end
### Constructors
############################################################################################

function ImpliedEmpty(;
specification,
meanstruct = NoMeanStruct(),
hessianeval = ExactHessian(),
function ImpliedEmpty(
spec::SemSpecification;
hessianeval::HessianApprox = ExactHessian(),
kwargs...,
)
return ImpliedEmpty(hessianeval, meanstruct, convert(RAMMatrices, specification))
ram_matrices = convert(RAMMatrices, spec)
return ImpliedEmpty(hessianeval, MeanStruct(ram_matrices), ram_matrices)
end

############################################################################################
### methods
############################################################################################

update!(targets::EvaluationTargets, implied::ImpliedEmpty, par, model) = nothing

############################################################################################
### Recommended methods
############################################################################################

update_observed(implied::ImpliedEmpty, observed::SemObserved; kwargs...) = implied
update!(targets::EvaluationTargets, implied::ImpliedEmpty, par) = nothing
```

As you see, similar to [Custom loss functions](@ref) we implement a method for `update_observed`.
As you see, similar to [Custom loss functions](@ref) we implement a constructor.
59 changes: 21 additions & 38 deletions docs/src/developer/loss.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@ Since we allow for the optimization of sums of loss functions, and the maximum l
using StructuralEquationModels
```

To define a new loss function, you have to define a new type that is a subtype of `SemLossFunction`:
To define a new loss function, you have to define a new type that is a subtype of `AbstractLoss`:
```@example loss
struct Ridge <: SemLossFunction
struct MyRidge <: AbstractLoss
α
I
end
Expand All @@ -25,8 +25,8 @@ Additionaly, we need to define a *method* of the function `evaluate!` to compute
```@example loss
import StructuralEquationModels: evaluate!

evaluate!(objective::Number, gradient::Nothing, hessian::Nothing, ridge::Ridge, model::AbstractSem, par) =
ridge.α * sum(i -> par[i]^2, ridge.I)
evaluate!(objective::Number, gradient::Nothing, hessian::Nothing, ridge::MyRidge, par) =
ridge.α * sum(i -> abs2(par[i]), ridge.I)
```

The function `evaluate!` recognizes by the types of the arguments `objective`, `gradient` and `hessian` whether it should compute the objective value, gradient or hessian of the model w.r.t. the parameters.
Expand Down Expand Up @@ -98,7 +98,7 @@ function evaluate!(objective, gradient, hessian::Nothing, ridge::Ridge, model::A
gradient[ridge.I] .= 2 * ridge.α * par[ridge.I]
end
# compute objective
if !isnothing(objective)
if !isnothing(objective)
return ridge.α * sum(i -> par[i]^2, ridge.I)
end
end
Expand Down Expand Up @@ -136,47 +136,30 @@ Additionally, you may provide analytic hessians by writing a respective method f

## Convenient

To be able to build the model with the [Outer Constructor](@ref), you need to add a constructor for your loss function that only takes keyword arguments and allows for passing optional additional kewyword arguments. A constructor is just a function that creates a new instance of your type:
To be able to build the loss term, it needs a constructor.
Every `SemLoss` subtype should provide a constructor with 3 positional arguments:
* `observed::SemObserved`: the observed part of the model
* `implied::SemImplied`: the implied part of the model
* `refloss::Union{MyLoss, Nothing} = nothing`: optional loss term of the same type
to use as a reference for any loss-specific configuration.

```julia
function MyLoss(;arg1 = ..., arg2, kwargs...)
...
return MyLoss(...)
end
```

All keyword arguments that a user passes to the Sem constructor are passed to your loss function. In addition, all previously constructed parts of the model (implied and observed part) are passed as keyword arguments as well as the number of parameters `n_par = ...`, so your constructor may depend on those. For example, the constructor for `SemML` in our package depends on the additional argument `meanstructure` as well as the observed part of the model to pre-allocate arrays of the same size as the observed covariance matrix and the observed mean vector:
Any additional loss configuration details should be passed as optional keyword arguments.
If both `refloss` and the keyword arguments are provided, the keyword arguments take
precedence. This constructor is used internally by the functions like [`replace_observed`](@ref)
to rebuild the loss term with new observed data while preserving the implied state.

```julia
function SemML(;observed, meanstructure = false, approx_H = false, kwargs...)

isnothing(obs_mean(observed)) ?
meandiff = nothing :
meandiff = copy(obs_mean(observed))

return SemML(
similar(obs_cov(observed)),
similar(obs_cov(observed)),
meandiff,
approx_H,
Val(meanstructure)
)
function MyLoss(
observed::SemObserved, implied::SemImplied, refloss::Union{MyLoss, Nothing} = nothing;
kwarg1 = ..., kwarg2 = ..., kwargs...
)
...
return MyLoss(...) # internal MyLoss constructor
end
```

## Additional functionality

### Update observed data

If you are planing a simulation study where you have to fit the **same model** to many **different datasets**, it is computationally beneficial to not build the whole model completely new everytime you change your data.
Therefore, we provide a function to update the data of your model, `replace_observed(model(semfit); data = new_data)`. However, we can not know beforehand in what way your loss function depends on the specific datasets. The solution is to provide a method for `update_observed`. Since `Ridge` does not depend on the data at all, this is quite easy:

```julia
import StructuralEquationModels: update_observed

update_observed(ridge::Ridge, observed::SemObserved; kwargs...) = ridge
```

### Access additional information

If you want to provide a way to query information about loss functions of your type, you can provide functions for that:
Expand Down
8 changes: 0 additions & 8 deletions docs/src/developer/optimizer.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,6 @@ struct MyoptResult{O <: SemOptimizerMyopt} <: SEM.SemOptimizerResult{O}
...
end

############################################################################################
### Recommended methods
############################################################################################

update_observed(optimizer::SemOptimizerMyopt, observed::SemObserved; kwargs...) = optimizer

############################################################################################
### additional methods
############################################################################################
Expand All @@ -43,8 +37,6 @@ and `SEM.sem_optimizer_subtype(::Val{:Myopt})` returns `SemOptimizerMyopt`.
This instructs *SEM.jl* to use `SemOptimizerMyopt` when `:Myopt` is specified as the engine for
model fitting: `fit(..., engine = :Myopt)`.

A method for `update_observed` and additional methods might be usefull, but are not necessary.

Now comes the essential part: we need to provide the [`fit`](@ref) method with `SemOptimizerMyopt`
as the first positional argument.

Expand Down
17 changes: 8 additions & 9 deletions docs/src/developer/sem.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
# Custom model types

The abstract supertype for all models is `AbstractSem`, which has two subtypes, `AbstractSemSingle{O, I, L}` and `AbstractSemCollection`. Currently, there are 2 subtypes of `AbstractSemSingle`: `Sem`, `SemFiniteDiff`. All subtypes of `AbstractSemSingle` should have at least observed, implied, loss and optimizer fields, and share their types (`{O, I, L}`) with the parametric abstract supertype. For example, the `SemFiniteDiff` type is implemented as
The abstract supertype for all models is [`AbstractSem`](@ref). Currently, there are 2 concrete subtypes:
`Sem{L <: Tuple}` and `SemFiniteDiff{S <: AbstractSem}`.
A `Sem` model holds a tuple of `LossTerm`s (each wrapping an `AbstractLoss`) and a vector of parameter labels. Both single-group and multigroup models are represented as `Sem`.

`SemFiniteDiff` wraps any `AbstractSem` and substitutes dedicated gradient/hessian evaluation with finite difference approximation:

```julia
struct SemFiniteDiff{O <: SemObserved, I <: SemImplied, L <: SemLoss} <:
AbstractSemSingle{O, I, L}
observed::O
implied::I
loss::L
struct SemFiniteDiff{S <: AbstractSem} <: AbstractSem
model::S
end
```

Expand All @@ -17,6 +18,4 @@ Additionally, you can change how objective/gradient/hessian values are computed
evaluate!(objective, gradient, hessian, model::SemFiniteDiff, params) = ...
```

Additionally, we can define constructors like the one in `"src/frontend/specification/Sem.jl"`.

It is also possible to add new subtypes for `AbstractSemCollection`.
Additionally, we can define constructors like the one in `"src/frontend/specification/Sem.jl"`.
20 changes: 12 additions & 8 deletions docs/src/internals/types.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,16 @@

The type hierarchy is implemented in `"src/types.jl"`.

`AbstractSem`: the most abstract type in our package
- `AbstractSemSingle{O, I, L} <: AbstractSem` is an abstract parametric type that is a supertype of all single models
- `Sem`: models that do not need automatic differentiation or finite difference approximation
- `SemFiniteDiff`: models whose gradients and/or hessians should be computed via finite difference approximation
- `AbstractSemCollection <: AbstractSem` is an abstract supertype of all models that contain multiple `AbstractSem` submodels
[`AbstractLoss`](@ref): is the base abstract type for all loss functions:
- `SemLoss{O <: SemObserved, I <: SemImplied}`: is the subtype of `AbstractLoss`, which is the
base for all SEM-specific loss functions ([`SemML`](@ref), [`SemWLS`](@ref) etc) that
evaluate how closely the implied covariation structure (represented by the object of type `I`)
matches the observed one (contained in the object of type `O`);
- regularizing terms (e.g. [`SemRidge`](@ref)) are implemented as subtypes of `AbstractLoss`.

Every `AbstractSemSingle` has to have `SemObserved`, `SemImplied`, and `SemLoss` fields (and can have additional fields).

`SemLoss` is a container for multiple `SemLossFunctions`.
[`AbstractSem`](@ref) is the base abstract type for all SEM models. It has two concrete subtypes:
- `Sem{L <: Tuple} <: AbstractSem`: the main SEM model type that implements a list of weighted
loss terms (using [`LossTerm`](@ref) wrapper around `AbstractLoss`) and allows modeling both single
and multi-group SEMs and combining them with regularization terms.
- `SemFiniteDiff{S <: AbstractSem} <: AbstractSem`: a wrapper around any `AbstractSem` that
substitutes dedicated gradient/hessian evaluation with finite difference approximation.
18 changes: 8 additions & 10 deletions docs/src/performance/mixed_differentiation.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,20 @@

This way of specifying our model is not ideal, however, because now also the maximum likelihood loss function lives inside a `SemFiniteDiff` model, and this means even though we have defined analytical gradients for it, we do not make use of them.

A more efficient way is therefore to specify our model as an ensemble model:
A more efficient way is therefore to specify our model as a combined model with multiple loss terms:

```julia
model_ml = Sem(
specification = partable,
data = data,
loss = SemML
ml_term = SemML(
SemObservedData(data = data, specification = partable),
RAMSymbolic(partable)
)

model_ridge = SemFiniteDiff(
specification = partable,
data = data,
loss = myridge
ridge_term = SemRidge(
α_ridge = 0.01,
which_ridge = params(ml_term)
)

model_ml_ridge = SemEnsemble(model_ml, model_ridge)
model_ml_ridge = Sem(ml_term, ridge_term)

model_ml_ridge_fit = fit(model_ml_ridge)
```
Expand Down
34 changes: 13 additions & 21 deletions docs/src/performance/simulation.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,13 @@
## Replace observed data
In simulation studies, a common task is fitting the same model to many different datasets.
It would be a waste of resources to reconstruct the complete model for each dataset.
We therefore provide the function `replace_observed` to change the `observed` part of a model,
without necessarily reconstructing the other parts.
We therefore provide the function [`replace_observed`](@ref) to change the `observed` part
of a model, without necessarily reconstructing the other parts.

For `SemLoss` terms, `replace_observed()` constructs the new loss by passing the new observed
data, the current implied state, and the current loss (as `refloss`) to the appropriate loss
constructor. The new loss term therefore shares the implied state with the original one, as well
as loss-specific settings and, potentially, the internal state.

For the [A first model](@ref), you would use it as

Expand Down Expand Up @@ -40,7 +45,7 @@ end

partable = ParameterTable(
graph,
latent_vars = latent_vars,
latent_vars = latent_vars,
observed_vars = observed_vars
)
```
Expand All @@ -57,28 +62,16 @@ model = Sem(
data = data_1
)

model_updated = replace_observed(model; data = data_2, specification = partable)
```

If you are building your models by parts, you can also update each part seperately with the function `update_observed`.
For example,

```@example replace_observed

new_observed = SemObservedData(;data = data_2, specification = partable)

my_optimizer = SemOptimizer()

new_optimizer = update_observed(my_optimizer, new_observed)
model_updated = replace_observed(model, data_2)
```

## Multithreading
!!! danger "Thread safety"
*This is only relevant when you are planning to fit updated models in parallel*
Models generated by `replace_observed` may share the same objects in memory (e.g. some parts of

Models generated by `replace_observed` may share the same objects in memory (e.g. some parts of
`model` and `model_updated` are the same objects in memory.)
Therefore, fitting both of these models in parallel will lead to **race conditions**,
Therefore, fitting both of these models in parallel will lead to **race conditions**,
possibly crashing your computer.
To avoid these problems, you should copy `model` before updating it.

Expand All @@ -90,7 +83,7 @@ model1 = Sem(
data = data_1
)

model2 = deepcopy(replace_observed(model; data = data_2, specification = partable))
model2 = deepcopy(replace_observed(model, data_2))

models = [model1, model2]
fits = Vector{SemFit}(undef, 2)
Expand All @@ -104,5 +97,4 @@ end

```@docs
replace_observed
update_observed
```
Loading
Loading