diff --git a/README.md b/README.md index ac5e674..d222870 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,9 @@ [![code style: runic](https://img.shields.io/badge/code_style-%E1%9A%B1%E1%9A%A2%E1%9A%BE%E1%9B%81%E1%9A%B2-black)](https://github.com/fredrikekre/Runic.jl) -This package implements special types of vectors and associated methods for hyperdimensional computing. Hyperdimensional computing (HDC) is a paragdigm to represent patterns by means of a high-dimensional vectors (typically 10,000 dimensions). Specific operations can be used to create new vectors by combining the information or encoding some kind of position. HDC is an alternative machine learning method that is extremely computationally efficient. It is inspired by the distributed, holographic representation of patterns in the brain. Typically, the high-dimensionality is more important than the nature of the operations. This package provides various types of vectors (binary, graded, bipolar...) with sensible operations for *aggragating*, *binding* and *permutation*. Basic functionality for fitting a k-NN like classifier is also supported. +This package implements special types of vectors and associated methods for hyperdimensional computing. Hyperdimensional computing (HDC) is a paragdigm to represent patterns by means of a high-dimensional vectors (typically 10,000 dimensions). Specific operations can be used to create new vectors by combining the information or encoding some kind of position. HDC is an alternative machine learning method that is extremely computationally efficient. It is inspired by the distributed, holographic representation of patterns in the brain. Typically, the high-dimensionality is more important than the nature of the operations. This package provides various types of vectors (binary, graded, bipolar...) with sensible operations for *aggragating*, *binding* and *permutation*. + +We provide a set of types of hypervectors (HVs), with the associated operations. ## Basic use @@ -15,41 +17,44 @@ Several types of vectors are implemented. Random vectors can be initialized of d ```julia using HyperdimensionalComputing -x = BipolarHDV() # default length is 10,000 +x = BipolarHV() # default length is 10,000 -y = BinaryHDV(20) # different length +y = BinaryHV(20) # different length -z = RealHDV(Float32) # specify data type +z = RealHV(Float32) # specify data type ``` -The basic operations are `aggregate` (creating a vector that is similar to the provided vectors), `bind` (creating a vector that is dissimilar to the vectors) and `circshift` (shifting the vector inplace to create a new vector). For `aggregate` and `bind`, we overload `+` and `*` as binary operators, while `Π` is an alias for `circshift`. The latter is lazily implemented. All functions have an inplace version, using the `!` prefix. +The basic operations are `bundle` (creating a vector that is similar to the provided vectors), `bind` (creating a vector that is dissimilar to the vectors) and `circshift` (shifting the vector inplace to create a new vector). For `bundle` and `bind`, we overload `+` and `*` as binary operators, while `ρ` is an alias for `shift`. ```julia -x, y, z = GradedHDV(10), GradedHDV(10), GradedHDV(10) +x, y, z = GradedHV(10), GradedHV(10), GradedHV(10) # aggregation -aggregate([x, y, z]) +bundle([x, y, z]) -x + y +x + y # binary operator for bundling # binding -bind([x, y, z]) -x * y +bind(x, y) + +x * y # binary operator for binding # permutation -circshift(x, 2) # shifts the coordinates -Π(x, 2) # same +shift(x, 2) # circular shifts the coordinates +ρ(x, 2) # same -Π!(y, 1) # inplace +ρ!(y, 2) # inplace ``` -See the table for which operations are used for which type. +See the table below for which operations are used for which type. ## Embedding sequences +TODO: update! + HDC is particularly powerful for embedding sequences. This is done by creating embeddings for n-grams and aggregating the n-grams found in the sequence. ```julia @@ -69,32 +74,13 @@ threegrams = compute_3_grams(basis) sequence_embedding(sequence, threegrams) ``` -## Training - -A model is basically trained by making an aggregation of all elements within a class. Training is simple. Prediction is done by nearest-neighbor search based on `similarity`. - -```julia -hdvs = [BipolarHDV() for i in 1:1000] # 1000 vectors -y = rand(Bool, 1000) # two labels - -centers = train(y, hdvs) - -predict(BipolarHDV(), centers) # predict for a random vector -predict(hdvs, centers) # repredict the labels -``` - -In practice, this leads to suboptimal performance. One can retrain the model by reaggregating the wrongly classified labels. - -```julia -retrain!(centers, y, hdvs, niters=10) -``` - ## Overview of operations | Vector | element domain | aggregate | binding | similarity | | ------ | --------------| ---------| ----------| --------| -| `BinaryHDV` | 0, 1 | majority | xor | Jaccard | -| `BipolarHDV` | -1, 0, 1 | sum and threshold | multiply | cosine | -| `GradedHDV` | [0, 1] | 3π | fuzzy xor | Jaccard | -| `GradedBipolarHDV` | [-1, 1] | 3π | fuzzy xor | cosine | -| `RealHDV` | real | sum weighted to keep vector norm | multiply | cosine | +| `BinaryHV` | {0, 1} | majority | xor | Jaccard | +| `BipolarHV` | {-1, 1} | sum and threshold | multiply | cosine | +| `TernaryHV` | {-1, 0, 1} | sum and threshold | multiply | cosine | +| `GradedHV` | [0, 1] | 3π | fuzzy xor | Jaccard | +| `GradedBipolarHV` | [-1, 1] | 3π | fuzzy xor | cosine | +| `RealHV` | real (normally distributed) | sum weighted to keep vector norm | multiply | cosine | diff --git a/src/HyperdimensionalComputing.jl b/src/HyperdimensionalComputing.jl index f01c8bd..99fd79c 100644 --- a/src/HyperdimensionalComputing.jl +++ b/src/HyperdimensionalComputing.jl @@ -4,6 +4,7 @@ using Distances, Random, Distributions, LinearAlgebra export AbstractHV, BinaryHV, BipolarHV, GradedBipolarHV, RealHV, GradedHV, TernaryHV +export normalize!, normalize export bundle, bind, shift!, shift, ρ, ρ!, perturbate, perturbate! export sequence_embedding, sequence_embedding! export compute_1_grams, compute_2_grams, compute_3_grams, compute_4_grams, compute_5_grams, diff --git a/src/types.jl b/src/types.jl index e05a1bd..daea843 100644 --- a/src/types.jl +++ b/src/types.jl @@ -1,57 +1,133 @@ -#= -types.jl - -Implements the basic types for the different hypervectors (wrappers for ordinary vectors) - -Contains: -- BinaryHV -- BipolarHV -- TernaryHV -- RealHV -- GradedHV -- GradedBipolarHV - -Every hypervector HV has the following basic functionality -- random generation using the Constructor () -- norm/sum/normalize... - -TODO: -- [ ] SparseHV -- [ ] support for different types -- [ ] complex HDC -=# - +# Types for Hyperdimensional Computing +# +# This file implements the core hypervector types and their interfaces. +# All types support the fundamental HDC operations: bundling, binding, and permutation. +# +# TODO: +# - [ ] SparseHV +# - [ ] ComplexHV +# - [ ] Better type parameter handling + +""" + AbstractHV{T} <: AbstractVector{T} + +Abstract supertype for all hyperdimensional vectors (hypervectors). + +Hyperdimensional vectors are high-dimensional vectors (typically 10,000+ dimensions) used in +hyperdimensional computing (HDC) for representing and manipulating symbolic information. All +concrete hypervector types support the fundamental HDC operations: + +- **Bundling/Aggregation**: Combining multiple vectors into a single similar vector +- **Binding**: Creating a vector dissimilar to its inputs that can be reversed +- **Permutation**: Reordering elements to encode position or sequence information + +# Interface +All subtypes have the following functionality: +- A default constructor taking optional dimension `n::Integer` +- Vector operations via `AbstractVector` interface +- Support for `bundle`, `bind`, and `shift` operations + +# See also +[`BinaryHV`](@ref), [`BipolarHV`](@ref), [`RealHV`](@ref), [`GradedHV`](@ref), [`TernaryHV`](@ref) +""" abstract type AbstractHV{T} <: AbstractVector{T} end -#Base.collect(hv::AbstractHV) = hv.v +# ============================================================================ +# AbstractHV Interface Implementation +# ============================================================================ + +# Standard AbstractVector interface Base.sum(hv::AbstractHV) = sum(hv.v) Base.size(hv::AbstractHV) = size(hv.v) Base.getindex(hv::AbstractHV, i) = hv.v[i] Base.similar(hv::T) where {T <: AbstractHV} = T(length(hv)) LinearAlgebra.norm(hv::AbstractHV) = norm(hv.v) -LinearAlgebra.normalize!(hv::AbstractHV) = hv +LinearAlgebra.normalize!(hv::AbstractHV) = hv # Default: no-op, overridden by subtypes Base.hash(hv::AbstractHV) = hash(hv.v) Base.copy(hv::HV) where {HV <: AbstractHV} = HV(copy(hv.v)) +# Utility functions + +""" + get_vector(hv::AbstractHV) + get_vector(v::AbstractVector) +Extract the underlying vector from a hypervector or pass through regular vectors. + +Utility function for generic code that needs to access the raw vector data. + +# Examples +```julia +julia> hv = BipolarHV(5); +julia> get_vector(hv) isa BitVector +true + +julia> v = [1,2,3]; get_vector(v) === v +true +``` +""" get_vector(v::AbstractVector) = v get_vector(hv::AbstractHV) = hv.v -# Gives an empty Vector (filled with neutral elelment) that -# the `hv::AbstractHV` type uses. +""" + empty_vector(hv::AbstractHV) + +Create a zero-initialized vector suitable for aggregation operations. + +Returns a vector filled with the neutral element for the bundling operation +of the given hypervector type. + +# Examples +```julia +julia> empty_vector(BipolarHV(10)) +10-element Vector{Int64} filled with 0s + +julia> empty_vector(GradedHV(10)) +10-element Vector{Float64} filled with 0.5s +``` +""" empty_vector(hv::AbstractHV) = zero(hv.v) +""" + eldist(::Type{<:AbstractHV}) + eldist(hv::AbstractHV) + +Get the element distribution used for generating random vectors of the given type. + +Returns the probability distribution used to sample elements when creating new +random hypervectors of this type. +""" eldist(hv::AbstractHV) = eldist(typeof(hv)) -# trait for checking which vector is used internall +# ============================================================================ +# Concrete Hypervector Types +# ============================================================================ + +""" + BipolarHV <: AbstractHV{Int} + BipolarHV(n::Integer=10_000) + BipolarHV(v::AbstractVector) + BipolarHV(v::BitVector) + +Bipolar hyperdimensional vector with elements in {-1, +1}. + +Internally stored as a `BitVector` for memory efficiency, but presents elements as +{-1, +1} through specialized indexing. -# We always provide a constructor with optinal dimensionality (n=10,000 by default) and -# a method `similar`. +# Operations +- **Bundling**: Majority vote with tie-breaking +- **Binding**: Element-wise multiplication (equivalent to XOR on the underlying bits) +- **Similarity**: Cosine similarity -# BipolarHV -# --------- +# Arguments +- `n::Integer=10_000`: Vector dimension +- `v::AbstractVector`: Convert from another vector (v > 0 → +1, v ≤ 0 → -1) +- `v::BitVector`: Use existing BitVector directly +# See also +[`BinaryHV`](@ref), [`bundle`](@ref), [`bind`](@ref) +""" struct BipolarHV <: AbstractHV{Int} v::BitVector BipolarHV(v::BitVector) = new(v) @@ -69,9 +145,25 @@ empty_vector(hv::BipolarHV) = zeros(Int, length(hv)) eldist(::Type{BipolarHV}) = 2Bernoulli(0.5) - 1 -# TernaryHV -# --------- +""" + TernaryHV <: AbstractHV{Int} + TernaryHV(n::Int=10_000) +Ternary hyperdimensional vector with elements in {-1, 0, +1}. + +Currently samples only from {-1, +1} but supports zero values through operations. + +# Operations +- **Bundling**: Element-wise sum with optional clamping +- **Binding**: Element-wise multiplication +- **Similarity**: Cosine similarity + +# Arguments +- `n::Int=10_000`: Vector dimension + +# See also +[`BipolarHV`](@ref), [`bundle`](@ref), [`bind`](@ref) +""" struct TernaryHV <: AbstractHV{Int} v::Vector{Int} end @@ -88,9 +180,27 @@ LinearAlgebra.normalize(hv::TernaryHV) = TernaryHV(clamp.(hv, -1, 1)) eldist(::Type{TernaryHV}) = 2Bernoulli(0.5) - 1 -# `BinaryHV` contain binary vectors. -# --------- +""" + BinaryHV <: AbstractHV{Bool} + BinaryHV(n::Integer=10_000) + BinaryHV(v::AbstractVector{Bool}) + +Binary hyperdimensional vector with elements in {0, 1} (false, true). + +Stored as a `BitVector` for memory efficiency. +# Operations +- **Bundling**: Majority vote with tie-breaking +- **Binding**: Element-wise XOR +- **Similarity**: Jaccard similarity + +# Arguments +- `n::Integer=10_000`: Vector dimension +- `v::AbstractVector{Bool}`: Use existing boolean vector + +# See also +[`BipolarHV`](@ref), [`bundle`](@ref), [`bind`](@ref) +""" struct BinaryHV <: AbstractHV{Bool} v::BitVector end @@ -103,29 +213,73 @@ empty_vector(hv::BinaryHV) = zeros(Int, length(hv)) eldist(::Type{BinaryHV}) = Bernoulli(0.5) -# `RealHV` contain real numbers, drawn from a distribution -# -------- +""" + RealHV{T<:Real} <: AbstractHV{T} + RealHV(n::Integer=10_000, distr::Distribution=Normal()) + RealHV(T::Type{<:Real}, n::Integer=10_000, distr::Distribution=Normal()) + +Real-valued hyperdimensional vector with elements drawn from a continuous distribution. + +Elements are typically drawn from a standard normal distribution, providing the richest +representational capacity among all hypervector types. + +# Operations +- **Bundling**: Element-wise sum with normalization to preserve vector norm +- **Binding**: Element-wise multiplication +- **Similarity**: Cosine similarity + +# Arguments +- `n::Integer=10_000`: Vector dimension +- `T::Type{<:Real}`: Element type (default: Float64) +- `distr::Distribution`: Distribution to sample from (default: Normal()) +# See also +[`GradedHV`](@ref), [`bundle`](@ref), [`bind`](@ref) +""" struct RealHV{T <: Real} <: AbstractHV{T} v::Vector{T} end RealHV(n::Integer = 10_000, distr::Distribution = eldist(RealHV)) = RealHV(rand(distr, n)) +RealHV(T::Type{<:Real}, n::Integer = 10_000, distr::Distribution = eldist(RealHV)) = RealHV(T.(rand(distr, n))) + Base.similar(hv::RealHV) = RealHV(length(hv), eldist(RealHV)) -function normalize!(hv::RealHV) - hv.v .*= std(hv.distr) / std(hv.v) +function LinearAlgebra.normalize!(hv::RealHV) + target_std = std(eldist(RealHV)) + current_std = std(hv.v) + if current_std > 0 # Avoid division by zero + hv.v .*= target_std / current_std + end return hv end eldist(::Type{<:RealHV}) = Normal() -# GradedHV are vectors in $[0, 1]^n$, allowing for graded relations. -# ---------------- +""" + GradedHV{T<:Real} <: AbstractHV{T} + GradedHV(n::Int=10_000, distr=Beta(1,1)) + +Graded hyperdimensional vector with elements in [0, 1]. + +Allows for soft, graded relationships rather than hard binary associations. +Uses specialized "3π" operation for bundling and fuzzy XOR for binding. +# Operations +- **Bundling**: 3π operation (probabilistic bundling) +- **Binding**: Fuzzy XOR +- **Similarity**: Jaccard similarity + +# Arguments +- `n::Int=10_000`: Vector dimension +- `distr`: Distribution with support in [0,1] (default: uniform via Beta(1,1)) + +# See also +[`GradedBipolarHV`](@ref), [`RealHV`](@ref), [`bundle`](@ref), [`bind`](@ref) +""" struct GradedHV{T <: Real} <: AbstractHV{T} v::Vector{T} #GradedHV(v::AbstractVector{T}) where {T<:Real} = new{T}(clamp!(v,0,1)) @@ -151,10 +305,27 @@ function Base.zeros(hv::GradedHV) return fill!(v, one(eltype(v)) / 2) end -# GradedBipolarHV are vectors in $[-1, 1]^n$, allowing for graded relations. -# --------------- +""" + GradedBipolarHV{T<:Real} <: AbstractHV{T} + GradedBipolarHV(n::Int=10_000, distr::Distribution=...) + +Graded bipolar hyperdimensional vector with elements in [-1, 1]. +Similar to `GradedHV` but with bipolar range, allowing for both positive and +negative graded relationships. +# Operations +- **Bundling**: 3π operation adapted for bipolar range +- **Binding**: Fuzzy XOR adapted for bipolar range +- **Similarity**: Cosine similarity + +# Arguments +- `n::Int=10_000`: Vector dimension +- `distr`: Distribution with support in [-1,1] + +# See also +[`GradedHV`](@ref), [`RealHV`](@ref), [`bundle`](@ref), [`bind`](@ref) +""" struct GradedBipolarHV{T <: Real} <: AbstractHV{T} v::Vector{T} #GradedBipolarHV(v::AbstractVector{T}) where {T<:Real} = new{T}(clamp!(v,-1,1)) @@ -174,15 +345,53 @@ Base.similar(hv::GradedBipolarHV) = GradedBipolarHV(length(hv)) LinearAlgebra.normalize!(hv::GradedBipolarHV) = clamp!(hv.v, -1, 1) -# TRAITS -# ------ +# ============================================================================ +# Type Traits for Dispatch +# ============================================================================ + +""" + HVTraits +Abstract type for hypervector storage traits. + +Used for dispatch to select appropriate algorithms based on underlying storage. +""" abstract type HVTraits end +""" + HVByteVec <: HVTraits + +Trait for hypervectors stored as regular byte-based vectors. + +Used for most hypervector types that store elements as numeric values. +""" struct HVByteVec <: HVTraits end +""" + HVBitVec <: HVTraits + +Trait for hypervectors stored as bit vectors. + +Used for memory-efficient binary and bipolar hypervectors. +""" struct HVBitVec <: HVTraits end +""" + vectype(hv::AbstractHV) -> HVTraits + +Get the storage trait for a hypervector type. + +Returns either `HVByteVec` or `HVBitVec` depending on underlying storage. + +# Examples +```julia +julia> vectype(BinaryHV(10)) +HVBitVec() + +julia> vectype(RealHV(10)) +HVByteVec() +``` +""" vectype(::AbstractHV) = HVByteVec vectype(::BinaryHV) = HVBitVec vectype(::BipolarHV) = HVBitVec diff --git a/src/vectors.jl b/src/vectors.jl deleted file mode 100644 index f235554..0000000 --- a/src/vectors.jl +++ /dev/null @@ -1,123 +0,0 @@ -#= -vectors.jl; Implements the interface for HDV -=# - -validindex(i, n) = i < 1 ? validindex(i + n, n) : (i > n ? validindex(i - n, n) : i) - -# random numbers in an interval [l, u] -@inline function randinterval(T::Type, n, l, u) - @assert l < u "The lower bound should be belowe the upper bound" - return rand(T, n) .* T(u - l) .+ T(l) -end - - -abstract type AbstractHDV{T} <: AbstractVector{T} end - -# taking the indices takes a long time=> remove! - -@inline Base.getindex(hdv::AbstractHDV, i) = @inbounds hdv.v[validindex(i - hdv.offset, length(hdv))] .|> normalizer(hdv) - -Base.size(hdv::AbstractHDV) = size(hdv.v) - -@inline Base.setindex!(hdv::AbstractHDV, val, i) = @inbounds (hdv.v[validindex(i - hdv.offset, length(hdv))] = val) - -Base.Vector(hdv::AbstractHDV) = collect(hdv) - -#= -Base.iterate(hdv::AbstractHDV, state=1) = state > length(hdv) ? - nothing : - (normalizer(hdv)(hdv.v[validindex(i-hdv.offset, length(hdv))]), state+1) -=# - -normalizer(::AbstractHDV) = identity # normalizer does nothing by default - -function normalize!(hdv::AbstractHDV) - hdv.v .= normalizer(hdv).(hdv.v) - hdv.m = 1 - return hdv -end - -getvector(hdv::AbstractHDV) = hdv.v - -Base.sum(hdv::AbstractHDV) = sum(hdv.v) - -# We always provide a constructor with optinal dimensionality (n=10,000 by default) and -# a method `similar`. - -mutable struct BipolarHDV <: AbstractHDV{Int} - v::Vector{Int} - offset::Int - m::Int - BipolarHDV(v::Vector, offset = 0, m = 1) = new(v, offset, m) -end - -BipolarHDV(n::Int = 10_000) = BipolarHDV(rand((-1, 1), n)) - -Base.similar(hdv::BipolarHDV) = BipolarHDV(similar(hdv.v), 0, 0) - -normalizer(::BipolarHDV) = vᵢ -> clamp(vᵢ, -1, 1) - -# `BinaryHDV` contain binary vectors. - -mutable struct BinaryHDV <: AbstractHDV{Bool} - v::Vector{Int} - offset::Int - m::Int - BinaryHDV(v::AbstractVector, offset = 0, m = 1) = new(v, offset, m) -end - - -BinaryHDV(n::Int = 10_000) = BinaryHDV(rand(0:1, n)) - -Base.similar(hdv::BinaryHDV) = BinaryHDV(similar(hdv.v), 0, 0) - -normalizer(hdv::BinaryHDV) = vᵢ -> 2vᵢ > hdv.m - - -# GradedBipolarHDV are vectors in $[-1, 1]^n$, allowing for graded relations. - -mutable struct GradedBipolarHDV{T <: Real} <: AbstractHDV{T} - v::Vector{T} - offset::Int - m::T - GradedBipolarHDV(v::AbstractVector, offset = 0, m = 1) = new{eltype(v)}(v, offset, m) -end - -GradedBipolarHDV(T::Type, n::Int = 10_000; l = -0.8, u = 0.8) = GradedBipolarHDV(randinterval(T, n, l, u)) -GradedBipolarHDV(n::Int = 10_00; l = -0.8, u = 0.8) = GradedBipolarHDV(Float32, n; l, u) - -Base.similar(hdv::GradedBipolarHDV) = GradedBipolarHDV(similar(hdv.v), 0, 0) - -#normalizer(hdv::GradedBipolarHDV) = vᵢ -> clamp(vᵢ, -1, 1) - -mutable struct GradedHDV{T <: Real} <: AbstractHDV{T} - v::Vector{T} - offset::Int - m::Int - GradedHDV(v::AbstractVector, offset = 0, m = 1) = new{eltype(v)}(v, offset, m) -end - -GradedHDV(T::Type, n::Int = 10_000; l = 0.2, u = 0.8) = GradedHDV(randinterval(T, n, l, u)) -GradedHDV(n::Int = 10_000; l = 0.2, u = 0.8) = GradedHDV(Float32, n; l, u) - -Base.similar(hdv::GradedHDV) = GradedHDV(similar(hdv.v), 0, 0) - -#normalizer(hdv::GradedHDV) = vᵢ -> clamp(vᵢ, -1, 1) - - -# Finally, `RealHDV` contain real values, drawn from a standard normal distribution -# by default. - -mutable struct RealHDV{T <: Real} <: AbstractHDV{T} - v::Vector{T} - offset::Int - m::Float64 - RealHDV(v::Vector{T}, offset = 0, m = 1) where {T} = new{T}(v, offset, m) -end - -RealHDV(n::Int = 10_000) = RealHDV(randn(n)) -RealHDV(T::Type{<:Real}, n::Int = 10_000) = RealHDV(T.(randn(n)), 0) - -normalizer(hdv::RealHDV) = vᵢ -> vᵢ / sqrt(hdv.m) - -Base.similar(hdv::RealHDV) = RealHDV(similar(hdv.v), 0, 0)