Enum for the different types of algorithms.
References
[1] R. M. Neal, Markov Chain Sampling Methods for Dirichlet Process Mixture Models. JCGS(2000)
[2] H. Ishwaran and L. F. James, Gibbs Sampling Methods for Stick-Breaking Priors. JASA(2001)
[3] S. Jain and R. M. Neal, A Split-Merge Markov Chain Monte Carlo Procedure for the Dirichlet Process Mixture Model. JCGS (2004)
[4] M. Kalli, J. Griffin and S. G. Walker, Slice sampling mixture models. Stat and Comp. (2011)
Name | Number | Description |
UNKNOWN_ALGORITHM | 0 | |
Neal2 | 1 | Neal's Algorithm 2, see [1] |
Neal3 | 2 | Neal's Algorithm 3, see [1] |
Neal8 | 3 | Neal's Algorithm 8, see [1] |
BlockedGibbs | 4 | Ishwaran and James Blocked Gibbs, see [2] |
SplitMerge | 5 | Jain and Neal's Split&Merge, see [3]. NOT IMPLEMENTED YET! |
Slice | 6 | Slice sampling, see [4]. NOT IMPLEMENTED YET! |
Parameters used in the BaseAlgorithm class and childs.
Field | Type | Label | Description |
algo_id | string | Id of the Algorithm. Must match the ones in the AlgorithmId enum |
|
rng_seed | uint32 | Seed for the random number generator |
|
iterations | uint32 | Total number of iterations of the MCMC chain |
|
burnin | uint32 | Number of iterations to discard as burn-in |
|
init_num_clusters | uint32 | Number of clusters to initialize the algorithm. It may be overridden by conditional mixings for which the number of components is fixed (e.g. TruncatedSBMixing). In this case, this value is ignored. |
|
neal8_n_aux | uint32 | Number of auxiliary unique values for the Neal8 algorithm |
|
splitmerge_n_restr_gs_updates | uint32 | Number of restricted GS scans for each MH step. |
|
splitmerge_n_mh_updates | uint32 | Number of MH updates for each iteration of Split and Merge algorithm. |
|
splitmerge_n_full_gs_updates | uint32 | Number of full GS scans for each iteration of Split and Merge algorithm. |
This message represents the state of a Gibbs sampler for
a mixture model. All algorithms must be able to handle this
message, by filling it with the current state of the sampler
in the `get_state_as_proto` method.
Field | Type | Label | Description |
cluster_states | AlgorithmState.ClusterState | repeated | The state of each cluster |
cluster_allocs | int32 | repeated | Vector of allocations into clusters, one for each observation |
mixing_state | MixingState | The state of the `Mixing` |
|
iteration_num | int32 | The iteration number |
|
hierarchy_hypers | AlgorithmState.HierarchyHypers | The current values of the hyperparameters of the hierarchy |
Field | Type | Label | Description |
uni_ls_state | UniLSState | State of a univariate location-scale family |
|
multi_ls_state | MultiLSState | State of a multivariate location-scale family |
|
lin_reg_uni_ls_state | LinRegUniLSState | State of a linear regression univariate location-scale family |
|
general_state | Vector | Just a vector of doubles |
|
fa_state | FAState | State of a Mixture of Factor Analysers |
|
cardinality | int32 | How many observations are in this cluster |
Field | Type | Label | Description |
general_state | Vector |
|
|
nnig_state | NIGDistribution |
|
|
nnw_state | NWDistribution |
|
|
lin_reg_uni_state | MultiNormalIGDistribution |
|
|
nnxig_state | NxIGDistribution |
|
|
fa_state | FAPriorDistribution |
|
Parameters defining a beta distribution
Field | Type | Label | Description |
shape_a | double |
|
|
shape_b | double |
|
Parameters defining a gamma distribution with density
f(x) = x^(shape-1) * exp(-rate * x) / Gamma(shape)
Field | Type | Label | Description |
shape | double |
|
|
rate | double |
|
Parameters defining an Inverse Wishart distribution
Field | Type | Label | Description |
deg_free | double |
|
|
scale | Matrix |
|
Parameters defining a multivariate normal distribution
Field | Type | Label | Description |
mean | Vector |
|
|
var | Matrix |
|
Parameters for the Normal Inverse Gamma distribution commonly employed in
linear regression models, with density
f(beta, var) = N(beta | mean, var * var_scaling^{-1}) * IG(var | shape, scale)
Field | Type | Label | Description |
mean | Vector |
|
|
var_scaling | Matrix |
|
|
shape | double |
|
|
scale | double |
|
Parameters of a Normal Inverse-Gamma distribution
with density
f(x, y) = N(x | mu, y/var_scaling) * IG(y | shape, scale)
Field | Type | Label | Description |
mean | double |
|
|
var_scaling | double |
|
|
shape | double |
|
|
scale | double |
|
Parameters of a Normal Wishart distribution
with density
f(x, y) = N(x | mu, (y * var_scaling)^{-1}) * IW(y | deg_free, scale)
where x is a vector and y is a matrix (spd)
Field | Type | Label | Description |
mean | Vector |
|
|
var_scaling | double |
|
|
deg_free | double |
|
|
scale | Matrix |
|
|
scale_chol | Matrix |
|
Parameters of a Normal x Inverse-Gamma distribution
with density
f(x, y) = N(x | mu, var) * IG(y | shape, scale)
Field | Type | Label | Description |
mean | double |
|
|
var | double |
|
|
shape | double |
|
|
scale | double |
|
Parameters defining a univariate normal distribution
Field | Type | Label | Description |
mean | double |
|
|
var | double |
|
Enum for the different types of Hierarchy.
Name | Number | Description |
UNKNOWN_HIERARCHY | 0 | |
NNIG | 1 | Normal - Normal Inverse Gamma |
NNW | 2 | Normal - Normal Wishart |
LinRegUni | 3 | Linear Regression (univariate response) |
LapNIG | 4 | Laplace - Normal Inverse Gamma |
FA | 5 | Factor Analysers |
NNxIG | 6 | Normal - Normal x Inverse Gamma |
PythonHier | 7 | Generic python hierarchy |
Field | Type | Label | Description |
fake_field | double |
|
Field | Type | Label | Description |
fixed_values | FAPriorDistribution |
|
Field | Type | Label | Description |
mutilde | Vector |
|
|
beta | Vector |
|
|
phi | double |
|
|
alpha0 | double |
|
|
q | uint32 |
|
Field | Type | Label | Description |
fixed_values | LapNIGState |
|
Prior for the parameters of the base measure in a Laplace - Normal Inverse Gamma hierarchy
Field | Type | Label | Description |
mean | double |
|
|
var | double |
|
|
shape | double |
|
|
scale | double |
|
|
mh_mean_var | double |
|
|
mh_log_scale_var | double |
|
Prior for the parameters of the base measure in a Normal mixture model with a covariate-dependent
location.
Field | Type | Label | Description |
fixed_values | MultiNormalIGDistribution |
|
Prior for the parameters of the base measure in a Normal-Normal Inverse Gamma hierarchy
Field | Type | Label | Description |
fixed_values | NIGDistribution | no prior, just fixed values |
|
normal_mean_prior | NNIGPrior.NormalMeanPrior | prior on the mean |
|
ngg_prior | NNIGPrior.NGGPrior | prior on the mean, var_scaling, and scale |
Field | Type | Label | Description |
mean_prior | UniNormalDistribution |
|
|
var_scaling_prior | GammaDistribution |
|
|
shape | double |
|
|
scale_prior | GammaDistribution |
|
Field | Type | Label | Description |
mean_prior | UniNormalDistribution |
|
|
var_scaling | double |
|
|
shape | double |
|
|
scale | double |
|
Prior for the parameters of the base measure in a Normal-Normal Wishart hierarchy
Field | Type | Label | Description |
fixed_values | NWDistribution | no prior, just fixed values |
|
normal_mean_prior | NNWPrior.NormalMeanPrior | prior on the mean |
|
ngiw_prior | NNWPrior.NGIWPrior | prior on the mean, var_scaling, and scale |
Field | Type | Label | Description |
mean_prior | MultiNormalDistribution |
|
|
var_scaling_prior | GammaDistribution |
|
|
deg_free | double |
|
|
scale_prior | InvWishartDistribution |
|
Field | Type | Label | Description |
mean_prior | MultiNormalDistribution |
|
|
var_scaling | double |
|
|
deg_free | double |
|
|
scale | Matrix |
|
Prior for the parameters of the base measure in a Normal-Normal x Inverse Gamma hierarchy
Field | Type | Label | Description |
fixed_values | NxIGDistribution | no prior, just fixed values |
Definition of a generic container for the prior parameters to be used in Python
Field | Type | Label | Description |
values | Vector | values are modified from python |
Field | Type | Label | Description |
mu | Vector |
|
|
psi | Vector |
|
|
eta | Matrix |
|
|
lambda | Matrix |
|
Parameters of a univariate linear regression
Field | Type | Label | Description |
regression_coeffs | Vector | regression coefficients |
|
var | double | variance of the noise |
Parameters of a multivariate location-scale family of distributions,
parameterized by mean and precision (inverse of variance). For
convenience, we also store the Cholesky factor of the precision matrix.
Field | Type | Label | Description |
mean | Vector |
|
|
prec | Matrix |
|
|
prec_chol | Matrix |
|
Parameters of a univariate location-scale family of distributions.
Field | Type | Label | Description |
mean | double |
|
|
var | double |
|
Message representing a matrix of doubles.
Field | Type | Label | Description |
rows | int32 | number of rows |
|
cols | int32 | number of columns |
|
data | double | repeated | matrix elements |
rowmajor | bool | if true, the data is read in row-major order |
Message representing a vector of doubles.
Field | Type | Label | Description |
size | int32 | number of elements in the vector |
|
data | double | repeated | vector elements |
Enum for the different types of Mixing.
Name | Number | Description |
UNKNOWN_MIXING | 0 | |
DP | 1 | Dirichlet Process |
PY | 2 | Pitman-Yor Process |
LogSB | 3 | Logit Stick-Breaking Process |
TruncSB | 4 | Truncated Stick-Breaking Process |
MFM | 5 | Mixture of finite mixtures |
PythonMix | 6 | Generic python mixing |
Prior for the concentration parameter of a Dirichlet process
Field | Type | Label | Description |
fixed_value | DPState | No prior, just a fixed value |
|
gamma_prior | DPPrior.GammaPrior | Gamma prior on the total mass |
Field | Type | Label | Description |
totalmass_prior | GammaDistribution |
|
Definition of the parameters of a Logit-Stick Breaking process.
Field | Type | Label | Description |
normal_prior | MultiNormalDistribution | Normal prior on the regression coefficients |
|
step_size | double | Steps size for the MALA algorithm used for posterior inference (TODO: move?) |
|
num_components | uint32 | Number of components in the process |
Prior for the Poisson rate and Dirichlet parameters of a MFM (Finite Dirichlet) process.
For the moment, we only support fixed values
Field | Type | Label | Description |
fixed_value | MFMState | No prior, just a fixed value |
Prior for the strength and discount parameters of a Pitman-Yor process.
For the moment, we only support fixed values
Field | Type | Label | Description |
fixed_values | PYState |
|
Definition of a generic container for the prior parameters to be used in Python
Field | Type | Label | Description |
values | Vector |
|
Definition of the parameters of a truncated Stick-Breaking process
Field | Type | Label | Description |
beta_priors | TruncSBPrior.BetaPriors | General stick-breaking distributions |
|
dp_prior | TruncSBPrior.DPPrior | Truncated Dirichlet process |
|
py_prior | TruncSBPrior.PYPrior | Truncated Pitman-Yor process |
|
mfm_prior | TruncSBPrior.MFMPrior |
|
|
pm_prior | PythonMixPrior |
|
|
num_components | uint32 | Number of components in the process |
Field | Type | Label | Description |
beta_distributions | BetaDistribution | repeated | General stick-breaking distributions |
Field | Type | Label | Description |
totalmass | double | Truncated Dirichlet process |
Field | Type | Label | Description |
totalmass | double | Truncated Dirichlet process |
Field | Type | Label | Description |
strength | double | Truncated Pitman-Yor process |
|
discount | double |
|
State of a Dirichlet process
Field | Type | Label | Description |
totalmass | double | the total mass of the DP |
State of a Logit-Stick Breaking process
Field | Type | Label | Description |
regression_coeffs | Matrix | Num_Components x Num_Features matrix. Each row is the regression coefficients for a component. |
State of a MFM (Finite Dirichlet) process
Field | Type | Label | Description |
lambda | double | rate parameter of Poisson prior on number of compunents of the MFM |
|
gamma | double | parameter of the dirichlet distribution for the mixing weights |
Wrapper of all possible mixing states into a single oneof
Field | Type | Label | Description |
dp_state | DPState |
|
|
py_state | PYState |
|
|
log_sb_state | LogSBState |
|
|
trunc_sb_state | TruncSBState |
|
|
mfm_state | MFMState |
|
|
general_state | Vector |
|
State of a Pitman-Yor process
Field | Type | Label | Description |
strength | double |
|
|
discount | double |
|
State of a truncated sitck breaking process. For convenice we store also the logarithm of the weights
Field | Type | Label | Description |
sticks | Vector |
|
|
logweights | Vector |
|
Field | Type | Label | Description |
pseudo_prior | SemiHdpParams.PseudoPriorParams |
|
|
dirichlet_concentration | double |
|
|
rest_allocs_update | string | Either "full", "metro_base", "metro_dist" |
|
totalmass_rest | double |
|
|
totalmass_hdp | double |
|
|
w_prior | SemiHdpParams.WPriorParams |
|
Field | Type | Label | Description |
card_weight | double |
|
|
mean_perturb_sd | double |
|
|
var_perturb_frac | double |
|
Field | Type | Label | Description |
shape1 | double |
|
|
shape2 | double |
|
Field | Type | Label | Description |
restaurants | SemiHdpState.RestaurantState | repeated |
|
groups | SemiHdpState.GroupState | repeated |
|
taus | SemiHdpState.ClusterState | repeated |
|
c | int32 | repeated |
|
w | double |
|
Field | Type | Label | Description |
uni_ls_state | UniLSState |
|
|
multi_ls_state | MultiLSState |
|
|
lin_reg_uni_ls_state | LinRegUniLSState |
|
|
general_state | Vector |
|
|
cardinality | int32 |
|
Field | Type | Label | Description |
cluster_allocs | int32 | repeated |
|
Field | Type | Label | Description |
theta_stars | SemiHdpState.ClusterState | repeated |
|
n_by_clus | int32 | repeated |
|
table_to_shared | int32 | repeated |
|
table_to_idio | int32 | repeated |
|
.proto Type | Notes | C++ | Java | Python | Go | C# | PHP | Ruby |
double | double | double | float | float64 | double | float | Float | |
float | float | float | float | float32 | float | float | Float | |
int32 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. | int32 | int | int | int32 | int | integer | Bignum or Fixnum (as required) |
int64 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. | int64 | long | int/long | int64 | long | integer/string | Bignum |
uint32 | Uses variable-length encoding. | uint32 | int | int/long | uint32 | uint | integer | Bignum or Fixnum (as required) |
uint64 | Uses variable-length encoding. | uint64 | long | int/long | uint64 | ulong | integer/string | Bignum or Fixnum (as required) |
sint32 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. | int32 | int | int | int32 | int | integer | Bignum or Fixnum (as required) |
sint64 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. | int64 | long | int/long | int64 | long | integer/string | Bignum |
fixed32 | Always four bytes. More efficient than uint32 if values are often greater than 2^28. | uint32 | int | int | uint32 | uint | integer | Bignum or Fixnum (as required) |
fixed64 | Always eight bytes. More efficient than uint64 if values are often greater than 2^56. | uint64 | long | int/long | uint64 | ulong | integer/string | Bignum |
sfixed32 | Always four bytes. | int32 | int | int | int32 | int | integer | Bignum or Fixnum (as required) |
sfixed64 | Always eight bytes. | int64 | long | int/long | int64 | long | integer/string | Bignum |
bool | bool | boolean | boolean | bool | bool | boolean | TrueClass/FalseClass | |
string | A string must always contain UTF-8 encoded or 7-bit ASCII text. | string | String | str/unicode | string | string | string | String (UTF-8) |
bytes | May contain any arbitrary sequence of bytes. | string | ByteString | str | []byte | ByteString | string | String (ASCII-8BIT) |