Protocol Documentation

Table of Contents

algorithm_id.proto

Top

AlgorithmId

Enum for the different types of algorithms.

References

[1] R. M. Neal, Markov Chain Sampling Methods for Dirichlet Process Mixture Models. JCGS(2000)

[2] H. Ishwaran and L. F. James, Gibbs Sampling Methods for Stick-Breaking Priors. JASA(2001)

[3] S. Jain and R. M. Neal, A Split-Merge Markov Chain Monte Carlo Procedure for the Dirichlet Process Mixture Model. JCGS (2004)

[4] M. Kalli, J. Griffin and S. G. Walker, Slice sampling mixture models. Stat and Comp. (2011)

NameNumberDescription
UNKNOWN_ALGORITHM 0

Neal2 1

Neal's Algorithm 2, see [1]

Neal3 2

Neal's Algorithm 3, see [1]

Neal8 3

Neal's Algorithm 8, see [1]

BlockedGibbs 4

Ishwaran and James Blocked Gibbs, see [2]

SplitMerge 5

Jain and Neal's Split&Merge, see [3]. NOT IMPLEMENTED YET!

Slice 6

Slice sampling, see [4]. NOT IMPLEMENTED YET!

algorithm_params.proto

Top

AlgorithmParams

Parameters used in the BaseAlgorithm class and childs.

FieldTypeLabelDescription
algo_id string

Id of the Algorithm. Must match the ones in the AlgorithmId enum

rng_seed uint32

Seed for the random number generator

iterations uint32

Total number of iterations of the MCMC chain

burnin uint32

Number of iterations to discard as burn-in

init_num_clusters uint32

Number of clusters to initialize the algorithm. It may be overridden by conditional mixings for which the number of components is fixed (e.g. TruncatedSBMixing). In this case, this value is ignored.

neal8_n_aux uint32

Number of auxiliary unique values for the Neal8 algorithm

splitmerge_n_restr_gs_updates uint32

Number of restricted GS scans for each MH step.

splitmerge_n_mh_updates uint32

Number of MH updates for each iteration of Split and Merge algorithm.

splitmerge_n_full_gs_updates uint32

Number of full GS scans for each iteration of Split and Merge algorithm.

algorithm_state.proto

Top

AlgorithmState

This message represents the state of a Gibbs sampler for

a mixture model. All algorithms must be able to handle this

message, by filling it with the current state of the sampler

in the `get_state_as_proto` method.

FieldTypeLabelDescription
cluster_states AlgorithmState.ClusterState repeated

The state of each cluster

cluster_allocs int32 repeated

Vector of allocations into clusters, one for each observation

mixing_state MixingState

The state of the `Mixing`

iteration_num int32

The iteration number

hierarchy_hypers AlgorithmState.HierarchyHypers

The current values of the hyperparameters of the hierarchy

AlgorithmState.ClusterState

FieldTypeLabelDescription
uni_ls_state UniLSState

State of a univariate location-scale family

multi_ls_state MultiLSState

State of a multivariate location-scale family

lin_reg_uni_ls_state LinRegUniLSState

State of a linear regression univariate location-scale family

general_state Vector

Just a vector of doubles

fa_state FAState

State of a Mixture of Factor Analysers

cardinality int32

How many observations are in this cluster

AlgorithmState.HierarchyHypers

FieldTypeLabelDescription
general_state Vector

nnig_state NIGDistribution

nnw_state NWDistribution

lin_reg_uni_state MultiNormalIGDistribution

nnxig_state NxIGDistribution

fa_state FAPriorDistribution

distribution.proto

Top

BetaDistribution

Parameters defining a beta distribution

FieldTypeLabelDescription
shape_a double

shape_b double

GammaDistribution

Parameters defining a gamma distribution with density

f(x) = x^(shape-1) * exp(-rate * x) / Gamma(shape)

FieldTypeLabelDescription
shape double

rate double

InvWishartDistribution

Parameters defining an Inverse Wishart distribution

FieldTypeLabelDescription
deg_free double

scale Matrix

MultiNormalDistribution

Parameters defining a multivariate normal distribution

FieldTypeLabelDescription
mean Vector

var Matrix

MultiNormalIGDistribution

Parameters for the Normal Inverse Gamma distribution commonly employed in

linear regression models, with density

f(beta, var) = N(beta | mean, var * var_scaling^{-1}) * IG(var | shape, scale)

FieldTypeLabelDescription
mean Vector

var_scaling Matrix

shape double

scale double

NIGDistribution

Parameters of a Normal Inverse-Gamma distribution

with density

f(x, y) = N(x | mu, y/var_scaling) * IG(y | shape, scale)

FieldTypeLabelDescription
mean double

var_scaling double

shape double

scale double

NWDistribution

Parameters of a Normal Wishart distribution

with density

f(x, y) = N(x | mu, (y * var_scaling)^{-1}) * IW(y | deg_free, scale)

where x is a vector and y is a matrix (spd)

FieldTypeLabelDescription
mean Vector

var_scaling double

deg_free double

scale Matrix

scale_chol Matrix

NxIGDistribution

Parameters of a Normal x Inverse-Gamma distribution

with density

f(x, y) = N(x | mu, var) * IG(y | shape, scale)

FieldTypeLabelDescription
mean double

var double

shape double

scale double

UniNormalDistribution

Parameters defining a univariate normal distribution

FieldTypeLabelDescription
mean double

var double

hierarchy_id.proto

Top

HierarchyId

Enum for the different types of Hierarchy.

NameNumberDescription
UNKNOWN_HIERARCHY 0

NNIG 1

Normal - Normal Inverse Gamma

NNW 2

Normal - Normal Wishart

LinRegUni 3

Linear Regression (univariate response)

LapNIG 4

Laplace - Normal Inverse Gamma

FA 5

Factor Analysers

NNxIG 6

Normal - Normal x Inverse Gamma

PythonHier 7

Generic python hierarchy

hierarchy_prior.proto

Top

EmptyPrior

FieldTypeLabelDescription
fake_field double

FAPrior

FieldTypeLabelDescription
fixed_values FAPriorDistribution

FAPriorDistribution

FieldTypeLabelDescription
mutilde Vector

beta Vector

phi double

alpha0 double

q uint32

LapNIGPrior

FieldTypeLabelDescription
fixed_values LapNIGState

LapNIGState

Prior for the parameters of the base measure in a Laplace - Normal Inverse Gamma hierarchy

FieldTypeLabelDescription
mean double

var double

shape double

scale double

mh_mean_var double

mh_log_scale_var double

LinRegUniPrior

Prior for the parameters of the base measure in a Normal mixture model with a covariate-dependent

location.

FieldTypeLabelDescription
fixed_values MultiNormalIGDistribution

NNIGPrior

Prior for the parameters of the base measure in a Normal-Normal Inverse Gamma hierarchy

FieldTypeLabelDescription
fixed_values NIGDistribution

no prior, just fixed values

normal_mean_prior NNIGPrior.NormalMeanPrior

prior on the mean

ngg_prior NNIGPrior.NGGPrior

prior on the mean, var_scaling, and scale

NNIGPrior.NGGPrior

FieldTypeLabelDescription
mean_prior UniNormalDistribution

var_scaling_prior GammaDistribution

shape double

scale_prior GammaDistribution

NNIGPrior.NormalMeanPrior

FieldTypeLabelDescription
mean_prior UniNormalDistribution

var_scaling double

shape double

scale double

NNWPrior

Prior for the parameters of the base measure in a Normal-Normal Wishart hierarchy

FieldTypeLabelDescription
fixed_values NWDistribution

no prior, just fixed values

normal_mean_prior NNWPrior.NormalMeanPrior

prior on the mean

ngiw_prior NNWPrior.NGIWPrior

prior on the mean, var_scaling, and scale

NNWPrior.NGIWPrior

FieldTypeLabelDescription
mean_prior MultiNormalDistribution

var_scaling_prior GammaDistribution

deg_free double

scale_prior InvWishartDistribution

NNWPrior.NormalMeanPrior

FieldTypeLabelDescription
mean_prior MultiNormalDistribution

var_scaling double

deg_free double

scale Matrix

NNxIGPrior

Prior for the parameters of the base measure in a Normal-Normal x Inverse Gamma hierarchy

FieldTypeLabelDescription
fixed_values NxIGDistribution

no prior, just fixed values

PythonHierPrior

Definition of a generic container for the prior parameters to be used in Python

FieldTypeLabelDescription
values Vector

values are modified from python

ls_state.proto

Top

FAState

FieldTypeLabelDescription
mu Vector

psi Vector

eta Matrix

lambda Matrix

LinRegUniLSState

Parameters of a univariate linear regression

FieldTypeLabelDescription
regression_coeffs Vector

regression coefficients

var double

variance of the noise

MultiLSState

Parameters of a multivariate location-scale family of distributions,

parameterized by mean and precision (inverse of variance). For

convenience, we also store the Cholesky factor of the precision matrix.

FieldTypeLabelDescription
mean Vector

prec Matrix

prec_chol Matrix

UniLSState

Parameters of a univariate location-scale family of distributions.

FieldTypeLabelDescription
mean double

var double

matrix.proto

Top

Matrix

Message representing a matrix of doubles.

FieldTypeLabelDescription
rows int32

number of rows

cols int32

number of columns

data double repeated

matrix elements

rowmajor bool

if true, the data is read in row-major order

Vector

Message representing a vector of doubles.

FieldTypeLabelDescription
size int32

number of elements in the vector

data double repeated

vector elements

mixing_id.proto

Top

MixingId

Enum for the different types of Mixing.

NameNumberDescription
UNKNOWN_MIXING 0

DP 1

Dirichlet Process

PY 2

Pitman-Yor Process

LogSB 3

Logit Stick-Breaking Process

TruncSB 4

Truncated Stick-Breaking Process

MFM 5

Mixture of finite mixtures

PythonMix 6

Generic python mixing

mixing_prior.proto

Top

DPPrior

Prior for the concentration parameter of a Dirichlet process

FieldTypeLabelDescription
fixed_value DPState

No prior, just a fixed value

gamma_prior DPPrior.GammaPrior

Gamma prior on the total mass

DPPrior.GammaPrior

FieldTypeLabelDescription
totalmass_prior GammaDistribution

LogSBPrior

Definition of the parameters of a Logit-Stick Breaking process.

FieldTypeLabelDescription
normal_prior MultiNormalDistribution

Normal prior on the regression coefficients

step_size double

Steps size for the MALA algorithm used for posterior inference (TODO: move?)

num_components uint32

Number of components in the process

MFMPrior

Prior for the Poisson rate and Dirichlet parameters of a MFM (Finite Dirichlet) process.

For the moment, we only support fixed values

FieldTypeLabelDescription
fixed_value MFMState

No prior, just a fixed value

PYPrior

Prior for the strength and discount parameters of a Pitman-Yor process.

For the moment, we only support fixed values

FieldTypeLabelDescription
fixed_values PYState

PythonMixPrior

Definition of a generic container for the prior parameters to be used in Python

FieldTypeLabelDescription
values Vector

TruncSBPrior

Definition of the parameters of a truncated Stick-Breaking process

FieldTypeLabelDescription
beta_priors TruncSBPrior.BetaPriors

General stick-breaking distributions

dp_prior TruncSBPrior.DPPrior

Truncated Dirichlet process

py_prior TruncSBPrior.PYPrior

Truncated Pitman-Yor process

mfm_prior TruncSBPrior.MFMPrior

pm_prior PythonMixPrior

num_components uint32

Number of components in the process

TruncSBPrior.BetaPriors

FieldTypeLabelDescription
beta_distributions BetaDistribution repeated

General stick-breaking distributions

TruncSBPrior.DPPrior

FieldTypeLabelDescription
totalmass double

Truncated Dirichlet process

TruncSBPrior.MFMPrior

FieldTypeLabelDescription
totalmass double

Truncated Dirichlet process

TruncSBPrior.PYPrior

FieldTypeLabelDescription
strength double

Truncated Pitman-Yor process

discount double

mixing_state.proto

Top

DPState

State of a Dirichlet process

FieldTypeLabelDescription
totalmass double

the total mass of the DP

LogSBState

State of a Logit-Stick Breaking process

FieldTypeLabelDescription
regression_coeffs Matrix

Num_Components x Num_Features matrix. Each row is the regression coefficients for a component.

MFMState

State of a MFM (Finite Dirichlet) process

FieldTypeLabelDescription
lambda double

rate parameter of Poisson prior on number of compunents of the MFM

gamma double

parameter of the dirichlet distribution for the mixing weights

MixingState

Wrapper of all possible mixing states into a single oneof

FieldTypeLabelDescription
dp_state DPState

py_state PYState

log_sb_state LogSBState

trunc_sb_state TruncSBState

mfm_state MFMState

general_state Vector

PYState

State of a Pitman-Yor process

FieldTypeLabelDescription
strength double

discount double

TruncSBState

State of a truncated sitck breaking process. For convenice we store also the logarithm of the weights

FieldTypeLabelDescription
sticks Vector

logweights Vector

semihdp.proto

Top

SemiHdpParams

FieldTypeLabelDescription
pseudo_prior SemiHdpParams.PseudoPriorParams

dirichlet_concentration double

rest_allocs_update string

Either "full", "metro_base", "metro_dist"

totalmass_rest double

totalmass_hdp double

w_prior SemiHdpParams.WPriorParams

SemiHdpParams.PseudoPriorParams

FieldTypeLabelDescription
card_weight double

mean_perturb_sd double

var_perturb_frac double

SemiHdpParams.WPriorParams

FieldTypeLabelDescription
shape1 double

shape2 double

SemiHdpState

FieldTypeLabelDescription
restaurants SemiHdpState.RestaurantState repeated

groups SemiHdpState.GroupState repeated

taus SemiHdpState.ClusterState repeated

c int32 repeated

w double

SemiHdpState.ClusterState

FieldTypeLabelDescription
uni_ls_state UniLSState

multi_ls_state MultiLSState

lin_reg_uni_ls_state LinRegUniLSState

general_state Vector

cardinality int32

SemiHdpState.GroupState

FieldTypeLabelDescription
cluster_allocs int32 repeated

SemiHdpState.RestaurantState

FieldTypeLabelDescription
theta_stars SemiHdpState.ClusterState repeated

n_by_clus int32 repeated

table_to_shared int32 repeated

table_to_idio int32 repeated

Scalar Value Types

.proto TypeNotesC++JavaPythonGoC#PHPRuby
double double double float float64 double float Float
float float float float float32 float float Float
int32 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. int32 int int int32 int integer Bignum or Fixnum (as required)
int64 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. int64 long int/long int64 long integer/string Bignum
uint32 Uses variable-length encoding. uint32 int int/long uint32 uint integer Bignum or Fixnum (as required)
uint64 Uses variable-length encoding. uint64 long int/long uint64 ulong integer/string Bignum or Fixnum (as required)
sint32 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. int32 int int int32 int integer Bignum or Fixnum (as required)
sint64 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. int64 long int/long int64 long integer/string Bignum
fixed32 Always four bytes. More efficient than uint32 if values are often greater than 2^28. uint32 int int uint32 uint integer Bignum or Fixnum (as required)
fixed64 Always eight bytes. More efficient than uint64 if values are often greater than 2^56. uint64 long int/long uint64 ulong integer/string Bignum
sfixed32 Always four bytes. int32 int int int32 int integer Bignum or Fixnum (as required)
sfixed64 Always eight bytes. int64 long int/long int64 long integer/string Bignum
bool bool boolean boolean bool bool boolean TrueClass/FalseClass
string A string must always contain UTF-8 encoded or 7-bit ASCII text. string String str/unicode string string string String (UTF-8)
bytes May contain any arbitrary sequence of bytes. string ByteString str []byte ByteString string String (ASCII-8BIT)