pymc3_models.models package¶

Submodules¶

pymc3_models.models.HierarchicalLogisticRegression module¶

class pymc3_models.models.HierarchicalLogisticRegression.HierarchicalLogisticRegression[source]¶

Bases: pymc3_models.models.BayesianModel

Custom Hierachical Logistic Regression built using PyMC3.

Methods

`create_model`()	Creates and returns the PyMC3 model.
`fit`(X, y, cats[, inference_type, …])	Train the Hierarchical Logistic Regression model
`get_params`([deep])	Get parameters for this estimator.
`plot_elbo`()	Plot the ELBO values after running ADVI minibatch.
`predict`(X, cats[, num_ppc_samples])	Predicts labels of new data with a trained model
`predict_proba`(X, cats[, return_std, …])	Predicts probabilities of new data with a trained Hierarchical Logistic Regression
`score`(X, y, cats[, num_ppc_samples])	Scores new data with a trained model with sklearn’s accuracy_score.
`set_params`(**params)	Set the parameters of this estimator.

load
save

create_model()[source]¶

Creates and returns the PyMC3 model.

Note: The size of the shared variables must match the size of the training data. Otherwise, setting the shared variables later will raise an error. See http://docs.pymc.io/advanced_theano.html

Returns:
Return type:	the PyMC3 model

fit(X, y, cats, inference_type='advi', num_advi_sample_draws=10000, minibatch_size=None, inference_args=None)[source]¶

Train the Hierarchical Logistic Regression model

Parameters:

X (numpy array) – shape [num_training_samples, num_pred]
y (numpy array) – shape [num_training_samples, ]
cats (numpy array) – shape [num_training_samples, ]
inference_type (str (defaults to ‘advi’)) – specifies which inference method to call Currently, only ‘advi’ and ‘nuts’ are supported.
num_advi_sample_draws (int (defaults to 10000)) – Number of samples to draw from ADVI approximation after it has been fit; not used if inference_type != ‘advi’
minibatch_size (int (defaults to None)) – number of samples to include in each minibatch for ADVI If None, minibatch is not run.
inference_args (dict (defaults to None)) – arguments to be passed to the inference methods Check the PyMC3 docs for permissable values. If None, default values will be set.

load(file_prefix)[source]¶

Loads a saved version of the trace, and custom param files with the given file_prefix.

Parameters:	file_prefix (str) – path and prefix used to identify where to load the saved trace for this model, e.g. given file_prefix = ‘path/to/file/’ This will attempt to load ‘path/to/file/trace.pickle’. load_custom_params (bool (defaults to False)) – flag to indicate whether custom parameters should be loaded
Returns:	custom_params
Return type:	Dictionary of custom parameters

predict(X, cats, num_ppc_samples=2000)[source]¶

Predicts labels of new data with a trained model

Parameters:	X (numpy array) – shape [num_training_samples, num_pred] cats (numpy array) – shape [num_training_samples, ] num_ppc_samples (int (defaults to 2000)) – ‘samples’ parameter passed to pm.sample_ppc

predict_proba(X, cats, return_std=False, num_ppc_samples=2000)[source]¶

Predicts probabilities of new data with a trained Hierarchical Logistic Regression

Parameters:	X (numpy array) – shape [num_training_samples, num_pred] cats (numpy array) – shape [num_training_samples, ] return_std (bool (defaults to False)) – Flag of whether to return standard deviations with mean probabilities num_ppc_samples (int (defaults to 2000)) – ‘samples’ parameter passed to pm.sample_ppc

save(file_prefix)[source]¶

Saves the trace and custom params to files with the given file_prefix.

Parameters:	file_prefix (str) – path and prefix used to identify where to save the trace for this model, e.g. given file_prefix = ‘path/to/file/’ This will attempt to save to ‘path/to/file/trace.pickle’. custom_params (dict (defaults to None)) – Custom parameters to save

score(X, y, cats, num_ppc_samples=2000)[source]¶

Scores new data with a trained model with sklearn’s accuracy_score.

Parameters:	X (numpy array) – shape [num_training_samples, num_pred] y (numpy array) – shape [num_training_samples, ] cats (numpy array) – shape [num_training_samples, ] num_ppc_samples (int (defaults to 2000)) – ‘samples’ parameter passed to pm.sample_ppc

pymc3_models.models.LinearRegression module¶

class pymc3_models.models.LinearRegression.LinearRegression[source]¶

Bases: pymc3_models.models.BayesianModel

Linear Regression built using PyMC3.

Methods

`create_model`()	Creates and returns the PyMC3 model.
`fit`(X, y[, inference_type, …])	Train the Linear Regression model
`get_params`([deep])	Get parameters for this estimator.
`plot_elbo`()	Plot the ELBO values after running ADVI minibatch.
`predict`(X[, return_std, num_ppc_samples])	Predicts values of new data with a trained Linear Regression model
`score`(X, y[, num_ppc_samples])	Scores new data with a trained model with sklearn’s r2_score.
`set_params`(**params)	Set the parameters of this estimator.

load
save

create_model()[source]¶

Creates and returns the PyMC3 model.

Note: The size of the shared variables must match the size of the training data. Otherwise, setting the shared variables later will raise an error. See http://docs.pymc.io/advanced_theano.html

Returns:
Return type:	the PyMC3 model

fit(X, y, inference_type='advi', num_advi_sample_draws=10000, minibatch_size=None, inference_args=None)[source]¶

Train the Linear Regression model

Parameters:

X (numpy array) – shape [num_training_samples, num_pred]
y (numpy array) – shape [num_training_samples, ]
inference_type (str (defaults to ‘advi’)) – specifies which inference method to call Currently, only ‘advi’ and ‘nuts’ are supported.
num_advi_sample_draws (int (defaults to 10000)) – Number of samples to draw from ADVI approximation after it has been fit; not used if inference_type != ‘advi’
minibatch_size (int (defaults to None)) – number of samples to include in each minibatch for ADVI If None, minibatch is not run.
inference_args (dict (defaults to None)) – arguments to be passed to the inference methods. Check the PyMC3 docs for permissable values. If None, default values will be set.

load(file_prefix)[source]¶

Loads a saved version of the trace, and custom param files with the given file_prefix.

Parameters:	file_prefix (str) – path and prefix used to identify where to load the saved trace for this model, e.g. given file_prefix = ‘path/to/file/’ This will attempt to load ‘path/to/file/trace.pickle’. load_custom_params (bool (defaults to False)) – flag to indicate whether custom parameters should be loaded
Returns:	custom_params
Return type:	Dictionary of custom parameters

predict(X, return_std=False, num_ppc_samples=2000)[source]¶

Predicts values of new data with a trained Linear Regression model

Parameters:	X (numpy array) – shape [num_training_samples, num_pred] return_std (bool (defaults to False)) – flag of whether to return standard deviations with mean values num_ppc_samples (int (defaults to 2000)) – ‘samples’ parameter passed to pm.sample_ppc

save(file_prefix)[source]¶

Saves the trace and custom params to files with the given file_prefix.

Parameters:	file_prefix (str) – path and prefix used to identify where to save the trace for this model, e.g. given file_prefix = ‘path/to/file/’ This will attempt to save to ‘path/to/file/trace.pickle’. custom_params (dict (defaults to None)) – Custom parameters to save

score(X, y, num_ppc_samples=2000)[source]¶

Scores new data with a trained model with sklearn’s r2_score.

Parameters:	X (numpy array) – shape [num_training_samples, num_pred] y (numpy array) – shape [num_training_samples, ] num_ppc_samples (int (defaults to 2000)) – ‘samples’ parameter passed to pm.sample_ppc

pymc3_models.models.NaiveBayes module¶

class pymc3_models.models.NaiveBayes.GaussianNaiveBayes[source]¶

Bases: pymc3_models.models.BayesianModel

Naive Bayes classification built using PyMC3.

The Gaussian Naive Bayes algorithm assumes that the random variables that describe each class and each feature are independent and distributed according to Normal distributions.

Example

>>> import pymc3_models as pmo
>>>
>>> model = pmo.GaussianNaiveBayes()
>>> model.fit(X,y)
>>> model.predict_proba(X)
>>> model.predict(X)

See the documentation of the create_model method for details on the model itself.

Methods

`create_model`()	Creates and returns the PyMC3 model.
`fit`(X, y[, inference_type, …])	Train the Naive Bayes model.
`get_params`([deep])	Get parameters for this estimator.
`plot_elbo`()	Plot the ELBO values after running ADVI minibatch.
`predict`(X)	Classify new data with a trained Naive Bayes model.
`predict_proba`(X)	Predicts the probabilities that the data points belong to each category.
`score`(X, y)	Scores new data with a trained model with sklearn’s accuracy_score.
`set_params`(**params)	Set the parameters of this estimator.

load
save

create_model()[source]¶

Creates and returns the PyMC3 model.

We note \(x_{jc}\) the value of the j-th element of the data vector \(x\) conditioned on x belonging to the class \(c\). The Gaussian Naive Bayes algorithm models \(x_{jc}\) as:

\[x_{jc} \sim Normal(\mu_{jc}, \sigma_{jc})\]

While the probability that \(x\) belongs to the class \(c\) is given by the categorical distribution:

\[P(y=c|x_i) = Cat(\pi_1, \dots, \pi_C)\]

where \(\pi_i\) is the probability that a vector belongs to category \(i\).

We assume that the \(\pi_i\) follow a Dirichlet distribution:

\[\pi \sim Dirichlet(\alpha)\]

with hyperparameter \(\alpha = [1, .., 1]\). The \(\mu_{jc}\) are sampled from a Normal distribution centred on \(0\) with variance \(100\), and the \(\sigma_{jc}\) are sampled from a HalfNormal distribuion of variance \(100\):

\[ \begin{align}\begin{aligned}\mu_{jc} \sim Normal(0, 100)\\\sigma_{jc} \sim HalfNormal(100)\end{aligned}\end{align} \]

Note that the Gaussian Naive Bayes model is equivalent to a Gaussian mixture with a diagonal covariance [1].

Returns:
Return type:	A PyMC3 model

References

[1]	Murphy, K. P. (2012). Machine learning: a probabilistic perspective.

fit(X, y, inference_type='advi', num_advi_sample_draws=10000, minibatch_size=None, inference_args=None)[source]¶

Train the Naive Bayes model.

Parameters:	X (numpy array) – shape [num_training_samples, num_pred]. Contains the data points y (numpy array) – shape [num_training_samples,]. Contains the category of the data points inference_type (str (defaults to ‘advi’)) – specifies which inference method to call Currently, only ‘advi’ and ‘nuts’ are supported. num_advi_sample_draws (int (defaults to 10000)) – Number of samples to draw from ADVI approximation after it has been fit; not used if inference_type != ‘advi’ minibatch_size (int (defaults to None)) – number of samples to include in each minibatch for ADVI If None, minibatch is not run. inference_args (dict (defaults to None)) – arguments to be passed to the inference methods Check the PyMC3 docs for permissable values. If None, default values will be set.
Returns:
Return type:	The current instance of the GaussianNaiveBayes class.

load(file_profile)[source]¶

Loads a saved version of the trace, and custom param files with the given file_prefix.

Parameters:	file_prefix (str) – path and prefix used to identify where to load the saved trace for this model, e.g. given file_prefix = ‘path/to/file/’ This will attempt to load ‘path/to/file/trace.pickle’. load_custom_params (bool (defaults to False)) – flag to indicate whether custom parameters should be loaded
Returns:	custom_params
Return type:	Dictionary of custom parameters

predict(X)[source]¶

Classify new data with a trained Naive Bayes model. The output is the point estimate of the posterior predictive distribution that corresponds to the one-hot loss function.

Parameters:	X (numpy array) – shape [num_training_samples, num_pred]. Contains the data to classify
Returns:	A numpy array of shape [num_training_samples,] that contains the predicted class to which the data points belong.

predict_proba(X)[source]¶

Predicts the probabilities that the data points belong to each category.

Given a new data point \(\vec{x}\), we want to estimate the probability that it belongs to a category \(c\). Following the notations in [1], the probability reads:

\[P(y=c|\vec{x}, \mathcal{D}) = P(y=c|\mathcal{D}) \prod_{j=1}^{n_{dims}} \ P(x_j|y=c, \mathcal{D})\]

We previously used the data \(\mathcal{D}\) to estimate the distribution of the parameters \(\vec{\mu}\), \(\vec{\pi}\) and \(\vec{\sigma}\). To compute the above probability, we need to integrate over the values of these parameters:

\[P(y=c|\vec{x}, \mathcal{D}) = \left[\int Cat(y=c|\vec{\pi})P(\vec{\pi}|\ \mathcal{D})\mathrm{d}\vec{\pi}\right] \int P(\vec{x}|\vec{\mu}, \vec{\sigma})\ P(\vec{\mu}|\mathcal{D})\ P(\vec{\sigma}|\mathcal{D})\ \mathrm{d}\vec{\mu}\mathrm{d}\vec{\sigma}\]

Parameters:	X (numpy array) – shape [num_training_samples, num_pred]. Contains the points for which we want to predict the class
Returns:	A numpy array of shape [num_training_samples, num_cats] that contains the probabilities that each sample belong to each category.

References

[1]	Murphy, K. P. (2012). Machine learning: a probabilistic perspective.

save(file_prefix)[source]¶

Saves the trace and custom params to files with the given file_prefix.

Parameters:	file_prefix (str) – path and prefix used to identify where to save the trace for this model, e.g. given file_prefix = ‘path/to/file/’ This will attempt to save to ‘path/to/file/trace.pickle’. custom_params (dict (defaults to None)) – Custom parameters to save

score(X, y)[source]¶

Scores new data with a trained model with sklearn’s accuracy_score.

Parameters:	X (numpy array) – shape [num_training_samples, num_pred]. Contains the data points y (numpy array) – shape [num_training_samples,]. Contains the category of the data points
Returns:
Return type:	A float representing the accuracy score of the predictions.

Module contents¶

class pymc3_models.models.BayesianModel[source]¶

Bases: sklearn.base.BaseEstimator

Bayesian model base class

Methods

`get_params`([deep])	Get parameters for this estimator.
`load`(file_prefix[, load_custom_params])	Loads a saved version of the trace, and custom param files with the given file_prefix.
`plot_elbo`()	Plot the ELBO values after running ADVI minibatch.
`save`(file_prefix[, custom_params])	Saves the trace and custom params to files with the given file_prefix.
`set_params`(**params)	Set the parameters of this estimator.

create_model
fit
predict
score

create_model()[source]¶

fit()[source]¶

load(file_prefix, load_custom_params=False)[source]¶

Loads a saved version of the trace, and custom param files with the given file_prefix.

Parameters:	file_prefix (str) – path and prefix used to identify where to load the saved trace for this model, e.g. given file_prefix = ‘path/to/file/’ This will attempt to load ‘path/to/file/trace.pickle’. load_custom_params (bool (defaults to False)) – flag to indicate whether custom parameters should be loaded
Returns:	custom_params
Return type:	Dictionary of custom parameters

plot_elbo()[source]¶: Plot the ELBO values after running ADVI minibatch.

predict()[source]¶

save(file_prefix, custom_params=None)[source]¶

Saves the trace and custom params to files with the given file_prefix.

Parameters:	file_prefix (str) – path and prefix used to identify where to save the trace for this model, e.g. given file_prefix = ‘path/to/file/’ This will attempt to save to ‘path/to/file/trace.pickle’. custom_params (dict (defaults to None)) – Custom parameters to save

score()[source]¶