Documentation

bp.bayes

class BayesPowerlaw.bayes(data, gamma_range=[1.01, 7.99], xmin=None, xmax=None, discrete=True, niters=10000, sigma=[0.05, 0.05], sigma_burn=[1.0, 1.0], burn_in=1000, prior='jeffrey', mixed=1, fit=True)[source]

This function fits the data to powerlaw distribution and outputs the exponent using Bayesian inference (markov chain monte carlo metropolis-hastings algorithm). If the data consists of mixture more than 1 powerlaws, if specified the function will identify the mixture of exponents as well as the weights each exponent carries.

Parameters:
  • data ((list or np.array of numbers)) – An array of data from the powerlaw that is being fitted here y=x^(-gamma)/Z. All values must be integers or floats from 1 to infinity.
  • gamma_range (([float>1,float<10])) – The first value in the list indicates the lowest possible exponent, which will be used to start the algorithm. The second value is the largest possible exponent. The algorithm will reject any exponent values higher than that. Default is [1.01,6.0], since exponent of 1 and lower is mathematically invalid, and exponents above 6 are rare.
  • xmin ((int or float >=1)) – The lowest value from the data included in the powerlaw fit. Default value is “None”, in which case the minimum value observed in the data is used.
  • xmax ((int or float >= xmin)) – The highest value from the data included in the powerlaw fit. Default value is “None”, in which case the maximum value observed in the data is used. To set xmax to infinity use xmax=np.infty.
  • discrete ((bool)) – Whether or not the powerlaw is discrete or continuous. Default True, in which case powerlaw is assumed to be discrete. The distinction is important when estimating partition function Z.
  • niters ((int)) – Number of MCMC iterations performed. The default is 10000.
  • sigma (([float>0,float>0])) – Standard deviation of the sampling step size during MCMC for both gamma (first value) and weight (second value). Samples are taken from a gaussian distribution with a mean of zero. Default value is 0.05 for both, which is seen to perform with good efficiency.
  • sigma_burn (([float>0,float>0])) – Standard deviation of the sampling step size during MCMC burn in for both gamma (first value) and weight (second value). Samples are taken from a gaussian distribution with a mean of zero. Default value is 1.0 for both, which quickly travels to a correct value range.
  • burn_in (int) – Number of MCMC iterations performs during burn in. The results from burn in are discarded from a final posterior distribution. The default is 1000.
  • prior ((string)) – Prior type for performing Bayesian inference. Possible values: ‘jeffrey’ : derived Jeffrey’s prior for fitting powerlaw distributions (default). ‘flat’ : flat prior within the specified gamma range.
  • mixed ((int>0)) – Number of distinct powerlaws the dataset is thought to consist of. Default is 1.
  • fit ((bool)) – Whether or not to perform fitting while creating the object. Not necessary if only desired to plot a power law with now fit. Default True.
Variables:
  • n – Sample size of the data. (int>5)
  • params – Number of parameters fitted (mixed*2-1), that is number of exponents (mixed) + number of weights (mixed-1), where the last weight is 1-[weights]. (int)
  • niters – Number of iterations in the main MCMC algorithm. Scales with number of parameters (niters x params). (int)
  • burn – Number of iterations in the burn in MCMC algorithm. Scales with number of parameters (burn_in x params). (int)
  • gammas – An array of 10000 gamma values within the given gamma_range for plotting prior. (1D np.array)
  • weight – An array of 100 weight values from 0 to 1 for plotting prior. (1D np.array)
  • sigma_g – Standard deviation of the sampling step size during MCMC for gamma. (float>0)
  • sigma_w – Standard deviation of the sampling step size during MCMC for weight. (float>0)
  • sigma_burn_g – Standard deviation of the sampling step size during MCMC burn in for gamma. (float>0)
  • sigma_burn_w – Standard deviation of the sampling step size during MCMC burn in for weight. (float>0)
  • prior_gamma – An array of probabilities of each value in gammas for plotting prior. (1D np.array)
  • prior_weight – An array of probabilities of each value in weights for plotting prior. (1D np.array)
  • gamma_posterior – A 2D array of accepted exponents in each iteration after burn in. Rows - each powerlaw in the mixture. Columns - accepted exponent each iteration. (2D np.array)
  • weight_posterior – A 2D array of accepted weights in each iteration after burn in. Rows - each powerlaw in the mixture. Columns - accepted weight each iteration. (2D np.array)
L(gamma_params, weight_params)[source]

This function calculates the log likelihood given target exponent and weight values.

Parameters:
  • gamma_params ((1D np.array)) – Array of randomly sampled target exponents for each powerlaw in the data.
  • weight_params ((1D np.array)) – Array of randomly sampled target weights for each powerlaw in the data.
Returns:

Return type:

Calculated log likelihood value.

Z(gamma)[source]

Partition function Z for discrete and continuous powerlaw distributions.

Parameters:gamma ((float)) – Randomly sampled target exponent.
Returns:Partition value.
Return type:s
Z_jeffrey(gamma)[source]

This function calculates Jeffrey’s prior for a given exponent.

Parameters:gamma ((float)) – Randomly sampled target exponent.
Returns:
Return type:Calculated Jeffrey’s prior.
Z_prime(gamma)[source]

This function calculates first differential of partition function Z.

Parameters:gamma ((float)) – Randomly sampled target exponent.
Returns:
Return type:First differential of the Z function.
Z_prime2(gamma)[source]

This function calculates second differential of partition function Z.

Parameters:gamma ((float)) – Randomly sampled target exponent.
Returns:
Return type:Second differential of the Z function.
bic()[source]

This function calculates the Bayesian Information Criteria for determining the number of parameters most optimal for fitting the dataset.

Parameters:
  • samples_gamma ((2D np.array)) – accepted exponents in each iteration after burn in. Rows - each powerlaw in the mixture. Columns - accepted exponent each iteration.
  • samples_weight ((2D np.array)) – accepted weights in each iteration after burn in. Rows - each powerlaw in the mixture. Columns - accepted weight each iteration.
Returns:

Bayesian Information Criteria (BIC) value.

Return type:

b

burn_in(gamma_params, weight_params)[source]

This function preforms the burn in part of the MCMC algorithm that will get the initial values to the roughly correct range for fitting. The parameters accepted during burn in are not included in the final samples array.

Parameters:
  • gamma_params ((1D np.array)) – Array of randomly sampled target exponents for each powerlaw in the data.
  • weight_params ((1D np.array)) – Array of randomly sampled target weights for each powerlaw in the data.
Returns:

  • gamma_params – 1D array of accepted exponent values.
  • weight_params – 1D array of accepted weight values.

log_prior(gamma)[source]

This function calculates prior given target exponent and prior model.

Parameters:gamma ((float)) – Randomly sampled target exponent.
Returns:calculated log of prior.
Return type:prior_answer
monte_carlo(gamma_params, weight_params)[source]

This function preforms the full MCMC algorithm. The parameters accepted in this part of MCMC will be saved in the final samples array.

Parameters:
  • gamma_params ((1D np.array)) – Array of randomly sampled target exponents for each powerlaw in the data.
  • weight_params ((1D np.array)) – Array of randomly sampled target weights for each powerlaw in the data.
Returns:

  • gamma_params – 1D array of accepted exponent values.
  • weight_params – 1D array of accepted weight values.

plot_fit(gamma_mean, data_label=None, data_color=None, edge_color=None, fit_color=None, scatter_size=10, line_width=1, fit=True, log=True, xmin=None)[source]

Function for plotting the date as a power law distribution on a log log scale along with the best fit.

Parameters:
  • gamma_mean ((float)) – Final exponent used to generate the best fit curve. For best results use the mean of posterior samples.
  • data_label ((str)) – curve label.
  • data color ((str)) – color of the data scatter plot.
  • edge_color ((str)) – color of the scatter marker edge.
  • fit_color ((str)) – color of the best fit curve.
  • scatter_size ((int or float)) – scatter marker size.
  • line_width ((int or float)) – width of the best fit curve.
  • fit ((bool)) – Whether to plot the best fit curve or not (default True).
  • log ((bool)) – Whether to plot in the log scale or not (default True).
Returns:

Return type:

None.

plot_posterior(samples, bins=100, alpha=None, color=None, label=None, range=None, normed=True)[source]

Function for plotting posterior histogram.

Parameters:
  • samples ((1D array)) – an array of accepted exponents during the MCMC algorithm.
  • bins ((int)) – number of bins used for the histogram (default: 100).
  • alpha ((0<=float<=1)) – transparency of the histogram.
  • color ((str)) – color of the histogram.
  • label ((str)) – label of the histogram.
  • range ((tuple)) – range of histogram’s X axis.
  • normed ((bool)) – whether the histogram is normalized or not (default: True)
Returns:

Return type:

None.

plot_prior(color=None, label=None)[source]

Function for plotting prior for gammas.

Parameters:
  • color ((str)) – color of the curve
  • label ((str)) – label of the curve
Returns:

Return type:

None.

posterior()[source]

A master function that executes burn in and monte carlo algorithms while storing the accepted parameter values into the final samples array.

Parameters:None.
Returns:
  • samples_gamma – A 2D array of accepted exponents in each iteration after burn in. Rows - each powerlaw in the mixture. Columns - accepted exponent each iteration.
  • samples_weight – A 2D array of accepted weights in each iteration after burn in. Rows - each powerlaw in the mixture. Columns - accepted weight each iteration.
powerlawpdf(final_gamma, xmin=None)[source]

The power law probability function for generating the best fit curve.

Parameters:final_gamma ((float)) – Final exponent used to generate the best fit curve. For best results use the mean of posterior samples.
Returns:
  • xp ((1D array)) – array of X’s arranged from xmin to xmax of the data.
  • yp ((1D array)) – array of probabilities for each X given the final exponent.
random_walk(gamma_params, weight_params, sigma_g, sigma_w)[source]

This function samples new target exponent and weight values using sample_new function and calculates acceptance values to compare the initial parameter values to the newly sampled ones.

Parameters:
  • gamma_params ((1D np.array)) – Array of randomly sampled target exponents for each powerlaw in the data.
  • weight_params ((1D np.array)) – Array of randomly sampled target weights for each powerlaw in the data.
  • sigma_g ((float>0)) – Standard deviation of gamma step size during random sampling.
  • sigma_w ((float>0)) – Standard deviation of weight step size during random sampling.
Returns:

  • a – acceptance value to compare which parameter values are more likely to result in a more accurate fit.
  • gamma_params_p – 1D array of randomly sampled exponents.
  • weight_params_p – 1D array of randomly sampled weights.

sample_new(gamma_params, weight_params, sigma_g, sigma_w)[source]

This function performs random sampling of gammas and weights given the initial gamma and weight values.

Parameters:
  • gamma_params ((1D np.array)) – Array of randomly sampled target exponents for each powerlaw in the data.
  • weight_params ((1D np.array)) – Array of randomly sampled target weights for each powerlaw in the data.
  • sigma_g ((float>0)) – Standard deviation of the sampling step size during MCMC for gamma.
  • sigma_w ((float>0)) – Standard deviation of the sampling step size during MCMC for weights.
Returns:

  • gamma_params_p – a 1D array of newly sampled exponent values.
  • weight_params_p – a 1D array of newly sampled weight values.

target(gamma_params, weight_params)[source]

This function calculates target values for comparing existing exponents and weight to newly samples exponents and weights. Target for given parameters is equal to log of (likelihood x prior).

Parameters:
  • gamma_params ((1D np.array)) – Array of randomly sampled target exponents for each powerlaw in the data.
  • weight_params ((1D np.array)) – Array of randomly sampled target weights for each powerlaw in the data.
Returns:

calculated target value.

Return type:

p

weight_prior(weight)[source]

This function calculates prior given target weight and flat prior.

Parameters:gamma ((float)) – Randomly sampled target weight.
Returns:calculated log of prior.
Return type:prior_answer

bp.maxLikelihood

class BayesPowerlaw.maxLikelihood(data, initial_guess=[1, 6, 10], xmin=None, xmax=None, discrete=True)[source]

This function fits the data to powerlaw distribution and outputs the exponent using the maximum likelihood and Newton-Raphson approach.

Parameters:
  • data ((list or np.array of numbers)) – An array of data from the powerlaw that is being fitted here y=x^(-gamma)/Z. All values must be integers or floats from 1 to infinity.
  • initial_guess ((list [lower range>=1, upper range, number of initial guesses])) – a list for generating a 1D array containing a variation of initial guesses for Newton-Raphson algorithm.
  • min ((int or float >=1)) – The lowest value from the data included in the powerlaw fit. Default value is “None”, in which case the minimum value observed in the data is used.
  • xmax ((int or float >= xmin)) – The highest value from the data included in the powerlaw fit. Default value is “None”, in which case the maximum value observed in the data is used. To set xmax to infinity use xmax=np.infty.
  • discrete ((bool)) – Whether or not the powerlaw is discrete or continuous. Default True, in which case powerlaw is assumed to be discrete. The distinction is important when estimating partition function Z.
Variables:
  • n – Sample size of the data. (int>5)
  • constant – a constant calculated to be added to 1st order Z differential prior to preforming Newton-Raphson algorithm. (float)
  • initial_guess – an array of initial guesses used in Newton-Raphson algorithm. (1D array)
F(gamma)[source]

The optimization function.

Parameters:gamma ((float)) – exponent guess.
Returns:
Return type:First order Z differential plus constant attribute.
Guess()[source]

Master function that performs Newton-Raphson algorithm and determins best exponent guess via maximum likelihood.

Parameters:None.
Returns:best_guess – an array containing best exponent guesses given the maximum likelihood. Length of array will be more than 1 if same likelihood value is associated with more than one exponent guess.
Return type:(1D array)
Z(gamma)[source]

Partition function Z for discrete and continuous powerlaw distributions.

Parameters:gamma ((float)) – exponent guess.
Returns:Partition value.
Return type:s

bp.power_law

BayesPowerlaw.power_law(exponents, weights, xmax, sample_size, xmin=1, discrete=True)[source]

This function simulates a dataset that follows a powerlaw distribution with a given exponent and xmax.

Parameters:
  • exponents ((array of floats>1)) – array of exponents of the mixed powerlaw distribution equation. Array length is equal to the simulated mixture size. e.g. for single power law, array will contain only one value, mixture of 2 - two values, etc.
  • weights ((array of 0<floats<=1)) – array of weights of the mixed powerlaw distribution equation. Array length is equal to the simulated mixture size. e.g. for single power law, array will contain only one value that will be equal to 1, mixture of 2 - two values, etc. The sum of array must always be equal to 1.
  • xmax ((int or float > xmin)) – the maximum possible x value in the simulated dataset.
  • sample_size ((int>5)) – samples size of the simulated dataset.
  • xmin ((int or float >=1)) – the minimum possible x value in the simulated dataset (default: 1).
  • discrete ((bool)) – whether the simulated powerlaw contains discrete (True) or continuous (False) values (default: True)
Returns:

Return type:

1D array of X’es that has a size of sample_size parameter.