Classes
struct	npl::RegrResult

class	npl::StudentsT
	Student's T-distribution. A cache of the Probability Density Function and cumulative density function is created using the analytical PDF. More...

class	npl::Classifier
	Base class for all ND classifiers. More...

class	npl::KMeans
	K-means classifier. More...

class	npl::ExpMax
	Expectation Maximization With Gaussian Mixture Model. More...

Functions
double	npl::sample_var (const Ref< const VectorXd > vec)
	Computes the statistical variance of a column vector. More...

void	npl::regressOutLS (VectorXd &signal, const MatrixXd &X, const MatrixXd &covInv)
	Removes the effects of X from signal (y). Note that this takes both X and the pseudoinverse of X because the bulk of the computation will be on the pseudoinverse. More...

template<typename T >
void	npl::fillGaussian (Ref< T > m)

double	npl::gaussianPDF (double mean, double sd, double x)
	1D Gaussian distribution function More...

double	npl::gaussianCDF (double mean, double sd, double x)
	1D Gaussian cumulative distribution function More...

double	npl::gammaPDF_MS (double mean, double sd, double x)
	PDF for the gamma distribution, if mean is negative then it is assumed that x should be negated as well. Unlike gammaPDF, this takes the mean and standard deviation (_MS) More...

double	npl::mutualInformation (size_t len, double a, double b, size_t mbin)
	Computes mutual information between signal a and signal b which are of length len. Marginal bins used is mbin. More...

double	npl::correlation (size_t len, double a, double b)
	Computes correlation between signal a and signal b which are of length len. More...

double	npl::sample_var (int count, double sum, double sumsqr)
	Takes a count, sum and sumsqr and returns the sample variance. This is slightly different than the variance definition and I can never remember the exact formulation. More...

double	npl::sample_corr (int count, double sum1, double sum2, double sumsq1, double sumsq2, double s1s2)
	Computes the sample correlation. More...

void	npl::regress (RegrResult *out, const Ref< const VectorXd > y, const Ref< const MatrixXd > X, const Ref< const VectorXd > covInv, const Ref< const MatrixXd > Xinv, const StudentsT &distrib)
	Computes the Ordinary Least Square predictors, beta for. More...

void	npl::regress (RegrResult *out, const Ref< const VectorXd > y, const Ref< const MatrixXd > X)
	Computes the Ordinary Least Square predictors, beta for. More...

void	npl::randomizePowerIterationSVD (const Ref< const MatrixXd > A, size_t subsize, size_t poweriters, MatrixXd &U, VectorXd &E, MatrixXd &V)

void	npl::randomizePowerIterationSVD (const Ref< const MatrixXd > A, double tol, size_t startrank, size_t maxrank, size_t poweriters, MatrixXd &U, VectorXd &E, MatrixXd &V)

MatrixXd	npl::pca (const Ref< const MatrixXd > X, double varth=1, int odim=-1)
	Computes the Principal Components of input matrix X. More...

MatrixXd	npl::rpiPCA (const Ref< const MatrixXd > X, double varth, int odim)
	Computes the Principal Components of input matrix X using the randomized power iteration SVD algorithm. More...

MatrixXd	npl::symICA (const Ref< const MatrixXd > Xin, MatrixXd *unmix=NULL)
	Computes the Independent Components of input matrix X using symmetric decorrlation. Note that you should run PCA on X before running ICA. More...

MatrixXd	npl::asymICA (const Ref< const MatrixXd > Xin, MatrixXd *unmix=NULL)
	Computes the Independent Components of input matrix X using sequential component extraction. Note that you should run PCA on X before running ICA. More...

MatrixXd	npl::pseudoInverse (const Ref< const MatrixXd > X)
	Computes the pseudoinverse of the input matrix. More...

VectorXd	npl::shootingRegr (const Ref< const MatrixXd > X, const Ref< const VectorXd > y, double gamma)
	Performs LASSO regression using the 'shooting' algorithm of. More...

VectorXd	npl::activeShootingRegr (const Ref< const MatrixXd > X, const Ref< const VectorXd > y, double gamma)
	Performs LASSO regression using the 'activeShooting' algorithm of. More...

MatrixXd	npl::pcacov (const Ref< const MatrixXd > cov, double varth)
	Computes the Principal Components of input matrix X using the covariance matrix. More...

void	npl::expMax1D (const Ref< const VectorXd > data, vector< std::function< double(double, double, double)>> pdfs, Ref< VectorXd > mean, Ref< VectorXd > sd, Ref< VectorXd > prior, std::string plotfile="")
	Computes the mean and standard deviation of multiple distributions based on 1D data. The probability distribution functions should be passed in through a vector of function objects (pdfs) taking mu/sd/x. More...

void	npl::gaussGammaMixtureModel (const Ref< const VectorXd > data, Ref< VectorXd > mu, Ref< VectorXd > sd, Ref< VectorXd > prior, std::string plotfile)
	Computes the mean and standard deviation of multiple distributions based on 1D data. This is a special version in which a negative gamma, a gaussian and positive gamma. The means of the negative and positive gamma are relative to the center of the gaussian. More...

void	npl::approxKMeans (const Ref< const MatrixXd > samples, size_t nclass, MatrixXd &means)
	Approximates k-means using the algorithm of: More...

int	npl::fastSearchFindDP (const Eigen::MatrixXf &samples, double thresh, double outthresh, Eigen::VectorXi &classes, bool brute=false)
	Algorithm of unsupervised learning (clustering) based on density. More...

int	npl::findDensityPeaks (const Eigen::MatrixXf &samples, double thresh, Eigen::VectorXf &rho, Eigen::VectorXf &delta, Eigen::VectorXi &parent)
	Computes Density and Peak computation for Fast Search and Find of Density Peaks algorithm. More...

int	npl::findDensityPeaks_brute (const Eigen::MatrixXf &samples, double thresh, Eigen::VectorXf &rho, Eigen::VectorXf &delta, Eigen::VectorXi &parent)
	Computes Density and Peak computation for Fast Search and Find of Density Peaks algorithm. This is a slower, non-bin based version. More...

Detailed Description

Function Documentation

VectorXd npl::activeShootingRegr	(	const Ref< const MatrixXd >	X,
		const Ref< const VectorXd >	y,
		double	gamma
	)

Performs LASSO regression using the 'activeShooting' algorithm of.

Peng, J., Wang, P., Zhou, N., & Zhu, J. (2009). Partial Correlation Estimation by Joint Sparse Regression Models. Journal of the American Statistical Association, 104(486), 735–746. doi:10.1198/jasa.2009.0126

Essentially solves the equation:

y = X * beta

where beta is mostly 0's

Parameters

X	Design matrix
y	Measured value
gamma	Weight of regularization (larger values forces sparser model)

Returns: Beta vector

void npl::approxKMeans	(	const Ref< const MatrixXd >	samples,
		size_t	nclass,
		MatrixXd &	means
	)

Approximates k-means using the algorithm of:

'Fast Approximate k-Means via Cluster Closures' by Wang et al

This is not really meant to be used by outside users, but as an initializer

Parameters

samples	Matrix of samples, one sample per row,
nclass	Number of classes to break samples up into
means	Estimated mid-points/means

MatrixXd npl::asymICA	(	const Ref< const MatrixXd >	Xin,
		MatrixXd *	unmix = `NULL`
	)

Computes the Independent Components of input matrix X using sequential component extraction. Note that you should run PCA on X before running ICA.

Outputs reduced dimension (fewer cols) in output

Note that this whole problem is the transpose of the version listed on wikipedia.

In: I number of components In: X RxC matrix with rows representing C-D samples for p in 1 to I: wp = random weight while wp changes: wp = g(X wp)X/R - wp SUM(g'(X wp))/R wp = wp - SUM wp^T wj wj normalize(wp)

Output: W = [w0 w1 w2 ... ] Output: S = XW, where each column is a dimension, each row a sample

Parameters

Xin	RxC matrix where each row is a sample, each column a dimension (or feature). The number of columns in the output will be fewer because there will be fewer features. Columns should be zero-mean and uncorrelated with one another.
unmix	output unmixing matrix, can be null if you don't need it

Returns: RxP matrix, where P is the number of independent components

double npl::correlation	(	size_t	len,
		double *	a,
		double *	b
	)

Computes correlation between signal a and signal b which are of length len.

Parameters

len	Length of signal and and b
a	Signal a
b	Signal b

Returns

void npl::expMax1D	(	const Ref< const VectorXd >	data,
		vector< std::function< double(double, double, double)>>	pdfs,
		Ref< VectorXd >	mean,
		Ref< VectorXd >	sd,
		Ref< VectorXd >	prior,
		std::string	plotfile = `""`
	)

Computes the mean and standard deviation of multiple distributions based on 1D data. The probability distribution functions should be passed in through a vector of function objects (pdfs) taking mu/sd/x.

Parameters

data	Data points to fit
pdfs	Probability distribution functions of the form pdf(mu, sd, x) and returning the probability density at x
mean	Output mean of each distribution, value when called will be used for initialization so it should be pre-set and pre-allocated
sd	Output standard deviation of each distribution, value when called will be used for initialization so it should be pre-set and pre-allocated
prior	Output prior probability of each distribution, initial value is not used but it should be pre-allocated
plotfile	file to plot histogram in (svg)

int npl::fastSearchFindDP	(	const Eigen::MatrixXf &	samples,
		double	thresh,
		double	outthresh,
		Eigen::VectorXi &	classes,
		bool	brute = `false`
	)

Algorithm of unsupervised learning (clustering) based on density.

see "Clustering by fast search and find of density peaks" by Rodriguez, a. Laio, A.

Parameters

samples	Samples, S x D matrix with S is the number of samples and D is the dimensionality. This must match the internal dimension count.
thresh	Threshold distance for density calculation
outthresh	threshold for outlier, ratio of standard devation. Should be > 2 because you want to be well outside the center of the distribution.
classes	Output classes
brute	whether ther use slower brute force method for density calculation (for testing purposes only)

return -1 if maximum number of iterations hit, 0 otherwise (converged)

template<typename T >

void npl::fillGaussian ( Ref< T > m )

Definition at line 60 of file statistics.h.

int npl::findDensityPeaks	(	const Eigen::MatrixXf &	samples,
		double	thresh,
		Eigen::VectorXf &	rho,
		Eigen::VectorXf &	delta,
		Eigen::VectorXi &	parent
	)

Computes Density and Peak computation for Fast Search and Find of Density Peaks algorithm.

Parameters

samples	Samples, S x D matrix with S is the number of samples and D is the dimensionality. This must match the internal dimension count.
thresh	Threshold for density calculation
rho	Point densities
delta	Distance to nearest peak
parent	Index (point) that is the nearest peak

Returns: 0 if successful

int npl::findDensityPeaks_brute	(	const Eigen::MatrixXf &	samples,
		double	thresh,
		Eigen::VectorXf &	rho,
		Eigen::VectorXf &	delta,
		Eigen::VectorXi &	parent
	)

Computes Density and Peak computation for Fast Search and Find of Density Peaks algorithm. This is a slower, non-bin based version.

Parameters

samples	Samples, S x D matrix with S is the number of samples and D is the dimensionality. This must match the internal dimension count.
thresh	Threshold for density calculation
rho	Point densities
delta	Distance to nearest peak
parent	Index (point) that is the nearest peak

Returns: 0 if successful

double npl::gammaPDF_MS	(	double	mean,
		double	sd,
		double	x
	)

PDF for the gamma distribution, if mean is negative then it is assumed that x should be negated as well. Unlike gammaPDF, this takes the mean and standard deviation (_MS)

mean = k theta var = k theta theta

theta = var / mean k = mean / theta

prob(mu, sd, x) = x^{k-1}exp(-x/theta)/(gamma(k) theta^k) log prob(mu, sd, x) = (k-1)log(x)-x/theta-log(gamma(k)) - k*log(theta)

Parameters

mean	Mean to compute density of
sd	Standard deviation of distribution
x	X position to compute density of (<0 produces 0 probability unless mu is negative

Returns: Probability Density

void npl::gaussGammaMixtureModel	(	const Ref< const VectorXd >	data,
		Ref< VectorXd >	mu,
		Ref< VectorXd >	sd,
		Ref< VectorXd >	prior,
		std::string	plotfile
	)

Computes the mean and standard deviation of multiple distributions based on 1D data. This is a special version in which a negative gamma, a gaussian and positive gamma. The means of the negative and positive gamma are relative to the center of the gaussian.

Parameters

data	Data points to fit
mu	Output mean of each distribution, value when called will be used for initialization so it should be pre-set and pre-allocated
sd	Output standard deviation of each distribution, value when called will be used for initialization so it should be pre-set and pre-allocated
prior	Output prior probability of each distribution, initial value is not used but it should be pre-allocated
plotfile	file to plot histogram in (svg)

double npl::gaussianCDF	(	double	mean,
		double	sd,
		double	x
	)

1D Gaussian cumulative distribution function

Parameters

mean	Mean of gaussian distribution
sd	Standard deviation
x	Position to get probability of

Returns: Probability of value falling below x

double npl::gaussianPDF	(	double	mean,
		double	sd,
		double	x
	)

1D Gaussian distribution function

Parameters

mean	Mean of gaussian distribution
sd	Standard deviation
x	Position to get probability of

Returns: Probability Density at Point

double npl::mutualInformation	(	size_t	len,
		double *	a,
		double *	b,
		size_t	mbin
	)

Computes mutual information between signal a and signal b which are of length len. Marginal bins used is mbin.

Parameters

len	Length of signal and and b
a	Signal a
b	Signal b
mbin	Bins to use in marginal distribution (mbin*mbin) used in joint

Returns

MatrixXd npl::pca	(	const Ref< const MatrixXd >	X,
		double	varth = `1`,
		int	odim = `-1`
	)

Computes the Principal Components of input matrix X.

Outputs reduced dimension (fewer cols) in output. Note that prior to this, the columns of X should be 0 mean otherwise the first component will be the mean

Parameters

X	RxC matrix where each column row is a sample, each column a dimension (or feature). The number of columns in the output will be fewer because there will be fewer features
varth	Variance threshold. This is the ratio (0-1) of variance to include in the output. This is used to determine the dimensionality of the output. If this is 1 then all variance will be included. If this < 1 and odim > 0 then whichever gives a larger output dimension will be selected.
odim	Threshold for output dimensions. If this is <= 0 then it is ignored, if it is > 0 then max(dim(varth), odim) is used as the output dimension.

Returns: RxP matrix, where P is the number of principal components

MatrixXd npl::pcacov	(	const Ref< const MatrixXd >	cov,
		double	varth
	)

Computes the Principal Components of input matrix X using the covariance matrix.

Outputs projections in the columns of a matrix.

Parameters

cov	Covariance matrix of XtX
varth	Variance threshold. Don't include dimensions after this percent of the variance has been explained.

Returns: RxP matrix, where P is the number of principal components

MatrixXd npl::pseudoInverse ( const Ref< const MatrixXd > X )

Computes the pseudoinverse of the input matrix.

$ P = UE^-1V^* $

Returns: Psueodinverse

void npl::randomizePowerIterationSVD	(	const Ref< const MatrixXd >	A,
		size_t	subsize,
		size_t	poweriters,
		MatrixXd &	U,
		VectorXd &	E,
		MatrixXd &	V
	)

Parameters

A	M x N
subsize	Columns in projection matrix,
poweriters	Number of power iterations to perform
U	Output Left Singular Vectors
E	OUtput Singular Values
V	Output Right Singular Vectors

void npl::randomizePowerIterationSVD	(	const Ref< const MatrixXd >	A,
		double	tol,
		size_t	startrank,
		size_t	maxrank,
		size_t	poweriters,
		MatrixXd &	U,
		VectorXd &	E,
		MatrixXd &	V
	)

void npl::regress	(	RegrResult *	out,
		const Ref< const VectorXd >	y,
		const Ref< const MatrixXd >	X,
		const Ref< const VectorXd >	covInv,
		const Ref< const MatrixXd >	Xinv,
		const StudentsT &	distrib
	)

Computes the Ordinary Least Square predictors, beta for.

$y = \hat \beta X$

Returning beta. This is the same as the other regress function, but allows for cacheing of pseudoinverse of X

Parameters

out	Struct with Regression Results.
y	response variables
X	independent variables
covInv	Inverse of covariance matrix, to compute us pseudoinverse(X^TX)
Xinv	Pseudo inverse of X. Compute with pseudoInverse(X)
distrib	Pre-computed students' T distribution. Example: auto v = students_t_cdf(X.rows()-1, .1, 1000);

Returns: Struct with Regression Results.

void npl::regress	(	RegrResult *	out,
		const Ref< const VectorXd >	y,
		const Ref< const MatrixXd >	X
	)

Computes the Ordinary Least Square predictors, beta for.

$y = \hat \beta X$

Returning beta. This is the same as the other regress function, but allows for cacheing of pseudoinverse of X

Parameters

out	Struct with Regression Results.
y	response variables
X	independent variables

void npl::regressOutLS	(	VectorXd &	signal,
		const MatrixXd &	X,
		const MatrixXd &	covInv
	)

inline

Removes the effects of X from signal (y). Note that this takes both X and the pseudoinverse of X because the bulk of the computation will be on the pseudoinverse.

Each column of X should be a regressor, Xinv should be the pseudoinverse of X

Beta in OLS may be computed with the pseudoinverse (P): B = Py where P is the pseudoinverse: P = VE^{-1}U^*

Parameters

signal	response term (y), will be modified to remove the effects of X
X	Design matrix, or independent values in colums
covInv	the pseudoinverse of XtX

Definition at line 53 of file statistics.h.

MatrixXd npl::rpiPCA	(	const Ref< const MatrixXd >	X,
		double	varth,
		int	odim
	)

Computes the Principal Components of input matrix X using the randomized power iteration SVD algorithm.

Outputs reduced dimension (fewer cols) in output. Note that prior to this, the columns of X should be 0 mean otherwise the first component will be the mean

Parameters

X	RxC matrix where each column row is a sample, each column a dimension (or feature). The number of columns in the output will be fewer because there will be fewer features
varth	Variance threshold. This is the ratio (0-1) of variance to include in the output. This is used to determine the dimensionality of the output. If this is 1 then all variance will be included. If this < 1 and odim > 0 then whichever gives a larger output dimension will be selected.
odim	Threshold for output dimensions. If this is <= 0 then it is ignored, if it is > 0 then max(dim(varth), odim) is used as the output dimension.

Returns: RxP matrix, where P is the number of principal components

double npl::sample_corr	(	int	count,
		double	sum1,
		double	sum2,
		double	sumsq1,
		double	sumsq2,
		double	s1s2
	)

inline

Computes the sample correlation.

Parameters

count	Number of samples
sum1	Sum of group 1
sum2	Sum of group 2
sumsq1	Sum of the sqaure of the elements of group1
sumsq2	Sum of the sqaure of the elements of group2
s1s2	Elements of group1*elements of group 2

Returns: Sample Correlation

Definition at line 179 of file statistics.h.

double npl::sample_var ( const Ref< const VectorXd > vec )

Computes the statistical variance of a column vector.

Parameters

vec	Input samples

Returns: Variance

double npl::sample_var	(	int	count,
		double	sum,
		double	sumsqr
	)

inline

Takes a count, sum and sumsqr and returns the sample variance. This is slightly different than the variance definition and I can never remember the exact formulation.

Parameters

count	Number of samples
sum	sum of samples
sumsqr	sum of square of the samples

Returns: sample variance

Definition at line 161 of file statistics.h.

VectorXd npl::shootingRegr	(	const Ref< const MatrixXd >	X,
		const Ref< const VectorXd >	y,
		double	gamma
	)

Performs LASSO regression using the 'shooting' algorithm of.

Fu, W. J. (1998). Penalized Regressions: The Bridge versus the Lasso. Journal of Computational and Graphical Statistics, 7(3), 397. doi:10.2307/1390712

Essentially solves the equation:

y = X * beta

where beta is mostly 0's

Parameters

X	Design matrix
y	Measured value
gamma	Weight of regularization (larger values forces sparser model)

Returns: Beta vector

MatrixXd npl::symICA	(	const Ref< const MatrixXd >	Xin,
		MatrixXd *	unmix = `NULL`
	)

Computes the Independent Components of input matrix X using symmetric decorrlation. Note that you should run PCA on X before running ICA.

Same number of cols as output

Note that this whole problem is the transpose of the version listed on wikipedia.

In: I number of components In: X RxC matrix with rows representing C-D samples for p in 1 to I: wp = random weight while wp changes: wp = g(X wp)X/R - wp SUM(g'(X wp))/R wp = wp - SUM wp^T wj wj normalize(wp)

Output: W = [w0 w1 w2 ... ] Output: S = XW, where each column is a dimension, each row a sample

Parameters

Xin	RxC matrix where each row is a sample, each column a dimension (or feature). The number of columns in the output will be fewer because there will be fewer features. Columns should be zero-mean and uncorrelated with one another.
unmix	output unmixing matrix, can be null if you don't need it

Returns: RxP matrix, where P is the number of independent components

Classes

Functions

Detailed Description

Function Documentation