Statistics Toolbox Release Notes

Chapter 1
Statistics Toolbox 4.0 Release Notes

New Features

This section summarizes the new features and enhancements introduced in the Statistics Toolbox 4.0.

If you are upgrading from a release earlier than Release 12.0, then you should also see New Features in the Statistics Toolbox 3.0 Release Notes.

Multivariate Analysis

Cluster Analysis

The new kmeans function performs K-means clustering and supports five different distance measures. The new function silhouette plots silhouettes of clusters created using either K-means or hierarchical clustering methods. The pdist function now allows several new distance measures and is more efficient for large datasets.

Factor Analysis

The new factoran function fits a Common Factor Analysis model using maximum likelihood, including rotation of the estimated factor loadings and estimation of factor scores.

Multidimensional Scaling and Procrustes Analysis

The new cmdscale function performs classical (metric) Multidimensional Scaling, to create a configuration of points in Euclidean space solely from distance data. The new function procrustes performs orthogonal Procrustes rotations to match one set of points onto another.

Canonical Correlation Analysis

The new function canoncorr performs Canonical Correlation Analysis, to find the subsets of variables in two datasets that best correlate with each other.

Discriminant Analysis

The classify function now supports three types of discrimination (linear, quadratic, and Mahalanobis) and allows specification of prior probabilities.

'linear' is now the default, and you must specify 'mahalanobis' to duplicate the behavior of the previous version.

Nonlinear Regression Models

Classification and Regression Trees

A collection of new functions (treefit, treeprune, treedisp, treetest, treeval) performs classification and regression using decision trees. These functions fit trees to data, display them, prune them, compute error rates for them using test data or cross-validation, and apply them to new data.

Probability Distributions

Several new functions support the generation of random samples from multivariate distributions. There are functions for generating random matrices from the Wishart (wishrnd) or inverse Wishart (iwishrnd) distributions. Other functions (lhsdesign, lhsnorm) use latin hypercube sampling methods to generate samples from the multivariate uniform and normal distributions. In addition there have been improvements in other probability functions, particularly those for the negative binomial distribution. Finally, a new function (mvnpdf) computes the probability density function for the multivariate Normal distribution.

Descriptive Statistics

Density Estimation

The new ksdensity function produces a nonparametric density estimate using a kernel smoothing technique.

Empirical Cumulative Distribution

The new ecdf function computes the empirical cumulative distribution function (cdf) and confidence bounds for it. For censored data (common in survival analysis), it computes the Kaplan-Meier estimate of the cdf.

Design of Experiments

Response Surface Designs

New functions support two commonly used designs: central composite designs (ccdesign) and Box-Behnken designs (bbdesign). Central composite designs fit a full quadratic model and can have three or five levels of each factor. ccdesign supports the three types, circumscribed, inscribed and faced.

Box-Behnken designs are rotatable designs that also fit a full quadratic model but use just three levels of each factor.

D-Optimal Designs

The D-optimal design generation functions are faster than in the past. In addition, the two new functions candgen and candexch provide more control over the row-exchange algorithm for design generation.

Function Summary

Version 4.0 of the Statistics Toolbox provides the following:

New functions
Functions with new or changed capabilities

New Functions

Function
Purpose

bbdesign
Generate Box-Behnken design

candexch
D-optimal design from candidate set using row exchanges

candgen
Generate candidate set for D-optimal design

canoncorr
Canonical correlation analysis

ccdesign
Generate central composite design

cmdscale

Classical multidimensional scaling

ecdf
Empirical (Kaplan-Meier) cumulative distribution function

factoran
Perform Factor Analysis by maximum likelihood

iwishrnd

Generate inverse Wishart random matrix

kmeans

K-means clustering

ksdensity
Compute a probability density estimate using a kernel smoothing method

lhsdesign

Generate a latin hypercube sample

lhsnorm
Generate a multivariate normal random matrix using latin hypercube sampling

mvnpdf

Multivariate normal probability density function (pdf)

nbinfit

Parameter estimates and confidence intervals for negative binomial data

procrustes
Procrustes Analysis

silhouette

Silhouette plot for clustered data

treefit
Fit a tree-based model for classification or regression.

treeprune
Produce a sequence of subtrees by pruning.

treedisp
Show classification or regression tree graphically

treetest
Compute error rate for tree

treeval
Compute fitted value for decision tree applied to data

wishrnd

Generate Wishart random matrix

Function	Purpose
`bbdesign`	Generate Box-Behnken design
`candexch`	D-optimal design from candidate set using row exchanges
`candgen`	Generate candidate set for D-optimal design
`canoncorr`	Canonical correlation analysis
`ccdesign`	Generate central composite design
`cmdscale`	Classical multidimensional scaling
`ecdf`	Empirical (Kaplan-Meier) cumulative distribution function
`factoran`	Perform Factor Analysis by maximum likelihood
`iwishrnd`	Generate inverse Wishart random matrix
`kmeans`	K-means clustering
`ksdensity`	Compute a probability density estimate using a kernel smoothing method
`lhsdesign`	Generate a latin hypercube sample
`lhsnorm`	Generate a multivariate normal random matrix using latin hypercube sampling
`mvnpdf`	Multivariate normal probability density function (pdf)
`nbinfit`	Parameter estimates and confidence intervals for negative binomial data
`procrustes`	Procrustes Analysis
`silhouette`	Silhouette plot for clustered data
`treefit`	Fit a tree-based model for classification or regression.
`treeprune`	Produce a sequence of subtrees by pruning.
`treedisp`	Show classification or regression tree graphically
`treetest`	Compute error rate for tree
`treeval`	Compute fitted value for decision tree applied to data
`wishrnd`	Generate Wishart random matrix

Statistics Functions with New or Changed Capabilities

Function
Enhancement or Change

classify
A new syntax lets you specify the type of discriminant function as 'linear' (default), 'quadratic', or 'mahalanobis'. Specify 'mahalanobis' to duplicate the behavior of the previous version.
Another new syntax enables you to specify prior probabilities for the groups.
A new output returns an estimate of the misclassification error rate.

cluster
Now also allows clustering based on distance measures. A new syntax also enables you to specify values for these parameters:

'cutoff'
Cutoff for inconsistent and distance measure

'maxclust'
Maximum number of clusters to form

'criterion'
Either 'inconsistent' or 'distance'

'depth'
Depth for computing inconsistent values

The old syntax still works but is undocumented.

clusterdata
clusterdata(Z,'param1',val1,'param2',val2,...) now enables you to specify parameters that clusterdata uses in calling pdist, linkage, and cluster:

'distance'
Any of the distance metric names allowed by pdist

'linkage'
Any of the linkage methods allowed by linkage

'cutoff'
Cutoff for inconsistent and distance measure

'maxclust'
Maximum number of clusters to form

'criterion'
Either 'inconsistent' or 'distance'

'depth'
Depth for computing inconsistent values

cordexch daugment dcovary rowexch
A new syntax provides more control over design generation through a set of parameter-value pairs.
function(...,'param1',value1,'param2',value2,...)

Valid parameters are:

'display'
Controls display of iteration counter.

'init'
Specifies an initial design. The default is a randomly selected set of points.

'maxiter'
Specifies the maximum number of iterations. The default is 10.

corrcoef (MATLAB)
Provides three new syntaxes:
[R,P] = corrcoef(...) returns P, a matrix of p-values for testing the hypothesis of no correlation.
[R,P,RLO,RUP] = corrcoef(...) returns matrices RLO and RUP which contain lower and upper bounds for a 95% confidence interval for each coefficient.

[...]=corrcoef(...,'param1',val1,'param2',val2,...) accepts parameter-value pairs that enable you to override the default confidence interval, and specify how to treat rows of X that contain NaNs.

nbincdf, nbininv, nbinpdf, nbinrnd,
nbinstat
Consistent with a more general interpretation of the negative binomial, these functions now accept any positive value, including nonintegers, for the size parameter R.

pdist

Provides four new metrics for calculating the pairwise distance between observations: 'cosine', 'correlation', 'hamming', and 'jaccard'. It now also accepts a function handle to a user-defined distance function.

regstats

A new syntax stats = regstats(responses,DATA,model,whichstats) creates an output structure stats containing the statistics listed in whichstats. whichstats can be a single name or a cell array of names. The list of available statistics remains the same.

Function	Enhancement or Change
`classify`	A new syntax lets you specify the type of discriminant function as `'linear'` (default), `'quadratic'`, or `'mahalanobis'`. Specify `'mahalanobis'` to duplicate the behavior of the previous version. Another new syntax enables you to specify prior probabilities for the groups. A new output returns an estimate of the misclassification error rate.
`cluster`	Now also allows clustering based on distance measures. A new syntax also enables you to specify values for these parameters:
`'cutoff'`	Cutoff for inconsistent and distance measure
`'maxclust'`	Maximum number of clusters to form
`'criterion'`	Either `'inconsistent'` or `'distance'`
`'depth'`	Depth for computing inconsistent values
The old syntax still works but is undocumented.
`clusterdata`	`clusterdata(Z,'param1',val1,'param2',val2,...)` now enables you to specify parameters that `clusterdata` uses in calling `pdist`, `linkage`, and `cluster`:
`'distance'`	Any of the distance metric names allowed by `pdist`
`'linkage'`	Any of the linkage methods allowed by `linkage`
`'cutoff'`	Cutoff for inconsistent and distance measure
`'maxclust'`	Maximum number of clusters to form
`'criterion'`	Either `'inconsistent'` or `'distance'`
`'depth'`	Depth for computing inconsistent values
`cordexch daugment dcovary rowexch`	A new syntax provides more control over design generation through a set of parameter-value pairs. `function`(...,'param1',value1,'param2',value2,...) Valid parameters are:
`'display'`	Controls display of iteration counter.
`'init'`	Specifies an initial design. The default is a randomly selected set of points.
`'maxiter'`	Specifies the maximum number of iterations. The default is 10.
`corrcoef (MATLAB)`	Provides three new syntaxes: `[R,P] = corrcoef(...)` returns `P`, a matrix of p-values for testing the hypothesis of no correlation. `[R,P,RLO,RUP] = corrcoef(...)` returns matrices `RLO` and `RUP` which contain lower and upper bounds for a 95% confidence interval for each coefficient. `[...]=corrcoef(...,'param1',val1,'param2',val2,...)` accepts parameter-value pairs that enable you to override the default confidence interval, and specify how to treat rows of `X` that contain `NaN`s.
`nbincdf, nbininv, nbinpdf, nbinrnd,` `nbinstat`	Consistent with a more general interpretation of the negative binomial, these functions now accept any positive value, including nonintegers, for the size parameter `R`.
`pdist`	Provides four new metrics for calculating the pairwise distance between observations: `'cosine'`, `'correlation'`, `'hamming'`, and `'jaccard'`. It now also accepts a function handle to a user-defined distance function.
`regstats`	A new syntax `stats = regstats(responses,DATA,model,whichstats)` creates an output structure `stats` containing the statistics listed in `whichstats`. `whichstats` can be a single name or a cell array of names. The list of available statistics remains the same.

Statistics Toolbox Release Notes Major Bug Fixes

Chapter 1Statistics Toolbox 4.0 Release Notes

Chapter 1
Statistics Toolbox 4.0 Release Notes