Statistics Toolbox Release Notes    

Chapter 1
Statistics Toolbox 4.0 Release Notes


New Features

This section summarizes the new features and enhancements introduced in the Statistics Toolbox 4.0.

If you are upgrading from a release earlier than Release 12.0, then you should also see New Features in the Statistics Toolbox 3.0 Release Notes.

Multivariate Analysis

Cluster Analysis

The new kmeans function performs K-means clustering and supports five different distance measures. The new function silhouette plots silhouettes of clusters created using either K-means or hierarchical clustering methods. The pdist function now allows several new distance measures and is more efficient for large datasets.

Factor Analysis

The new factoran function fits a Common Factor Analysis model using maximum likelihood, including rotation of the estimated factor loadings and estimation of factor scores.

Multidimensional Scaling and Procrustes Analysis

The new cmdscale function performs classical (metric) Multidimensional Scaling, to create a configuration of points in Euclidean space solely from distance data. The new function procrustes performs orthogonal Procrustes rotations to match one set of points onto another.

Canonical Correlation Analysis

The new function canoncorr performs Canonical Correlation Analysis, to find the subsets of variables in two datasets that best correlate with each other.

Discriminant Analysis

The classify function now supports three types of discrimination (linear, quadratic, and Mahalanobis) and allows specification of prior probabilities.

'linear' is now the default, and you must specify 'mahalanobis' to duplicate the behavior of the previous version.

Nonlinear Regression Models

Classification and Regression Trees

A collection of new functions (treefit, treeprune, treedisp, treetest, treeval) performs classification and regression using decision trees. These functions fit trees to data, display them, prune them, compute error rates for them using test data or cross-validation, and apply them to new data.

Probability Distributions

Several new functions support the generation of random samples from multivariate distributions. There are functions for generating random matrices from the Wishart (wishrnd) or inverse Wishart (iwishrnd) distributions. Other functions (lhsdesign, lhsnorm) use latin hypercube sampling methods to generate samples from the multivariate uniform and normal distributions. In addition there have been improvements in other probability functions, particularly those for the negative binomial distribution. Finally, a new function (mvnpdf) computes the probability density function for the multivariate Normal distribution.

Descriptive Statistics

Density Estimation

The new ksdensity function produces a nonparametric density estimate using a kernel smoothing technique.

Empirical Cumulative Distribution

The new ecdf function computes the empirical cumulative distribution function (cdf) and confidence bounds for it. For censored data (common in survival analysis), it computes the Kaplan-Meier estimate of the cdf.

Design of Experiments

Response Surface Designs

New functions support two commonly used designs: central composite designs (ccdesign) and Box-Behnken designs (bbdesign). Central composite designs fit a full quadratic model and can have three or five levels of each factor. ccdesign supports the three types, circumscribed, inscribed and faced.

Box-Behnken designs are rotatable designs that also fit a full quadratic model but use just three levels of each factor.

D-Optimal Designs

The D-optimal design generation functions are faster than in the past. In addition, the two new functions candgen and candexch provide more control over the row-exchange algorithm for design generation.

Function Summary

Version 4.0 of the Statistics Toolbox provides the following:

New Functions

Function
Purpose
bbdesign
Generate Box-Behnken design
candexch
D-optimal design from candidate set using row exchanges
candgen
Generate candidate set for D-optimal design
canoncorr
Canonical correlation analysis
ccdesign
Generate central composite design
cmdscale

Classical multidimensional scaling

ecdf
Empirical (Kaplan-Meier) cumulative distribution function
factoran
Perform Factor Analysis by maximum likelihood
iwishrnd

Generate inverse Wishart random matrix

kmeans

K-means clustering

ksdensity
Compute a probability density estimate using a kernel smoothing method
lhsdesign

Generate a latin hypercube sample

lhsnorm
Generate a multivariate normal random matrix using latin hypercube sampling
mvnpdf

Multivariate normal probability density function (pdf)

nbinfit

Parameter estimates and confidence intervals for negative binomial data

procrustes
Procrustes Analysis
silhouette

Silhouette plot for clustered data

treefit
Fit a tree-based model for classification or regression.
treeprune
Produce a sequence of subtrees by pruning.
treedisp
Show classification or regression tree graphically
treetest
Compute error rate for tree
treeval
Compute fitted value for decision tree applied to data
wishrnd

Generate Wishart random matrix

Statistics Functions with New or Changed Capabilities

Function
Enhancement or Change
classify
A new syntax lets you specify the type of discriminant function as 'linear' (default), 'quadratic', or 'mahalanobis'. Specify 'mahalanobis' to duplicate the behavior of the previous version.
Another new syntax enables you to specify prior probabilities for the groups.
A new output returns an estimate of the misclassification error rate.
cluster
Now also allows clustering based on distance measures. A new syntax also enables you to specify values for these parameters:
'cutoff'
Cutoff for inconsistent and distance measure
'maxclust'
Maximum number of clusters to form
'criterion'
Either 'inconsistent' or 'distance'
'depth'
Depth for computing inconsistent values
The old syntax still works but is undocumented.
clusterdata
clusterdata(Z,'param1',val1,'param2',val2,...) now enables you to specify parameters that clusterdata uses in calling pdist, linkage, and cluster:
'distance'
Any of the distance metric names allowed by pdist
'linkage'
Any of the linkage methods allowed by linkage
'cutoff'
Cutoff for inconsistent and distance measure
'maxclust'
Maximum number of clusters to form
'criterion'
Either 'inconsistent' or 'distance'
'depth'
Depth for computing inconsistent values
cordexch
daugment
dcovary
rowexch

A new syntax provides more control over design generation through a set of parameter-value pairs.
  • function(...,'param1',value1,'param2',value2,...)
    

Valid parameters are:

'display'
Controls display of iteration counter.
'init'
Specifies an initial design. The default is a randomly selected set of points.
'maxiter'
Specifies the maximum number of iterations. The default is 10.
corrcoef
(MATLAB)

Provides three new syntaxes:
[R,P] = corrcoef(...) returns P, a matrix of p-values for testing the hypothesis of no correlation.

[R,P,RLO,RUP] = corrcoef(...) returns matrices RLO and RUP which contain lower and upper bounds for a 95% confidence interval for each coefficient.

[...]=corrcoef(...,'param1',val1,'param2',val2,...) accepts parameter-value pairs that enable you to override the default confidence interval, and specify how to treat rows of X that contain NaNs.

nbincdf, nbininv, nbinpdf, nbinrnd,
nbinstat
Consistent with a more general interpretation of the negative binomial, these functions now accept any positive value, including nonintegers, for the size parameter R.
pdist

Provides four new metrics for calculating the pairwise distance between observations: 'cosine', 'correlation', 'hamming', and 'jaccard'. It now also accepts a function handle to a user-defined distance function.

regstats

A new syntax stats = regstats(responses,DATA,model,whichstats) creates an output structure stats containing the statistics listed in whichstats. whichstats can be a single name or a cell array of names. The list of available statistics remains the same.


  Statistics Toolbox Release Notes Major Bug Fixes