Statistics Toolbox Release Notes |
New Features
This section summarizes the new features and enhancements introduced in the Statistics Toolbox 4.0.
If you are upgrading from a release earlier than Release 12.0, then you should also see New Features in the Statistics Toolbox 3.0 Release Notes.
Cluster Analysis
The new kmeans
function performs K-means clustering and supports five different distance measures. The new function silhouette
plots silhouettes of clusters created using either K-means or hierarchical clustering methods. The pdist
function now allows several new distance measures and is more efficient for large datasets.
Factor Analysis
The new factoran
function fits a Common Factor Analysis model using maximum likelihood, including rotation of the estimated factor loadings and estimation of factor scores.
Multidimensional Scaling and Procrustes Analysis
The new cmdscale
function performs classical (metric) Multidimensional Scaling, to create a configuration of points in Euclidean space solely from distance data. The new function procrustes
performs orthogonal Procrustes rotations to match one set of points onto another.
Canonical Correlation Analysis
The new function canoncorr
performs Canonical Correlation Analysis, to find the subsets of variables in two datasets that best correlate with each other.
Discriminant Analysis
The classify
function now supports three types of discrimination (linear, quadratic, and Mahalanobis) and allows specification of prior probabilities.
'linear'
is now the default, and you must specify 'mahalanobis'
to duplicate the behavior of the previous version.
Classification and Regression Trees
A collection of new functions (treefit
, treeprune
, treedisp
, treetest
, treeval
) performs classification and regression using decision trees. These functions fit trees to data, display them, prune them, compute error rates for them using test data or cross-validation, and apply them to new data.
Probability Distributions
Several new functions support the generation of random samples from multivariate distributions. There are functions for generating random matrices from the Wishart (wishrnd
) or inverse Wishart (iwishrnd
) distributions. Other functions (lhsdesign
, lhsnorm
) use latin hypercube sampling methods to generate samples from the multivariate uniform and normal distributions. In addition there have been improvements in other probability functions, particularly those for the negative binomial distribution. Finally, a new function (mvnpdf
) computes the probability density function for the multivariate Normal distribution.
Density Estimation
The new ksdensity
function produces a nonparametric density estimate using a kernel smoothing technique.
Empirical Cumulative Distribution
The new ecdf
function computes the empirical cumulative distribution function (cdf) and confidence bounds for it. For censored data (common in survival analysis), it computes the Kaplan-Meier estimate of the cdf.
Response Surface Designs
New functions support two commonly used designs: central composite designs (ccdesign
) and Box-Behnken designs (bbdesign
). Central composite designs fit a full quadratic model and can have three or five levels of each factor. ccdesign
supports the three types, circumscribed, inscribed and faced.
Box-Behnken designs are rotatable designs that also fit a full quadratic model but use just three levels of each factor.
D-Optimal Designs
The D-optimal design generation functions are faster than in the past. In addition, the two new functions candgen
and candexch
provide more control over the row-exchange algorithm for design generation.
Function Summary
Version 4.0 of the Statistics Toolbox provides the following:
New Functions
Function |
Purpose |
|
Generate Box-Behnken design |
|
D-optimal design from candidate set using row exchanges |
|
Generate candidate set for D-optimal design |
|
Canonical correlation analysis |
|
Generate central composite design |
|
|
|
Empirical (Kaplan-Meier) cumulative distribution function |
|
Perform Factor Analysis by maximum likelihood |
|
|
|
|
|
Compute a probability density estimate using a kernel smoothing method |
|
|
|
Generate a multivariate normal random matrix using latin hypercube sampling |
|
|
|
Parameter estimates and confidence intervals for negative binomial data |
|
Procrustes Analysis |
|
|
|
Fit a tree-based model for classification or regression. |
|
Produce a sequence of subtrees by pruning. |
|
Show classification or regression tree graphically |
|
Compute error rate for tree |
|
Compute fitted value for decision tree applied to data |
|
Statistics Functions with New or Changed Capabilities
Function |
Enhancement or Change | |
|
A new syntax lets you specify the type of discriminant function as 'linear' (default), 'quadratic' , or 'mahalanobis' . Specify 'mahalanobis' to duplicate the behavior of the previous version.Another new syntax enables you to specify prior probabilities for the groups. A new output returns an estimate of the misclassification error rate. | |
| Now also allows clustering based on distance measures. A new syntax also enables you to specify values for these parameters: | |
'cutoff' |
Cutoff for inconsistent and distance measure |
|
'maxclust' |
Maximum number of clusters to form |
|
'criterion' |
Either 'inconsistent' or 'distance' |
|
'depth' |
Depth for computing inconsistent values |
|
The old syntax still works but is undocumented. | ||
| clusterdata(Z,'param1',val1,'param2',val2,...) now enables you to specify parameters that clusterdata uses in calling pdist , linkage , and cluster : | |
'distance' |
Any of the distance metric names allowed by pdist |
|
'linkage' |
Any of the linkage methods allowed by linkage |
|
'cutoff' |
Cutoff for inconsistent and distance measure |
|
'maxclust' |
Maximum number of clusters to form |
|
'criterion' |
Either 'inconsistent' or 'distance' |
|
'depth' |
Depth for computing inconsistent values |
|
| A new syntax provides more control over design generation through a set of parameter-value pairs. | |
'display' |
Controls display of iteration counter. |
|
'init' |
Specifies an initial design. The default is a randomly selected set of points. |
|
'maxiter' |
Specifies the maximum number of iterations. The default is 10. |
|
|
Provides three new syntaxes:[R,P] = corrcoef(...) returns P , a matrix of p-values for testing the hypothesis of no correlation.
| |
|
Consistent with a more general interpretation of the negative binomial, these functions now accept any positive value, including nonintegers, for the size parameter R . | |
|
Provides four new metrics for calculating the pairwise distance between observations: | |
|
A new syntax |
Statistics Toolbox Release Notes | Major Bug Fixes |