Programmer's guide ------------------------------------------------------------------------------ 1 Introduction: adding a new database 2 Viewing images 3 Using metrics 4 List of metrics 5 List of display modes ------------------------------------------------------------------------------ 1 Introduction: adding a new database A database in Photobook has seven components: 1. An index of members (database entries) 2. Annotations for members (optional) 3. A list of display modes 4. A list of search modes 5. Power-assisted labeling information 6. Image data for the members 7. Content data for the members Items 1-5 are provided by the "index", "spec", and "data" files given to the "annotate" utility (described in the annotate manual). The information in these items is provided to Photobook via FRAMER. Items 6 and 7 are data files which exist in certain places on the hard disk. Photobook arranges data first by database, then by field name, and then by member, e.g. "textures/sar_coeff/1001" is a file of SAR coefficients for member "1001" in the "textures" database. This pathname is relative to the PHOTOBOOK_DATA_DIR environment variable, which is usually "./data". A new database is added in three steps: 1. Write the .index, .spec, and .data files for "annotate", and run "new-database " to enter this information in FRAMER. 2. Convert the image data into the Photobook format and place it in the appropriate directories under PHOTOBOOK_DATA_DIR. 3. Compute the content data (image features), convert it into the Photobook format, and place it in the appropriate directories under PHOTOBOOK_DATA_DIR. The "convert-data" script is a convenient way to perform this conversion. Example database specification files can be found in the database distributions. These steps do not have to be performed in order. Typically, the images will be provided first, along with a small FRAMER description for viewing in Photobook. Features are then computed on the images, and the FRAMER description is extended. Additional image sets are then generated as by-products of the feature extraction, and are added as display modes. ------------------------------------------------------------------------------ 2 Viewing images To set up a database for simple browsing (i.e. by hand), image data and a small FRAMER description file needs to be provided. Suppose we wish to store the original images under the field name "image", i.e. in the directory "example/image/" ("example" is the name of the database). Then a sufficient example.spec file is: Example database display image field image width 160 height 120 channels 3 end which specifies that the images are 160x120 interleaved RGB pixels in a raw format, i.e. one byte per channel per pixel, with no file header. This format is equivalent to the output of "rletoraw" on an RLE file or "tail +4" on a PPM file. The first line of the example.spec file is a comment and is ignored. The "display image" line specifies that the display mode name and class is "image". The "field image" line specifies that the filename of the images is "image". If we had stored another image "fft" for each member, we could add the lines display fft class image field fft width 160 height 120 channels 1 end to the file. These should be added before the "display image" specification, because the default display mode (when Photobook starts up) is the last one specified in this file, and we want "image" to be the default. The example.index file contains the member names, one per line. Since the image file names must be the same as the corresponding member names, the index file can be created by the unix command "cd example/image; ls -1 > ../index". If we have annotations for the images, they can be specified in the example.data file. For example, if each image has a serial number, these can be listed in the example.data file, one per line. The ordering of lines in this file corresponds to the ordering of lines in the index file. Assuming that the example.* files were placed in the top-level Photobook directory, the FRAMER information can be entered with "new-database example example" or "annotate example example". Photobook can then be started with "photobook example" and we can view the images. ------------------------------------------------------------------------------ 3 Using metrics The next step is to provide image features for browsing by content. After image features have been extracted, a metric (classifier function) is chosen from Photobook's repertoire. For example, suppose that we wish to browse our images based on a linear combination of their "coarseness", "contrast", and "directionality". For each image, we would compute a vector of three numbers, each number corresponding to one of these features. To compute the similarity of two images, we would compute the squared error (or Euclidean distance) between the corresponding vectors. Each element of the vector could have its own weight in the error function, so that we can bias ourselves towards contrast, or ignore coarseness. There is a metric called "euclidean" which corresponds to this kind of classifier. To specify that we want the database to use the euclidean metric, we add these lines to example.spec: data vector =ptr 3 double search euclidean vector-size 3 field vector end Note that a data field called "vector" must be specified for each database member that we wish to use this metric on. This is done with the "data" annotation above. ------------------------------------------------------------------------------ 4 List of metrics This is a list of the metric classes supported in version 5.0, each followed by the customization fields it supports. Arrays are specified using square brackets, e.g. [1 2 3] is an array of three integers. Arrays are indexed starting from zero. Vector-based metrics -------------------- The following five metrics operate on vector-based features: Euclidean --------- Distance between vectors is computed using weighted squared error. D(x,y) = (x-y)'*W*(x-y), where W is diagonal. field description value type -------------------------------------------------------- vector-size vector size integer field vector directory string from starting index integer to ending index integer weights coefficient weights array of float For example, vector-size 3 field tamura_coeff from 1 to 2 weights [1.0 0.5 2.0] specifies that distance is computed with the values in the "tamura_coeff" directory using the formula D = sqrt( 0.5 * (a[1]-b[1])^2 + 2.0 * (a[2]-b[2])^2 ) the a[0] and b[0] coefficients are left out since "from" is 1. "from" defaults to zero, "to" defaults to vector-size minus one. "weights" defaults to all 1.0. Mahalanobis ----------- Distance between vectors is computed using a general covariance matrix for each image; euclidean distance is a special case of this. Note that this is asymmetric. D(x,y) = (x-y)'*K*(x-y), where K is the inverse covariance for x. field description value type -------------------------------------------------------- vector-size vector size integer coeff vector directory string icovar inverse covariance string matrix directory Divergence ---------- Distance between vectors is computed using the formula for undirected Gaussian divergence: D(1,2) = H(1,2) + H(2,1) (relative entropy). This method has proven to be useful even when the data is not known to be Gaussian. Has the same fields as Mahalanobis plus one more: field description value type -------------------------------------------------------- covar covariance matrix string directory VSpace ------ Distance between vector spaces. This is normally asymmetric; symmetry is achieved by computing the minimum of the distance both ways. The asymmetric distance D(1,2) is the squared error between C1 and B2*B2'*C1, where C1 is the correlation matrix for the first image and B2 is the basis matrix for the second image. The distance returned by this metric is min( D(1,2), D(2,1) ). field description value type -------------------------------------------------------- rows dimensions of corr int cols and basis matrices int corr correlation matrix string basis basis matrix string Min --- Distance between histograms used in Swain and Ballard (). SumOf(x_i, all i) - SumOf(min(x_i, y_i), all i). field description value type --------------------------------------------------------- vector-size vector size integer field vector directory string Tsw --- Tree-structured Euclidean comparison operator used by Chang and Kuo (IEEE Trans. Image Processing, vol. 2, no. 4, pp. 429-441, 1993). Distance between quad-trees is the squared error between their highest valued leaves. The measure is asymmetric; if a test image is missing a leaf present in the query image, its distance is increased by 1e6. field description value type --------------------------------------------------------- field energy tree directory string levels height of the tree integer cutoff pruning constant float keep pruning constant integer The height of the tree includes the root, and defaults to 4 (totaling 85 nodes). The cutoff value is used to prune parts of the tree; any node whose energy is < cutoff*(max energy of siblings) is not expanded (i.e. forced to be a leaf). Hence, cutoff should be in [0, 1]; it defaults to 0.3. The keep value specifies how many leaves are used in the comparison; if keep is 5 (the default), then the 5 largest energy leaves are used. The vector packing format corresponds to a preorder traversal. Peaks ----- Peak match value used by the Wold texture model. Distance between peak-sets a and b is Sum_i_j(a_i^2 * w_i_j * b_j / (a_i + b_j)^2) where w_i_j is a neighborhood weighting function which decreases as peaks a_i and b_j get farther apart. The radius of the neighborhood is specified by "nbr-size" (which defaults to 2). field description value type --------------------------------------------------------- peaks peak directory string num-peaks number of peaks integer nbr-size size of neighborhood integer The vectors in the peaks directory should contain three things: the width of the image from which peaks were extracted, its height, and a peak description matrix. This matrix is stored row-major where each row is a peak entry and the three columns are x location, y location, and magnitude. Peak reflections through the origin are computed automatically, and included in the matching process. Tree ---- The tree metric expects features to be cluster assignments, where clusters are arranged in a tree. Distance between clusters is "ancestral distance", e.g. nodes with same parent are distance "1", nodes with same grandparent are distance "2", etc. The cluster tree is specified by a tree file (see "annotate.txt" for info on this format). The leaf values of the tree are image indices, starting at 1; the interior node values are ignored. field description value type -------------------------------------------------------- tree tree file string Combination ----------- The combination metric computes distance via a weighted sum of the distances computed by other metrics. For example, it can be used to define a metric which is the sum of the euclidean distance and the Gaussian divergence of two vectors. field description value type -------------------------------------------------------- num-metrics number of metrics integer to be combined metrics metric names array of string factors metric weights array of float weights metric weights array of float The "metrics" field is something like [metric/sar metric/texture-ev], where "metric/" must precede each metric name. The "factors" and "weights" fields are multiplied together to get the total weight multiplying each metric's distance value. They both default to all ones. Rank_Combo ---------- Computes distance via a weighted sum of ranks computed by other metrics. This differs from the above by ignoring absolute distances. For example, if metric A gives an image rank 3 and metric B gives it rank 1 (first), then the resulting distance is 3 * weight_A + 1 * weight_B. It also differs by using weights specific to the query image. field description value type -------------------------------------------------------- num-metrics number of metrics integer to be combined metrics metric names array of string weights metric weight directory string Combining a peaks metric with MRSAR (usually mahalanobis) and computing weights from a harmonicity measure yields the Wold model developed by Fang Liu. ------------------------------------------------------------------------------ 5 List of display modes This is a list of the display classes supported in version 5.0, each followed by the customization fields it supports. Image ----- Displays a stored image file. field description value type -------------------------------------------------------- field image directory string Bar_graph --------- Displays the bar graph of a data vector associated with the image, optionally overlayed on top of an image file. field description value type -------------------------------------------------------- vector vector directory string maximum vector component max float minimum vector component min float spacing whether bars are spaced integer (0 or 1) color bar color array of three integers field image directory string The "vector" field specifies data vector to be displayed. It should correspond to a "data" annotation in the database specification file. The "maximum" and "minimum" fields are used to scale the bars. Values which are out of this range will be clipped to fit the image size. They default to 1.0 and 0.0. The "spacing" flag is a suggestion to space out the bars. The system may push the bars together anyway if the vector is too long for the image width. The default is 1. The "color" field looks like "[0 255 255]" to get cyan bars. The default is [255 255 255] (bright white). If "field" is specified, the images in that directory will be displayed under the bar graph. Useful for seeing the image and its coefficients at the same time. The default is "", i.e. a blank background. Stretch ------- Takes the output of another display mode and dynamically stretches the pixel values so that the brighest pixel maps to 255 and the darkest pixel maps to 0. Only works (properly) on single channel images. Useful for pulling out the details in an image. field description value type -------------------------------------------------------- field display mode source string The display mode source is specified using "view/", such as "view/image" or "view/fft". Zoom ---- Takes the output of another display mode and up/downsamples it. Upsampling is pixel replication; downsampling is block averaging. field description value type -------------------------------------------------------- field display mode source string zfact zoom factor integer If zfact > 0, upsamples by that factor. If zfact < 0, downsamples by -zfact. Since the size of the output image varies with zfact, be sure to press "Resize" after changing it. Tsw_tree -------- Displays a wavelet energy tree in the format of Chang and Kuo (). field description value type -------------------------------------------------------- field tsw metric source string The tree directory and the cutoff value for determining leaves is taken from the metric specified by field, e.g. "metric/my_tsw". Peaks ----- Displays a peak map by pasting small diamonds on a black background. The brightness of the diamond is proportional to the peak's magnitude; these are dynamically stretched for each image so that the highest peak has value 255. field description value type -------------------------------------------------------- peaks peaks metric source string ------------------------------------------------------------------------------ tpminka@media.mit.edu 6/23/95