Debugging, Profiling, & Style Guidelines in Matlab

July 22, 2010 by Admin · 2 Comments
Filed under: Code Optimization 
VN:F [1.8.8_1072]
Rating: +2 (from 2 votes)
VN:F [1.8.8_1072]
Rating: 10.0/10 (1 vote cast)

by Matt Dunham

Contents

Intro

Matlab code is very flexible, particularly with respect to its dynamic typing, however, this can also lead to enormous headaches when it comes time to debug a crashing program. You may have overwritten or misspelled an important variable without knowing it, or inadvertently expanded the size of your matrix, and Matlab will happily continue executing without warning until your program grinds to a halt. Trying to discover the original problem that started dozens of lines before or in another function is not always easy.

In this section, we describe a number of tools and techniques that can help as well as ways to assess the speed of your code and find potential bottlenecks. We finish by discussing a few stylistic points and best practices that can make your code more readable and less prone to bugs in the first place.

Useful Functions

dbstop, dbquit, dbclear, dbstep, dbstep nlines, dbstep in, dbstep out, dbcont, dbstatus, keyboard, workspace, tic, toc, profile on, profile off, profile viewer

M-lint Warnings & Errors

Matlab automatically checks for certain problems and suggests fixes as you edit your m-files. The problem code is underlined in red much like word processors underline misspelled words. It is worthwhile paying attention to these as they can often point out problems before you run your code and frequently suggest ways to speed up execution. The suggestions appear when you hover your mouse over the underlined text, and you can quickly find these spots by looking for the red markers to the right of the document. The warnings and errors M-lint warns you about can be set under File->Preferences->M-lint.

In newer versions of Matlab, you can generate a full M-lint html report by going to Tools->Save and Show M-lint Report. You can also bring up a file dependency report or compare two versions of a file from the Tools drop down menu.

Stop! if Errors/Warnings…

If your program is crashing or displaying cryptic warnings, it is very useful to have it automatically halt execution right at the point where it ran into trouble. Select Debug –> Stop if Errors/Warnings to turn this on.

Break Points

Break points can be set at any line in the document that executes code by pressing just right of the line number. A small circle will appear and will turn red when the file is saved.

These can be temporarily disabled by right clicking on them and selecting disable. To clear them all, type dbclear all or press the equivalent tool bar button.

You can set a condition on the breakpoints so that it is only triggered if a variable takes on a certain value, by right clicking on the variable and selecting ‘Set/Modify Condition’.

Once your code has stopped at a breakpoint, you can step one line at a time, continue on until the next break point, or exit debug mode completely using the tool bar buttons at the top of the editor.

The step in and step out buttons, let you enter into, or leave a function called at the current line.

There are function equivalents to these commands if you prefer, namely dbstep, dbstep nlines, dbstep in, dbstep out, dbcont, and dbquit. The dbstop() function can be used to set breakpoints and the dbstatus() function displays all of the breakpoints currently set. You can save these into a variable as in s = dbstatus(), clear the breakpoints and then reset them at a later point with dbstop(s).

When you are in debug mode, the command window prompt will look slightly different: it will have a k in front.

The keyboard() function can also be used to stop execution of a program, temporarily relinquishing control back to the command window. Simply add the line keyboard anywhere in your file to stop at that point. To return execution, type return.

Variable Stacks

Once execution has stopped because of a break point or keyboard() command, you can inspect the current values of the variables in the workspace window. Type workspace at the command prompt if it is not already open. If your program contains or calls multiple functions, you can move among the variable stacks from the top of this window, (next to where it says Stack:). This is particularly useful if your program stopped in a third party function and you want to return to your function’s stack to see what went wrong. You can also view the base workspace from here.

You can also execute commands at the command prompt while execution has stopped, and assign new values to existing variables.

Run Configurations

Each m-file has one or more run configurations associated with it as seen in the image below. These can be used to setup tests for your function by specifying test input values and other validation code. You can reach this window by selecting Debug –> Run Configuration for yourfile.m–> Edit Run Configuration…. Add new configurations with the + button and edit the code to execute on the right. These can then be run from the Debug drop down menu or from the run button in the editor.

Profiling & Timing Code

We have already seen the tic() and toc() functions, which can be used to time how long your code takes to run. Simply run tic() before your code and toc() after. Matlab, however, has a much more powerful framework, called profiling, which gives you a detailed report about how long was spent executing each subfunction. You can use this report to find bottlenecks that you might be able to improve.

You can turn on profiling with the profile on command and turn it off again with profile off. Once profiling is on, execute your code, and then type profile viewer to see the report.

The report shows a breakdown of all the functions called from your function, the number of times they were called, and the total time spent executing them. Self-time, denoted by a dark blue band, is the the time spent within a function not including how long was spent in functions called from here. This is really the statistic you should pay attention to.

Style Guidelines

There is a lot of Matlab code floating around and too much of it is totally unreadable. It does not have to be this way. Readable code is easier to use, maintain, debug, and extend, and can often serve to communicate your ideas, particularly when they are mathematical in nature. What makes one piece of code more readable than another is somewhat subjective, but there are fairly uncontentious and straightforward heuristics we can nevertheless employ. We describe a few here. Many of these suggestions are from Richard Johnson’s Matlab Programming Style Guidelines available here.

Matlab Style Guidelines

Layout

Taking the time to organize and layout your m-file well, can help you find bugs later and jump to the code you are looking for much faster.

Use indentation to denote scope, indenting the code in function bodies, and further indenting the code within loops, switch statements, try/catch blocks as well as nested functions.

Include spaces around operators like ||, &&, ==, etc and consider breaking long commands into multiple lines by using ellipses, (…). Keep lines to less than say 80 characters long and be consistent throughout.

Align variables and values by equal signs and commas to show parallel structure.

plot(Xequal,f(Xequal), ‘o’ ,’MarkerFaceColor’ , ‘g’…

,’MarkerEdgeColor’ , ‘k’…

,’LineWidth’       ,  2 …

,’MarkerSize’      , 10);

Matlab programmers seem to love packing as much into a single line of code as possible. When you can not think of an informative name for a temporary variable, passing values from one function directly to another via composition is not a bad approach. It is certainly better than having variables with names like temp1 and temp2 floating around. However, this approach can be taken to extremes too. If you find yourself squinting at a line of your own code, trying to decipher its purpose for more than a few seconds, consider breaking it into multiple commands.

Comments

The easiest way to make your program more readable is to document it well, however, this is no substitute for well written code, which should ’speak’ for itself. If you find yourself commenting many lines of code, consider adding greater structure, by writing subfunctions for instance. Subfunctions with well chosen names self document their behavior and help to abstract the details.

Assume programmers reading your code want to know how it actually works and will not be satisfied by assurances in the comments. Make it as easy as possible for them, (and yourself) to verify the correctness of your program.

Comments should be written directly below function headers as this is where the Matlab help functions look in calls to help() and doc(): lookfor() searches only the first line of comments in a function, and so this should be particulary concise and informative. The same applies to class definitions.

It is very important to describe the inputs and outputs to your function as well as any expectations or preconditions. Does your function work with unstandardized data, or missing values? What can the user of your function expect as output so long as the preconditions are met?

Provide examples of all of the important ways in which your function can be called. If you have 6 optional parameters, you do not have to show all 6 factorial possibilities, but include enough, (say 6), so that the user can reasonably extrapolate as to what a particular combination will do. Consider writing a separate function or script to demonstrate certain functionality in context. Separate advanced or infrequently used options and comment on these below the rest. When in doubt, follow the style of built in Matlab functions.

Consider using process_options() written by Mark Paskin, when a function takes many inputs. This allows users to call your function with the inputs specified in any order they like, preceded by a string name as in the following. myfunc(‘niterations’,3,’maxdepth’,5,’verbose,’true)

We describe _process_options in more depth here.

process_options

Variable Names

It should go without saying that variable names should be meaningful and informative; part of this, however, is a matter of convention. Here are some suggestions.

  • Short, single letter variables should only be used in one of three cases: where the structure of the algorithm is important, as in a mathematical derivation; for local temporary variables such as loop indices; or when well defined conventions exist. In all of these cases, document their meaning through comments.
  • Use lower case variable names when there is only one word, or when one of two words is very short as in isvalid, otherwise use camelCase.
  • Capitalize constant variables whose values will not change.
  • Prefix variables denoting a number of elements with the letter n as in nvalues for the number of values.
  • Suffix variables storing indices with NDX as in dataNDX
  • Prefix logical functions with is as in isfinite().
  • Use i,j,k for loop variables.
  • Do not use any magic numbers, i.e. constant values appearing out of nowhere. Rather, assign these values to variables with descriptive names and use these instead.
  • Be consistent with pluralization for non-scalar data, i.e. pick one of value(j) or values(j) and use that convention throughout.
  • Resuse variables names only when the data is related and even then, with caution. It can be very confusing when a variable you have been tracing through a program suddenly changes role.

See the section on Functions for more suggestions.

Functions

VN:F [1.8.8_1072]
Rating: 10.0/10 (1 vote cast)
VN:F [1.8.8_1072]
Rating: +2 (from 2 votes)

Popularity: 1% [?]

Share and Enjoy:
  • Print
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Blogplay
  • Live
  • PDF
  • Technorati
  • Twitter
  • Yahoo! Bookmarks
  • Add to favorites
  • email
  • MySpace
  • RSS

Strings, Cells, Structs, and Sets in Matlab

July 6, 2010 by Admin · Leave a Comment
Filed under: Code Optimization 
VN:F [1.8.8_1072]
Rating: +1 (from 1 vote)
VN:F [1.8.8_1072]
Rating: 10.0/10 (3 votes cast)

by Matt Dunham

In this section, we examine strings and string operations as well as two very important Matlab data structures: cell arrays, and structs. We also examine various set-theoretic operations and end with a comprehensive example.

Contents

Useful Functions

  • repmat, ischar, isletter, ispace, upper, lower, strtrim, deblank,
  • isstrprop, char, abs, dec2hex, hex2dec, bin2dec, dec2bin, num2str,
  • mat2str, str2num, strcat, strvcat, sortrows, strjust, sprintf, fprintf,
  • cell, iscell, num2cell, mat2cell, cell2mat, cellstr, iscellstr, cellfun,
  • strcmp, strcmpi, strncmp, strncmpi, strfind, strmatch, strtok
  • intersect, union, setdiff, setxor, ismember, all, any, perms, issorted,
  • unique, struct, isstruct, fieldnames, isfield, orderfields, rmfield,
  • isvarname, genvarname, vertcat, cell2struct, struct2cell

Character Arrays

Strings in Matlab are actually character matrices, which can be manipulated in very similar ways to numeric matrices.

A = ' This is Test String #1! '
B = A(1:5)              % extract the first 5 characters
C = [A ; A]             % concatenate vertically
D = repmat('@!',2,5)    % replicate char arrays, just like numeric ones
E = 'z':-1:'a'          % create the matrices just like numeric ones.
check = ischar(A)       % is it a char array?
F = isletter(A(1:6))    % which characters are letters? - returns a logical array
G = isspace(A(1:6))     % which characters are spaces? - returns a logical array
H = upper(A)            % convert to upper case
I = lower(A)            % convert to lower case
J = strtrim(A)          % trim leading and trailing blank spaces.
K = deblank(A)          % trim trailing blank spaces only.
A =
 This is Test String #1!
B =
 This
C =
 This is Test String #1!
 This is Test String #1!
D =
@!@!@!@!@!
@!@!@!@!@!
E =
zyxwvutsrqponmlkjihgfedcba
check =
     1
F =
     0     1     1     1     1     0
G =
     1     0     0     0     0     1
H =
 THIS IS TEST STRING #1!
I =
 this is test string #1!
J =
This is Test String #1!
K =
 This is Test String #1!

The isstrprop() function can be used much like the isletter() or isspace() functions, allowing you to test which characters in a matrix belong to one of several different categories. Type doc isstrprop for the full list.

str = ' a1!'
A = isstrprop(str,'punct')       % punctuation
B = isstrprop(str,'alphanum')    % alpha or numeric characters
C = isstrprop(str,'digit')       % decimal digits
D = isstrprop('3A','xdigit')     % valid hexadecimal digits
str =
 a1!
A =
     0     0     0     1
B =
     0     1     1     0
C =
     0     0     1     0
D =
     1     1

The char() and abs() functions convert from integers to the ascii equivalents and vice versa.

A = char(65)
B = abs('B')
C = abs('abcdefg')
A =
A
B =
    66
C =
    97    98    99   100   101   102   103

We can convert from string representations of hexadecimal or binary numbers to decimal numbers and back using the dec2hex() , hex2dec() , dec2bin(), and bin2dec() functions. The num2xxx and xxx2num functions operate on signed numbers.

A = dec2hex(211)
B = hex2dec('D3')
C = dec2bin(211)
D = bin2dec('11010011')
A =
D3
B =
   211
C =
11010011
D =
   211

We can also use the num2str() and mat2str() functions to generate string representations of numeric matrices. Or, parse a number from a string with str2num()

A = num2str([1:5;1:5]) %Takes an optional formatting string - see Formatting Strings section
B = mat2str([1:5;1:5])
C = str2num('44')
A =
1  2  3  4  5
1  2  3  4  5
B =
[1 2 3 4 5;1 2 3 4 5]
C =
    44

If the size of the strings match, we can concatenate vertically and horizontally just like numeric matrices. If not, we can either use the blanks() function to pad with blanks or the strcat() and strvcat() functions to concatenate, adding blanks for us.

C = strvcat('hello','this','is','a','test') %concatenate vertically
C =
hello
this
is
a
test
D = sortrows(C)                             % sort the rows alphabetically
D =
a
hello
is
test
this
E = strjust(C)                              % justify the char array
E =
hello
 this
   is
    a
 test

Formatting Strings

The sprintf() and fprintf() functions can be used to format strings for output: sprintf() returns a string, while fprintf() directly displays the string, or writes it to a file, depending on the mode.

We pass these functions a string that includes place holders, (denoted by % signs) which will be replaced by corresponding values listed after the string. These place holders define how these values will be formatted. We use %s for a string, %d for a decimal digit, and %05.2f to indicate that we want a floating point number with 5 characters in total, two digits after the decimal point, and padded with zeros if necessary. There are many formatting options; Type doc sprintf for the full list. We can use escape characters like \n for a new line and \t for a tab. The examples will make this clearer.

fprintf('\n %s won the %s medal in the %s \n for his time of %05.2f seconds.\n',...
         'Kosuke Kitajima','gold','100m breaststroke',60.08);
str = sprintf('%07.4f',pi) % display pi to 4 decimals, 7 chars in total, padded with zeros.
str = sprintf('%x',999)    % display number in hexadecimal
 Kosuke Kitajima won the gold medal in the 100m breaststroke
 for his time of 60.08 seconds.
str =
03.1416
str =
3e7

Cell Arrays

In addition to matrices, Matlab supports another very general and powerful data structure, the cell array. Cell arrays can hold any type of Matlab object or structure including numeric matrices of different sizes, character arrays, other cells, as well as structs and objects, which we will see later. In fact, the same cell array can hold elements of different types. Cell arrays are frequently used to store strings, (i.e. char arrays of different sizes), which is why we discuss them here. Much of what was said about indexing matrices also applies to cells with one or two important differences.

We can create a cell array by using the cell() command

A = cell(2,4)               % create a 2-by-4 cell array
check = iscell(A)           % really a cell?
A =
     []     []     []     []
     []     []     []     []
check =
     1

or by enclosing an object or objects in curly braces

B = {[1,2,3],'hello',{1};[3;5],'yes',{'no'}}   % add a bunch of objects to a cell array
B =
    [1x3 double]    'hello'    {1x1 cell}
    [2x1 double]    'yes'      {1x1 cell}

A 2-by-4 cell array is in fact made up of 8, 1-by-1 cell arrays, (simply called cells), which store the data. There are two ways to index into, (and assign into) a cell array: using () braces and using {} braces. Using the () braces, we access or assign cells. Using the the {} braces we access or assign the data within those cells.

C = B(1,2)      % Returns a cell holding the string 'hello'
D = B{1,2}      % Returns the string itself.
E = B(:,1)      % Returns a cell array holding the first column
C =
    'hello'
D =
hello
E =
    [1x3 double]
    [2x1 double]

If we extract the data from more than one cell at once using the curly bracket indexing, Matlab returns each element one at a time much like a function that returns multiple values. We can assign each of these to new variables or perhaps pass them directly to a function expecting that many parameters.

[F,G] = B{:,1}
F =
     1     2     3
G =
     3
     5

When assigning data, we must be careful what kind of brackets we use.

B(1,1) = {'test'}   % must pass it a cell as we are using () brackets
B{1,1} = 'test'     % same effect as line before.
B{1,2} = {'test'}   % careful, this adds a cell to the cell at (1,2), (nesting cells)
H = B{1,2}{1}       % to then extract it, we have to index twice.
B =
    'test'          'hello'    {1x1 cell}
    [2x1 double]    'yes'      {1x1 cell}
B =
    'test'          'hello'    {1x1 cell}
    [2x1 double]    'yes'      {1x1 cell}
B =
    'test'          {1x1 cell}    {1x1 cell}
    [2x1 double]    'yes'         {1x1 cell}
H =
test

We can transpose,reshape,replicate, concatenate, and delete cell arrays just like matrices

I = B'                           % transpose
J = reshape(B,1,6)               % reshape
K = [repmat(C,1,3);B]            % replicate and concatenate
K(end,:) = []                    % delete
I =
    'test'        [2x1 double]
    {1x1 cell}    'yes'
    {1x1 cell}    {1x1 cell  }
J =
    'test'    [2x1 double]    {1x1 cell}    'yes'    {1x1 cell}    {1x1 cell}
K =
    'hello'         'hello'       'hello'
    'test'          {1x1 cell}    {1x1 cell}
    [2x1 double]    'yes'         {1x1 cell}
K =
    'hello'    'hello'       'hello'
    'test'     {1x1 cell}    {1x1 cell}

Suppose we store numeric matrices of different sizes in a cell array.

A = {[1,2,3],[4,5],[6],[7,8,9,10]}
A =
    [1x3 double]    [1x2 double]    [6]    [1x4 double]

We can concatenate the entries themselves by first extracting all of the elements using the colon operator and then passing the results to the concatenation operator [].

B = [A{:}]
B =
     1     2     3     4     5     6     7     8     9    10

We can also use the num2cell() , mat2cell() ,and cell2mat() functions to convert between matrices and cell arrays.

A = num2cell(1:5)                     % convert [1,2,3,4,5] to {[1],[2],[3],[4],[5]}
B = mat2cell(ones(4,8),[2,2],[3,3,2]) % partition matrix ones(4,8) into 6 cells
C = cell2mat(B)                       % inverse operation, (group together)
A =
    [1]    [2]    [3]    [4]    [5]
B =
    [2x3 double]    [2x3 double]    [2x2 double]
    [2x3 double]    [2x3 double]    [2x2 double]
C =
     1     1     1     1     1     1     1     1
     1     1     1     1     1     1     1     1
     1     1     1     1     1     1     1     1
     1     1     1     1     1     1     1     1

We can convert from a character matrix to a cell array of strings, where each string is taken to be a row of the matrix, using the cellstr() command, and back again using the char() command.

A = strvcat('cell','array','example')   % make a char array
B = cellstr(A)                          % convert to a cell array of strings
check = iscellstr(B)                    % check that its a cell array of strings
C = char(B)                             % convert back to a char array
A =
cell
array
example
B =
    'cell'
    'array'
    'example'
check =
     1
C =
cell
array
example

The cellfun() function can be very useful when we want to apply a function to the data inside every cell. We make use of function handles here. Read the section on functions if you are unfamiliar.

A = {'This, ', 'a ', 'test ', 'message, ', 'contains ', '3 ','punctuation ' ,'marks!'}
f = @(str)str(isstrprop(str,'alphanum')); % function to remove non-alphanumeric chars
B = cellfun(f,A,'uniformOutput',false)    % apply function to A, don't expect same-sized output
A =
  Columns 1 through 6
    'This, '    'a '    'test '    'message, '    'contains '    '3 '
  Columns 7 through 8
    'punctuation '    'marks!'
B =
  Columns 1 through 7
    'This'    'a'    'test'    'message'    'contains'    '3'    'punctuation'
  Column 8
    'marks'

String Matching

There are several functions we can use to compare strings.

A = 'testString';
test1 = strcmp(A,'testString');       % compare two strings
test2 = strcmpi(A,'TESTstring');      % compare two strings but ignore case
test3 = strncmp(A,'testFoo',4);       % compare only the first 4 chars of two strings
test4 = strncmpi(A,'TEST',4);         % same as above, but ignore case.
result = test1 && test2 && test3 && test4
result =
     1

We can find the occurrences of one substring inside another using the strfind() function, or search for all strings, (stored as rows in a matrix or cells in a cell array) that begin with a certain string, using the strmatch() function. We can also grab the first token in a char array delimited by spaces using the strtok() command, (the delimiter it uses can be changed).

str = 'actgcgctgacgctgatacacgggagctgacgactgaggacgagc'
A = strfind(str,'ctga')
str =
actgcgctgacgctgatacacgggagctgacgactgaggacgagc
A =
     7    13    27    34
str2 = {'foobar','bar','barfoo','foofoo'}
B = strmatch('foo',str2)
str2 =
    'foobar'    'bar'    'barfoo'    'foofoo'
B =
     1
     4
[token, remaining] = strtok('this is a test')
token =
this
remaining =
 is a test

Matlab also supports search and replace operations using regular expressions. Type doc regexp for numerous examples and useful functions.

Set Operations

We can treat matrices and cell arrays as sets or multisets and perform various set operations with the functions union() , intersect() , setdiff() , setxor() , and ismember() .

set1 = 1:2:9
set2 = 1:4
int = intersect(set1,set2)
uni = union(set1,set2)
dif = setdiff(set1,set2)
xor = setxor(set1,set2)
check = ismember(3,set1)
set1 =
     1     3     5     7     9
set2 =
     1     2     3     4
int =
     1     3
uni =
     1     2     3     4     5     7     9
dif =
     5     7     9
xor =
     2     4     5     7     9
check =
     1
set3 = {'alpha','beta','gamma'}
set4 = {'delta','beta','epsilon'}
intc = intersect(set3,set4)
check2 = ismember('delta',set4)
set3 =
    'alpha'    'beta'    'gamma'
set4 =
    'delta'    'beta'    'epsilon'
intc =
    'beta'
check2 =
     1

If we are taking the set difference of integers, it can be much faster to use the custom mysetdiff() function, which uses logical indexing.

mysetdiff

Or, perform basic quantification over logical arrays using the all() and any() commands: all() returns true if all of the inputs are true; whereas, any() returns true if at least one input is true.

forall = all(isprime(1:2:7))
exists = any(isprime(1:2:7))
forall =
     0
exists =
     1

We can also extract the unique elements of a cell array or matrix using the unique() function.

A = ['bba';'bab';perms('aba');'aba']                          % perms() generates every permutation
[uniqueElems, firstIndices, perm] = unique(A,'rows');         % find unique rows of A
sorted = issorted(uniqueElems,'rows')                         % are they sorted? - yes!
check = isequal(A,uniqueElems(perm,:),A(firstIndices(perm),:))% note what each return var represents
uniqueNums = unique([1,2,1,1,2,3,4,4,5,3,2,1])                % numeric matrix
uniqueNames = {'Bob','Fred','Bob','Ed','Fred','Chris','Ed'}   % cell array
A =
bba
bab
aba
aab
baa
baa
aba
aab
aba
sorted =
     1
check =
     1
uniqueNums =
     1     2     3     4     5
uniqueNames =
    'Bob'    'Fred'    'Bob'    'Ed'    'Fred'    'Chris'    'Ed'

Structs

In addition to matrices and cell arrays, Matlab supports structured arrays or structs, which allow you to organize data and access it by name. For those familiar with other programming languages, structs are basically hashmaps with string keys, but depending on how they are used, they can also operate much like a simple database. Structs, like cell arrays, can store anything you throw at them. Conversely, you can store structs in cell arrays and even within matrices so long as the fieldnames of the structs are the same.

We can create a struct by using the struct() function, passing it fieldnames and data in alternating order.

S = struct('time',0:10,'distance',0:0.1:1,'height',1:0.1:2)
check = isstruct(S)             % really a struct?
names = fieldnames(S)           % list the fieldnames
check2 = isfield(S,'time')      % check that 'time' is really a fieldname
S = orderfields(S)              % order the fields alphabetically
S = rmfield(S,'height')         % remove a field
S =
        time: [0 1 2 3 4 5 6 7 8 9 10]
    distance: [1x11 double]
      height: [1x11 double]
check =
     1
names =
    'time'
    'distance'
    'height'
check2 =
     1
S =
    distance: [1x11 double]
      height: [1x11 double]
        time: [0 1 2 3 4 5 6 7 8 9 10]
S =
    distance: [1x11 double]
        time: [0 1 2 3 4 5 6 7 8 9 10]

Access the data using the . operator and the name of the field.

time = S.time
time =
     0     1     2     3     4     5     6     7     8     9    10

Alternatively, we can use a string for the name, which allows us to access fields dynamically at runtime, much like a map.

distance = S.('distance')
distance =
  Columns 1 through 7
         0    0.1000    0.2000    0.3000    0.4000    0.5000    0.6000
  Columns 8 through 11
    0.7000    0.8000    0.9000    1.0000

We can set set a new value for a field

S.time = 2*S.time
S =
    distance: [1x11 double]
        time: [0 2 4 6 8 10 12 14 16 18 20]

or add new fields and data on the fly

S.newField = 'foo'
S =
    distance: [1x11 double]
        time: [0 2 4 6 8 10 12 14 16 18 20]
    newField: 'foo'

When the names for the fields will be generated dynamically, (i.e. at runtime) it is often prudent to ensure that the string is a valid fieldname. Fieldnames must begin with a letter and can contain only letters, numbers and the underscore symbol. You can check that a string is valid with the isvarname() command and auto-generate a valid name from a source string with the genvarname() command.

test = isvarname('3alpha')
better = genvarname('3alpha')
test =
     0
better =
x3alpha

We can create an array of structs all having the same fieldnames, which allows us to build a kind of database of entries.

S = struct('Name',{},'ID',{},'Position',{});
S(1).Name = 'Greg'; S(1).ID = '123'; S(1).Position = 'Manager';
S(2).Name = 'Ed'  ; S(2).ID = '312'; S(2).Position = 'Clerk';
S(3).Name = 'Pete'; S(3).ID = '301'; S(3).Position = 'CEO';

We can then access an individual record, itself a struct,

EdsRecord = S(2)
EdsRecord =
        Name: 'Ed'
          ID: '312'
    Position: 'Clerk'

Or access data across all of the records at once.

[gID,eID,pID] = S.ID
gID =
123
eID =
312
pID =
301

We can concatenate the output from the above command

A = [S.ID]
A =
123312301

However, when dealing with structures, its often more useful to concatenate vertically. We can do this by using the vertcat() function, which is the same function called when you concatenate with semicolons as in [A ; B]. Since we cannot control how we get the data from the struct, we sometimes have to call vertcat explicitly.

B = vertcat(S.ID)
B =
123
312
301

We can create structs from cell arrays using the cell2struct() function and, (possibly multidimensional), cell arrays from structs using the struct2cell() function.

data = {1,2,3,4};
fieldNames = {'one','two','three','four'};
dim = 2;                                    %data for each fieldname is ordered along dim 2
S = cell2struct(data,fieldNames,dim)
S =
      one: 1
      two: 2
    three: 3
     four: 4

Example

In the below example, we put many of the functions and constructs just discussed to work. We load Darwin’s on the Origin of Species into a cell array, convert to lower case, remove the punctuation and any non-alpha characters, and sort the words by how frequently they occur in the text.

Here is a link to the text. Place it somewhere on the Matlab path before running this code.

http://www.gutenberg.org/dirs/etext98/otoos11.txt

if(exist('darwin.txt','file'))  % make sure the file exists
    tic                         % time how long this takes
    fid = fopen('darwin.txt');  % Open file
    text = textscan(fid,'%s');  % Grab every word and put it in a cell array
    fclose(fid);                % Close file
    %one big cell is returned, unwrap it and convert to lowercase
    text = lower(text{:});
   %delete any tokens that do not contain at least one alpha character
    noAlpha = cellfun(@(x)~any(x),isstrprop(text,'alpha'));
    text(noAlpha) = [];
  %remove punctuation and any non-alpha characters
    puncRemover = @(str)str(isstrprop(str,'alpha'));
    text = cellfun(puncRemover,text,'UniformOutput',false);
   %find the unique words and assign them numeric ids.
    [uniqueWords, numericIDs, wordOrder] = unique(text);
   %make sure the variables hold what we think they do.
    assert(isequal(text,uniqueWords(wordOrder),text(numericIDs(wordOrder))));
  %count how often each word occurs.
    counts = histc(wordOrder,1:numel(uniqueWords));
   %sort from most frequently occurring to least
    [frequency,order] = sort(counts,'descend');
   %list the words from most frequently occurring to least.
    sortedWords = uniqueWords(order);
  %create a cell array of the frequencies
    freqcell = num2cell(frequency);
  %create a structure from the words so that we can easily search for
    %the frequency of particular words. This is basically a hashmap.
    map = cell2struct(freqcell,sortedWords);
  %check a few words to make sure we didn't make a mistake
    testWord = {'origin','of','the','species','natural','selection'};
    test = true;
    for i=1:numel(testWord)
        test = test && sum(strcmp(testWord{i},text)) == map.(testWord{i});
    end
    assert(test);
  %create a formatted string array of the frequencies as percentages
    freqstring = num2str(100*frequency/sum(frequency),'%2.2f');
 %add percentage signs and convert to a cell array
    freqstring = cellstr([freqstring,repmat('%',size(frequency,1),1)]);
  %display the top 15 words with their percent frequencies.
    display([sortedWords(1:15),freqstring(1:15,:)]);
    toc
  'the'        '6.96%'
    'of'         '5.02%'
    'and'        '2.81%'
    'in'         '2.59%'
    'to'         '2.30%'
    'a'          '1.61%'
    'that'       '1.31%'
    'as'         '1.07%'
    'have'       '1.01%'
    'be'         '1.01%'
    'is'         '1.00%'
    'on'         '0.94%'
    'species'    '0.90%'
    'by'         '0.89%'
    'which'      '0.86%'
Elapsed time is 11.917106 seconds.
end
 
clear all;
VN:F [1.8.8_1072]
Rating: 10.0/10 (3 votes cast)
VN:F [1.8.8_1072]
Rating: +1 (from 1 vote)

Popularity: 1% [?]

Share and Enjoy:
  • Print
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Blogplay
  • Live
  • PDF
  • Technorati
  • Twitter
  • Yahoo! Bookmarks
  • Add to favorites
  • email
  • MySpace
  • RSS

Flow of Control and Vectorization in Matlab

June 15, 2010 by Admin · Leave a Comment
Filed under: Code Optimization 
VN:F [1.8.8_1072]
Rating: +1 (from 3 votes)
VN:F [1.8.8_1072]
Rating: 8.2/10 (5 votes cast)

by Matt Dunham

Matlab’s constructs for controlling the conditional execution of commands are very similar to those found in most popular programming languages. One major difference, however, is that variables created within the if, for, while, switch, and try statements, are not locally scoped but instead share their scope with all variables in the same function. This is quite unlike java, for example, where a variable created inside a loop can only be used inside that loop.

Contents

Useful Functions

zeros, continue, break, return waitbar, inputdlg, listdlg warning, error, rethrow tic, toc bsxfun, repmat, cellfun, cell2mat, mat2cell, arrayfun, vectorize, meshgrid, blkdiag, cumsum, cumprod, filter, conv, accumarray

if, else, elseif

Matlab if statements allow you to execute different code depending on the current state of the program, i.e. the values of certain variables.

test1 = true;
test2 = false;
test3 = false;
if(test1), A = 1; end       % simple if statement on one line. A=1 executed if test1 is true
if(test1)
    A = 2;                  % executed if test1 = true
else
    A = 3;                  % executed if test1 = false
end
if(test1)
    A = 3;                  % executed if test1 = true
elseif(test2)
    A = 4;                  % executed if test1 = false and test2 = true
elseif(test3)
    A = 5;                  % executed if test1 = false, test2 = false, test3 = true
else
    A = 6;                  % executed if test1=test2=test3=false
end

All if statements must end with an end statement.

for loops

For loops allow you to execute a block of code a specified number of times. That number can be determined dynamically as the program runs.

n = ceil(100*rand);                 % can be set dynamically
A = zeros(n,1);                     % improve speed by preallocating space
for i=1:n                           % set i = 1, then loop and increment i by 1, until i = n
    A(i,1) = max(i,50);             % execute code within the loop - usually depends on i.
end                                 % both i and A can then be accessed outside the loop.

We can nest for loops and include if statements. Note, we do not have to start looping from 1. We feature the continue, break, and return commands. Continue instructs Matlab to skip directly to the beginning of the current loop without first executing the lines directly below it. Break, on the other hand, breaks from the current loop completely. Return breaks completely from the current script or function without executing any further code.

A = rand(20,20,20);
counter = 0;
for i=1:size(A,1)
    for j=2:size(A,2)
        for k=3:size(A,3)
            %if k is even, go immediately to beginning of loop
            if(mod(k,2) == 0),          continue; end
            %if j+k is prime, break from this inner loop completely
            if(isprime(j+k)),           break   ; end
            %if all three of i,j,k prime, stop all further execution.
            if(all(isprime([i,j,k])) && false),  return  ; end
            if(isprime(floor(100*A(i,j,k))))
                counter = counter + 1;
            end
        end
    end
end

The continue, break, and return statements should be used sparingly as they can easily obscure the code and can almost always be replaced by if,else,elseif statements.

while loops

While loops are used to execute a block of code until some condition is satisfied. This condition is usually more complicated than simply reaching a set number of iterations as with a for loop. The comments on scope, and the continue, break and return statements apply equally to while loops.

A = true; B = true; C = true;
val = 1;
while(A || B || C)
    val = 2*val +1;
    A = isprime(val);
    B = val < 10;
    C = ((round(sqrt(val)))^2) == val;
end

Here is common code idiom involving break. This effectively allows us to test at the end of the loop, (or in multiple spots).

while(true)
    %execute code
    if(condition)
        break;
    end
end

progress and message bars

Newer versions of Matlab have nice, easy to use, graphical message and progress windows. Search for Predefined Dialog Boxes in help for a full list of available dialogs. We give examples of three here, waitbar(), inputdlg()_, and listdlg().

Keep the user informed as to the progress of a loop or lengthy calculation.

w = waitbar(0,'My Progress Bar');                       % create a new waitbar, w with 0% progress
for i=1:500
   isprime(i);
   w = waitbar(i/500,w,['iteration: ',num2str(i)]);     % update the wait bar each iteration
end
close(w);                                               % remember to close it

Ask the user for input with a graphical window

answer = inputdlg('What would you like for dinner?');   % ask the user for input

Give the user a list of options. The index of the selected one is returned.

options = {'Chicken','Beef','Fish','Pasta'};
message = 'Here are your options:';
[selection, ok] = listdlg('PromptString',message,'ListString',options);

switch statements

Switch statements are useful when what code to execute depends on a variable that takes on a countable number of values. Most commonly, this value is an integer or a string. Switch statements can be replaced by a long series of if-else statements but this usually results in less readable code. Note that unlike languages such as C or java, switch statements do not fall through; that is, the code from, (at most), one case statement is executed. As such, break statements are not necessary.

color = 'blue';
switch color                    % switching variable
    case 'red'
        A = 1;                  % code for case 'red'
    case 'blue'
        A = 2;
    case {'green','purple'}     % either 'green' or 'purple'
        A = 3;
    otherwise                   % optional 'catch all'
        A = 4;
end

try/catch statements

Try catch blocks give you some control over Matlab error handling. They are useful for executing code that might potentially fail, such as writing to a file, allowing you to perform cleanup or recover gracefully. Overuse, however, can easily obscure your program and does not necessarily make it easier to debug.

a = rand;
b = a*(a< 0.5);
try
    c = a / b;
    assert(true);                       % set to false to have code throw an error
catch ME                                % disaster recovery, cleanup, inform user, etc...
    display('Something went wrong');
    warning('WARNING:ID','my own warning message');
    display(ME.message);                % ME is a structure with info on the error
    %error('my own error message');     % stops execution
    %rethrow(ME);                       % rethrows the original error and stops execution
end

preallocation

Matlab stores matrices in contiguous blocks of memory. When the size of a matrix changes, Matlab, if it has not preallocated enough space, must find a new chunk of memory large enough and copy the matrix over. When a matrix grows inside of a loop, this process may have to be repeated over and over again causing huge delays. It can therefore significantly speed up your code by preallocating a chunk of memory before entering into a loop. The zeros() command is the most common way to do this. Below we see two simple loops in which we store the numbers 1 to 30 000. We preallocate only in the second. Timing the two loops with the tic() and toc() commands we see that preallocating in this case speeds up the code by about 30 times. The larger the matrices, the more important this becomes.

tic
for i = 1:30000
    A(i) = i;
end
without = toc
without =
    1.3971
tic
B = zeros(30000,1);      % Preallocate B with the zeros command.
for i = 1:30000
    B(i) = i;
end
with = toc
ratio = without / with
with =
    0.0471
ratio =
   29.6756

vectorization

Matlab is an interpreted language, which means that each line of code must be reduced to machine instructions as the program runs, whereas with compiled code, this is done before execution. Moreover, compiled code can often be optimized automatically by the compiler in ways interpreted code cannot. As such, interpreted code is generally slower than an equivalent compiled version.

Many of Matlab’s built in functions, however, particularly those involving matrix operations, are highly optimized and compiled in a low level language like C or Fortran resulting in very fast code. But where Matlab is at a disadvantage, is with regard to loops. Although recent versions have seen a considerable increase in speed, loops are still a major bottleneck.

Thankfully, we can frequently replace loops with matrix operations or calls to fast, built in functions – a process called vectorization. Learning how to do this well is an extremely important skill.

We will see a number of examples comparing vectorized with non-vectorized code. Many of these might seem obvious to seasoned Matlab programmers but we show to them anyway to emphasize two points: the speed increase, and the relative terseness of the vectorized versions. Unfortunately publishing the document skews the timing results but try the code out for yourself. For some of the examples, we see a 30 fold increase in speed.

vectorization examples

A = rand(200,200);                    % We will use this as our data

Most functions in Matlab are already vectorized, so that to take the log of every number in an array A, for instance, we simply execute B = log(A).

non-vectorized version

tic                                   % time the code
Bnv = zeros(size(A));                 % We preallocate to level the playing field
for i=1:size(A,1)
    for j=1:size(A,2);
        Bnv(i,j) = log(A(i,j));
    end
end
nonvec = toc;

vectorized version

tic
Bv = log(A);
vec = toc;
assert(isequal(Bnv,Bv));
ratio = nonvec / vec;

Matlab supports parallel indexing and assignment so that we can retrieve and assign multiple values at once.

A1 = A; A2 = A;
rsndx = 1:100;   csndx = 80:130;
rtndx = 101:200; ctndx = 150:200;

non-vectorized version

tic
for i=1:numel(rsndx)
    for j=1:numel(csndx)
        A1(rsndx(i),csndx(j)) = A1(rtndx(i),ctndx(j));
    end
end
nonvec = toc;

vectorized version

tic
A2(rsndx,csndx) = A2(rtndx,ctndx);
vec = toc;
ratio = nonvec / vec;
assert(isequal(A1,A2));

Here we see the benefit of logical indexing.

non-vectorized version

tic
B1 = [];                                % note, it is difficult to preallocate here
counter = 1;
 for j=1:size(A,2)
     for i=1:size(A,1)
        if(A(i,j) < 0.2)
            B1(counter,1) = A(i,j);
            counter = counter + 1;
        end
    end
end
nonvec = toc;

vectorized version

tic
B2 = A(A < 0.2);
vec = toc;
ratio = nonvec / vec;
assert(isequal(B1,B2));

Here we perform three tricks at once as it were. Recall that operators such as ^, \, have element-wise equivalents, (e.g. .^), which we can apply to the corresponding elements of two same-sized matrices. Secondly, Matlab performs automatic scalar expansion in expressions like A+1, and thirdly, we can easily multiply two matrices together without loops. Most loops involving patterned additions and multiplications of vector elements can be translated, with a little thought, into equivalent vectorized statements.

non-vectorized version

tic
B1 = zeros(size(A));
for i=1:size(A,1)
    for j=1:size(A,2)
       T = 0;
       for k=1:size(A,1)
           T = T + A(i,k)*A(j,k);
       end
       B1(i,j) = T * (A(i,j)/2) + 1;
    end
end
nonvec = toc;

vectorized version

tic
B2 = ((A*A') .* (A/2)) + 1;
vec = toc;
test = mean(abs(B1(:) - B2(:))); % very small differences between B1, & B2 because of numerical error
ratio = nonvec / vec;

Recall from the chapter on matrices that we can use repmat() or bsxfun() to perform element-wise operations on non-scalar matrices of different sizes as long as a singleton dimension can be extended to make them the same size. Here we subtract off the mean of the third dimension and leave our ‘non-vectorized’ version at least somewhat vectorized to emphasize the role of bsxfun().

A3d = rand(100,100,100);
A1 = A3d; A2 = A3d; A3 = A3d;

non-vectorized version

tic
m = mean(A1,3);
for i=1:size(A1,3)
   A1(:,:,i) = A1(:,:,i) - m;
end
nonvec = toc;

vectorized version

tic
A2 = bsxfun(@minus,A2,mean(A2,3));
vec = toc;

We could have also used repmat() as follows, but this requires more memory and is slightly slower.

tic
A3 = A3 - repmat(mean(A3,3),[1,1,size(A3,3)]);
rep = toc;
assert(isequal(A1,A2,A3));

Here is a real example involving the last two techniques. We are calculating a scalar value lambda given X, y, & W, with the following dimensions.

  • X is n-by-d
  • y is n-by-1
  • W is d-by-1

With the following formula, we can calculate a single lambda value per W vector.

lambda = 2*max(abs(X'*(y-X*W)),[],1);

However, we would like to calculate multiple lambda values given multiple W vectors. We could use a loop, but then this would not be much of an example. Instead, we stack k of the d-by-1 W vectors, column-wise into a d-by-k matrix before hand. XW is then an n-by-k matrix. We use bsxfun() to effectively expand y, which was an n-by-1 vector, into an n-by-k vector with k identical columns and perform the subtraction. Finally we multiply X’ by the resuling n-by-k matrix yielding a d-by-k matrix, and take the maximum along the first dimension yielding the k lambda values we were after. Note that this new version still works even if k=1, that is, even if we are after only one lambda value.

  lambdaVals = 2*max(abs(X'*(bsxfun(@minus,y,X*W))),[],1);

Vector equations tend to naturally generalize into matrix equations.


Here is one final example.

Suppose we have a large numeric matrix and we want to apply a function to arbitrary sized blocks of it. That is, we want to partition a matrix of size m-by-n into many smaller matrices of differing sizes, apply a function to each block, and group the results back together. We could extract each block first with a long series of indexing operations and then loop over them all applying the function, but there is better way involving the mat2cell() and cellfun() functions.

We have not discussed cells yet and you can skip this example until after you read about them in the next chapter if you like. However, for now you could just use this example as a template.

A = rand(100,40);                               % here is our data

Partition the matrix into 12 blocks of different sizes. These blocks are stored in a 4×3 cell array. Notice the sizes of each of the 12 blocks and how we achieved these sizes with the inputs to mat2cell().

groups = mat2cell(A,[10,30,20,40],[5,27,8])
groups =
    [10x5 double]    [10x27 double]    [10x8 double]
    [30x5 double]    [30x27 double]    [30x8 double]
    [20x5 double]    [20x27 double]    [20x8 double]
    [40x5 double]    [40x27 double]    [40x8 double]

Create a function to apply to each block; we will choose something simple like replacing each element in a block with the block’s largest value.

f = @(x)repmat(max(x(:)),size(x));

Use the cellfun() function to apply this function to every one of the 12 elements in groups, (i.e. to every matrix block). We have to set ‘UniformOutput’ to false because the sizes of the elements returned by cellfun() will be different.

groupSums = cellfun(f,groups,'UniformOutput',false)
groupSums =
    [10x5 double]    [10x27 double]    [10x8 double]
    [30x5 double]    [30x27 double]    [30x8 double]
    [20x5 double]    [20x27 double]    [20x8 double]
    [40x5 double]    [40x27 double]    [40x8 double]

We then convert back to a numeric matrix with the same size as our original matrix A.

B = cell2mat(groupSums);

more tips

In the last example, we used cellfun() function but there is a similar function arrayfun() that applies a function to every element of an array. When other vectorization techniques fail, this can be a better alternative than looping over every element yourself.

Some functions, like mvnpdf() for example, interpret an n-by-d matrix, not as n-times-d elements but as n, d-dimensional vectors. If this is not what we are after, we can convert the matrix into a vector using the (:) operator, pass it to the function, and reshape the output back into the original size with the reshape() function.

The vectorize() function takes in a string or function handle and converts all operators, (e.g ^) to their element-wise equivalents, (e.g. .^). This can be useful when using someone else’s function that was not vectorized to begin with.

Recall from the matrix chapter that there are many functions that will create matrices for such as meshgrid(), or blkdiag(), yet again helping us avoid loops.

When a value vec(i) depends on on entries v(1)…v(i-1) for instance, we can use functions like cumsum(), cumprod(), filter(), conv(), and accumarray(). See their help entries for more information.

Finally, if a loop is essential and proves to be a significant bottleneck in your program, consider compiling it via emlmex, (if possible). Alternatively, write the loop in another language like C or Java and call that code from within your Matlab program. For details, see the chapter on calling external code.

Calling External Code

clear all;
VN:F [1.8.8_1072]
Rating: 8.2/10 (5 votes cast)
VN:F [1.8.8_1072]
Rating: +1 (from 3 votes)

Popularity: 1% [?]

Share and Enjoy:
  • Print
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Blogplay
  • Live
  • PDF
  • Technorati
  • Twitter
  • Yahoo! Bookmarks
  • Add to favorites
  • email
  • MySpace
  • RSS

Next Page »