Learning to Write CodeXM Checkers
Copyright © 2020 Synopsys, Inc. All rights reserved worldwide.
The name CodeXM is short for Code eXaMination.
CodeXM is a specialized language used to write customized checkers that run using the Coverity engine. It allows you to concisely define problematic patterns that you want to find in your source code.
CodeXM checkers are written in the CodeXM (CXM) language, which is easy to learn and allows you to write meaningful checkers quickly. The CodeXM checker definition involves three elements:
CodeXM checkers are seamlessly integrated with the typical Coverity workflow.
After writing your checker definitions in a text file, you simply pass the name of that file to the cov-analyze
command.
The checkers defined in that file are executed along with Coverity’s built-in checkers, and results are reported in the same way.
The first section of this guide, “QuickStart Tutorial”, demonstrates how to define a CodeXM checker and then run it.
Describing CodeXM is a bit different from describing other languages. Most programming books talk about a single language only. To describe CodeXM, we have to talk about CodeXM code itself, and the code of the language it is inspecting—the target language.
This guide shows all code (except pseudocode) in monospace
type.
CodeXM code appears in shades of green.
Target code, whatever the language. appears in shades of orange.
Reminder: By default, many browsers do not print out background colors.
This section walks you through the steps of using CodeXM to create a checker.
Let’s jump right in. Specifically, let’s look at creating a checker that detects goto statements.
The first thing to do is to give your checker an appropriate name.
At its simplest, a CodeXM checker has the parts shown in the code sample that follows. Open a text editor or development environment of your choice (not a word processor, though), and then type the following code, or copy it and paste it:
Note: In these first samples, we use the CodeXM /* comment */ notation to represent parts of the checker you will later replace with more detailed code. (CodeXM comments have two forms. You can use the /* slash-asterisk-multi-line comments */ form. You can also use the single-line form: // slash-slash single-line comment.)
Recommendation: In most cases, CodeXM disregards white space such as indentation or line breaks. The “Style Guide” section at the end of this guide describes a consistent style for CodeXM source that we find useful and readable. The code samples in this guide follow that style. For guidelines to white space, in particular, see the entries Braces and Spacing and Line Length.
We have begin to define a checker named NO_GOTO (as declared in the checker’s name field). Next, we put the checker implementation itself (such as code patterns you’re looking for and how to report defects) into the reports statement.
The next thing to do is to specify where to look and what to look for.
Since we want the checker to look for goto statements in each function of C/C++ source code, the logic to find what we’re looking for would look like this:
Here we have added a simple expression that looks at all the code in our project (that’s what the for loop does), and uses the gotoStatement pattern to match all instances of goto in the source code under examination.
The include statement above the checker includes the CodeXM C/C++ Standard Library; this provides the patterns needed for CodeXM to understand C or C++, and ensures that only C or C++ source code is analyzed by this checker. (And yes, those are back-ticks being used to surround the language name.)
Now that we’ve defined the pattern that we’re looking for, we want to report findings as events, or issues.
For a simple one-event defect like this goto statement, we just need to provide the description (what the event is supposed to say) and the location (where the event should be placed relative to your source code.)
What follows is a completed checker for finding a goto statement:
Note: In this checker, our loop variable c matches a gotoStatement pattern that is specified in the C/C++ standard library. This library provides patterns that are appropriate for C/C++ source code, and that limits this checker to analyzing C/C++ source code only. The full set of available patterns can be found in the CodeXM C/C++ Library Reference.
When it runs, our checker displays the message Use of goto is not allowed
wherever it encounters a goto
in the source code being scanned.
The following section shows how to run a scan.
Let’s run the checker that we’ve just created.
Your code base might not actually have any goto statements—and we’d rather not have you put one in if that’s the case—but we would like you to see some results, so here is a simple C program with a goto in it, just for this purpose:
Save this as mygoto.c
, and run one of the variants of cov-build
as shown below, depending on which compiler you have at hand: gcc
or the Microsoft® C/C++ compiler cl
. This builds your intermediate directory—we’ve named it mycxm
—which you can reference when you run your CodeXM checker.
... or:
Note: If you are using a different compiler—and assuming that it is available on your path—simply substitute
that compiler’s command name for gcc
or cl
, above.
Now that we have an intermediate directory, we can get back to running our CodeXM checker, as follows.
To apply your checker to the little sample we gave you, execute the following command line:
Once the analysis is complete, its output should appear something like this:
Typically you would use cov-commit-defects --dir intermediate-dir
to commit your results to Coverity Connect.
For the purposes of this exercise (and when rapidly developing a CodeXM checker), you can also quickly inspect the results by invoking
cov-format-errors
, as follows:
When this finishes, you should find a folder, errs
, that contains some HTML files, notably
index.html
.
To view the defects found by the analysis, use a browser to open index.html
.
This concludes our QuickStart. To learn more, you can read further in this manual, or you can consult some of the other documents provided with CodeXM.
With the CodeXM extension for Microsoft® Visual Studio® Code, you can interactively write, test, and debug CodeXM checkers. (You can also test source code in the VS Code IDE.) The CodeXM extension is available from Synopsys, and can be found in the Marketplace site for Visual Studio Code.
This QuickStart Tutorial only scrapes the surface, getting you up and running with just a hint of what CodeXM is capable of doing. For a fuller understanding of CodeXM, we encourage you to also examine the following documents:
Describes the CodeXM syntax, including all the directives, expressions, and built-in types.
... These manuals describe the patterns, functions, and utilities available for analyzing code in each of the languages that CodeXM supports.
Describes functions that are common to all the CodeXM language libraries.
These sections elaborate on the QuickStart by describing further ways to use CodeXM, and further features of the CodeXM language.
Let’s take a moment to assimilate what we’ve learned. (If you haven’t gone through the “QuickStart Tutorial”, please do so now.)
CodeXM checkers examine your project’s source code by looking for specific kinds of code constructs. These can be fairly simple (like specific statement types) or they can be much more complex. More sophisticated checkers (which we’ll take a look at later on), look for more complex patterns. The main idea is that CodeXM checkers look for patterns.
Now that we know how to write a basic checker that finds goto statements (using the gotoStatement pattern we invoked in the QuickStart), try modifying your NO_GOTO checker to look for an ifStatement pattern, forLoop pattern, and returnStatement. Since any of these are likely to occur in your own code, you can just run the analysis against your code base.
You can also modify the NO_GOTO source file to include the new checkers.
Be sure to re-run cov-build
so analysis will detect the changes you made.
To illustrate, in the next section we write a simple checker, named VAR_DECL, to detect all declarations of variables.
We’ve now reviewed the basics of creating a CodeXM checker, so let’s go ahead and extend them.
CodeXM comes with some predefined patterns, but you can define your own as well. This allows you to match more complex code. In NO_GOTO, when code matches a goto statement in the source code, it generates an event.
To illustrate, let’s write a simple checker called VAR_DECL to detect all declarations of variables.
We can use the basic structure of a CodeXM checker and start off with a stock checker definition (like we did in the NO_GOTO example):
This time, instead of matching goto statements, we want to match variable declarations. To do this, we use the aptly named variableDeclaration pattern in the following way:
Note: Just like the gotoStatement pattern, the variableDeclaration pattern is provided by the CodeXM C/C++ library. The use of this library is specified by the include directive shown at the top of the code sample.
This checker is complete. You can run it by following the same steps you followed for your NO_GOTO checker, as previously described in Running Your CodeXM Checker.
Going to the next level of complexity, let’s say we want to determine whether the variable is initialized or not. To check for this, we can make a new checker by enhancing our current VAR_DECL checker.
In CodeXM, when a pattern matches some code, the pattern doesn’t just return true. More usefully, we get access to the matched code, along with properties about that code. By adding as <name> to the end of a matches operator, we tell CodeXM to assign a name to the result, so that we can refer to the result later on.
Our work-in-progress checker follows:
In the example above, the variable code can refer to any sort of code: if statements, assignment expressions, function calls, and yes, variable declarations. But decl will only ever be a variable declaration because it is the result of a match for that particular C or C++ pattern. So we can use decl to access properties of a variable declaration in order to check for further conditions.
Now that we know we have a variable declaration, we want to decide if this is the kind of declaration we’re interested in: that is, a declaration without an initialization. The general pattern for narrowing down a search in this way looks like the following code fragment:
Seeing the general form above, we know we want to only consider variable declarations that are uninitialized. It turns out that variableDeclaration has a property, initializer, whose value is null when a variable isn’t initialized at declaration time. Pattern properties, such as fields in a record or attributes in an object, are specified by a dot notation. So to express that a variable does not have an initializer, we use the following code:
Combining all of the above, the code for the UNINIT_VAR checker should now look like the following:
In the previous sections, we learned how to refine our pattern-match search by using the logical && conjunction. Now let’s see how we can further specify the pattern for properties we’re interested in, by using a more compact notation known as pattern decomposition.
The previous example looked like this:
The following CodeXM construct tests for the same condition:
In other words, pattern decomposition means breaking a test pattern into one or more constituent constraints—such as .initializer == null. The decomposed pattern is enclosed in curly brackets. You can specify more than one constraint within the decomposition. Separate one constraint from the next by using a semicolon ( ; ). When you specify multiple constraints, all the constraints must be true in order for the overall pattern to match.
To illustrate another case of pattern decomposition, let’s now write a checker that detects calls to a problematic function, system().
To find calls to system() we must match functionCall patterns. For each function call, we must then check its calledFunction property, which has a subproperty, .identifier, that contains the declared name of the function being called.
Here’s an example:
As the sample code shows, a pattern decomposition can use properties that are nested. Simply use the . operator to name them in sequence—rather like the way you specify a path to a particular file.
A tip about nested properties: Nested properties can be tremendously useful, so be sure to check for them, especially when you use a pattern you haven’t used before. For example, if you look at the description of functionCall in the C/C++ Library Reference, you see a table of that pattern’s properties. Always check the property types. In the example we have just shown, the .calledFunction property, which we use, has the type functionSymbol. Now, functionSymbol is itself a library pattern: one that has a large number of properties—one of which, .identfier, we take advantage of in our example.
You can use the % operator, a set-filter expression (also known as the which are operator) to filter or simplify the code even further. For this example, we’ll use the % operator to help detect function calls.
Since the where clause happens to limit itself to just expressing a more precise functionCall pattern, we can use the % operator and rewrite our code as follows (we are only interested in those portions of the code that are function calls to system()):
We can now quite economically define the entire checker, using both pattern decomposition and filtering, as follows:
To summarize in plain English what the code does, the checker finds each call to the system() function, and reports each find. The report describes the issue and shows where the call is located in the source code.
Note: The construct .formattedAsCode, which we have not shown before, is a CodeXM property of any string value, either constant or variable. It simply formats the string for display in Coverity Analysis output.
You have now used patterns from the C/C++ library, such as functionCall and gotoStatement. In CodeXM, you can also define your own patterns. This can be useful if you find yourself matching the same thing many times, or if you just want to assign a descriptive name to what you are trying to match: It is important to write readable code.
Let’s assume that you want to find any use of the comma operator in your source code.
Since the comma operator is a binary operator, we can find it by using the C/C++ library pattern binaryOperator.
This fragment shows the structure for using binaryOperator:
// ...
But there are many kinds of binary operators. We need to be more specific. The binaryOperator pattern has a property, .operator, that tells us which kind of operator it has matched. So the following code fragment shows how to detect instances of using the comma operator:
Note: The various C-language binary operators are represented by literal values—they are members of an enum—so in CodeXM code you need to surround them with backticks. For example, the addition operator is indicated by using `+` and the multiplication operator is indicated by using `*`.
Now, let’s suppose we are writing a complex checker that requires finding the C-language comma operator several times. In order to prevent repeating the previous code over and over again, we can declare a reusable custom pattern. We can call the new pattern, theCommaOperator.
Here is a piece of code that does just that:
(Always end a pattern declaration with a semicolon, as you do for any other kind of declaration.)
That’s it! This new pattern finds any use of the comma operator.
Remember: Not all commas are instances of the comma operator: a comma can also be a separator in a list, a separator between parameters in a function declaration or a call to that function, and so forth. The comma operator is defined as evaluating its left-hand operand, throwing it away, then evaluating its right-hand operand, whose result then becomes the result of the whole expression.
Tip: To use a custom pattern as often as you’d like, put the pattern declaration above the checker keyword. That is, make it global to the individual checkers in your code.
Now that you have defined theCommaOperator, you can use it anywhere a library pattern would be valid; for example, as an argument to the matches operator or to the % operator. The following code fragments show this usage:
A custom CodeXM pattern can also return a new object. The object can have a different set of properties than the patterns for which the custom pattern searches. This is useful when you want to match multiple related code constructs and then return information in a uniform way.
The following sample code finds loops of various types:
The pattern above matches four different kinds of loops: while, do-while, and the two kinds of for loop. The C# or Java library patterns that match these (forLoopRange, forLoopSimple, whileLoop, and doWhileLoop) each has its own particular set of properties. The loop variable returned by our new allLoops pattern, on the other hand, uses three properties to summarize those different loop types. (The .after property applies only to forLoopSimple: It is the statement that is executed at the end of every iteration.)
As an example of using the custom allLoops pattern, suppose we wanted to write a CodeXM checker that inspected the body of each loop in the target code, without caring what kind of loop contained that body. The following code sample shows how you could set this up:
The new pattern provides a generic way to locate loop bodies, greatly simplifying our search condition.
Note: Some language libraries have a pattern to match all loop types. The C/C++ library and the Python library do not.
With the functionCall pattern, available for all supported target languages, you can detect a literal value being passed to a function.
The following example simply checks for a mandatory value. The functionCall pattern’s argumentList property lets your checker inspect what is being passed to a speficied function.
In the previous pattern for loops (Returning Objects from Patterns), you might have noticed that the after property and condition property can be null, depending on the type of loop that is matched. In fact, we say that these properties are nullable.
In CodeXM documentation, a nullable value is indicated by the potential type followed by a question mark; for example, string?.
Accessing a null value is not a fatal run-time error in CodeXM, as it is in many other languages. A nullable property is one that might not return an actual value: This does require additional CodeXM code to account for that possibility.
You can handle a nullable type in one of the following ways:
The following code sample uses the method of matching NonNull:
The following code sample uses the converse method of checking that a property value is null:
The null-coalescing operator simply specifies a default value to use when a property is null. The following code fragment shows how to use it in code:
... After this expression, the CodeXM code can now access myNullableString just as it accesses any non-nullable property. If the property has a value, that is what CodeXM inspects. If the property is null, CodeXM uses the value "default value" instead.
Note: Some language libraries have a pattern to match all loop types. The C/C++ library and the Python library do not.
Imagine that you want to write a checker to find all switch statements that don’t have a default clause.
The following code outline is an initial pass for such a checker:
Thus far, the code shown above finds all instances of switch statements. We need to narrow the search. The variable sw is a switchStatement object, so it has a property called caseList, which contains all the switch statements, including the default case.
We might be inclined to search sw.caselist for defaultStatement—but this just returns the statement itself, when what we want to know is whether such a statement is present. The exists expression comes to our aid.
A CodeXM exists expression looks like a for-in loop (to help illustrate what it does). It differs from the for loop in that it returns true if there is at least one object that satisfies the condition. It returns false otherwise.
The following code fragment shows how an exists expression is constructed:
Actually, we want to also use the logical NOT operator, !, to find the switch statement that does not contain a default clause. So the completed checker looks like the following code:
Note: The space after the NOT operator ! is not strictly required, but we put it there to make the operator stand out. While you are reading code, neglecting to notice a NOT in a condition can lead to great confusion.
Like most common programming languages, CodeXM has the concept of variables.
Unlike most common programming languages, where you can change the value of a variable once it has been declared, when you declare a variable in CodeXM you must assign it a value, and thereafter you cannot assign a new value to it. The value of a CodeXM variable is effectively constant.
In CodeXM, you can define global and local variables, and the two have similar—though not identical—syntax. They are also placed in two different locations within a file.
To define a global variable, you place it outside of a checker. This global variable is accessible (or, in scope) from any part of the file in which the variable is declared. The following code shows the syntax for defining a global variable:
Always close a global variable declaration with a semicolon (;).
The following code fragment is an example of a global declaration as it might appear in CodeXM code:
You define a local variable within the body of a checker or a function. The following code shows the syntax for defining a local variable:
The variable takes on the value computed, just as in the global case, but its value is available (in scope) only within the expression that follows the keyword in.
The following example demonstrates variables local to particular for loops:
Reviewing what the example shows, the first line identifies all switch statements (which we refer to, individually, as sw). The first for loop enumerates all the case statements within a given instance of sw, looking specifically for those that have an integer literal expression (for example, case 42:) and then collecting that set into a variable named sw_cases.
In the lines that follow the third in (the one on its own line), sw_cases is used in the next for loop to inspect each such integer to see if its value is less than zero.
The variable vcs is defined only within the first of the nested for loops, and the variable c is defined only within the second. The variable sw is available to both of the nested for loops, but it is available nowhere outside the enclosing for loop that begins this code fragment.
The types that the CodeXM language supports, such as integers (int) or strings (string), are likely to be familiar to you. Like many programming languages, CodeXM is strongly typed. In many cases, however, when you declare a variable you don’t need to use the name of the type: The type is inferred from the declaration syntax. For example, the following code snippet shows implicit typing:
... In this example, i is clearly an integer whose value is 5, so it has the type int.
You can specify a type explicitly. To do so, use a colon (:), as shown in the following code snippet:
Types are described fully in the Syntax Reference. These are some of the widely used ones:
As many common languages do, CodeXM has a switch expression that compares one value to multiple patterns and returns a value based on the first pattern that matches. The following sample code shows the overall syntax of a switch:
CodeXM only goes through the cases—in the order they are listed—until one of the patterns matches, or the until it reaches the default case. In any event, when a match does occur, the switch returns the corresponding result. Therefore, the order of your patterns within the switch matters. The default case will always match if no case before it has done so. (The default case is optional, but we recommend that you always use it.)
Since you specify a pattern before each arrow, you can also specify an as variable. The variable definition goes after the pattern but before the arrow. This way, the expression to the right of the arrow can use the local variable to refer to the pattern’s result. A variable specified within a case remains in scope until the next case begins (at the vertical bar). Because the variable is local to a single case only, you can use the same variable name for several patterns.
The following sample code shows a completed switch statement that uses local, case-by-case variables:
Note: Strictly speaking, the vertical bar (|) before the first of the cases is optional, but our recommended style is to use it, to maintain readability and visual consistency.
To summarize this example, the switch checks whether code matches one of three statement types: in order, an if, a while, or a for. If code does match one of these, then cond is assigned a one-element list that contains the matching statement’s conditional expression. If code does not match (the default case), then cond is assigned an empty list.
CodeXM also has an operator that is similar to the one used in other programming languages; namely, the ?: ternary or conditional operator. It provides the expected if-then-else functionality.
The following code fragment uses the conditional operator:
In other words, if code matches the whileLoop pattern, the conditional returns the string "is a while loop"; otherwise, it returns "is not a while loop".
The conditional operator can be hard to read. We recommend you use it only for brief, one-line tests such as the previous example. If the test requires more than a single line, and especially if it contains subexpressions, use an if statement.
Here is a sample of using an if statement to perform the same comparison we performed in the previous example:
An if statement is constructed out of the keywords if then elsif else endif, where the elsif clauses (there can be more than one of these) are optional.
The debug() function is useful when you debug your CodeXM checker.
This function prints its parameter (for example, a message string, the value of a variable, or both) to the console while your CodeXM checker executes.
Watch Out: The debug() function does not generate output unless you specify the
--codexm-print-debug
option when you invoke cov-analyze
.
Because all functions return values (and you might not care what debug() returns), when you use the debug() function wrap it inside a let ... in expression.
The following code sample shows how you might call debug(). (Because we don’t care about the value that debug() returns, we use the underscore as a placeholder variable name. We won’t reference it elsewhere.)
The code in this example prints to the console every expression that matches the filter pattern provided.
In addition to custom patterns, CodeXM allows you to define your own functions. This is particularly useful for structuring your CodeXM checker. Functions are even required to perform certain kinds of tests. This section shows an example of that technique.
This kind of search requires recursion, and to search recursively you need a function to call.
Here is an example of the code we want to examine:
We can use the following pattern to check for throw operators that are not inside of a try statement:
... But this example is not complete, because it relies on a call to the function insideTry(). Now we need to define that function as well.
CodeXM functions, like functions elsewhere, take arguments and return the value obtained from evaluating their function body.
To satisfy the needs of the pattern we defined above, the implementation of insideTry() returns a Boolean value that indicates whether some ancestor of the function’s argument is a try statement or not. The code to define the function looks like this:
Examining this example, we see that the function expects an astnode argument and returns a Boolean value. (An astnode is a node in the Abstract Syntax Tree: in effect, any statement or expression.)
First, the function checks whether the node has a parent.
Functions, like patterns, are good for modularizing your code, providing easy-to-maintain reusable elements, or otherwise just giving descriptive names to complicated logic. Used together with checkers, they can make CodeXM an effective and powerful language.
With CodeXM, you can include other CodeXM files by using the include directive. When you include another file, the patterns, functions, and other definitions kept in that file become available to the current CodeXM file. This allows you to define a collection of useful tools in one place, and share them among many checker files.
To do this, use the following syntax in your checker file:
You’ll notice that unlike other instances of include mentioned in this document,
the code that follows the include keyword is
a (quote-enclosed) string, not a (backtick-enclosed) enumeration member.
When quotes are used, this tells CodeXM to look for a file on your file system.
In contrast, the enum-based include directives cause predefined libraries installed with cov-analyze
to be included.
Relative paths used with the quoted-string variant are relative to the location of the CodeXM file.
Sometimes it might not be clear what pattern you should use to match a particular bit of your code.
To help you find the pattern (or patterns) that would match a specific part of your code, use cov-manage-emit
to dump the specific part of your code.
(In particular, a function in which you notice the thing you’re looking for.)
To do this, it helps to generate an intermediate directory that contains the desired code function.
You might want to try this on a small test case first, where a small source file contains the function
(or a relevant part of it, anyway) for which you’re trying to find a pattern.
Given that, invoke cov-build
as follows:
This captures the source of test.c
into the mycxm
directory.
Now, we need to see the CodeXM patterns in our test file. We do this by running cov-manage-emit
. The general form is:
So for our example, let’s say you’ve created a little function called example()
(that you put into test.c
, above) containing the code of interest. Use the following command syntax:
After running cov-manage-emit
, the output for any function, class, enum, or global variable is presented in the following form:
By reading this JSON code, you can identify all the patterns present in your function, including the part that you’re interested in matching.
Note: As you can see in the sample output, the JSON pattern descriptions also use the <typeName>? convention to show nullable types.
A function property is, in effect, a sub-checker that provides information about a called function. You can think of it as a pattern on steroids, since it allows the main checker to match characteristics found in other functions that the current function calls. Examples might include Does the function being called contains side effects? Or, Does this function, or any function it calls, call system()? Function properties permit a lightweight form of interprocedural analysis.
Function properties are defined using the key-phrase function property, followed by an identifier, followed by a record definition that has a single property named models, which should be a pattern. The general form appears like this:
The pattern you provide to the models property should have the following form:
That is to say, the pattern should expect to match an astnode (a statement, an expression, and so on) and return a record object with the member field events containing a list of eventRecord items (much like an ordinary checker does). This pattern should match nodes of interest within the called function, and produce one or more events descriptive of what was matched.
The following example shows a function property that determines whether a given called function calls system().
Here we see that the pattern in the function property contains two cases. The first (top) case identifies an event of interest directly: a call of a function named system(). The second (bottom) case applies the function property recursively on call sites within the called function. (If that transitive behavior is not desired, that alternative of the pattern can be omitted.)
The function property behaves like a pattern you match against functionCall results. If the property matches, the called function in question exhibits that characteristic; if the property does not match, neither does the function.
To use our new function property we could write the following, treating the function property as an expression:
To be more complete, here the pattern is used to implement a complete checker. In this case, the function property is treated as a pattern:
In this second example we use the function property like a pattern, making it the right-hand operand of the % (which-are) operator, instead of filtering against functionCall and then matching against the function property. Both methods work.
If you play with using this code, you’ll see that this brief checker finds all instances where a function calls system(), including indirectly so. This is a more general test than the earlier checker that searched for system() calls, but it might generate more issues than you expect.
If FunctionA calls FunctionB, which in turn calls FunctionC, which finally calls system(), you will get exactly what the checker was told to do: an issue report for each of FunctionA, FunctionB, and FunctionC, because each of them ultimately causes a call to system().
This might not be the result you want. More often you will probably want to combine the function property with another condition, specific to the current function, so only when both conditions are met do you actually generate an issue event.
So far we have focused on detecting the presence of single elements of code within your project: a goto statement, for example, or a call to the system() function. Finding such standalone elements is useful, but it can only help you detect a certain class of coding problems. Wouldn’t it be nice if you could also have CodeXM find problems that involve a specific sequence of code elements?
You can, through a mechanism called path-sensitive analysis.
As you are probably aware, the computer executes the instructions in your functions one at a time (at least conceptually; let’s ignore threads and all the pipeline optimizations modern processors do to speed up your code: in spite of all these neat tricks, it’s all meant to give the illusion that instructions are executed in discrete, ordered steps). Any nontrivial function will likely have some control flow: perhaps a condition is tested, and two or possibly more different ways can be taken based on the outcome. Each possible way through a given function—from the entry point to where it returns—is called a path.
CodeXM has the ability to evaluate the feasible paths through a function, looking for specific sequences of code along any one of them. This is what path-sensitive analysis is all about.
Finding sequences of code along paths is not much more work than what we’ve already done. We just need to introduce two syntax elements: the sequence() function and the >=> operator (AKA the happens before operator). These constructs, working together, describe sequences of code elements rather than just individual elements.
Calling things in the wrong order is a common problem when programming. Take, for example, forgetting to seed the pseudorandom number generator before you start to pull numbers out of it.
Here is an example of outright forgetting to call srand():
More troublesome, though—and not as easily detected manually—is the case where srand() is called, but only sometimes:
This second example collapses what might be a complicated bit of control flow into a single if statement, but the point is that while there is a path that does call srand(), there is also a path where srand() is not called. The second of these paths is the one we want to find.
With the problem laid out, let’s see how we attack it. To express it in words: We are looking for a sequence where we have not called srand() before we call rand(). Here is how we can express it in CodeXM:
These are some issues you should be aware of:
The sequence() function just produces a special kind of pattern that understands paths. In fact, we’ll end up putting it into a checker that specifically walks along paths. The checker looks like this:
So the checker itself is pretty straightforward. All we need to do now is to define what we mean by callToSRand and callToRand. Given that CodeXM is a define-before-use language, somewhere above the preceding snippet we need to define the two patterns that match what we are looking for. The first one is the simpler of the two and should be fairly self-evident, but it does introduce one new concept: the path step. Let’s examine it:
This pattern is special in that it specifically matches a path step; that is, a single, discrete bit of code that is part of a larger path object. In order to be interesting to us, the node in question is expected to be a call to the function srand().
Another curiosity is that this pattern returns an empty event list. We can get away with this because the sequence is looking for the absence of a call to srand() (that is, ! callToSRand) so we have nothing meaningful to report if srand() is actually called.
The other pattern we need finds a call to rand(), and report its finding. Structurally this is similar to the previous pattern, but of course we change the name of the function we’re looking for, and in this pattern we do want to report something. To construct the report, we need to save what was matched, in a new variable, and then construct a proper event list that references the variable. Here is the pattern in its entirety:
So what might you expect to find when you run this checker? The simple case we illustrated above is straightforward. The following sample annotates the code with a message generated by the checker UNSEEDED_RAND:
The second example—where some paths seed the generator but others don’t—is more interesting: Not only does the checker tell you that the code forgot to call rand(), it tells you which path that was taken failed to do so.
Here is the second example, with messages from Coverity Analyze added:
So describing sequences of steps in your code is not much more difficult than writing other kinds of checkers. This capability opens the door to solving many problems: For example, you can use path sensitivity to check that resources you opened are always closed, or that you only call certain functions between two “book-end” functions.
Remember: A sequence is limited to finding steps within a single function. Writing an interprocedural checker that combines function properties and sequences is possible, but we’ll leave that as a proverbial exercise for the reader.
A family of Coverity Analysis checkers known as Don’t Call checkers (their names begin with “DC.”) report calls to functions that for one reason or another are considered risks to the security and integrity of a program. You can use CodeXM to add custom checkers of the Don’t Call type.
Because it can execute operating system commands, the system() call in C and C++ is one such function, as we have mentioned before. Those sections were written with teaching the use of CodeXM in mind, and so the examples they use are fragments of code. Here is the complete code for a checker that provides a Don’t Call for system():
Although this is not a lengthy example, it brings together several of the features and techniques that previous sections have described.
Here are some guidelines for CodeXM source code. Both developers at Synopsys and developers outside the company have found them helpful for keeping code consistent and easy to read. Code examples in the CodeXM documents follow these guidelines, too (for the most part).
Alternatives in multiline patterns that use the | separator should include the first, optional |. (The switch expression uses the same convention.)
The following code shows an example:
Try to keep anonymous, or lambda, functions to a single line, as in the following code:
If the lambda function has to use several lines, use the following indentation style:
Use the style known as the One True Brace Style (1TBS or OTBS).
This style is a variation of the Kernighan and Ritchie (K&R) style, the style used in those authors’ book, The C Programming Language.
The salient points of the 1TBS style are:
The following code is an example of using the One True Brace Style:
Caution: Unlike some languages, CodeXM does not support using brackets to enclose a single statement (as opposed to a block) for an if statement, its else clause or elsif clauses, or the contents of a for loop.
Use the ?: operator only for simple conditional expressions that fit onto a single line, as in the following example:
If the conditional expression requires more than one line, or if you think that readability is important, use if then else endif instead, as in the following example:
Place the keywords if, else, and endif at the same indentation level. This applies to elsif clauses as well, as in the following example:
If the logic of the conditions is hard to follow, it might be better to nest if-expressions, as the next example shows:
Avoid using “raw” literals in code. To improve maintainability and readability, use the let operator to create a global value that has a name, as in the following example:
Unless the line becomes too long (>100 characters), put the function’s name, argument list, and optional return type on a single line, and close the line with the -> arrow that introduces the function body.
If the entire function fits on a single line, that is OK. Here is an example:
If the function arguments don’t all fit on one line, give each argument after the first one a line of its own, aligned with the first argument, as the following example shows:
If the function requires more than one line, place the semicolon ( ; ) that closes the function body on a line of its own, at the same indentation level as the keyword function. This is shown in the previous example and in the one that follows.
If a return type fits on the same line as the function keyword, that is OK. Otherwise, put the return type on a line of its own and align the colon ( : ) that introduces it with the name of the function. Here is an example:
Use camelCase for identifiers (as opposed to snake_case).
If a let expression fits on a single line, end the line with the in keyword. Indent the lines that follow by one more level, as in the following sample code:
If the let expression is too long to fit on one line, then use multiple lines and align the in keyword with the opening let.
Here is an example of a multiline let expression:
If a conditional expression is too long to fit on one line, begin a new line with the first subcondition, and indent this line by one level. If there are further subconditions, put each on its own line. Begin each new line with the logical operator (such as && or ||). Indent each subcondition at the same level as the first.
Here is an example of a multiline logical expression:
The indentation of pattern-decomposition code should parallel the level of nesting it is matching in the target code. Here is an example:
The following sample code illustrates all of these conventions: