Coverity Extend Software Development Kit (Coverity Extend SDK) is part of Coverity Analysis.
Copyright © 2020 Synopsys, Inc. All rights reserved worldwide.
Table of Contents
Coverity Extend SDK is a framework for writing program analyzers (that is, checkers) in C++ that support analyses of C/C++, Java, and C# applications. Much of this framework is the same as that used by the checkers in Coverity Analysis. The framework provides the following services:
Basic front-end features: parsing, type checking and elaboration, abstract syntax construction, template instantiation, and linking across translation units.
Facilities to inspect abstract syntax, using pattern matching.
Mechanisms to traverse paths in the abstract syntax in execution order, prune false paths, and merge similar states to ensure termination in loops.
Flexible checker state management for derivation of flow-sensitive properties.
Output routines that work with the false path pruning (FPP) mechanism to ensure that defects are only reported in feasible paths.
What you must write is a description of a state machine, also known as an abstract interpreter. This description specifies how the state transitions occur and which states constitute errors. The Coverity Extend SDK framework then runs this state machine over each function in the code that is undergoing analysis, collects the defect (issue) reports (those produced in error states), and allows you to commit the reports to Coverity Connect, where developers can learn about and triage the issues.
![]() | |
CodeXM is a language specifically designed for writing new checkers.
If you have not already invested in the Extend SDK, we strongly recommend you use CodeXM rather
than the mechanisms described in this manual.
See
Learning to Write CodeXM Checkers
|
The installer puts Coverity Extend SDK resources into the
<install_dir>/sdk
directory. It contains the following
subdirectories:
compiler
— Coverity Extend SDK compiler for checker source
code.
doc
— Coverity Extend SDK documentation in HTML and PDF
formats.
headers
— Coverity Analysis specific C++ header files that are
needed to compile a Coverity Extend SDK checker.
See Section 2.8.1, “Header files”
for a description of each header.
libs
— Binary files that are needed to link to a
Coverity Extend SDK checker.
samples
— Sample checkers that illustrate
different Coverity Extend SDK features.
The Coverity Extend SDK provides a few tools for compiling checkers:
On Unix and Windows: The build-checker
or, on Windows,
build-checker.bat
command. You will use this command to build
the Hello checker sample in Chapter 1.2, Creating your first checker: Hello.
<install_dir>/sdk/build-checker
<checker-name
>
This command compiles a Coverity Extend SDK checker. It looks in the
current directory for
<
or checker_name
>.c<
,
compiles it, and places the executable in the current directory. You
can then run it as you would run checker_name
>.cppcov-analyze
.
(For information about cov-analyze
, see the
Coverity Command Reference.)
The build-checker
command needs to be able to
find the Coverity Extend SDK installation directory. If you copy the command
from <install_dir>/sdk
to another location, you
must set environment variable PREVENT_ROOT to the root directory of
the Coverity Extend SDK installation. For example:
For example, on Unix:
export PREVENT_ROOT=<install_dir>/sdk
For example, on Windows:
set PREVENT_ROOT=<install_dir>/sdk
Makefile
in <install_dir>
/sdk/samples
compiles all the sample checkers located in
<install_dir>/sdk/samples
. You need to run
Makefile
from this directory. For example:
> cd <install_dir>/sdk/samples
> make
![]() | |
When compiling on a Windows system, you must use the |
The console output consists of a number of compile and link command lines. After successful completion, the console will print a message similar to the following:
SUCCESS! Your checker has been compiled to ./whileloopassign
On Unix only: Each sample checker subdirectory (for example,
<install_dir>/sdk/samples/hello
) includes a
Makefile
that is designed to compile just that checker.
See Section 1.2.6, “Creating a makefile for convenience”.
In this section, you will build and run the Hello checker, then commit issues it finds to a stream in Coverity Connect. In addition to using Coverity Connect and the Coverity Extend SDK, you will also need to run Coverity Analysis commands on sample code and use Coverity Analysis to run your Hello checker.
![]() | Requirements |
---|---|
|
In this section, you create a simple Coverity Extend SDK checker
(hello.cpp
) designed to print every abstract syntax tree
that is passed into the
ANALYZE_TREE
function. In subsequent sections, you will compile
and run this checker, then commit the issues that it finds in a code sample to
Coverity Connect.
To create the Hello checker:
Type or copy the following source code into a text editor:
/* (c) 2017, Synopsys, Inc. All rights reserved worldwide. The information contained in this file is the proprietary and confidential information of Synopsys, Inc. and its licensors, and is supplied subject to, and may be used only by Synopsys customers in accordance with the terms and conditions of a previously executed license agreement between Synopsys and that customer. */ // hello.c // trivial Extend checker #include "extend-lang.hpp" // Extend API START_EXTEND_CHECKER( hello, simple ); ANALYZE_TREE() { cout << "ANALYZE_TREE: " << CURRENT_TREE << endl; OUTPUT_ERROR("ANALYZE_TREE: " << CURRENT_TREE); ReturnPat ret; if( MATCH(ret) ) print_tree(CURRENT_TREE); } END_EXTEND_CHECKER(); MAKE_MAIN( hello ) // EOF
The source code and makefile for this checker are located in the
<install_dir>/sdk/samples/hello
directory.
Unlike the print_tree checker (see Section 1.3.2, “Examining nodes in the AST with print_tree”), this
checker restricts the level of information that is returned by using an
if
statement before calling print_tree
.
Save this file as hello.cpp
in a directory that is
outside of the Coverity Extend SDK installation
directory.
For example: <
so that
your checker source file is now called
HELLO
>HELLO/hello.cpp
![]() | |
Saving the file inside of the installation directory can make the upgrade process for Coverity Extend SDK more difficult. |
In this section, you compile the Hello checker that you built in Section 1.2.1, “The Hello checker source code”.
To compile the Hello checker:
Go to your <
directory,
and invoke HELLO
>build-checker
:
> cd <HELLO
>
> <install_dir>/sdk/build-checker hello
Note that the argument is hello
, not
hello.cpp
.
<install_dir
> is the Coverity Analysis root directory.
After printing the compilation and linking command line, this build command prints the following:
SUCCESS! Your checker has been compiled to ./hello
If an error occurs, set your PATH. See
build-checker
in Section 1.1.3, “Compiling Coverity Extend SDK checkers”.
![]() | |
On Unix, you can run the makefile for the sample checker instead of
running On Windows, if you see the following error when compiling your checker,
you must either restart your console as an administrator or make a copy of
the C:/Program Files/Coverity/Coverity Static Analysis/sdk/compiler/bin/ld.exe: cannot open output file hello.exe: Permission denied collect2: ld returned 1 exit status ERROR: Checker "hello" did not successfully compile. |
Locate the following output in your
<
directory:HELLO
>
hello
(on Unix)
hello.exe
(on Windows)
This output is the checker program, which supports a command-line interface
similar to cov-analyze
, except that it can only run one
checker. (For information about cov-analyze
, see the
Coverity 2020.12 Command Reference
.)
In this section, you run the Hello checker that you compiled in Section 1.2.2, “Compiling the Hello checker”. See Requirements before attempting to complete the steps in this section.
To run the Hello checker:
Create some sample input for the checker.
You can save the following code as a file called
HELLO/test1/hello.test.c
:
/* (c) 2017, Synopsys, Inc. All rights reserved worldwide. The information contained in this file is the proprietary and confidential information of Synopsys, Inc. and its licensors, and is supplied subject to, and may be used only by Synopsys customers in accordance with the terms and conditions of a previously executed license agreement between Synopsys and that customer. */ // test1/hello.test.c // test input for 'hello' checker int foo() { int x = 1; x += 5; return x; } // EOF
Use Coverity to configure a compiler.
To configure gcc and g++ compilers with Coverity Analysis:
> cd<install_dir>
/bin
> ./cov-configure --gcc
To configure the Microsoft C/C++ compiler cl.exe
with
Coverity Analysis:
> cd<install_dir>
\bin
> cov-configure --msvc
![]() | |
The remaining steps in this section assume that you are using the
Unix-based gcc compiler. If you are using a different compiler, configure it
instead, and adjust the command lines to use the appropriate command-line
syntax for that compiler and operating system. For guidance with the
configuration of such compilers, see the Coverity Analysis 2020.12 User and Administrator Guide. For more complete information
about compiler configuration, you can also refer to the Coverity Analysis 2020.12 User and Administrator Guide and the
Coverity 2020.12 Command Reference documentation on the |
Use cov-build
to intercept calls to the compiler and save
its abstract syntax in the intermediate directory.
On Unix:
> cd <HELLO
>
> <install_dir>/bin/cov-build --dir int_dir gcc -c test1/hello.test.c
On Windows:
> <install_dir>\bin\cov-build --dir int_dir cl test1\hello.test.c
Upon successful completion, this command prints the following output:
[...] 1 C/C++ compilation units (100%) are ready for analysis The cov-build utility completed successfully.
This command creates a c/emit
subdirectory (an emit
directory) in your intermediate directory that contains the
cov-build
output:
<
.HELLO
>/int_dir/c/emit
For information about the cov-build
command, see the
Coverity 2020.12 Command Reference
.
Use your Hello checker (hello
) to analyze this intermediate
directory:
Copy the hello
checker program into the Coverity Analysis
bin
directory.
For example, on Unix:
> cp hello<install_dir>
/bin
On Windows:
> copy hello.exe <install_dir>
\bin
Run hello
from your
<HELLO
> directory:
> cd <HELLO
> ><install_dir>
/bin
/hello --dir int_dir --force
![]() | |
The options to the |
The output looks something like the following:
Looking for translation units |0----------25-----------50----------75---------100| **************************************************** [STATUS] Computing links for 1 translation unit |0----------25-----------50----------75---------100| **************************************************** [STATUS] Computing virtual overrides |0----------25-----------50----------75---------100| **************************************************** [STATUS] Computing callgraph |0----------25-----------50----------75---------100| **************************************************** [STATUS] Topologically sorting 1 function |0----------25-----------50----------75---------100| **************************************************** [STATUS] Computing node costs |0----------25-----------50----------75---------100| **************************************************** [STATUS] Starting analysis run ANALYZE_TREE: "{...}" ANALYZE_TREE: "int x = 1" ANALYZE_TREE: "x" ANALYZE_TREE: "1" ANALYZE_TREE: "x = 1" ANALYZE_TREE: "x = 1;" ANALYZE_TREE: "int x = 1;" ANALYZE_TREE: "x" ANALYZE_TREE: "x" ANALYZE_TREE: "5" ANALYZE_TREE: "x + 5" ANALYZE_TREE: "x = x + 5" ANALYZE_TREE: "x += 5;" ANALYZE_TREE: "x" ANALYZE_TREE: "return x;" tree = S_return: loc = <file ID 0>:5:3-<file ID 0>:5:11 expr = E_variable: type = int deepID = 2147483646 var = x, type = int, dflags = {} isImplicit = 0 ANALYZE_TREE: "<destruction for x>" |0----------25-----------50----------75---------100| **************************************************** Analysis summary report: ------------------------ Files analyzed : 1 Total LoC input to cov-analyze : 2990 Functions analyzed : 1 Paths analyzed : 1 Time taken by Coverity analysis : 00:00:00 Defect occurrences found : 15 hello
Aside from the usual cov-analyze
text, the output consists of one
line for each call to the ANALYZE_TREE
function, showing the
abstract syntax tree that was passed to it.
The checker also creates
<
,
containing that output in a format that can be committed to Coverity Connect.HELLO
>/int_dir/<programming_language
>/output/hello.errors.xml
You can run a Coverity Extend SDK checker from a directory other than
(as shown in Section 1.2.3, “Running the Hello checker”). You might do so if Coverity is installed to
a read-only directory.<install_dir>
/bin
The Hello checker requires the following options:
The installation directory: Specified by the
--prevent-root
option.
The intermediate directory: Specified by the --dir
option.
For example:
> cd <HELLO
> > <install_dir>/sdk/build-checker hello ><install_dir>
/bin
/cov-build --dir int_dir_2 gcc -c test1/hello.test.c > ./hello --dir int_dir_2 --prevent-root=<install_dir>
Just as you can commit the output of the cov-analyze
command to
Coverity Connect, you can also commit the issues that the Hello checker finds.
To commit issues found by Hello to Coverity Connect:
Prepare Coverity Connect to receive the issues found by the Hello checker:
StartCoverity Connect.
The startup command is located in the Coverity Connect /bin
directory:
> cd <install_dir_cc>
/bin
> ./cov-start-im
Log into Coverity Connect.
Create a project that is configured with a Coverity Analysis stream for the C/C++ programming language.
For example: hello_stream
in the project
extend_examples
![]() | |
If you need help with any of these steps, contact your Coverity Connect administrator. |
Use Coverity to commit (push) the issues to Coverity Connect:
> <install_dir>/bin/cov-commit-defects \ --host <server_hostname
> \ --port <port_number
> \ --stream hello_stream \ --user admin --dir int_dir
This command produces output similar to the following on the console:
Connecting to server sduke-t61p:9090 2012-08-06 21:54:57 UTC - Committing 4 file descriptions... |0----------25-----------50----------75---------100| **************************************************** 2012-08-06 21:54:57 UTC - Committing 4 source files... |0----------25-----------50----------75---------100| **************************************************** 2012-08-06 21:54:56 UTC - Calculating 4 cross-references... |0----------25-----------50----------75---------100| **************************************************** 2012-08-06 21:54:57 UTC - Committing 4 cross-references... |0----------25-----------50----------75---------100| **************************************************** 2012-08-06 21:54:58 UTC - Committing 0 functions... 2012-08-06 21:54:58 UTC - Committing 15 defect occurrences... |0----------25-----------50----------75---------100| **************************************************** 2012-08-06 21:54:59 UTC - Committing 3 output files... |0----------25-----------50----------75---------100| **************************************************** New snapshot ID 10004 added. Elapsed time: 00:00:04
This section describes each line of hello.cpp
.
The following include is for the header file that contains declarations for the classes, functions, and macros that comprise the Coverity Extend SDK API:
#include "extend-lang.hpp" // Coverity Extend SDK API
The headers are explained in detail in Section 2.8.1, “Header files”.
The next line declares that the name of the checker is hello
and
that it is a simple checker:
START_EXTEND_CHECKER( hello, simple );
Here, simple
means that the checker is flow-insensitive: the checker is
stateless, and the Abstract Syntax Tree (AST) nodes are not visited in any particular
order. An AST is a tree-shaped data structure that represents the phrase structure of
the concrete input syntax (for more information, see Chapter 1.3, The Abstract Syntax Tree).
>Note that subsequent sections will introduce you to examples of flow-sensitive
checkers. The Coverity Extend SDK macros such as START_EXTEND_CHECKER
and
ANALYZE_TREE
are explained in Chapter 2.2, Handler functions.
The next line starts the principal function of a checker:
ANALYZE_TREE()
The body of ANALYZE_TREE
is called for every AST in the program
that is undergoing analysis. The ASTs are passed one at a time, and as they arrive, each
one is then the CURRENT_TREE
.
The next line prints the current AST to standard output:
{ cout << "ANALYZE_TREE: " << CURRENT_TREE << endl;
The next line prints the current AST as an issue report:
OUTPUT_ERROR("ANALYZE_TREE: " << CURRENT_TREE);
This report is also stored in an intermediate directory output file,
output/hello.errors.xml
. Note that, as discussed in Chapter 1.5, Output, not every call to OUTPUT_ERROR
results in a user-visible issue report. For example, if the path along which the issue
is found is later determined to be infeasible, the report is suppressed. Consequently,
printing to standard output is useful as a debugging aid, but
OUTPUT_ERROR
should be used for the actual issue
reports.
The Extend SDK defines pattern types which can be used to
determine if the current AST node meets certain criteria. As an example, here we create
a pattern which matches return statements by declaring a variable ret of type
ReturnPat
. The MATCH predicate determines whether the current AST node
being processed by the checker matches the pattern. See Section 1.3.3, “Patterns ”
for more details.
The print_tree
function displays detailed information about the AST node;
for more information, see Section 1.3.2, “Examining nodes in the AST with print_tree”
ReturnPat ret; if( MATCH(ret) ) print_tree(CURRENT_TREE); }
The next line signals the end of the checker:
END_EXTEND_CHECKER();
The final line creates main()
, the entry point to the checker
executable. The name of the checker is passed as an argument.
MAKE_MAIN( hello )
When creating your own checkers, you can adapt the Hello makefile
(<install_dir>/sdk/samples/hello/Makefile
) and the
checker.mk
file that it includes:
checker.mk
and include.mk
.
Makefile
for the Hello sample checker:
# hello/Makefile # Makefile for the 'hello' Extend checker # name of the checker include checker.mk # default target all: $(CHECKER) # rules shared by the example checkers include ../include.mk # EOF
The checker.mk
file for the Hello sample checker:
# hello/checker.mk # set CHECKER to 'hello' # This fragment is separated into its own file so that it can # be used by the Makefile in this directory, as well of those # in the test subdirectories. CHECKER := hello # EOF
There are three main tasks that a checker typically performs:
Inspect the AST to recognize syntax of importance.
Update the store to reflect the effect of that syntax (simple checkers do not do this).
Output errors when appropriate.
An Abstract Syntax Tree (AST) is a tree-shaped data structure that
represents the phrase of the concrete input syntax of the source code. For example, the
input 1+2*3
has a corresponding AST that looks like this:
+ / \ 1 * / \ 2 3
From this AST fragment, you can infer that the multiplication of 2 and 3 happens first
(even though *
occurs textually after the +
), and the result
is added to 1. If the input is instead (1+2)*3
, then the AST is:
* / \ + 3 / \ 1 2
Note that the grouping parentheses has affected the AST by changing the order of operations. The parentheses are not explicitly present in the AST, since the tree structure is sufficient to represent their effect.
In Coverity Analysis, the AST is the output of the parser, and the input to the checker.
AST fragments can be grouped into several categories:
Expressions, such as the examples shown previously.
Statements, such as x = 3;
or while (true) { ...
}
.
Type identifiers, such as int
or class C { ...
}
.
Function definitions as a whole.
Although there are a few other categories, these are the main ones that most checkers use.
You can use the print_tree
function to return detailed
information about each node of the AST, as shown in the print_tree checker next:
// print_tree.c // Coverity Extend SDK checker printing the AST tree at every node #include "extend-lang.hpp" // Coverity Extend SDK API START_EXTEND_CHECKER( hello, simple ); ANALYZE_TREE() { print_tree(CURRENT_TREE); } END_EXTEND_CHECKER(); MAKE_MAIN( hello ) // EOFThe source code and makefile for this checker are located in the
<install_dir>/sdk/samples/print_tree
directory.
Like the hello checker, this checker prints every
abstract syntax tree that is passed into the ANALYZE_TREE
function.
However, the print_tree
checker returns much more detailed information than
the hello
checker. You can use print_tree(CURRENT_TREE)
to
expose detailed information about an AST node. For example, the exact kind and many
other details of each program element are exposed by the element
structure,
which you can see in the output below.
If you run this checker on the sample program at
<install_dir>/sdk/samples/print_tree/test1/print_tree_test.cpp
the output has the following format:
[STATUS] Reading call graph [STATUS] Computing class hierarchy |0----------25-----------50----------75---------100| **************************************************** [STATUS] Computing call graph [STATUS] Starting analysis run (analysis pass) |0----------25-----------50----------75---------100| ***************************************************tree = S_compound: loc = /nfs/foo2/extend/samples/print_tree/test1/print_tree.test.cpp:5 endLoc = /nfs/foo2/extend/samples/print_tree/test1/print_tree.test.cpp:9 stmts = { element = S_decl: loc = /nfs/foo2/extend/samples/print_tree/test1/print_tree.test.cpp:6 decl = Declaration: var = "x" init = IN_expr: expr = E_intLit: cached_hash = 0 type = int i = 1 original_expr = <null Expression> destructionCode = <null Statement> initCode = S_expr: loc = /nfs/foo2/extend/samples/print_tree/test1/print_tree.test.cpp:6 expr = E_assign: cached_hash = 0 type = int target = E_variable: cached_hash = 0 type = int var = "x" op = 21 src = E_intLit: cached_hash = 0 type = int i = 1 original_expr = <null Expression> destroyStmts = { tree = S_destroy } element = S_expr: loc = /nfs/foo2/extend/samples/print_tree/test1/print_tree.test.cpp:7 expr = E_assign: cached_hash = 0 type = int target = E_variable: cached_hash = 0 type = int var = "x" op = 9 src = E_intLit: cached_hash = 0 type = int i = 5 original_expr = <null Expression> element = S_return: loc = /nfs/foo2/extend/samples/print_tree/test1/print_tree.test.cpp:8 expr = E_variable: cached_hash = 0 type = int var = "x" isImplicit = 0 }
The main method that checkers use to inspect the AST is through the use of patterns, which are fragments of syntax with pattern variables (holes) that can match arbitrary subtrees. The basic approach is to define a pattern with named holes, test to see if the pattern matches some input syntax, and then use the named holes to examine the parts of the patterns that match.
The expression patterns are, in most cases, constructed using C++ operator
overloading. For an example, see
<install_dir>/sdk/samples/patterns/patterns.cpp
.
Expr a, b; if (MATCH_TREE( a + b , CURRENT_TREE)) { cout << " matched an addition; a=" << a << ", b=" << b << endl; }
This checker fragment declares two pattern variables (holes)
called a
and b
, with type
Expr
. The Expr
pattern type matches
any expression; there are other pattern types that are more restrictive. For
convenience, there is a predefined pattern variable called "_" (underscore) that
matches anything.
The previous code then constructs a pattern expression:
a + b
The +
here is actually an overloaded operator that constructs a
pattern. The pattern matches input ASTs that use the binary operator
+
.
It then uses MATCH_TREE
to compare the constructed pattern to
CURRENT_TREE
(which is the AST passed to
ANALYZE_TREE
). This returns true when the input matches the
pattern. For example, when given the following line of input syntax:
z = x + y;
the checker fragment prints out:
matched an addition; a="x", b="y"because there is exactly one subexpression in the input file that matches the pattern.
When the line of code:
cout << "matched an addition; a=" << a << ", b=" << b << endl;prints a pattern variable (such as
a
), it prints the AST fragment that
the pattern variable matched. This AST fragment can also be obtained explicitly by
calling the Pattern::get_tree()
method.
Since it is common to use MATCH_TREE
with
CURRENT_TREE
, the MATCH
macro is
available as a shortcut:
MATCH(<pattern>) is equivalent to MATCH_TREE(<pattern>, CURRENT_TREE)
To match a function call expression, create a pattern of type
CallSite
:
CallSite bar("bar"); // match call to bar() if (MATCH( bar )) cout << "call to bar: " << CURRENT_TREE << endl;When the
CallSite
pattern matches, you can inspect its
arguments using the nargs
and get_arg
methods:
for (int i=0; i < bar.nargs(); i++) { cout << " arg " << (i+1) << ": " << bar.get_arg(i) << endl; }To match a call site with specific patterns for the arguments, simply pass the argument patterns to
CallSite::operator()
:
Const_int ci; if (MATCH( bar(ci) )) cout << " single literal integer argument: " << ci.llval() << endl;
The previous fragment utilizes the Const_int
pattern, which
matches an expression that is an integer literal.
See Section 2.3.8, “ Expression patterns” for more information.
There are several pattern types that match uses of specific kinds of variables. Some of the more important are next.
Table 1.3.1. Patterns for variables
Parameter | Usage |
---|---|
LocalVar
| use of local variable |
StaticVar
| use of a variable with static storage duration |
GlobalVar
| use of a variable with global scope |
See Section 2.3.8.3, “Variable reference expression patterns” for more information.
Patterns can be combined using the following general-purpose combinators.
Table 1.3.2. Pattern combinators
And(p1,p2)
| match if p1 and p2 match the given AST |
Or(p1,p2)
| match if p1 or p2 matches the given AST |
Within(p)
| match if some enclosing (parent or ancestor of given) AST matches p |
Contains(p)
| match if a subtree (descendant) of the given AST matches p |
For example, given the declarations:
CallSite bar("bar"); Const_int ct; LocalVar lv; StaticVar sv; GlobalVar gv;the pattern:
bar(Or(ct,lv), Or(sv,gv))matches any call to
bar
with two arguments, where the first
argument is either an integer literal or a local variable, and the second is
either a static variable or a global variable.
The following figure illustrates how Within()
looks for parents/ancestors, and Contains()
looks for
descendants:
Thus, the following statement
MATCH( Within( Contains (pat)))matches the pattern from anywhere within the current function.
See Section 2.3.10.5, “Combinators (And, Or, etc.)” for more information.
In addition to the patterns, there are a variety of functions that perform various queries on the AST. This section covers some of the more common functions. See Chapter 2.4, Accessors for more information.
Several accessors return information about the current function being analyzed.
Table 1.3.3. Accessors for functions
current_function_get_name
| name of current function |
current_function_get_return_type
| return type of current function |
current_file_get_name
| file containing current function |
print_tree
| Returns detailed information on the node, including the type of pattern |
Others return information about a specific AST.
Table 1.3.4. Other accessors
get_type_of_tree
| get expression's type |
get_size_of_type
| representation size of a type |
After inspecting the AST of a particular segment of code to decide what it means, a checker must then respond to this meaning. Typically, this is done via the state machine interaction model.
The simple
checker type (where simple
is the second argument
to START_EXTEND_CHECKER
) is stateless. It is not sensitive to the order in
which it encounters AST fragments. In the program analysis literature this is known as a
flow-insensitive analysis.
In contrast, the int_store
checker type is stateful. It has a
store, which is a map from ASTs to values. This map can be used
to implement an abstract interpreter, a concept explained in the next section. This is
called flow-sensitive analysis and is what makes the Extend SDK so
powerful.
Abstract interpretation is a general framework for doing program analysis. The core of the analysis is an abstract store, which is a map from program variables to abstract values. At one extreme, the abstract values could actually be concrete values, and then we would have a real (concrete) interpreter. However, by abstracting the value space, we enable the analysis of programs with loops or inputs. The choice of abstraction is dictated by the property being checked.
For example, in an analysis that looks for occurrences of a call to
fopen
and then a call to fclose
, you might
use the following rules:
variable maps to 1 means fopen
has been called
but fclose
has not.
variable is unmapped (mapped to nothing) means either
fopen
has not been called, or else
fclose
has subsequently been called.
As another example, you might want to check that a negative value is never cast to
unsigned
, and use a store with these rules:
expression maps to 0 if it is negative.
expression maps to 1 if it is negative or zero.
expression maps to 2 if it is zero.
expression maps to 3 if it is positive or zero.
expression maps to 4 if it is positive.
expression is unmapped if its sign is unknown
Note that these two examples not only use different abstract values (store ranges), but they map from different constructs (variables vs. expressions, the store domains). The choice of store domain is usually determined by concerns such as soundness and completeness: a simple domain such as local variables will tend toward a sound analysis (where a program bug implies a defect report), while a complex domain such as expressions will tend towards a complete analysis (where a bug-free program implies no defect report), though this characterization oversimplifies things.
In practice, it takes some experimentation to select a good store domain and range for your particular properties of interest.
In addition to the abstraction, which was the subject of the previous section, an abstract interpreter also interprets. This means that it simulates execution, modulo the chosen abstraction.
Consequently, like a real interpreter, an abstract interpreter visits the abstract syntax statements and expressions in execution order. For example, in the following fragment:
x = y + z; a = foo(b, c*2);the visit order is:
y z y + z x = y + z x x = y + z; // statement b c 2 c*2 foo(b, c*2) a = foo(b, c*2) a a = foo(b, c*2); // statement
This is called postorder traversal: the children of a given node are visited (recursively) in order, and then the node itself is visited.
Left-hand sides of assignments are evaluated after the assignment, since the left-hand side becomes the value of the entire assignment expression.
The previous visit order can be seen by running the hello
checker that we
created in the previously (with the source file at
<HELLO>/hello.c
) on
.<install_dir>
/sdk
/hello/test2/hello.test.c
When the abstract interpreter reaches choice points (such as an if
statement), it first follows one path, then later backtracks to follow the other. In
this way, all paths in the function are explored. See Chapter 1.7, Paths for
more detail about paths.
The store is a map from expressions to integer values:
store : expression -> integerSome of the common functions that are used to manipulate the map are described next. For a full listing, see Chapter 2.5, The store.
Retrieve the mapping for t
. If it exists,
GET_STATE
returns true and stores the value in
v
. Otherwise, GET_STATE
returns false and
v
is undefined.
First, CLEAR_STATE(dst)
. Then, if src
is mapped, copy
its mapping to dst
.
The previous store functions (except for MATCH_STATE
) are demonstrated in
sign.cpp
, a checker that tracks the sign of expressions. See
or Section 3.1.1, “sign checker”.<install_dir>
/sdk
/samples/sign/sign.cpp
After some preliminary code to define the abstract domain, it uses the store to remember sign information for variables and expressions.
When it sees certain function calls, it prints out some of the information that it is
tracking. Since we haven't yet covered the output routines, this checker just uses
cout
.
The sign.test.c
file (see
) provides some
basic input to the <install_dir>
/sdk
/samples/sign/test1/sign.test.csign.cpp
checker:
// sign.test.c // test input for 'sign' checker void whatis(int); void print_store(); unsigned something(); int foo(int x, int y) { whatis(x); whatis(y); int n_one = -1; int zero = 0; int one = 1; whatis(n_one); whatis(zero); whatis(one); whatis(n_one + zero); whatis(n_one + one); whatis(n_one - one); whatis(one + one); x = 3; y = 3; whatis(x+y); x = -3; y = -3; whatis(x+y); x = 0; x = x+1; whatis(x); unsigned u = something(); whatis(u); x = zero - u; whatis(x); whatis(x + y); print_store(); return 0; } // EOFComplete the same steps that you followed for the hello checker as follows:
build-checker
on sign.cpp
(build-checker sign
).
sign
to the Coverity Analysis bin
directory.
cov-build
on the
sign.test.c
file.
sign
checker on the intermediate
directory.
sign.test.c:10: "x" has unknown value sign.test.c:11: "y" has unknown value sign.test.c:17: "n_one" has value AV_NEGATIVE sign.test.c:18: "zero" has value AV_ZERO sign.test.c:19: "one" has value AV_POSITIVE sign.test.c:21: "(n_one + zero)" has value AV_NEGATIVE sign.test.c:22: "(n_one + one)" has unknown value sign.test.c:23: "(n_one - one)" has value AV_NEGATIVE sign.test.c:24: "(one + one)" has value AV_POSITIVE sign.test.c:28: "(x + y)" has value AV_POSITIVE sign.test.c:32: "(x + y)" has value AV_NEGATIVE sign.test.c:36: "x" has value AV_POSITIVE sign.test.c:39: "u" has value AV_POS_ZERO sign.test.c:42: "x" has value AV_NEG_ZERO sign.test.c:43: "(x + y)" has value AV_NEGATIVE sign.test.c:45: print_store: "y" has value AV_NEGATIVE "x" has value AV_NEG_ZERO "u" has value AV_POS_ZERO "zero" has value AV_ZERO "one" has value AV_POSITIVE "n_one" has value AV_NEGATIVE "-1" has value AV_NEGATIVE "0" has value AV_ZERO "1" has value AV_POSITIVE "(n_one + zero)" has value AV_NEGATIVE "(n_one - one)" has value AV_NEGATIVE "(one + one)" has value AV_POSITIVE "3" has value AV_POSITIVE "(x + y)" has value AV_NEGATIVE "-3" has value AV_NEGATIVE "(x + 1)" has value AV_POSITIVE "(zero - u)" has value AV_NEG_ZERO 17 mappings
Outputting defects.
So far we have just been using cout
to communicate information
from a checker, but the Coverity Extend SDK has a more sophisticated defect reporting mechanism with
several advantages:
The resulting reports are suitable for display in the Coverity Connect, just like other defect reports.
They properly take into account path feasibility, a topic covered in more detail in Section 1.7.2, “False path pruning (FPP)”.
A series of reports can be associated with specific variables or expressions, allowing the checker to communicate a timeline of important events in the diagnosis of the defect. This can greatly improve the comprehensibility of the report for flow-sensitive checkers.
The simplest reporting routine is OUTPUT_ERROR(<message>)
, where
<message>
has an ostream
operator<<
to its left. For example:
OUTPUT_ERROR("Zounds! " << some_expr << " is " << some_value);This method has properties (a) and (b) from previous, but not (c). It is suitable for flow-insensitive checkers.
In Section 1.4.4, “Manipulating the store”, we oversimplified things a bit. The store actually maps an expression to a value and a set of events:
expression -> integer, set of events
An event is created by calling ADD_EVENT(t, tag, text)
:
tree t
— The associated expression tree. Among other
things, this is used to obtain the line number to display the event.
string tag
— Something that identifies the general
kind of event, for example var_assign
. It provides a name for
hyperlinks pointing at the event. Also, source code annotations can be used
to suppress errors with a given event tag. Finally, the event tag affects
CID merging; reports with different sets of event tags are never
merged.
string text
— A string that explains the event, for
example "a is assigned to the value of b"
.
The set of events associated with an expression are (conventionally) a history of what has happened to give the expression the abstract value it currently has. A good rule of thumb is that any time you set or change an abstract value, you should add an event explaining why.
The store operations explained in Section 1.4.4, “Manipulating the store” operate on events as
well as values, transparently. For example, CLEAR_STATE
removes all
events, and COPY_STATE
copies events.
It is not possible to associate an event with an expression (with ADD_EVENT) without
first giving that expression an abstract value (with SET_STATE). In some cases, you
might need to invent a new abstract value (for example, unknown
) so that
you can assign a value in order to attach an event.
Also note that simply adding an event to an expression will not cause it to be output.
Rather, calling COMMIT_ERROR(t, ....)
or
COMMIT_INPUTFILE_ERROR(t, ....)
will output the event that you
add this way.
You use COMMIT_ERROR(t, tag, text)
to output a defect report that
contains all the events in the store that were previously associated with t
and one final, main event given by tag
and text
:
tree t
: The expression tree with which events were previously
associated through the use of ADD_EVENT(t,...)
or
ADD_INPUTFILE_EVENT(t,...)
. It is an error if
t
does not have any associated events (in the current
implementation, COMMIT_ERROR
does nothing in this case).
tag/text
: These are the same as described in
ADD_EVENT
and are used to create one final event for the
defect report (but this final event is not added to the store). If you pass
the empty string (""
) for both, no final event is
created.
For a description, see ADD_INPUTFILE_ONLY_EVENT(/*extend_inputfile_t*/ f, line, tag, text_utf8).
For a description, see COMMIT_INPUTFILE_ONLY_ERROR(/*extend_inputfile_t*/ f, line, tag, text_utf8).
For a description, see ADD_INPUTFILE_EVENT (tree, /*extend_inputfile_t*/ f, line, tag, text_utf8).
For a description, see COMMIT_INPUTFILE_ERROR (tree, /*extend_inputfile_t*/ f, line, tag, text_utf8).
The error output routines are demonstrated in the sign2.cpp
checker (see <install_dir>/sdk/samples/sign2/sign2.cpp
or Section 3.1.2, “sign2 checker”), which is an extension of the
sign.cpp
checker discussed previously (see Section 1.4.5, “Example: tracking the sign of expressions”).
The main difference between the sign.cpp
and
sign2.cpp
checkers is that in sign2.cpp
every call to SET_STATE
is followed by a call to
ADD_EVENT
. There are also some calls to
CLEAR_STATE
, used to remove previously-associated events in
cases where a new value is being stored. COPY_STATE
is used to copy
events from one expression to another, to reflect the history of each value.
The whatis
query now uses COMMIT_ERROR
to
output the events associated with the expression being queried, or
OUTPUT_ERROR
if nothing is known about the expression.
The print_store
query uses an ostringstream
to construct a big string, which it then sends to OUTPUT_ERROR
.
Finally, this checker has some true defect detection because it recognizes typecasts from signed to unsigned where the source might be negative. Though these potential defects include many false positives at first, the premise for this checker is to suppress them by adding assertions. However, in order to respond to assertions, the checker needs to handle conditionals, which is the subject of the next section.
A checker has two fundamental sources of information about a path: assignments and conditionals. Assignments are handled by computing an abstraction of the right-hand side, and storing that abstraction in the left-hand side. That is, they simply correspond to updating the store.
Conditionals, on the other hand, act as constraints: the current abstract state (the store) must be refined in such a way as to be consistent with the branch of the conditional being taken. The refinement algorithm is dependent on the abstraction being used by the checker.
ANALYZE_CONDITION
handles conditional guards in the same way that
ANALYZE_TREE
handles statements and expression side
effects.
Within ANALYZE_CONDITION
, you can use
MATCH_COND
to inspect the guard expression.
MATCH_COND
automatically takes account of whether the true or
false branch is being taken. For example, if the code to analyze says:
if (x == y) { // then-branch } else { // else-branch }
then the checker fragment:
Expr a, b; MATCH_COND(a != b)
matches only when the else
branch is followed.
![]() | |
It is important to note that MATCH_COND only works when matching comparisons. For
instance, |
Often, an analysis is able to determine that a path is
infeasible, meaning that it cannot be executed at run time. This
discovery happens when the engine attempts to traverse through a conditional that is
inconsistent with the known facts in the store. In such cases, the checker can call
force_backtrack()
, which stops further exploration of this
path.
The sign3.cpp
checker illustrates handling conditionals (see
<install_dir>/sdk/samples/sign3/sign3.cpp
or Section 3.1.3, “sign3 checker”).
The core of the checker is in the section titled abstract
comparison, which defines what it means to compare two abstract values
using operators like <
and ==
. Whenever an operator
<op>
is used to compare abstract values a
and
b
, and we follow the path where the comparison yields true, there are
several possible consequences embodied in AbstractComparisonResult:
We might decide that the comparison could not possibly have yielded true, in which case its truth is inconsistent with the current abstract state. For example:
int x = 0; if (x < 0) { // inconsistent, i.e., unreachable }
We might discover new facts about a
or b
. For
example:
int x = 0; if (x == y) { // discover that 'y' equals 0 as well } if (y <= 0 && z >= 0) { // discover: y <= 0 // discover: z >= 0 if (z == y) { // discover: y == 0 and z == 0 } }
We might discover no new information. For example:
if (x < y) { // lacking any previous information about 'x' and 'y', the // constraint 'x < y' cannot be expressed in our abstraction }
In the sign3
checker, this computation is performed by the
abstractComparison
function. This function is probably more complicated
than ones you write, but illustrates the general technique in a realistic
setting.
Because there are six relational comparison operators, there are six precomputed
abstract relation tables stored in the relationalOperators
global variable.
The RelationalOperator
class effectively maps from an AST tree code
to an abstract operator table.
The ANALYZE_CONDITION
function begins by attempting to match the
input condition with each of the six relational operators:
for (int i=0; i < NUM_RELATIONAL_OPERATORS; i++) { RelationalOperator *relop = relationalOperators[i]; if (MATCH_COND(Binop(relop->treeCode, a, b))) {
When it finds a match, it abstractly evaluates the arguments a
and
b
; an unmapped expression corresponds to
AV_UNKNOWN
:
if (!GET_STATE(a, va)) { va = AV_UNKNOWN; } if (!GET_STATE(b, vb)) { vb = AV_UNKNOWN; }
It then performs the abstract comparison by doing a table lookup:
AbstractComparisonResult &res = relop->map[va][vb];
If the comparison is infeasible, that is, it could not possibly have evaluated to true, then we abort analysis of the current path:
if (!res.consistent) { force_backtrack(); }
Otherwise, if incorporation of the new constraint has led to a refinement of the
abstract value of a
or b
, then the store is updated
accordingly:
if (res.newAValue != va) { SET_STATE(a, res.newAValue); } if (res.newBValue != vb) { SET_STATE(b, res.newBValue); }
With these refinements, the sign3
checker is able to confirm that
<install_dir>/sdk/samples/sign3/test1/sign3.test.c
never
converts a negative integer to unsigned
even though there are three places
that have implicit conversions from int
to unsigned
. A portion
of the output is shown next:
matched conditional "x >= 0"; "x" = AV_UNKNOWN, "0" = AV_ZERO refined "x" to AV_POS_ZERO matched conditional "y > 0"; "y" = AV_NEG_ZERO, "0" = AV_ZERO backtracking due to inconsistency matched conditional !"y > 0"; "y" = AV_NEG_ZERO, "0" = AV_ZERO matched conditional "x == z"; "x" = AV_POS_ZERO, "z" = AV_UNKNOWN refined "z" to AV_POS_ZERO matched conditional !"x == z"; "x" = AV_POS_ZERO, "z" = AV_UNKNOWN matched conditional !"x >= 0"; "x" = AV_UNKNOWN, "0" = AV_ZERO refined "x" to AV_NEGATIVE
Realities of paths.
The mechanics of path traversal are relevant when writing a checker. Among
other things, this chapter explains why using cout
often produces
misleading results.
The Coverity Extend SDK engine does not simply traverse one path through a function; instead it traverses many paths to achieve complete coverage of all relevant sequences of operations.
The paths are not executed in sequence, one after another. Instead, common prefixes of paths are analyzed once with the analysis branching at conditionals to investigate each path separately.
What this means is that your checker's handler functions such as
ANALYZE_TREE
are called for expression and statement trees in
different paths at different times; from the checker's point of view, the engine seems
to be jumping from path to path unpredictably. This is why using cout
is
misleading.
The way to keep track of separate paths is by using the store. The Coverity Extend SDK engine will always call a handler with the same store (contents) for the same path. Thus, all knowledge about the current path should be saved to the store.
Since the events are also in the store, using ADD_EVENT
is the
best way to produce coherent, path-dependent output.
Among the reasons for what at first might seem to be unpredictable switching among paths is false path pruning (FPP). Your Coverity Extend SDK checker is actually running in conjunction with a number of FPP modules, each of which is looking for sequences of operations that are inconsistent. For example, in this code:
int x = 2; if (x != 2) { // unreachable }there is an FPP module that detects that the then branch of this conditional cannot be taken. That FPP module calls
force_backtrack
, but won't
otherwise inform your checker, so that the checker finds the next
ANALYZE_TREE
coming from a different path.
Because the FPP modules are somewhat computationally expensive, they are disabled
until the checker calls either the COMMIT_ERROR
or
OUTPUT_ERROR
functions. Once the checker tries to output an error,
the analysis of that function is restarted with the FPP modules enabled. This saves time
in the usual case where a checker does not find any problems on any path, but still
filters out reports from infeasible paths.
This is another reason why cout
produces misleading results: your
checker appears to analyze some functions twice due to the activation of the second
pass.
Any function that has a loop will have an infinite number of apparent paths. Even with FPP enabled, there might still be no bound on the number of paths. For example, consider the following loop:
void foo(int n) { for (int i=0; i<n; i++) { // ... } }
How does the Coverity Extend SDK engine avoid running forever checking an example like this? The Coverity Extend SDK engine notices that the second and subsequent paths through the loop are not significantly different from the first iteration, and stops analyzing the loop. This condition is called a fixpoint of the loop.
The concept of significantly different is difficult to describe, and is not necessary to understand fully to write a checker. You must be aware that the contents of the store is the key determiner of whether something is considered different by the Coverity Extend SDK; engine. As a first approximation: exploration of the loop terminates if and only if two different iterations produce the same store. Although the actual rule for this behavior is more complex, this abstraction is generally accurate.
What this means for a checker writer is that you must not try to track values too precisely, otherwise you risk putting the Coverity Extend SDK engine into an infinite loop. You must choose abstractions that are precise enough to check the property of interest, but sufficiently imprecise to allow termination.
A simple example of the imprecise concept is integers. If you track integer values exactly, then you have an infinite abstract domain and hence the analysis does not terminate. But by abstracting this down to just three values (negative, zero, and positive), you can ensure termination.
If you track values of arbitrarily complex expressions, then there is no guarantee of
reaching a fixpoint in a loop, so the abstract intepretation could go on forever. As
another example, when tracking values in the heap, you should avoid tracking information
about arbitrarily deep nested pointer dereference expressions. For example, tracking
p->field
is usually fine, but tracking
p->field1->field2->field3
is not, since the latter has enough precision to
take considerable (infinite) time exploring all of its variations.
You can use the Coverity Extend SDK to examine the class hierarchy of C++, C#, or java programs. This capability consists of an in-memory representation of the hierarchy and an API to query that representation.
The representation and API are declared in
<install_dir>/sdk/headers/types/extend-types.hpp
. The comments
in that header file are the definitive documentation for individual methods. This
section describes how to use the API at a high level, and Chapter 2.7, Types explains it at an intermediate level of detail.
To obtain the type of an expression or variable, match it with a pattern that inherits
from TypedExpr
(such as Expr
), and then use
the TypedExpr::get_type_t()
method.
This returns a pointer to a types::type_t
(hereafter referred to
as type_t
) object, which is the root of a C++ class hierarchy for
representing types.
The type_t
class has a number of methods for querying the actual
type. For example, type_t::as_class_p()
returns a pointer to a
class_type_t
if that type_t is a
class_type_t
, otherwise it returns
NULL
.
The following is an Coverity Extend SDK checker fragment that checks for the current expression being a pointer to a class:
Expr e; if (MATCH(e)) { if (type_t *t = e.get_type_t()) { if (class_type_t *ct = t->as_class_p()) { // now 'ct' is the class_type_t representing the type // of the expression matched by 'e' } } }
A given type_t object
is actually a tree (or subtree). For
example, the type int (*)(float, char**)
looks like the following:
pointer_type_t | |pointed_to V function_type_t / \ \ return_type/ \args[0] \args[1] V V V scalar_type_t scalar_type_t pointer_type_t | | | |name |name |pointed_to | | | "int" "float" pointer_type_t | |pointed_to | scalar_type_t | |name | "char"
scalar_type_t
is a leaf type.
The class_type_t
and union_type_t
classes are also leaf types of a sort. Since all recursive types (such as the type of a
linked list) must go through a class or struct or union in order to recurse, you can
think of them as leaves — so that types really are trees, rather than arbitrary
graphs. However, this "leafness" is a property of the code that
uses the type_t
structure. It's merely a
convenient convention. This convention is used by
type_recursive_visitor_t
,
among others.
Each class in the program being analyzed is represented by a
class_type_t
object in the Extend SDK.
Given a class_type_t
, the inheritance hierarchy can be examined
using, for example, the following:
class_type_t *c = ...; defined_class_type_t d = c->load_definition(); if(d) { d->get_parents(); }
This returns a vector of the immediate ancestors.
To iterate over all parents, including those that are inherited, use:
class_type_t *c = ...; defined_class_type_t d = c->load_definition(); if(d) { d->get_all_parents(); }
Given two function_t
objects, the
function_t::overrides()
method can be used to determine
whether one overrides the other. This function can only be called if the class
hierarchy allows it. For example, if you call f1->overrides(f2)
, then
f1->get_owner_class()->derives_from(f2->get_owner_class())
must be
true.
The type_t
visitor classes allow types to be processed using
the visitor pattern, which is sometimes more convenient than
explicit recursive processing.
The type_visitor_t
is an interface that clients can
implement. The handlers, such as on_function
or
on_class
, react to the various kinds of
type_t
nodes. Invoking its
operator()
on a type.t
object
invokes the appropriate handler for the dynamic type of that object.
The subtype_visitor_t
is another interface. When a type
object's iter_subtypes
method is invoked, the type object
invokes the passed subtype_visitor_t::operator()
on each of
its component or "sub" types. For example, the fields of a class are considered sub
types, as is the pointed_to
element of a
pointer_type_t
.
The previous two interfaces are combined to form the
type_recursive_visitor_t
, which is both an interface and
a tree traversal mechanism. To use this class, inherit from it, and then implement
the appropriate on_XXX
methods. By default, these methods
recursively traverse their sub types (hence the recursive
in the name
of the visitor), except for classes and unions. That is, they visit the tree
structure described in Section 1.8.3, “Tree structure of types”. If you override one
of the methods other than on_class
or
on_union
, you must call the superclass method if you want
recursive traversal to proceed below the overridden point.
In general, a checker should not save type_t
pointers beyond
the analysis of the function when they were obtained. The reason is that
type_t
objects get loaded and unloaded as the analysis runs.
The analysis guarantees to keep resident the set of types defined when the function
being analyzed was compiled, but not beyond that.
The PRINT_TYPES checker example (see also,
<install_dir>/sdk/samples/print_types/print_types.cpp
)
demonstrates some of the type_t
API. The checker itself simply
matches declarations and uses the declared types as a root set in a recursive
exploration of the program's type hierarchy. It also prints out the name and type of
those declarations. The bulk of the work is done by the
ClassTypeVisitor
, which inherits from
type_recursive_visitor_t
. Its on_class
method prints out the base classes and (non-static) fields of each class that can be
reached from the root set of types. It keeps a set of class names so it can avoid
printing information about the same class twice. Note that, as explained in the previous
section, you cannot keep a set of class_type_t
pointers.
The SWITCH_DEFAULT checker example (see
also, <install_dir>/sdk/samples/switch_default.cpp
) demonstrates
how to match the case
and default
substatements of a switch
case statement. It reports switch statements that do not have an explicit
default
statement and shows how it is possible to extract the
particular values in case
statements.
In addition to analyzing ASTs and reporting defects in source code, Coverity Extend SDK checkers
can also inspect the contents of files captured during the Coverity build and emit processes
and report defects in them. These files can include source files, files packaged within a
WAR file (and emitted with cov-emit-java --webapp-archive
or similar), or
an Android AndroidManifest.xml
and its associated APK file (emitted
with the --android-apk
and --input-file
options to
cov-emit-java
).
The general workflow uses cov-emit-java
to emit the files to the
intermedidate directory, then runs the custom checkers on the emitted files. The
checkers iterate over the input files to produce the reports.
Emit files, such as AndroidManifest.xml
, that your
checker requires to the intermediate directory. Such files must be associated
with an APK, for example:
cov-emit-java --dir myIntDir --android-apk myAPK.apk --input-file AndroidManifest.xml
The --input-file
option is required and can be specified
multiple times. The --android-apk
option is required and can
only be specified once. Descriptions of these command line options are in the
Coverity 2020.12 Command Reference.
Run your custom checker to analyze the input files.
This methods in this class are for iterating over and reporting defects on files built or emitted by Coverity Analysis processes.
FOREACH_MATCHING_INPUTFILE(f, suffix_utf8)
This macro iterates over the input files (including source and application archives) when the filesystem path matches the specified suffix.
f
is of type extend_inputfile_t
and will be
set in each iteration.
suffix_utf8
is of type const char *
, which
is interpreted as a NUL-terminated UTF-8 sequence.
Arguments to the function declarations in the remaining macros (below) share the following characteristics:
f
is of typeextend_inputfile_t
.
line
is a positive integer line number on which to report the event.
tag
andtext_utf8
are of typeconst char *
, which is interpreted as a NUL-terminated UTF-8 sequence.
ADD_INPUTFILE_ONLY_EVENT(/*extend_inputfile_t*/ f, line, tag, text_utf8)
Queue an event (in an input file) to report at the next use of COMMIT_INPUTFILE_ONLY_ERROR. This macro is useful when reporting events that are entirely inside input files. It is most appropriate to use this macro inside CHECKER_INIT or CHECKER_FINAL.
COMMIT_INPUTFILE_ONLY_ERROR(/*extend_inputfile_t*/ f, line, tag, text_utf8)
Report an error in an input file. This macro will include only events previously queued through ADD_INPUTFILE_ONLY_EVENT. This is useful when reporting events that are entirely inside input files.
It is most appropriate to use this macro inside CHECKER_INIT or CHECKER_FINAL.
ADD_INPUTFILE_EVENT (tree, /*extend_inputfile_t*/ f, line, tag, text_utf8)
Create an event in input file f
, and add it to the set of events
in the store for tree
. A subsequent use of
COMMIT_INPUTFILE_ERROR(tree, ...) or COMMIT_ERROR(tree, ...) will include this
event in the defect report it produces. This macro is for reporting events in
input files in the context of a flow-sensitive checker; it allows mixing events
in source code and input files.
Use this macro inside any ANALYZE_* or FUNCTION_INIT handler.
Note that subsequent use of COMMIT_INPUTFILE_ERROR or COMMIT_ERROR will include this event in the defect it creates.
For related information, see Chapter 1.9, Reporting events and defects on input files.
COMMIT_INPUTFILE_ERROR (tree, /*extend_inputfile_t*/ f, line, tag, text_utf8)
Report the error in an input file using the events from tree
, an
AST node.
This macro is for use with a flow-sensitive checker and allows mixing events in source code and input files.
Use this macro inside any ANALYZE_* or FUNCTION_INIT handler.
For related information, see Chapter 1.9, Reporting events and defects on input files.
The sample checkers iterate over input files, querying encoding and parent archives, loading the contents to the emit (part of the intermediate directory), and reporting defects in both simple and stateful checkers.
Location: <intermediate_directory>
sdk/samples
java_input_file_simple.cpp
java_input_file_stateful.cpp
Example:
[STATUS] Reading call graph [STATUS] Computing callgraph. |0----------25-----------50----------75---------100| **************************************************** [STATUS] Incremental analysis could not be used - this may take a while. [STATUS] Starting analysis run (analysis pass) |0----------25-----------50----------75---------100| *******************************************| extend-patterns.hpp:111:operator P5_tree: Tree used with no match in pattern Var SM NAME: a_never_follows_b ANALYZING: a_never_follows_b_test.cpp:_Z5test6i LINE 50: "B()" 0xb23c80 0xb23c68 call_expr, NAME: _Z1Bv TYPE: void -0xb23c38 0xb23c20 addr_expr TYPE: void (void)* -0xb23f08 0xb23bc0 function_decl NAME: _Z1Bv, public TYPE: void (void) This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. Returned with error code 0x3 at function 6.
Solution:
A pattern has been used before it matches anything.
For example, you can get this error if you run the following checker:
#include "extend-lang.hpp" START_EXTEND_CHECKER( a_never_follows_b, int_store ); ANALYZE_TREE() { CallSite a( "A" ); CallSite b( "B" ); Var v; tree t; int i; if( MATCH( b ) ) { SET_STATE( v, 1 ); ADD_EVENT( v, "B", "B is called" << v ); } else if( MATCH( a ) ) { FOREACH_IN_STORE( t, i ) { ADD_EVENT( v, "A", "A is called" << v ); COMMIT_ERROR( t, "A", "A is follows b" << t ); } } // CLEAR_STATE( v ); } END_EXTEND_CHECKER(); MAKE_MAIN( a_never_follows_b )against the following test case:
extern void A(); extern void B(); // OK void test1() { A(); } // OK void test2() { B(); } // OK void test3() { A(); B(); } // Defect void test4() { B(); A(); } // OK void test5(int x) { if (x) { A(); } else { B(); } if (!x) { B(); } else { A(); } } // Defect void test6(int x) { if (x) { A(); } else { B(); } if (!x) { A(); } else { B(); } }
Table of Contents
The Coverity Extend Software Development Kit (Coverity Extend SDK) is a framework for writing program analyzers (that is, checkers) in C++ that support the analysis of C/C++, C#, and Java applications. Features include:
Basic frontend features: Parsing, type checking and elaboration, abstract syntax construction, template instantiation, and linking across translation units.
Facilities to inspect abstract syntax using pattern matching.
Mechanism to traverse paths in the abstract syntax in execution order, prune false paths, and merge similar states to ensure termination in loops.
Flexible notion of checker state for checking flow-sensitive properties.
Output routines that work with the false path pruning (FPP) mechanism to ensure that defects are only detected from feasible paths.
This guide provides detailed information about the Coverity Extend SDK API. See Part 1, “Coverity Extend SDK Usage” for basic concepts and an introduction to using the Coverity Extend SDK.
The core of an Coverity Extend SDK checker is a set of handler functions. These handlers are called by Coverity Analysis to inform the checker of key events.
An Coverity Extend SDK checker source file is organized as follows:
// checker_name.c // (comment about what the checker does) #include "extend-lang.hpp" // Coverity Extend SDK API (1) START_EXTEND_CHECKER( checker_name, checker_type ); (2) END_EXTEND_CHECKER(); MAKE_MAIN( checker_name )In section (1) you can define arbitrary C/C++ functions and data structures. Syntactically, it is in the global scope.
In section (2) you define the checker handler functions. Syntactically, section (2) is inside a class definition.
You can define member variables inside section (2); doing so is somewhat cleaner than defining them as global variables in section (1), but either method works. However, member variables cannot be initialized at the declaration site (see Section 2.2.1.5, “INIT_OPTIONS”).
START_EXTEND_CHECKER( checker_name, checker_type );
Description
The START_EXTEND_CHECKER
macro call ends section (1) and begins section (2). This macro begins a class
declaration.
Arguments
checker_name
is the name of your checker. This is the same name as
the name of the source file (without the .c
extension). This name is
used by Coverity Connect to identify defect reports that are created by your checker.
checker_type
is one of the following:
simple
— Checker type used for flow-insensitive
(stateless) checkers. It has no store. For more information about the store,
see Chapter 2.5, The store.
int_store
— Checker type used for flow-sensitive
(stateful) checkers. Its store maps from expressions to integers; the exact
meaning of the integers is up to you to establish.
type
— A special kind of
checker that has no store and does not analyze abstract syntax trees. It
visits each class that has been defined. See Section 2.2.1.17, “ANALYZE_CLASS”.
END_EXTEND_CHECKER();
Description
This macro call ends section (2). It
terminates the class declaration started by START_EXTEND_CHECKER()
.
MAKE_MAIN( checker_name );
Description
This macro call defines the main
function of the entire checker
program. checker_name
is the name of the checker and should be the same
as the name of the source file (without the .c
extension).
The call to MAKE_MAIN
should come directly after the call to END_EXTEND_CHECKER
, and should
be the last statement in the checker source file.
INIT_OPTIONS() { <
code
> }
Description
The code in INIT_OPTIONS
is executed at program startup. Use this
code to initialize member variables declared in section (2). This is the first handler
function that is called when the program starts up (even before CHECKER_INIT
).
HANDLE_OPTION() { <code> }
Description
This handler is called for every command-line argument to the
cov-analyze
command of the form:
(--checker_option|-co) checker_name:option_name[:option_value]
where checker_name
is the same as the argument to START_EXTEND_CHECKER
.
The following macros can be used in HANDLE_OPTION
to determine the
argument:
CHECK_OPTION(opt) { <code> }
— Executes
code
if the option_name
on the command
line equals opt
.
OPTION_VALUE — <option_value>
(as a
char const *
) passed on the command line, or NULL if no
value was passed.
OPTION_HANDLED()
— Tells Coverity Analysis that the command-line
option has been recognized and processed. Calling this function causes a
return from HANDLE_OPTION
.
OPTION_NOT_HANDLED()
— Tells Coverity Analysis that the option
has not been recognized. Calling this function
causes a return from HANDLE_OPTION
.
For example:
// The following member variables are located between // the checkers START_EXTEND_CHECKER and END_EXTEND_CHECKER. int mode; bool use_mangled; // called at program startup INIT_OPTIONS() { mode = 0; use_mangled = false; } // called for each --checker_option command-line argument HANDLE_OPTION() { CHECK_OPTION( "mode" ) { mode = atoi( OPTION_VALUE ); OPTION_HANDLED(); } CHECK_OPTION( "use_mangled" ) { use_mangled = true; OPTION_HANDLED(); } OPTION_NOT_HANDLED(); } // You can now use mode and use_mangled in ANALYZE_TREE..
CHECKER_INIT() { <code> }
Description
This function is called at program startup, after INIT_OPTIONS
and HANDLE_OPTION
have been called.
Initialization code, particularly code that depends on the command-line options, can
be placed in this function.
CHECKER_FINAL() { <code> }
Description
This function is called when the program is about to terminate.
FUNCTION_INIT() { <code> }
Description
This function is called when the checker is about to start analyzing a function. You can use functions such as current_function_get_name to get information about the function that will be analyzed.
FUNCTION_FINAL() { <code> }
Description
This function is called when the checker is finished analyzing all of the paths in a function.
ANALYZE_TREE() { <code> }
Description
This is one of two central checker functions (ANALYZE_CONDITION
is
the other). It is called for each statement and expression, as described in Section 1.4.3, “Visit order”.
Options
Within the body of ANALYZE_TREE
, you can use several macros to
inspect the AST fragment that is undergoing analysis:
CURRENT_TREE
— The AST node that is undergoing
analysis. It has type tree
.
MATCH(pat)
— Matches CURRENT_TREE
against pattern pat
, returning true if it matches. See
Chapter 2.3, Patterns.
MATCH_TREE(pat,t)
— Matches t
against
pattern pat
.
ANALYZE_CONDITION() { <code> }
Description
This handler is called for every conditional expression through which the current
path passes (see Chapter 1.7, Paths for more information). To inspect
the condition, use MATCH_COND(pat)
. This returns true if the current
condition matches pat
.
![]() | |
Do not use |
ANALYZE_END_OF_PATH() { <code> }
Description
This function is called each time that the end of a particular path is reached. It is typically used in checkers that need to flag the absence of something along a path.
Since multiple paths are analyzed, this function can be called many times for a single function.
PREFER_TO_ANALYZE_CSHARP()
Description
This function makes the checker analyze C#-based output in the intermediate directory. By default, checkers otherwise analyze the C/C++ output.
Example:
START_EXTEND_CHECKER( cs1, simple ); PREFER_TO_ANALYZE_CSHARP(); ANALYZE_TREE() { ...
The following error occurs if the intermediate directory does not contain C# output:
[ERROR] This program operates on Static C#
but specified intermediate directory <int-dir
>
only contains data for C/C++.
![]() | |
If you have both C# and C/C++ output in your intermediate directory and want
to use your checker on the C/C++, you can use the following option on the
command line to override For example: > < On the other hand, if you want to analyze C# output with a checker that does
not call For example: > < |
PREFER_TO_ANALYZE_JAVA()
Description
This function makes the checker analyze Java-based output in the intermediate directory. By default, checkers otherwise analyze the C/C++ output.
Example:
START_EXTEND_CHECKER( java1, simple ); PREFER_TO_ANALYZE_JAVA(); ANALYZE_TREE() { ...
The following error occurs if the intermediate directory does not contain Java output:
[ERROR] This program operates on Static Java
but specified intermediate directory <int-dir
>
only contains data for C/C++.
![]() | |
If you have both Java and C/C++ output in your intermediate directory and want
to use your checker on the C/C++, you can use the following option on the
command line to override For example: > < On the other hand, if you want to analyze Java output with a checker that does
not call For example: > < |
PREFER_TO_ANALYZE_JAVASCRIPT()
Description
This function makes the checker analyze JavaScript-based output in the intermediate directory. By default, checkers otherwise analyze the C/C++ output.
Example:
START_EXTEND_CHECKER( js1, simple ); PREFER_TO_ANALYZE_JAVASCRIPT(); ANALYZE_TREE() { ...
![]() | |
If you have both JavaScript and C/C++ output in your intermediate directory and want
to use your checker on the C/C++, you can use the following option on the
command line to override For example: > < On the other hand, if you want to analyze JavaScript output with a checker
that does not call For example: > < |
ANALYZE_CLASS() { <
code
> }
Description
This function can only be used for
type
checkers. It is called for each class that is
defined in the source code. You can use the CURRENT_CLASS
macro to get
the class that is undergoing analysis. It has type
defined_class_type_t
.
This introduction explains the pattern matching
API. It does not replace the comments in the
><install_dir>/sdk/headers/patterns/*-patterns.hpp
header files, but
provides a high level overview. You should refer to *-patterns.hpp
as
you read this section.
Refer to the sample pattern checker at
><install_dir>/sdk/samples/patterns/patterns.cpp
, which exercises
all of the APIs described in this section.
Patterns are a mechanism for inspecting AST fragments (the AST, or Abstract Syntax Tree,
is the internal representation of the code to analyze). The basic idea is to create a
pattern object and then use its match()
method to compare the pattern
to a specific AST fragment. When match()
returns true, each subpattern
can be queried to obtain the AST fragment that it matched.
In addition to AST fragments (that represent code), patterns can match types and symbols. Types are C/C++ types such as structs, classes, or typedefs. Symbols are unique representatives for variables, functions, and class fields (as opposed to a specific appearance of them in the code). For instance, a variable expression (which is an AST fragment) references the symbol corresponding to the variable.
Correspondingly, there are three pattern hierarchies, with the corresponding C++ type
hierarchy that they are used to match (argument type to match()
is a
rough equivalent). Note that only an ASTNodePattern
(or a subclass)
can be used directly with MATCH
or MATCH_TREE
. For
MATCH_COND
, you should use
ExpressionPattern
.
ASTNodePattern
(class
ASTNode
)
TypePattern
(class
type_t
)
SymbolPattern
(class
symbol_t
)
ASTNodePattern
has two important sub-hierarchies for matching
statements (such as a for
loop) or expressions (such as an addition).
Correspondingly, class ASTNodePattern
has these subclasses:
StatementPattern
(class
Statement
)
ExpressionPattern
(class
Expression
)
Class ASTNode
has a third subclass,
Declaration
(variable declaration), that can be matched by the
Decl
pattern. Since Declaration
has no
further subclasses, there is no hierarchy.
Next are diagrams of each of those hierarchies. In these diagrams triangles represent inheritance. Filled triangles indicate that the superclass is abstract (not instantiable), while the superclasses for the open triangles are concrete.
The Coverity Extend SDK was originally used only to analyze C/C++ source code. Now that the Coverity Extend SDK also supports the creation of checkers that analyze C# and Java source code, you should note the following conventions when writing such checkers:
To match the use of a C# struct or class or Java class, use the pattern
StructType
. See Section 2.3.5, “TypePattern superclass”.
The terms function and method are
sometimes used interchangeably in the Coverity Extend SDK header files and documentation. To
match a C# or Java method call, use the pattern CallSite
. See
Section 2.3.8.2, “Function call site expression patterns ”.
To match the use of a C# or Java static field, use the pattern
StaticVar
. See
Section 2.3.8.3, “Variable reference expression patterns”.
To match the use of a C# or Java instance field, use the pattern
Component
. Keep in mind that matching a non-static class instance field
involves an implicit dereference. In other words, the code obj.field
involves a dereference of the object reference obj
. For a code example,
see <install_directory>/sdk/samples/java_match_field
. For
more information, see Section 2.3.8.1, “Basic expression patterns”.
To match C# or Java references in an Coverity Extend SDK checker, you must use the
Pointer
pattern. The Reference
pattern is only used for
analyzing C/C++ code. Using the Reference
pattern to analyze C# or Java
code will not result in a match. See
Section 2.3.8.4, “Type-filtered expression patterns”.
Every pattern superclass exposes a number of methods suitable for use in the Extend SDK.
In the following, T
is used to represent either
ASTNode
, type_t
, or
symbol_t
(as appropriate).
bool match(const T *t)
— The primary matching
function. Call it to attempt to match a data structure t
with
the pattern. If the match is a success, match
returns
true.
T last_<
—
Returns the last T that matched the pattern. Use only if the match
succeeded. The value for <XXX
>()XXX
> depends on which
hierarchy you're using, and which level you're at in this hierarchy, and can
be astnode
, type
,
symbol
, expr
, or
stmt
. For compatibility with previous versions of
the Extend SDK, get_tree
is equivalent to
last_astnode
.
void print(ostream &out) const
— Prints a
textual representation of this pattern. The text does not depend on whether
the pattern has been used to match anything; it simply describes the
structure of the pattern itself.
operator const T *()
— Same as
last_XXX()
; provided as a syntactic
convenience.
Patterns can also be passed to operator<<(ostream)
, in
which case it will print the last matched T
.
The ASTNodePattern
is primarily used as a superclass for
StatementPattern
and
ExpressionPattern
. It is also used to inspect the tree hierarchy
formed by all ASTNode
s (for instance, an expression can be
contained within a for
loop). It also has a function,
recursive_match
, that returns a list of all the
ASTNode
s underneath (and including) the given one that
matched the pattern.
ExpressionPattern
matches expressions. By default, most
ExpressionPattern
s implicitly strip casts off of the
expressions they match (exceptions to this are noted), since casts are often just noise
from the point of view of program analysis; likewise for the return value of
last_expr()
and like functions.
ExpressionPattern
also has a few extra functions that are not
available in other pattern hierarchies:
bool match(const Expression *e, bool polarity)
— Matches an expression that is negated if and only if
polarity
is false. For instance, (a == b).match(e,
false)
will match a != b
. Typically used in
ANALYZE_CONDITION
and is used in the implementation of MATCH_COND
.
match_with_casts
— Matches an expressions
without first stripping casts. This does not affect cast stripping in
subpatterns.
field()
— Returns a pattern that matches a
field taken off an expression matched by this pattern. For instance, if
A
matches foo
, then A.field()
matches foo.bar
. The method can also take a
SymbolPattern
argument to restrict the specific
fields to match.
method()
— Same as
field()
, except that it matches a non-static method
call.
get_type()
— Returns the type of the last
matched expression.
ExpressionPattern
, that allows for the construction of
patterns that match equivalent syntax. For instance, pattern A + B
matches
an addition.
TypePattern
matches types. They cannot be used directly in
MATCH
or MATCH_TREE
(because these
functions match AST nodes) but they can be used as parameters to other patterns, or used
directly on the result of get_type_of_tree
. By default, most
TypePattern
s remove typedefs and qualifiers before attempting
to match (exceptions are noted). This means for instance, that a pattern
x_p
defined like this:
StructType x("X"); PointerType x_p(x);matches all these types:
typedef struct X * X_p; X_p struct X * struct X const *
TypePattern
also has a
match_with_typedefs_and_qualifiers
that is analogous to
ExpressionPattern::match_with_casts
.
SymbolPattern
matches symbols. Like the
TypePattern
class, SymbolPattern
cannot be used directly in a MATCH
or
MATCH_TREE
. Instead , use SymbolPattern
as a parameter to other patterns. The most common SymbolPattern
is NamedSymbol
, which can be used to match a specific function or
variable.
The Extend SDK has several pattern objects predefined for convenience.
anyASTNode, anyExpr, anyStmt, anyType, anySymbol
—
Patterns that matches anything in the relevant pattern hierarchy.
![]() | |
Do not pass |
_
(underscore) — A pattern that matches anything in any
pattern hierarchy. Because of potential ambiguity, it's preferable to use
any<
described previously.
XXX
>
The most commonly used pattern classes are
ExpressionPattern
s.
AnyExpression
(also Expr
) is the most general pattern and it
matches any expression. It doesn't strip casts.
The basic expression patterns are those that correspond directly to primitive syntactic expression constructors. Here's a list of some of them.
Binop (BinaryOp op, ExpressionPattern &a, ExpressionPattern
&b)
— Match a binary operator, for example a
+ b
. Typically you do not need to explicitly create a Binop,
because there are overloaded operators on
ExpressionPattern
s that will do so
automatically. However, there are occasions where direct use is
convenient. The possible values for op
are listed in
><install_dir>/sdk/headers/ast/cc_flags.hpp
SymBinop(BinaryOp op, ExpressionPattern &a,
ExpressionPattern &b
— Match an operator that is
symmetric, for example a + b
. Any of the Binop
operators can be used, but only the symmetric ones make sense. This is
useful when the a
and b
patterns are
different, as SymBinop
matches both (a,b)
and
(b,a)
orderings.
ASymBinop(BinaryOp op, ExpressionPattern &,
ExpressionPattern &b)
— Match an anti-symmetric
operator or its dual. For example, a < b
or b >
a
. Only anti-symmetric binops (inequalities) can be
used.
Unop(UnaryOp op, ExpressionPattern &)
— Match
a unary operator, for example -a
. As with
BinOp
, direct use of Unop
is only
occasionally useful, since overloaded operators are provided that cover
most of the common uses. The possible values for op
are
listed in
><install_dir>/sdk/headers/ast/cc_flags.hpp
MapAccess(ExpressionPattern &map, ExpressionPattern &key)
This expression pattern applies only to JavaScript.
Properties of other expressions will be represented by this. Also,
global variables are represented by a MapAccess
off a
GlobalVar
map. For example, the following will match
all global variable expressions:
GlobalVar global; MapAccess access(global, _); if (MATCH(access)) { ... }
Star
— Match a dereference, for example
*p
. This will not match if
p
is an array (see below).
ArrayIndex
— Match an array element reference,
i.e. a[i]
where a
is an array. This will
not match if a
is a
pointer.
Assign
— Match an assignment or compound
assignment, for example a = b
or a *=
2
.
Effect
— Match an increment or decrement, for
example ++a
or b--
.
CondPattern
— Match a use of the ?:
operator.
Const_int
— Match an integer literal.
Const_float
— Match a floating point
literal.
Const_string
— Match a string literal.
Component
— Match a use of a field of an object,
for example a.b
.
Cast
— Match a cast. The flags control whether
automatic (implicit) casts, manual (explicit) casts, or both are
matched. This includes C++-style casts such as
reinterpret_cast
but not
dynamic_cast
.
StmtExpr
— Match a GNU statement expression, for
example ({ x; y; z; })
.
This
— Match a use, either explicit or implicit, of
the this
expression.
Throw
— Matches a throw
expression.
NewPattern
— Matches a new
expression.
DeletePattern
— Matches a delete
expression.
DynamicCast
— Matches a dynamic_cast
expression.
The two remaining primitive constructs are variable reference and function call, which are addressed in subsequent sections.
There are a variety of patterns to match function call sites:
CallSite
— Match function calls. This pattern is
used to implement all the other ones below, which are only provided for
backwards compatibility and convenience. A CallSite
pattern
can be set up to match function pointers, direct function calls or both
(using setCalledExpression)
. It can also include or exclude
method/non-method calls (using setReceiverObject
). It also
has an operator()
method that allows specifying call
arguments.
![]() | |
See bullet items below for uses of |
Fun()
— Match a call to any function, including
calls through function pointers. Equivalent to:
CallSite()For backwards compatibility, note that using
operator()
with no arguments has no effect
(but using CallSite
with no arguments matches a call with
no arguments).
![]() | |
This function is deprecated as of version 2020.12. Use
|
Fun(char const *name, fun_options flags = NONE)
—
Match a call to a function, restricted as follows:
If the call is through a function pointer, match if and only
if flags
includes FUNCTION_POINTERS_ALLOWED.
If the call is to a named function (including class methods):
If name
is NULL, then match.
Otherwise, if flags
includes
UNMANGLE_NAME, match if the unmangled name of the called
function equals name
.
Name mangling is a technique used by compilers to encode the type of an entity in its linker symbol name. For more information, see Section 2.8.2, “Name mangling”.
Otherwise, match if the mangled name of the called
function equals name
.
![]() | |
This function is deprecated as of version 2020.12. Use
|
Fun(MultipleNamesTag mntag, const char **fnames, fun_options
flags = NONE)
— Similar to the previous pattern,
except for a call to a named function, match if the name equals any of
the strings in the NULL-terminated fnames
array. If
fnames
begins with NULL, match regardless of the called
function's name.
![]() | |
This function is deprecated as of version 2020.12. Use
|
MemberFun
— Match a call to a nonstatic member
function, and provide patterns for the arguments. At a minimum, a
pattern must be provided for the receiver object (the
instance). Optionally, a sequence of argument
patterns may be specified using operator()
.
![]() | |
This function is deprecated as of version 2020.12. Use
|
Constructor
— Match a call to a
constructor.
CopyConstructor
— Match a call to a copy
constructor, which is a special case of what Constructor
matches.
Destructor
— Match a call to a destructor.
Anyfun
— Match a call to any function (there is no
filtering based on function name). If you specify an argument pattern,
the pattern matches if any argument at the call
site matches the argument pattern. For example,
Anyfun()(Const_int)
matches any call where a literal
int is among the arguments.
![]() | |
This function is deprecated as of version 2020.12. Use
|
There are several patterns that match variable references, possibly taking into account scope and linkage:
Var
— Match a variable used in an expression.
TempVar
— Match a use of a temporary variable
inserted by the parser.
LocalVar
— Match a use of a (nonstatic) local
variable.
StaticVar
— Match a use of a static
variable.
GlobalVar
— Match a use of a global
variable.
Parm
— Match a use of a formal parameter.
FunctionDecl
— Match a use of a function as an
expression (either the called expression in a function call, or taking
the address of a function).
FunLocal
— Match a use of a local variable, formal
parameter, or a field of (recursively) a class/struct/union-valued local
or parameter. This corresponds to stack-allocated storage (except
arrays).
Several patterns match expressions with certain types:
Array
— Match an expression with array
type.
Const_obj
— Match an expression whose type has the
const
qualifier.
ExprWithType (TypePattern &)
— Matches an
expression whose type is matched by the given
TypePattern
. For example,
Integer integeris equivalent to:
IntegralType itype; ExprWithType integer(itype);
Float
— Matches an expression with a floating point
type (such as float
or double
). Same as
ExprWithType (FloatType)
.
FloatType
— Matches a floating point type (float,
double, or long double). For example, you can pass this as a parameter
to ExprWithType's constructor.e.
FunctionPointer
— Match an expression with pointer
to function type.
Integer
— Matches an expression with an integral
type. Same as ExprWithType (IntegralType)
.
IntegralType
— Matches an integral type, for
instance int
, char
or bool
but
not float
or double
.
NonConstAddr
— Like
Addr
(match a use of the address-of operator), except it does not
match if the object whose address is taken has the const
type qualifier (for instance, when using a const reference function
argument).
Pointer
— Match an expression with pointer or
reference type.
Reference
— Match an expression with a C++
reference type.
Scalar
— Matches an expression with a scalar type.
Same as ExprWithType (ScalarType)
.
ScalarType
— Matches any scalar type (integral or
floating point).
Struct
— Match an expression with
struct
or class
type.
Union
— Match an expression with union
type.
Program analyses often need to detect certain kinds of more complex expression patterns. Several Extend SDK patterns do just this:
Arg
— Match an expression used as an argument to a
function call. Accessors are provided to navigate to the call itself,
and to see where in the call list the matched expression appeared. Will
not strip casts.
ConditionPattern
— Match an expression used as a
guard for a control flow statement or short-circuit operator.
Offset
— Match an expression that is the same as or
an offset off of a given expression (pattern). For example, if you have
an AST p
that denotes a pointer, then
Offset(Same(p))
matches an expression like
&p->foo
which denotes a pointer to the same object
that p
does, but displaced by the offset of field
foo
. It will also recursively handle e.g. &(p
+ 10)->foo
.
AnyField
— Given an expression (pattern), match an
expression that is formed by appending field access operators. For
example, given a
, AnyField(a)
matches
a.b
and a.b.c
.
AnySubpart
— Like AnyField
, except
also allow array accesses, and (optionally) pointer dereferences.
Lval
— Match an lvalue, which
is an expression that can appear on the left-hand side of an assignment
operator. For example, x
and *p
are lvalues,
whereas 3
and a+b
are not (assuming those
expressions use the built-in operators). A function call
f()
that returns a reference is translated by the
parser into an explicit dereference *f()
and the latter is
matched by Lval
.
The evaluation patterns find the subexpression s
of a (potentially)
larger expression e
that specifies the value yielded by e
.
These patterns are most easily explained in terms of the
Evals_to
function.
EvalPattern (Pattern &pat)
— Use
Evals_to
repeatedly to dig down into the matched
expression, stopping as soon as pat
matches a subexpression
returned by Evals_to
.
EvalsToPattern
— Use Evals_to
to dig
down to the smallest subexpression, then attempt to match
pat
against that subexpresssion.
Each kind of statement has its own pattern to match it:
AnyStatement
(also StmtPat
) — Match any
statement.
ExprStmt
— Match a statement containing a single
expression, for example a++;
.
IfPat
— Match an if
statement.
Note that the guard expression for if
, for
, and
while
is normalized to a Boolean, and the match expression
must agree with this normalized form. For example:
int f(); if (f()) { ... }
The test is normalized by the parser to:
if (f() != 0) { ... }Hence, to match this, use:
CallSite call; MATCH(IfPat(call != 0))There is an example of this in
><install_dir>/sdk/samples/patterns/patterns.cpp
.
ForPat
— Match a for
statement.
WhilePat
— Match a while
statement. See
><install_dir>/sdk/samples/whileloopassign/whileloopassign.cpp
for some examples.
SwitchPat
— Match a switch
statement.
ReturnPat
— Match a return
statement. You
can specify whether a value is returned. While there is always a return
statement even when falling through the end of a function, you should use
the
ANALYZE_END_OF_PATH
handler to respond to control flow that exits the function.
Try
— Match a try/catch
statement.
AsmPat
— Match an asm
statement.
LoopPat
— Matches for
, while
and do
statements. This pattern does not match loops with a
condition of constant 0, for example it won't match:
do {} while (0);because this isn't a loop. You can use
DoWhilePat
to match this case.
Use:
ExprPat loop_cond; LoopPat loop(loop_cond);to match a loop and pull out the conditional expression. You can then check the conditional (for example is a <= b, or a > b) and act accordingly.
The following classes are part of the implementation of Same
and PtrSame
, but they are not intended to be used directly.
Binding
PatternBinding
PtrBinding
PtrBindingPattern
ContainsPat (Pattern &subpattern)
— Match a
tree that has a subtree that matches subpattern
.
InContextPat (Pattern &context)
— Match a
tree that has a parent tree that matches context
.
InStatementContextPat
— Similar to
InContextPat
, but only searches within the closest
enclosing statement.
SubTreePat (tree t)
— Match any subtree of
t
.
STL_container
— Match an expression that denotes an
instance of an STL container. Member functions are provided that match
invocations of corresponding member functions on the
container..
STL_iterator
— Match an expression that denotes
an instance of an STL iterator. Member operators are provided that match
applications of operators to the iterator.
ExitScope
— Match a local variable going out of
scope. This matches when exiting using a return
statement,
or any other way the flow goes out of a block (for example,
break
or goto
statements).
DeadVariable
— Match a local variable becoming
dead, which means that its value is not used
again on the current path. The main purpose of this pattern is to
optimize a checker's performance by removing useless mappings from the
store.
constant(int i)
— Equivalent to
Const_int(i)
.
assign(ExpressionPattern &a, ExpressionPattern
&b)
— Equivalent to Assign(a,
b).
cast(ExpressionPattern &e)
— Equivalent to
Cast(e)
.
opt_cast(ExpressionPattern &e):
—
Equivalent to Or(e, Cast(e))
.
Const
— Build a Const_obj
pattern.
And(Pattern &a, Pattern &b, ...)
—
Match if all of the argument patterns match.
Available for all pattern hierarchies.
Or(Pattern &b, Pattern &b, ...)
— Match
if any of the argument patterns match. Available
for all pattern hierarchies.
Not(Pattern &a)
— Match if a
does not match.
Contains
— Build a ContainsPat
pattern.
Within
— Build an InContextPat
pattern.
WithinStatement
— Build an
InStatementContextPat
pattern.
Evals_to
— Look in an expression e
to
find the subexpression s
that determines the value yielded
by e
. For example, Evals_to(a = b
) returns
b
. The exact forms that are recognized are documented
in the comments above the declaration in
><install_dir>/sdk/headers/patterns/extend-patterns.hpp
.
A number of additional AST query functions are declared in
<install_dir>/sdk/headers/extend/extend-lang.hpp
, which are
described next.
The type safe Abstract Syntax Tree can be accessed from Coverity Extend SDK checkers. Details about
the interface are located in <install_dir>/sdk/headers/ast/cc.ast
.
Since the analysis works one function at a time, several functions return information about the current function being analyzed.
current_function_get_mangled_name
— Returns the full, mangled name of the current
function.
current_function_get_name
—
Returns the identifier for the current function, for example foo
.
This name is never mangled.
current_function_get_signature
— If the function's name is
mangled (that is, a C++ function, but not mixed code using extern
"C"
), returns the result of demangling. This includes scope and
parameter type information, for example N::foo(int)
. Otherwise,
this returns the identifier of the function, for example,
printf
.
current_function_get_class_name
— Returns the scope in which
the current function is defined, for example N
. For a function in
the global scope, returns the empty string (""
).
current_function_is_ctor
— Returns true if the current
function is a constructor.
current_function_is_dtor
— Returns true if the current
function is a destructor.
current_function_is_pure_virtual
— Returns true if the
current function is a pure virtual function. Note that a pure virtual function
is defined as follows:
class MyClass { ... virtual void pureDefined() = 0; // combining pure and inline NOT allowed }; void MyClass::pureDefined() { /* but this IS allowed */ }
current_function_is_virtual
— Returns true if the current
function is a virtual function.
Additional queries return information about the file in which the current function is defined.
current_file_get_name
— Returns the name of the file in
which the current function is defined.
current_file_lineno
— Returns the line number where the
current AST fragment appears.
These queries return information about the current tree.
is_tree_in_macro
— If the current tree is inside a macro,
returns the macro's name. Otherwise, it returns NULL.
get_type_of_tree(tree t)
— Returns the
type_t
representing the type of t
. If
t
does not have a type, returns NULL.
get_size_of_type(type_t *t)
— Returns the representation
size in bytes for objects of type t
.
The store is the primary data structure for a flow-sensitive checker (a flow-insensitive checker has no store). It is an approximation of a set of states that a real, running program might be in. As the checker walks over the program's abstract syntax tree (one function at a time), it simulates the program's behavior by changing the store in response to program operations such as assignments and function calls.
The store is a map from abstract syntax trees denoting expressions (AST nodes) to a pair consisting of an integer and an event sequence:
store : tree -> (int, event[])
The integer part of a mapping value is an abstract value. Whereas a real program (typically) has an infinite state space of concrete values, a checker reduces these down to a finite number of abstract values so that the checker can terminate. Much of the art of checker design is in choosing the set of abstract values. Although the examples in this manual are meant to suggest basic approaches, there are no hard and fast rules about abstract domain design, so some experimentation is required.
The event sequence part of the mapping value is for defect reporting purposes. It summarizes the sequence of abstract state transitions that have occurred for the mapped expression, so that a user viewing the report can understand what the checker did when it arrived at some conclusion. Note that an expression must be mapped to some abstract value before events can be attached to it.
The Extend SDK API provides several macros for querying and manipulating the store, as documented in the following sections.
For a detailed example of using the store, see Section 1.4.5, “Example: tracking the sign of expressions”.
If there is no mapping for t
, creates a mapping, setting the integer
component to v
and the event sequence component to the empty
sequence.
If there is no mapping for t
, returns false.
If there is a mapping for t
, returns true, and sets v
to
equal the integer component of that mapping.
If there is a mapping for t
, and the integer component of that mapping is
equal to v
, then returns true.
Otherwise, returns false.
First, calls CLEAR_STATE(dst)
.
Next, if there is no mapping for src
, returns false.
Otherwise, creates a mapping for dst
, sets its integer value and event
sequence to equal those of src
, and returns true.
For each mapping in the store, bind t
to the tree and v
to
the integer component, then execute body
. The body is syntactically in a
for
loop, so break
and continue
can be used
to control the iteration.
The loop iterates over the mappings in an undefined order.
It is an error to modify the store during the iteration.
If there is no mapping for t
, returns false.
Otherwise, appends a new event to the event sequence to which t
is
mapped, and returns true.
The new event is constructed using tag
and desc
. The latter
is evaluated as the right-hand argument to operator <<(ostream&)
so you can construct complicated event strings, for example:
ADD_EVENT(t, "my_tag", "One plus " << 1 << " is " << (1+1));See Chapter 2.6, Adding events for more information on tags.
If there is no mapping for t
, or t
has no events, returns
false.
If tag
is neither NULL nor the empty string (""
), outputs
the event sequence associated with t
, plus one more
event, constructed from tag
and event
as described in ADD_EVENT
.
![]() | |
Due to these rules, you must create a mapping for a tree
before you can add or output events. Further, you need to add at least one event (by
using |
For a full description of this macro, see ADD_INPUTFILE_EVENT (tree, /*extend_inputfile_t*/ f, line, tag, text_utf8).
For a full description of this macro, see COMMIT_INPUTFILE_ERROR (tree, /*extend_inputfile_t*/ f, line, tag, text_utf8).
A defect report (error) from a checker consists of a nonempty sequence of events. An event is a pair consisting of two strings: a tag, and a description. A typical checker creates an event each time it updates its store, and a final event when it outputs an defect report. Event tags are short, single-word strings. Typically, a checker has one tag for each major kind of event that it creates.
You can suppress defect reports by adding source code annotations as described in the
Coverity 2020.12 Checker Reference
. The event tag named is in the annotation.
Event descriptions are arbitrary strings that describe what is happening and what the defect is to the user.
The main way to create a defect report is to use
ADD_EVENT
and
COMMIT_ERROR
. However, using OUTPUT_ERROR(desc)
outputs a single-event defect report immediately, bypassing the store entirely. Input file
macros are also used for producing events and errors (see Chapter 1.9, Reporting events and defects on input files).
Outputting a defect does not necessarily mean it will ultimately be put into the final list of defects, due to two pass checking, which is explained in Section 1.7.3, “Two-pass checking”.
As explained in Chapter 1.8, Examining the class hierarchy, types are represented using
objects drawn from a class hierarchy rooted at type_t
. Each of
the classes and its fields is described in the following sections.
This class is the superclass of all type representation classes. It does not have any data members.
It has a number of virtual functions that can be used to determine which
type_t
subclass that an object is. This information is also
available using standard run-time type identification, but these methods are sometimes
more convenient. For each subclass SUB
, there are methods:
const SUB &as_SUB() const
— If this object has
dynamic type SUB
, returns a reference to it as such.
Otherwise, throws invalid_type_exception
.
SUB &as_SUB()
— Same as previous, but
accepts/returns a non-constant reference.
const SUB *as_SUB_p() const
— If this object has
dynamic type SUB
, returns a pointer to it as such.
Otherwise, returns NULL.
SUB *as_SUB_p()
— Same as previous, but
accepts/returns a non-constant pointer.
This is a placeholder type for JavaScript, since it is a dynamic language and detailed
type information is not available. Most types in JavaScript will be treated as pointers
to any_type_t
. The exceptions to this are scalar types (e.g. integer
literals) and pointers to the "char
" scalar type (e.g. string literals). As
such, the matching described for most types in this chapter is not applicable to
JavaScript.
This class represents a fundamental type such as int
.
It has the following fields and methods.
string name
— Name of the type. Possible values
are:
"void"
"bool"
"char"
"signed char"
"unsigned char"
"short"
"unsigned short"
"int"
"unsigned int"
"long"
"unsigned long"
"long long"
"unsigned long long"
"float"
"double"
"long double"
int size
— Size in bytes of objects of this
type.
bool m_is_float
— True if this type is a
floating-point type (one of float
, double
or
long double
).
bool m_is_signed
— True if this type is any of the
floating-point types, or one of the signed integer types.
This class represents a pointer, reference, or pointer to member type.
type_t pointed_to
— The referent type. If it is a
member_type_t
, then this type represents a
pointer to member.
bool m_is_ref
— True if this type is a reference,
false if it is a pointer or pointer to member.
This class represents an array type.
type_t element_type
— The type of the elements.
int element_count
— Number of elements in the array. A
value of -1 indicates that it is a variable length array.
This class represents a variant of an underlying type but with
const
or volatile
(or both) applied.
type_t target
— The type that is being wrapped with
cv-qualifiers. Cannot be cv_wrapper_type_t
.
v_flag_t flags
— Bitmap of cv-qualifiers being
applied. Never 0.
This class represents a function type.
type_t args[]
— The sequence of argument types. For
nonstatic methods, the receiver object (this
) type is the first
argument.
type_t return_type
— The return type of the
function.
bool has_varargs
— If true, the function accepts a
variable number of arguments. The args
sequence has the
required parameter types (those that precede "..."
).
This class is a superclass of types that the user can define.
string name
— The name of the type.
string unmangled_name
— The name of the type, mangled
as defined by the IA64 C++ ABI.
scope_t scope
— The scope in which this type
appears.
This class represents a named scope in the program being analyzed.
string name
— Name of the scope mangled as defined by
the IA64 C++ ABI.
string unmangled_name
— Unmangled name of the
scope.
scope_t parent
— Parent scope, or NULL.
This class is the superclass of typedef_type_t
.
string file
— The file in which the definition appears.
If it is only declared, this is the file in which the declaration
appears.
int line_no
— The line number of the previous
declaration or definition.
This class is a superclass of the types that can be forward-declared: class/union or enum only.
bool is_defined
— Indicates that the object is an
internal_defined_class_type_t
or
internal_defined_enum_type_t
, and the
as_defined
function can be called.
bool is_forward_declared
— Indicates that the type was
forward-declared and the definition is not contained in the
forward_declarable_type_t
object itself.
is_forward_declared
is the negation of
is_defined
. To find out if a definition is available,
use the has_definition
function.
bool is_unnamed
— Indicates if the type was originally
unnamed. The get_name
function returns a generated
name.
bool has_definition
— Indicates if there is a
definition for this type.
This class represents an enumeration type.
int size
— The size in bytes of an object of this
enumeration's type.
tag_t values[]
— The sequence of enumerators in this
enumeration.
This class represents a single enumerator in an enumeration.
string name
— The name of the enumerator.
int value
— The value to which the enumerator is
defined.
This class represents a union type.
int size
— The size in bytes of this union.
field_t fields[]
— The set of members of this
union.
This class represents a member of a union.
string name
— The name of the member.
type_t type
— The type of the member.
bool is_bit_field
— Determine if the field is a bit
field.
bool is_anonymous_bit_field
— Determine if the field
is an anonymous bit field.
defined_class_type_t get_owner_class
— Retrieve the
owner class as defined_class_type_t
.
class_type_t *owner_class
— Retrieve the type as
member_type_t
.
unsigned get_offset
— Retrieve the offset, in bytes,
from the beginning of the object.
unsigned char get_bit_offset
— If a bit field, this is
the non-byte offset (0 otherwise). The bit field's bit offset is therefore
get_offset() * 8 + get_bit_offset()
.
bool is_signed_bitfield
— True if the type of a
bitfield is explicitly signed, For example, signed int
.
bool is_mutable
— Determine if this is a mutable field
(mutable keyword).
This class represents a class or struct type.
bool m_is_struct
— True if this object represents a
struct, false if it represents a class.
int size
— The size in bytes of this class.
parent_t parents[]
— The sequence of base
classes.
This class represents an immediate parent class of another class.
AccessKeyword access
— The accessibility of the base
class.
bool is_virtual
— True if this is a virtual base
class.
class_type_t get_class()
— A member function.
This class represents a function.
type_t *get_return_type
— Get the return type of this
function.
function_type_t *get_ftype
— Get the function
type.
bool is_virtual
— Determine if the method is virtual.
bool is_pure
— Determine if the method is pure
virtual.
bool is_nonstatic_method
— Determine if the method is
non-static.
bool is_static_method
— Determine if the method is
static.
bool is_ctor
— Determine if the method is a
constructor.
bool is_dtor
— Determine if the method is a
destructor.
This class represents the type of a member of a class.
type_t member_type
— The type of this class
member.
class_type_t c
— The class of this member.
For a description, see Chapter 1.9, Reporting events and defects on input files.
The following sections contain header file and name mangling reference information.
The many header files in the <install_dir>/sdk/headers
directory are
necessary to compile an Coverity Extend SDK checker. However, only the following subset of these files
declare or define functionality that can be used in an Coverity Extend SDK checker.
ast/astnode.hpp
- Defines the ASTNode
class, the
base class of abstract syntax tree (AST) node types, such as expressions and
statements.
ast/cc.ast
- Defines the AST node types, representing
statements, expressions, and so on. It serves as the source for the generated header
ast/cc.ast.hpp
.
ast/cc_flags.hpp
- Defines enumerations used by AST node
types.
extend/extend-lang.hpp
- Defines the primary set of macros that an
Coverity Extend SDK checker uses.
Note that extend-checker-types
is not useful for end users.
symbols/field.hpp
- Defines the field_t
type, a
symbol_t
representing a data member.
symbols/function.hpp
- Defines the function_t
type, a symbol_t
representing a function.
symbols/symbol.hpp
- Defines the symbol_t
type,
which is the base type for all the symbols (such as variables and functions).
symbols/variable.hpp
- Defines the variable_t
type, a symbol_t
representing a variable (global or local, including
static data members).
types/extend-types.hpp
- Defines the type_t
hierarchy, representing types.
types/scalar-types.hpp
- Defines enums and macros to go with
the scalar_type_t
type.
You can also use standard C/C++ headers.
Name mangling is a technique used by compilers to encode the type of an entity in its
linker symbol name. This is necessary because of function overloading:f(int)
and f(int,int)
are distinct entities, so their linker symbol names must be
distinct.
For C++ name mangling, Coverity uses the IA64 C++
ABI name mangling scheme, regardless of whether it is running
on the IA64 platform or not. Table 2.8.1, “C++ Mangled names” provides
the mangled names of two functions, as defined by that scheme.
Coverity uses a specialized grammar to represent the names and signatures of C# types, fields, methods, and so on. The rules for C# name mangling are as follows:
Nested namespaces are separated by a period:
System.Data
Types directly contained by a namespace are separated by a period:
System.Math
Types represented by special keywords, such as int or void, are
represented by their equivalent fully-qualified system types:
System.Int32
Array types are represented as in C#, by appending brackets containing
zero or more commas: System.Char[,]
Nested types are separated by a solidus:
OuterClass/InnerClass
Unconstructed generic types and methods are represented by appending a
backtick and the generic arity (the number of type parameters) to the
mangled type or method name:
System.Collections.Generic.Dictionary`2
Constructed generic types are represented by appending to the mangled type
name a comma-separated list of mangled type arguments in angle brackets:
System.Collections.Generic.Dictionary`2<
System.Int32,System.String
>
Type members other than nested types are separated from their containing
type by a double colon: MyClass::myField
Method names are followed by first, a parenthesized, comma-separated list
of mangled formal parameter types (ote that the comma is not followed by a
space), and second, the mangled return type:
System.Math::Sin(System.Double)System.Double
Class constructors, instance constructors and destructors are treated as
void-returning methods named .cctor
,
.ctor
, or .dtor
respectively:
System.Object::.ctor()System.Void
Property accessors are represented as though they were methods; the
property name is prepended with get_
or set_
as
appropriate: System.String::get_Chars()System.Char[]
Similarly, event accessors are represented as though they were methods;
the property name is prepended with add_
and
remove_
as appropriate.
Indexers are treated as though they were methods; the indexer name is
get_Item
or set_Item
as appropriate:
System.String::get_Item(System.Int32)System.Char
User-defined operators are treated as though they were methods named
op_Addition
, op_LogicalNot
. The full list is
as follows:
!= | op_Inequality |
< | op_LessThan |
> | op_GreaterThan |
<= | op_LessThanOrEqual |
>= | op_GreaterThanOrEqual |
* | op_Multiply |
/ | op_Division |
% | op_Modulus |
+ | op_Addition / op_UnaryPlus |
- | op_Subtraction / op_UnaryNegation |
<< | op_LeftShift |
>> | op_RightShift |
& | op_BitwiseAnd |
| | op_BitwiseOr |
^ | op_ExclusiveOr |
! | op_LogicalNot |
~ | op_OnesComplement |
++ | op_Increment |
-- | op_Decrement |
operator true | op_True |
operator false | op_False |
implicit operator <type > | op_Implicit |
explicit operator <type > | op_Explicit |
Formal type parameters are represented using the name of the parameter.
Unsafe pointer types have a *
appended to the mangled
name.
For example: the mangled name of the method in:
namespace N { class O<T> { class I<U, V> { void M<W>(L<W> w, int[] i) {} } } class L<X> {} }
Would be:
N.O`1/I`2::M`1(N.L`1<W>,System.Int32[])System.Void
Coverity uses a specialized grammar to mangle the names of Java identifiers
(Identifier
), which include the names of packages, classes, fields,
methods, and so on. Table 2.8.2, “Java Mangled name grammar” provides synopses of that
grammar, and the code sample that
follows it provides the mangled name for several identifiers in the sample.
Table 2.8.2. Java Mangled name grammar
Identifier type | Synopsis[a] |
---|---|
ClassName | { Identifier "$" } Identifier |
MethodName | Identifier
|
FieldName | Identifier |
PackageName | Identifier { "." Identifier } |
MangledClassName | [ PackageName "." ] ClassName {
" [ ]" } |
MangledFieldName | MangledClassName "." FieldName |
PrimitiveTypeName | "boolean" | "char" | "float" |
"double" | "byte" | "short" |
"int" | "long" |
TypeName | MangledClassName | PrimitiveTypeName {
" [ ]" } |
ReturnTypeName | TypeName | "void" |
ArgList | TypeName { ", " TypeName } |
MangledMethodName | MangledClassName "." MethodName "(" [
ArgList ] ")"
ReturnTypeName |
[a] The synopses use the following syntax:
|
The comments in the following code sample list the mangled names of classes, fields, methods, and other items in a sample Java class.
// (c) 2017, Synopsys Inc. All rights reserved worldwide. package p; class Outer { // p.Outer[1] int i = 24; // p.Outer.i[2] String str = "s"; // p.Outer.str // The following "virtual" function is created to // initialize all non-static member fields: // p.Outer.<instance_field_initializations
>()void[3] static final int a = 1; // p.Outer.a static int b = 2; // p.Outer.b static { b = 777; } // The following "virtual" function is created to do // all the static initializations for this class: // p.Outer.<clinit
>()void Outer() { // p.Outer.<init
>()void int localVar = 4; // localVar[4] } int foo(int i, int j) { // p.Outer.foo(int, int)int return 21; } void printme(String s) { // p.Outer.printme(java.lang.String)void[5] System.out.println(s); // java.io.PrintStream.println(java.lang.String)void } class Inner { // p.Outer$Inner[6] int field = 1; // p.Outer$Inner.field int bar(int i, int j) { // p.Outer$Inner.bar(int, int)int[7][8] return 22; } } void testAnonClass(String s) { // p.Outer.testAnonClass(java.lang.String)void final String capturedLocal = s; // capturedLocal Inner anonymousInstance = new Inner() { // p.Outer$1[9] // The constructor that is generated for an anonymous class // takes the containing class instance as an argument: // p.Outer$1.<init
>(p.Outer)void // // In addition, a synthetic field called "this$0" is // created to refer to the containing class instance. // And synthetic fields called val$<var_name
> are created // for any captured local variables from the containing class. public String getS() { // p.Outer$1.getS()java.lang.String Outer myParent = // myParent Outer.this; // this$0 return capturedLocal; // val$capturedLocal[10] } }; } }
![]() | |
Also note that Java class constructors have different mangled names than C++
constructors. For example, compare the C++ name |
[1] If a class is declared within a package, then its mangled class name is always prepended with its package name. Examples: java.lang.String
, Package.ClassA
[2] Class field names are always prepended with the full class name and the package name, if any. Example: Package.ClassA.field
[3] A few special methods are created automatically (if applicable) for each class:
<init
> - A class constructor.
<clinit
> - A method containing all the static initalizers for a class.
<instance_field_initializations
> - A method containing the initializers for non-static member fields.
[4] Local variable names do not require any mangling. The package and class names are not prepended for local variables.
[5] Unless a type is one of the primitive types (such as boolean, char, or int), its mangled type name includes the package it belongs to. Example: java.lang.String
[6] Nested class names are denoted with $
. Example: Package.Outer$Inner1$Inner2
[7] The return type for a method is located at the end of the mangled
name, rather than at the beginning. Example:
Package.ClassA.printString(java.lang.String)void
.
Otherwise, mangled method names are very readable (especially compared
to mangled C functions).
[8] When matching the mangled name of a method, be careful to
notice the space between arguments. For example, attempting to
match the name foo(int,int)void
will not result in
a match. You must use the following, instead: foo(int,
int)void
[9] Anonymous classes have no real name, so they are assigned numbers. The numbers are assigned arbitrarily and you should not depend on their appearance in a particular order. For the purpose of name mangling, anonymous classes are treated as nested classes. Example: Package.Outer$1
[10] There are two kinds of synthetically-created variables, and they are both related to the way that an anonymous class captures values from its enclosing class or method.
this$0
- Pointer to the enclosing class instance.
val$<
- Captured local variables. When a final local variable is used inside an anonymous class, this name is used to refer to it.original_variable_name
>
Table of Contents
Some of the sample checkers that were too long to include in the previous sections are reproduced here.
![]() | |
The complete set of source code and makefiles are available in the
|
// keep track of the sign of each expression #include "extend-lang.hpp" // Extend SDK API // -------------------- utilities ------------------------ // skip past pathname component of a file name char const *strip_path(char const *fname) { // find last slash; don't want to rely on strrchr being present for (char const *p = fname; *p; p++) { if (*p == '/') { fname = p+1; // go one past this (maybe last) slash } } return fname; } // print out the current file/line (stripping the path of the file), // and return an ostream for additional printing ostream &cout_loc() { return cout << strip_path(current_file_get_name()) << ":" << current_file_lineno() << ": "; } // -------------------- AbsValue ------------------------ // abstract value domain enum AbsValue { AV_NEGATIVE, // < 0 AV_NEG_ZERO, // <= 0 AV_ZERO, // 0 AV_POS_ZERO, // >= 0 AV_POSITIVE, // > 0 AV_UNKNOWN // unknown; only for return value from abstract // arithmetic, not to be put into store }; // confirm the int is in the right range for an AbsValue void bcAbsValue(int i) { assert((unsigned)i < AV_UNKNOWN); } // map from int to AbsValue; this is necessary because the store // stores ints, not AbsValues, as its declared type AbsValue toAbsValue(int i) { bcAbsValue(i); return (AbsValue)i; } // print an abstract value ostream& operator<< (ostream &os, AbsValue v) { switch (v) { default: assert(!"bad AbsValue code"); case AV_NEGATIVE: return os << "AV_NEGATIVE"; case AV_NEG_ZERO: return os << "AV_NEG_ZERO"; case AV_ZERO: return os << "AV_ZERO"; case AV_POS_ZERO: return os << "AV_POS_ZERO"; case AV_POSITIVE: return os << "AV_POSITIVE"; } } // ------------------ abstract operations -------------------- // abstract addition; assumes overflow can't happen AbsValue abstractAdd(AbsValue a, AbsValue b) { static AbsValue const map[5][5] = { // b: a: <0 <=0 0 >=0 >0 /* <0 */ { AV_NEGATIVE, AV_NEGATIVE, AV_NEGATIVE, AV_UNKNOWN, AV_UNKNOWN }, /* <=0 */ { AV_NEGATIVE, AV_NEG_ZERO, AV_NEG_ZERO, AV_UNKNOWN, AV_UNKNOWN }, /* 0 */ { AV_NEGATIVE, AV_NEG_ZERO, AV_ZERO, AV_POS_ZERO, AV_POSITIVE }, /* >=0 */ { AV_UNKNOWN, AV_UNKNOWN, AV_POS_ZERO, AV_POS_ZERO, AV_POSITIVE }, /* >0 */ { AV_UNKNOWN, AV_UNKNOWN, AV_POSITIVE, AV_POSITIVE, AV_POSITIVE }, }; bcAbsValue(a); bcAbsValue(b); return map[a][b]; } AbsValue abstractSub(AbsValue a, AbsValue b) { // just invert the sign of 'b' and add bcAbsValue(b); return abstractAdd(a, toAbsValue(AV_POSITIVE - b)); } // ------------------------ the checker ---------------------- // This store maps expressions to AbsValue; unmapped expressions // have unknown sign. START_EXTEND_CHECKER( sign, int_store ); ANALYZE_TREE() { // integer literal? Const_int ci; if (MATCH(ci)) { if (ci.llval() < 0) { SET_STATE(CURRENT_TREE, AV_NEGATIVE); } else if (ci.llval() == 0) { SET_STATE(CURRENT_TREE, AV_ZERO); } else { SET_STATE(CURRENT_TREE, AV_POSITIVE); } return; } // unsigned variable? Scalar scal; Var var; if (MATCH(var) && MATCH(scal) && scal.get_type()->is_unsigned()) { int v; if (GET_STATE(CURRENT_TREE, v) && v == AV_POSITIVE) { // 'var' is already known to be positive, so leave it alone } else { // set it to >= 0 SET_STATE(CURRENT_TREE, AV_POS_ZERO); } } // arithmetic? Expr a,b; if (MATCH(a+b)) { int va, vb; if (GET_STATE(a, va) && GET_STATE(b, vb)) { AbsValue v = abstractAdd(toAbsValue(va), toAbsValue(vb)); if (v != AV_UNKNOWN) { SET_STATE(CURRENT_TREE, v); } } return; } if (MATCH(a-b)) { int va, vb; if (GET_STATE(a, va) && GET_STATE(b, vb)) { AbsValue v = abstractSub(toAbsValue(va), toAbsValue(vb)); if (v != AV_UNKNOWN) { SET_STATE(CURRENT_TREE, v); } } return; } // assignment? if (MATCH(a = b)) { COPY_STATE(a, b); return; } // query for abstract value? if (MATCH(CallSite("whatis")(a))) { int val; if (GET_STATE(a, val)) { cout_loc() << a << " has value " << toAbsValue(val) << endl; } else { cout_loc() << a << " has unknown value" << endl; } return; } // print entire store? if (MATCH(CallSite("print_store"))) { cout_loc() << "print_store:\n"; int mappings = 0; const ASTNode* t; int v; FOREACH_IN_STORE(t, v) { cout << " " << t << " has value " << toAbsValue(v) << endl; mappings++; } cout << " " << mappings << " mappings" << endl; return; } } END_EXTEND_CHECKER(); MAKE_MAIN( sign )
// keep track of the sign of each expression // Extended from sign.c: Use Extend SDK API output routines. #include "extend-lang.hpp" // Extend SDK API using std::ostringstream; // -------------------- utilities ------------------------ // skip past pathname component of a file name char const *strip_path(char const *fname) { // find last slash; don't want to rely on strrchr being present for (char const *p = fname; *p; p++) { if (p[0] == '/' && p[1] != '\0') { fname = p+1; // go one past this (maybe last) slash } } return fname; } // print out the current file/line (stripping the path of the file), // and return an ostream for additional printing ostream &cout_loc() { return cout << strip_path(current_file_get_name()) << ":" << current_file_lineno() << ": "; } // -------------------- AbsValue ------------------------ // abstract value domain enum AbsValue { AV_NEGATIVE, // < 0 AV_NEG_ZERO, // <= 0 AV_ZERO, // 0 AV_POS_ZERO, // >= 0 AV_POSITIVE, // > 0 AV_UNKNOWN // unknown; only for return value from abstract // arithmetic, not to be put into store }; // confirm (bounds check) that the int is in the right range void bcAbsValue(int i) { assert((unsigned)i < AV_UNKNOWN); } // map from int to AbsValue; this is necessary because the store // stores ints, not AbsValues, as its declared type AbsValue toAbsValue(int i) { bcAbsValue(i); return (AbsValue)i; } // print an abstract value ostream& operator<< (ostream &os, AbsValue v) { switch (v) { default: assert(!"bad AbsValue code"); case AV_NEGATIVE: return os << "AV_NEGATIVE"; case AV_NEG_ZERO: return os << "AV_NEG_ZERO"; case AV_ZERO: return os << "AV_ZERO"; case AV_POS_ZERO: return os << "AV_POS_ZERO"; case AV_POSITIVE: return os << "AV_POSITIVE"; } } // ------------------ abstract operations -------------------- // abstract addition; assumes overflow can't happen AbsValue abstractAdd(AbsValue a, AbsValue b) { static AbsValue const map[5][5] = { // b: a: <0 <=0 0 >=0 >0 /* <0 */ { AV_NEGATIVE, AV_NEGATIVE, AV_NEGATIVE, AV_UNKNOWN, AV_UNKNOWN }, /* <=0 */ { AV_NEGATIVE, AV_NEG_ZERO, AV_NEG_ZERO, AV_UNKNOWN, AV_UNKNOWN }, /* 0 */ { AV_NEGATIVE, AV_NEG_ZERO, AV_ZERO, AV_POS_ZERO, AV_POSITIVE }, /* >=0 */ { AV_UNKNOWN, AV_UNKNOWN, AV_POS_ZERO, AV_POS_ZERO, AV_POSITIVE }, /* >0 */ { AV_UNKNOWN, AV_UNKNOWN, AV_POSITIVE, AV_POSITIVE, AV_POSITIVE }, }; bcAbsValue(a); bcAbsValue(b); return map[a][b]; } // abstract subtraction AbsValue abstractSub(AbsValue a, AbsValue b) { // just invert the sign of 'b' and add bcAbsValue(b); return abstractAdd(a, toAbsValue(AV_POSITIVE - b)); } // ------------------------ the checker ---------------------- // This store maps expressions to AbsValue; unmapped expressions // have unknown sign. START_EXTEND_CHECKER( sign2, int_store ); ANALYZE_TREE() { // integer literal? Const_int ci; if (MATCH(ci)) { if (ci.llval() < 0) { SET_STATE(CURRENT_TREE, AV_NEGATIVE); } else if (ci.llval() == 0) { SET_STATE(CURRENT_TREE, AV_ZERO); } else { SET_STATE(CURRENT_TREE, AV_POSITIVE); } ADD_EVENT(CURRENT_TREE, "literal", "Saw literal value: " << ci.llval()); return; } // unsigned variable? Scalar scal; Var var; if (MATCH(var) && MATCH(scal) && scal.get_type()->is_unsigned()) { int v; if (GET_STATE(CURRENT_TREE, v) && v == AV_POSITIVE) { // 'var' is already known to be positive, so leave it alone } else { // set it to >= 0 CLEAR_STATE(CURRENT_TREE); // avoid lots of 'unsigned' events SET_STATE(CURRENT_TREE, AV_POS_ZERO); ADD_EVENT(CURRENT_TREE, "unsigned", "Variable is unsigned"); } } // arithmetic? Scalar a,b; if (MATCH(a+b)) { // any prior info we might have had regarding "a+b" is irrelevant CLEAR_STATE(CURRENT_TREE); int va, vb; if (GET_STATE(a, va) && GET_STATE(b, vb)) { AbsValue v = abstractAdd(toAbsValue(va), toAbsValue(vb)); if (v != AV_UNKNOWN) { // at this time, there is no way to copy the events from two // different sources, so just get what I can ... bug 3439 COPY_STATE(CURRENT_TREE, a); SET_STATE(CURRENT_TREE, v); ADD_EVENT(CURRENT_TREE, "addition", "Addition: " << a << " (" << toAbsValue(va) << ") plus " << b << " (" << toAbsValue(vb) << ") yields " << v); } } return; } if (MATCH(a-b)) { CLEAR_STATE(CURRENT_TREE); int va, vb; if (GET_STATE(a, va) && GET_STATE(b, vb)) { AbsValue v = abstractSub(toAbsValue(va), toAbsValue(vb)); if (v != AV_UNKNOWN) { COPY_STATE(CURRENT_TREE, a); SET_STATE(CURRENT_TREE, v); ADD_EVENT(CURRENT_TREE, "subtraction", "Subtraction: " << a << " (" << toAbsValue(va) << ") minus " << b << " (" << toAbsValue(vb) << ") yields " << v); } } return; } // assignment? if (MATCH(a = b)) { COPY_STATE(a, b); ADD_EVENT(a, "var_assign", "Assigning " << a << " to value of " << b); return; } // possible conversion error? IntegralType destType; Cast cast(a, destType); // cast from expression 'a' to type 'destType' if (MATCH(cast) && !a.get_type()->is_unsigned() && destType.is_unsigned()) { int v; if (GET_STATE(a, v)) { if (v == AV_NEGATIVE) { COMMIT_ERROR(a, "conversion_error", a << " is converted to 'unsigned' but is known to be negative"); } else if (v == AV_NEG_ZERO) { COMMIT_ERROR(a, "conversion_error", a << " is converted to 'unsigned' but may be negative"); } else { // we know it is *not* negative, so the cast is safe } } else { OUTPUT_ERROR(a << " is converted to 'unsigned' but may be negative"); } } // query for abstract value? if (MATCH(CallSite("whatis")(a))) { int val; if (GET_STATE(a, val)) { COMMIT_ERROR(a, "whatis", a << " has value " << toAbsValue(val)); } else { // here, COMMIT_ERROR would do nothing OUTPUT_ERROR("whatis: " << a << " has unknown value"); } return; } // print entire store? if (MATCH(CallSite("print_store"))) { ostringstream os; os << "print_store: "; int mappings = 0; const ASTNode* t; int v; FOREACH_IN_STORE(t, v) { if (mappings > 0) { os << ", "; } os << t << " has value " << toAbsValue(v); mappings++; } os << "; " << mappings << " mappings"; OUTPUT_ERROR(os.str()); return; } } END_EXTEND_CHECKER(); MAKE_MAIN( sign2 )
// keep track of the sign of each expression // Extended from sign.c: Use Extend SDK API output routines. #include "extend-lang.hpp" // Extend SDK API #if 1 # define DIAGNOSTIC(stuff) cout << stuff << endl /* user ; */ #else # define DIAGNOSTIC(stuff) ((void)0) /* user ; */ #endif // -------------------- AbsValue ------------------------ // abstract value domain enum AbsValue { AV_NEGATIVE, // < 0 AV_NEG_ZERO, // <= 0 AV_ZERO, // 0 AV_POS_ZERO, // >= 0 AV_POSITIVE, // > 0 AV_UNKNOWN, // unknown; only for return value from abstract // arithmetic, not to be put into store NUM_ABSVALS }; #define FOREACH_ABSVAL(var) \ for(AbsValue var = AV_NEGATIVE; var < NUM_ABSVALS; \ var = (AbsValue)(var+1)) // confirm (bounds check) that the int is in the right range void bcAbsValue(int i) { assert((unsigned)i < AV_UNKNOWN); } // this one allows the AV_UNKNOWN value void bcAbsValueU(int i) { assert((unsigned)i < NUM_ABSVALS); } // map from integer code to AbsValue; this is necessary because the // store stores ints, not AbsValues, as its declared type AbsValue toAbsValue(int i) { bcAbsValueU(i); return (AbsValue)i; } // map a integer value to abstract value AbsValue abstractSingleValue(long long v) { if (v < 0) { return AV_NEGATIVE; } else if (v == 0) { return AV_ZERO; } else { return AV_POSITIVE; } } // print an abstract value ostream& operator<< (ostream &os, AbsValue v) { switch (v) { default: assert(!"bad AbsValue code"); case AV_NEGATIVE: return os << "AV_NEGATIVE"; case AV_NEG_ZERO: return os << "AV_NEG_ZERO"; case AV_ZERO: return os << "AV_ZERO"; case AV_POS_ZERO: return os << "AV_POS_ZERO"; case AV_POSITIVE: return os << "AV_POSITIVE"; case AV_UNKNOWN: return os << "AV_UNKNOWN"; } } // ------------------ abstract operations -------------------- // abstract addition; assumes overflow can't happen AbsValue abstractAdd(AbsValue a, AbsValue b) { static AbsValue const map[AV_UNKNOWN][AV_UNKNOWN] = { // b: a: <0 <=0 0 >=0 >0 /* <0 */ { AV_NEGATIVE, AV_NEGATIVE, AV_NEGATIVE, AV_UNKNOWN, AV_UNKNOWN }, /* <=0 */ { AV_NEGATIVE, AV_NEG_ZERO, AV_NEG_ZERO, AV_UNKNOWN, AV_UNKNOWN }, /* 0 */ { AV_NEGATIVE, AV_NEG_ZERO, AV_ZERO, AV_POS_ZERO, AV_POSITIVE }, /* >=0 */ { AV_UNKNOWN, AV_UNKNOWN, AV_POS_ZERO, AV_POS_ZERO, AV_POSITIVE }, /* >0 */ { AV_UNKNOWN, AV_UNKNOWN, AV_POSITIVE, AV_POSITIVE, AV_POSITIVE }, }; bcAbsValue(a); bcAbsValue(b); return map[a][b]; } // abstract subtraction AbsValue abstractSub(AbsValue a, AbsValue b) { // just invert the sign of 'b' and add bcAbsValue(b); return abstractAdd(a, toAbsValue(AV_POSITIVE - b)); } // -------------------- abstract comparison ----------------- // Record the consequences of learning that the comparison 'a' op 'b' // is true. struct AbstractComparisonResult { // if this is false, we know the comparison could *not* be true, // so we will abort the path bool consistent; // what is the new approximation for 'a' and 'b'? AbsValue newAValue; AbsValue newBValue; }; // the type of a precomputed relation map; maps from the abstract // values of its arguments to the abstract comparison result typedef AbstractComparisonResult RelationMap[NUM_ABSVALS][NUM_ABSVALS]; // concrete relational operator typedef bool (*ConcreteOperator)(int a, int b); // compute best approximation of union of concrete values // represented by 'a' and 'b' AbsValue greatestLowerBound(AbsValue a, AbsValue b) { if (a == AV_UNKNOWN || b == AV_UNKNOWN) { return AV_UNKNOWN; } static AbsValue const map[AV_UNKNOWN][AV_UNKNOWN] = { // b: a: <0 <=0 0 >=0 >0 /* <0 */ { AV_NEGATIVE, AV_NEG_ZERO, AV_NEG_ZERO, AV_UNKNOWN, AV_UNKNOWN }, /* <=0 */ { AV_NEG_ZERO, AV_NEG_ZERO, AV_NEG_ZERO, AV_UNKNOWN, AV_UNKNOWN }, /* 0 */ { AV_NEG_ZERO, AV_NEG_ZERO, AV_ZERO, AV_POS_ZERO, AV_POS_ZERO }, /* >=0 */ { AV_UNKNOWN, AV_UNKNOWN, AV_POS_ZERO, AV_POS_ZERO, AV_POS_ZERO }, /* >0 */ { AV_UNKNOWN, AV_UNKNOWN, AV_POS_ZERO, AV_POS_ZERO, AV_POSITIVE }, }; bcAbsValue(a); bcAbsValue(b); return map[a][b]; } // Test if a given concrete value is a member of the set represented // by the given abstract value. bool elementOf(int concrete, AbsValue abstract) { switch (abstract) { default: assert(!"bad AbsValue"); case AV_NEGATIVE: return concrete < 0; case AV_NEG_ZERO: return concrete <= 0; case AV_ZERO: return concrete == 0; case AV_POS_ZERO: return concrete >= 0; case AV_POSITIVE: return concrete > 0; case AV_UNKNOWN: return true; } } // Concretize 'a' and 'b', filter for pairs satisfying 'op', then // re-abstract. AbstractComparisonResult abstractComparison(AbsValue a, AbsValue b, ConcreteOperator op) { AbstractComparisonResult ret; ret.consistent = false; ret.newAValue = AV_UNKNOWN; ret.newBValue = AV_UNKNOWN; // The algorithm here is to just compare all pairs of concrete // values drawn from [-2,2], since that is sufficient precision // to distinguish all our abstract comparisons. // // This is pretty stupid (inefficient), but it works, and we'll only // do it once at the beginning. for (int aa=-2; aa<=2; aa++) { if (!elementOf(aa, a)) { continue; } for (int bb=-2; bb<=2; bb++) { if (!elementOf(bb, b)) { continue; } // Now 'aa' is a concrete element of 'a', and 'bb' is a // concrete element of 'b'. // Filter on 'op'. if (!op(aa, bb)) { continue; } // Abstract the ('aa', 'bb') pair. AbsValue aaa = abstractSingleValue(aa); AbsValue bbb = abstractSingleValue(bb); // Fold this into our current approximation. if (!ret.consistent) { ret.consistent = true; ret.newAValue = aaa; ret.newBValue = bbb; } else { ret.newAValue = greatestLowerBound(ret.newAValue, aaa); ret.newBValue = greatestLowerBound(ret.newBValue, bbb); } } } return ret; } // ---------------- relational operators ------------------- // information about a single relational operator ("<", "==", etc.) class RelationalOperator { public: // data // Code to denote it BinaryOp binaryOp; // concrete comparison function; this is used to compute 'map' ConcreteOperator concreteOp; // abstract comparison table RelationMap map; public: // funcs RelationalOperator(BinaryOp bop, ConcreteOperator concrete); }; RelationalOperator::RelationalOperator(BinaryOp bop, ConcreteOperator concrete) : binaryOp(bop), concreteOp(concrete) { // compute the abstract operation table FOREACH_ABSVAL(a) { FOREACH_ABSVAL(b) { map[a][b] = abstractComparison(a, b, concreteOp); } } } // concrete comparisons bool compareLess(int a, int b) { return a < b; } bool compareLessEq(int a, int b) { return a <= b; } bool compareGreater(int a, int b) { return a > b; } bool compareGreaterEq(int a, int b) { return a >= b; } bool compareEqual(int a, int b) { return a == b; } bool compareNotEqual(int a, int b) { return a != b; } enum { NUM_RELATIONAL_OPERATORS = 6 }; RelationalOperator *relationalOperators[NUM_RELATIONAL_OPERATORS]; // ------------------------ the checker ---------------------- // This store maps expressions to AbsValue; unmapped expressions // have unknown sign. START_EXTEND_CHECKER( sign3, int_store ); // Called at program startup. CHECKER_INIT() { relationalOperators[0] = new RelationalOperator(BIN_LESS, compareLess); relationalOperators[1] = new RelationalOperator(BIN_LESSEQ, compareLessEq); relationalOperators[2] = new RelationalOperator(BIN_GREATER, compareGreater); relationalOperators[3] = new RelationalOperator(BIN_GREATEREQ, compareGreaterEq); relationalOperators[4] = new RelationalOperator(BIN_EQUAL, compareEqual); relationalOperators[5] = new RelationalOperator(BIN_NOTEQUAL, compareNotEqual); } ANALYZE_TREE() { // integer literal? Const_int ci; if (MATCH(ci)) { SET_STATE(CURRENT_TREE, abstractSingleValue(ci.llval())); ADD_EVENT(CURRENT_TREE, "literal", "Saw literal value: " << ci.llval()); return; } // unsigned variable? Scalar scal; Var var; if (MATCH(var) && MATCH(scal) && scal.get_type()->is_unsigned()) { int v; if (GET_STATE(CURRENT_TREE, v) && v == AV_POSITIVE) { // 'var' is already known to be positive, so leave it alone } else { // set it to >= 0 CLEAR_STATE(CURRENT_TREE); // avoid lots of 'unsigned' events SET_STATE(CURRENT_TREE, AV_POS_ZERO); ADD_EVENT(CURRENT_TREE, "unsigned", "Variable is unsigned"); } } // arithmetic? Scalar a,b; if (MATCH(a+b)) { // any prior info we might have had regarding "a+b" is irrelevant CLEAR_STATE(CURRENT_TREE); int va, vb; if (GET_STATE(a, va) && GET_STATE(b, vb)) { AbsValue v = abstractAdd(toAbsValue(va), toAbsValue(vb)); if (v != AV_UNKNOWN) { // at this time, there is no way to copy the events from two // different sources, so just get what I can ... bug 3439 COPY_STATE(CURRENT_TREE, a); SET_STATE(CURRENT_TREE, v); ADD_EVENT(CURRENT_TREE, "addition", "Addition: " << a << " (" << toAbsValue(va) << ") plus " << b << " (" << toAbsValue(vb) << ") yields " << v); } } return; } if (MATCH(a-b)) { CLEAR_STATE(CURRENT_TREE); int va, vb; if (GET_STATE(a, va) && GET_STATE(b, vb)) { AbsValue v = abstractSub(toAbsValue(va), toAbsValue(vb)); if (v != AV_UNKNOWN) { COPY_STATE(CURRENT_TREE, a); SET_STATE(CURRENT_TREE, v); ADD_EVENT(CURRENT_TREE, "subtraction", "Subtraction: " << a << " (" << toAbsValue(va) << ") minus " << b << " (" << toAbsValue(vb) << ") yields " << v); } } return; } // assignment? if (MATCH(a = b)) { COPY_STATE(a, b); ADD_EVENT(a, "var_assign", "Assigning " << a << " to value of " << b); return; } // possible conversion error? IntegralType destType; Cast cast(a, destType); // cast from expression 'a' to type 'destType' if (MATCH(cast) && !a.get_type()->is_unsigned() && destType.is_unsigned()) { int v; if (GET_STATE(a, v)) { if (v == AV_NEGATIVE) { COMMIT_ERROR(a, "conversion_error", a << " is converted to 'unsigned' but is known to be negative"); } else if (v == AV_NEG_ZERO) { COMMIT_ERROR(a, "conversion_error", a << " is converted to 'unsigned' but may be negative"); } else { // we know it is *not* negative, so the cast is safe } } else { OUTPUT_ERROR(a << " is converted to 'unsigned' but may be negative"); } } // query for abstract value? if (MATCH(CallSite("whatis")(a))) { int val; if (GET_STATE(a, val)) { COMMIT_ERROR(a, "whatis", a << " has value " << toAbsValue(val)); } else { // here, COMMIT_ERROR would do nothing OUTPUT_ERROR("whatis: " << a << " has unknown value"); } return; } // print entire store? if (MATCH(CallSite("print_store"))) { std::ostringstream os; os << "print_store: "; int mappings = 0; const ASTNode* t; int v; FOREACH_IN_STORE(t, v) { if (mappings > 0) { os << ", "; } os << t << " has value " << toAbsValue(v); mappings++; } os << "; " << mappings << " mappings"; OUTPUT_ERROR(os.str()); return; } } ANALYZE_CONDITION() { /* const cond_cfg_edge_t *cond_edge = dynamic_cast<const cond_cfg_edge_t *>(edge); if(!cond_edge) return; bool cov_polarity = cond_edge->polarity; const Expression *cond = cond_edge->cond; const ASTNode* astnode = (const ASTNode*)cond; */ // comparison? Expr a, b; for (int i=0; i < NUM_RELATIONAL_OPERATORS; i++) { RelationalOperator *relop = relationalOperators[i]; if (MATCH_COND(Binop(relop->binaryOp, a, b))) { int va, vb; if (!GET_STATE(a, va)) { va = AV_UNKNOWN; } if (!GET_STATE(b, vb)) { vb = AV_UNKNOWN; } DIAGNOSTIC("matched conditional " << (cov_polarity? "" : "!") << CURRENT_TREE << "; " << a << " = " << toAbsValue(va) << ", " << b << " = " << toAbsValue(vb)); // do the abstract comparison AbstractComparisonResult &res = relop->map[va][vb]; if (!res.consistent) { DIAGNOSTIC(" backtracking due to inconsistency"); force_backtrack(); return; } // update store? if (res.newAValue != va) { SET_STATE(a, res.newAValue); ADD_EVENT(a, "conditional", "Refined via conditional " << CURRENT_TREE); DIAGNOSTIC(" refined " << a << " to " << res.newAValue); } if (res.newBValue != vb) { SET_STATE(b, res.newBValue); ADD_EVENT(b, "conditional", "Refined via conditional " << CURRENT_TREE); DIAGNOSTIC(" refined " << b << " to " << res.newBValue); } } } } END_EXTEND_CHECKER(); MAKE_MAIN( sign3 )
// print type information for every local variable #include "extend-lang.hpp" // Extend API #include <string> // std::string #include <set> // std::set using namespace types; using namespace std; // set of classes whose info has been printed set<string> printedClasses; // type_recurisve_visitor_t is defined in extend-types.hpp class ClassTypePrinter : public type_recursive_visitor_t { public: virtual void on_class(const class_type_t &ct); }; void ClassTypePrinter::on_class(const class_type_t &c) { defined_class_type_t ct = c.load_definition(); if (!ct) { return; } // obtain qualified name as a string, e.g., "A::B::C" ostringstream os; os << ct; string name = os.str(); // check to see if we've already printed it if (printedClasses.find(name) != printedClasses.end()) { return; } printedClasses.insert(name); // print class/struct name cout << name << endl; // print what this class inherits from foreach(p, ct->get_parents()) { cout << " parent: " << p->get_class() << endl; } // print fields foreach(f, ct->get_fields()) { cout << " field: " << (*f)->get_pretty_name() << ", type: " << (*f)->get_type() << endl; } // re-examine field types, looking for classes to print; do this // after the above loop so we don't get fields from different // classes mixed together foreach(f, ct->get_fields()) { (*f)->get_type()->visit(*this); } // similarly for parent classes foreach(p, ct->get_parents()) { p->get_class()->visit(*this); } } void printVarInfo(const Expression* varTree) { if (!varTree) { return; } type_t const *t = get_type_of_tree(varTree); if (!t) { return; } cout << "local variable:\n" << " file: " << current_file_get_name() << "\n" << " line: " << current_file_lineno() << "\n" << " function: " << current_function_get_name() << "\n" << " var: " << varTree << "\n" << " type: " << *t << endl; // visit all the types in 't', looking for classes to print ClassTypePrinter ctp; ctp(t); } START_EXTEND_CHECKER( print_types, simple ); ANALYZE_TREE() { Decl decl; if (MATCH(decl)) { printVarInfo(decl.var()); } } END_EXTEND_CHECKER(); MAKE_MAIN( print_types )
// find switch/case statements with no "default" #include "extend-lang.hpp" // Extend API START_EXTEND_CHECKER( switch_default, simple ); ANALYZE_TREE() { SwitchPat sw; if (MATCH(sw)) { const S_switch *s = sw.last_stmt(); vector<const Statement *> cases; // getTargets gets all the "S_case" and the "S_default", if // any, for the switch. // See cc.ast s->getTargets(cases); bool hasDefault = false; foreach(i, cases) { const Statement *stmt = *i; // as<class>() -> downcast with assert, similar to // dynamic_cast<class &> // as<class>C() -> same, pointer to const // if<class>() -> downcast, NULL on failure, similar do // dynamic_cast<class *> const S_default *def = stmt->ifS_defaultC(); if(def) { hasDefault = true; break; } else { // Only possibilities = S_default and S_case, so this // must be an S_case. const S_case *case_stmt = stmt->asS_caseC(); // You can obtain the value of the "case" const E_intLit *case_expr = case_stmt->expr; long long case_value = case_expr->i; // This is not relevant for this checker, but is only // included as an example } } if(!hasDefault) { OUTPUT_ERROR("switch statement doesn't have a \"default\""); } } } END_EXTEND_CHECKER(); MAKE_MAIN( switch_default )
#include "extend-lang.hpp" // Extend API START_EXTEND_CHECKER( javascript_match_local, simple ); PREFER_TO_ANALYZE_JAVASCRIPT(); ANALYZE_TREE() { LocalVar local1; LocalVar local2; if (MATCH(local1 = local2)) { OUTPUT_ERROR("Found JavaScript local from " << local2 << " to " << local1); } } END_EXTEND_CHECKER(); MAKE_MAIN( javascript_match_local )
Table of Contents
When cov-build
is used to build an instrumented binary for function coverage analysis,
code for gathering coverage data is injected into the program. The runtime library is required to gather
the coverage data and write it to an output channel to make it available for analysis.
As provided, the runtime library supports file and network based output. See
Test Advisor 2020.12 User and Administrator Guide
for implementation information.
The Runtime Library source and build scripts reside in the directory $PREVENT_ROOT/sdk/runtime/ta-runtime
.
The windows-scripts
subdirectory contains the Windows command
line scripts needed to compile and test the Coverity Runtime Library.
The tests
subdirectory contains the unit and integration tests. When the library is built and tests executed,
the following subdirectories will be created:
lib
: Contains the static and shared versions of the Runtime Library.
objs
: Contains the object files produced when the Runtime Library source files are compiled.
test-results
: Contains one subdirectory for each test that ran, along with the captured output
for that test. Each test subdirectory will contain the shared and static versions of the test's executable and the
output files produced by the executables.
Bash shell scripts are provided with the Runtime Library to build the library, build and run a suite of unit and integration tests, and to clean up all build artifacts.
It is advisable to make your own copy of the source code tree and place it under version control, then verify that the code can be built and pass all tests before making modifications.
To perform a build, run the command:
bash ./unix-build.sh all
Build artifacts will be placed in the following subdirectories:
objs
: Will contain the .o
files
compiled from source.
lib
: Will contain the new versions of the daemon executable and
the shared and static runtime libraries.
#!/usr/bin/env bash export CC=cc bash ./unix-build.sh all bash ./unix-build.sh testsuite
To run the unit and integration tests on Linux, use the following command within the source directory:
bash ./unix-build.sh testsuite
Upon completion of the tests, the unix-build.sh
script will write
out a summary of the number of tests run, and the number that failed.
Individual tests can be run by going to the directory for that test and executing:
bash ./unix-build.sh
Providing BUILD_TYPE=DEBUG to the unix-build.sh
script will
enable errexit and trace modes for the bash scripts. The libraries and tests will be
compiled with debugging enabled.
Example:
BUILD_TYPE=DEBUG bash ./unix-build.shThe unit tests are written to be standalone tests so as not to introduce a dependency on any given unit test framework.
Each integration test exercises specific layers and components of the Runtime Library, with an emphasis on the correct handling of error conditions.
Tests are built and run against the shared and static versions of the Runtime Library.
Many test directories contain copies of expected output data. This can range from binary data (test-flush) to expected error messages (test-logging).
Cleanup of test artifacts can be done by running the following command in the parent source directory:
bash ./unix-build.sh clean
There are several options for deploying a Runtime Library. The choice of deployment will depend on whether the shared or static version or both will be used, and in the case of a shared library, whether tests will be run on the same machine as where it was built.
To deploy a dynamic library:
Copy the shared library to a location where it can be read in at build time. Set the access rights so that the library is readable by any user accounts building tests.
Modify the XML Configuration files for the gnu compilers and linker to reference this location. See Instrumentation Predicate Language for instructions on how to modify the configuration files.
If the test machine is different from the build machine, or if for some other reason the copy of the shared library used at build time is not accessible at runtime, the developer can make a local copy of the Runtime Library on the test machine and provide access to the copy using one of the following approaches:
Add the full path to the local copy, or the environment variable LD_LIBRARY_PATH.
Modify the XML Configuration files to change the -rpath=/dir
link directive to point at the
correct location. For this option, the Runtime Library must be in the same location on all test machines.
Install the Runtime Library in one of the standard library directories, such as /lib
,
/usr/lib
, /lib64
or /usr/lib64
.
To deploy a static library:
Copy the static library to a location where it can be read in at build time. Set the permissions so that the library is readable by any accounts used to build tests.
An executable "ci-daemon
" is built as part of the runtime used for collecting
function coverage when instrumented code is executed. Depending on the use case, you may be required
to install this executable on the test machine.
If you are using cov-build
to collect test coverage, no special
installation of ci-daemon
is required. A copy of
ci-daemon
already installed as part of the Coverity
Analysis package will be used in this case.
If you are not using cov-build
to collect test coverage, you
will need to copy this executable to the test machine. This executable should be
placed in a directory which is included in the PATH environment variable when the
tests are run. Either a directory already existing in the PATH can be used, or this
can be placed in a new directory with PATH updated accordingly.
Windows batch scripts are provided with the Runtime Library to build the library as a shared library or DLL, build and run a suite of unit and integration tests, and to clean up all build artifacts.
![]() | |
A static Runtime Library is not provided with Windows. |
It is recommended that you make your own copy of the source code tree and place it under version control, then verify that the code can be built and pass all tests before making modifications.
Prior to running the build scripts, the command line environment needs to be configured to target the desired Visual Studio compiler. This can be accomplished in one of the following ways:
Open the appropriate Visual Studio Tools Command Prompt in the Visual Studio Tools folder.
From a command line prompt, run the appropriate VCVARSALL.BAT
file.
For example, in a standard Visual Studio 2013 installation, the command would be:
> "c:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\vcvarsall.bat"
Set the environment variable VCVARSALL
to point to the location of the
VCVARSALL.BAT
file you want to run. The build script will use this
variable to automatically run the file.
Set the CPU_ARCH
variable to x64 to build a 64-bit
library (the default), or to x86 to build a 32-bit library. This
value is passed to the VCVARSALL.BAT
file to
set the target processor architecture for the Visual Studio
compiler.
See the documentation on the MSDN website for more information about VCVARSALL.BAT
and the
desired version of Visual Studio and processor architecture. Once the command line environment is configured for
Visual Studio, perform a build by running the command:
CMD /E:ON /C windows-build.bat all
Build artifacts will be placed in the following directories:
objs:
Will contain the .obj
files compiled from source.
lib:
Will contain the new version of the Runtime Daemon
EXE
, Runtime Library DLL
and
LIB
files. When compiled with debug enabled, will also
contain the .PDB
file.
To run the unit and integration tests on Windows, use the following command within the source directory:
CMD /E:ON /C windows-build.bat testsuite
Upon completion of the tests, the script will write out a summary of the number of tests run, and the number of tests that failed.
Cleanup of test artifacts can be done by running the following command in the parent source directory:
CMD /E:ON /C windows-build.bat clean
When running an executable instrumented with Coverity Function Coverage, the Runtime Library must be loaded by the program for successful execution.
If the program is run under cov-build
, the
<cov_lib_deployment_path
> and the
<cov_lib_name
> specified in the compiler configuration will
be used to resolve the location of the instrumentation runtime library and daemon.
![]() | |
If there are multiple compiler configurations with different values for the
< |
If instrumented programs are run standalone, then Windows will resolve the location of the Runtime Library and Daemon using the following order:
The directory where the executable module for the current process is located.
The current directory.
The Windows system directory. The GetSystemDirectory
function retrieves
the path of this directory.
The Windows directory. The GetWindowsDirectory
function retrieves
the path of this directory.
The directories listed in the PATH environment variable.
If all of the programs will use the same Runtime Library and Daemon, then the library and daemon can be installed in the locations described in steps 3, 4, or 5. If different versions of the library and daemon are required, then a runtime library and daemon should not be installed in these directories, and the locations described in steps 1 and 2 are preferable.
Configuration for building the ci-runtime libraries is controlled via a set of
environment variables defined in the file unix-build-common.sh
on
Linux, and windows-scripts\windows-build-common.bat
on Windows.
Overriding these variables will allow the developer to control compiler and linker options, such as the location of input and output files, and flags passed to the compiler and linker.
The following environment variables are common to both the Windows and Linux build scripts:
This environment variable must be set if the run-time source distribution is not located inside a Coverity Analysis installation.
Default destination for build artifacts in the source directory.
Destination directory for test executables and output. Each test appears in its own subdirectory.
Destination for the compiled object files.
Destination for the library files.
Path name for the shared library.
The C compiler command.
Linker command used to build the shared library.
Archive command used to build the static library.
Compiler options common to both the static and shared versions.
Flags for compiling object files for the shared library.
The flags needed to link the shared library to the integration tests.
Normally this variable is not set. Setting it to DEBUG builds the library with debug information on the target platform.
Generate detailed output of the build script's execution.
The following environment variables are only used by the Linux build scripts:
Destination for the shared version of the compiled object files.
Destination for the static version of the compiled object files.
Path name for the static library.
Flags for linking the shared library.
Common library flags used by integration tests.
The flags needd to link the static library to the integration tests.
The following environment variables are only used by the Windows build scripts:
The name of the import library associated with the DLL.
Flag to indicate the name of the Runtime Libary DLL.
Flag to indicate the name of the associated import library.
Flag to indicate a path for include files.
Directory containing the Runtime Library header files.
Directory containing header files shared by the tests.
Flag to indicate the name of an output file (usually an executable) for the C compiler.
Optional variable containing the full path name of the VCVARSALL.BAT file that will be used to set up compiler dependencies for the build. Only used when cl.exe is not among the PATH directories.
Indicates the processor architecture to target when the VCVARSALL.BAT file is used. Legal values are "x86" for 32-bit applications and "x64" for 64-bit applications. The default value is "x86".
The Instrumentation Predicate Language supports Instrumentation Predicate Language and the runtime library by allowing the specification of rules and conditional logic to modify native compiler command lines. The Instrumentation Predicate Language is specified in the form of nested xml tags, within the standard compiler configuration files.
Default Instrumentation Predicate Language rules are generated for the gcc family of compilers and linkers (gcc, g++, ld, collect2). Compiler configuration files specify whether the target binary is a compiler and/or a linker with the following xml fields:
<is_compiler>[true|false]</is_compiler>
<is_linker>[true|false]</is_linker>
is_linker
is
set to true, otherwise any Instrumentation Predicate Language rules will be ignored. These xml fields must appear after all
other fields in the compiler
node for each relevant compiler variant. Example:... <config> <build> <compiler> <template_compiler>true</template_compiler> <comp_name>ld</comp_name> <comp_translator>ld</comp_translator> <comp_lang>C/C++</comp_lang> <fake_compiler>true</fake_compiler> <is_compiler>false</is_compiler> <is_linker>true</is_linker> </compiler> ... ...Additionally, all Instrumentation Predicate Language xml fields must be contained within opening and closing
<instrument_predicate>
tags, and appear in the compiler-specific
options node above the <begin_command_line_config>
field. Example:... <config> <build> <compiler> ... <is_compiler>false</is_compiler> <is_linker>true</is_linker> </compiler> <options> <id>ld-ld-.*</id> <opt_preinclude_file>$CONFIGDIR$/../user_nodefs.h</opt_preinclude_file> <instrument_predicates> <remove_regex>-Werror</remove_regex> </instrument_predicates> ... <begin_command_line_config></begin_command_line_config> <pre_prepend_arg>--no_error_recovery</pre_prepend_arg> </options> ... ...If the
<instrument_predicate>
tags are placed under the
<begin_command_line_config>
field, the compilation will result in a fatal
error.
![]() | |
In template configuration files, the |
Instrumentation Predicate Language provides the remove_regex
rule for removing arguments from the native
command line. This xml field specifies a regex to search for, and remove from, the command
line. For example, consider the following scenario.
Original command line:
> thing1 foo1 thing2 foo2
Removal rule:
<remove_regex>foo</remove_regex>
Resulting command line:
> thing1 thing2
The regex "foo" matched foo1
and foo2
, so both were removed
from the command line.
Instrumentation Predicate Language also provides rules for inserting strings onto the command line. This process is slightly more complex than removing arguments, as the relative position of the inserted string must also be specified.
Each insertion rule must be surrounded by opening and closing
<insert>
tags. Specify the string you want to insert onto the command
line with the insertion_string
field. By default, the inserted string will be
placed at the very beginning of the command line. For example:
Original command line:
> thing1 foo1
Insertion rule:
<insert> <insertion_string>arg1</insertion_string> </insert>
Resulting command line:
> arg1 thing1 foo1
Insertion rules can also specify conditions on the placement of the new string within
the native command line. These conditions use regex matching to locate an existing argument,
and specify whether to place the new string before or after the matched regex. Condition
tags must be placed in the same set of <insert>
tags as the string to be
inserted.
The precedes_regex
condition specifies that the insertion string should
be placed on the command line, just before the left-most regex match. Multiple
precedes_regex
conditions may be specified, and the inserted string will be
placed immediately before the first argument that satisfies any condition. For example:
Original command line:
> thing1 foo1
Insertion rule:
<insert> <insertion_string>arg1</insertion_string> <precedes_regex>foo</precedes_regex> </insert>
Resulting command line:
> thing1 arg1 foo1
If there is no match on the command line for the regex, the insertion string will be placed at the default location (the beginning of the command line).
You can also specify a special value of "nothing" to precedes_regex
. This
specifies that the inserted string should precede no other argument, and will instead be
placed at the very end of the command line. Alternatively, using the special value,
"anything", will cause the string to be placed at the start of the command line.
The follows_regex
condition specifies that the insertion string should be
placed on the command line, after the right-most regex match. Multiple
follows_regex
conditions may be specified, and the inserted string will be
placed directly after the first argument that satisfies any condition. For example:
Original command line:
> thing1 foo1
Insertion rule:
<insert> <insertion_string>arg1</insertion_string> <follows_regex>foo</follows_regex> </insert>
Resulting command line:
> thing1 foo1 arg1
If there is no match on the command line for the regex, the insertion string will be placed at the default location (the beginning of the command line).
You can also specify a special value of "anything" to follows_regex
. This
specifies that the inserted string should follow all other arguments, and will be placed
at the very end of the command line. Alternatively, using the special value, "nothing",
will cause the string to be placed at the start of the command line.
It is possible to specify both precedes_regex
and
follows_regex
conditions in the same insertion rule. They will be combined to
place the insertion string between two existing arguments. For example:
Original command line:
> -one -two -three -four
Insertion rule:
<insert> <insertion_string>arg1</insertion_string> <precedes_regex>three</precedes_regex> <follows_regex>two</follows_regex> </insert>
Resulting command line:
> -one -two arg1 -three -four
In this case, arg1
follows the matched argument,
-two
, and precedes the matched argument,
-three
. If the precedes_regex
and follows_regex
conditions contradict each other (so that satisfying one condition makes it impossible to
satisfy the other), the result is a fatal error.
If there is no match on the command line for either regex, the insertion string will be placed at the default location (the beginning of the command line).
Instrumentation Predicate Language allows the grouping of insertion rules into if/else style blocks. This places
individual insertion rules in order, so that if the first insertion rule is unsatisfiable
(because it has contradictory precedes_regex
and follows_regex
conditions), the next insertion rule is executed instead. Insertion groups are specified
by opening and closing <insert_group>
tags. Within these tags will be
one <insert>
rule to attempt first, followed by one or more
<or_insert>
rules, to be attempted in order until one is executed. For
example:
Original command line:
> -one -two -three -four
Insertion rule:
<insert_group> <insert> <insertion_string>something</insertion_string> <precedes_regex>two</precedes_regex> <follows_regex>three</follows_regex> </insert> <or_insert> <insertion_string>something_else</insertion_string> <precedes_regex>three</precedes_regex> <follows_regex>two</follows_regex> </or_insert> <insert_group>
Resulting command line:
> -one -two arg1 -three -four
The first insert rule in the group is unsatisfiable against the given command line.
This means that the or_insert
rule is attempted, and because it is
satisfiable, results in the final command line.
If none of the insert rules inside an insertion group are satisfiable, the result is a fatal error.
Instrumentation variables are placeholders in an insertion string, the value of which can be specified in the configuration file by using the following syntax:
<instrument_variable> <var_name>variable name</var_name> <var_value>value of the variable</var_value> </instrument_variable>
These instrument_variable
specifications should appear outside of the
<instrument_predicates>
tags. The supported instrumentation variables
are named:
cov_lib_name
cov_lib_path
cov_lib_deployment_path
[[:variable_name:]]
. For example:
<insertion_string>-r[[:cov_lib_path:]] -l[[:cov_lib_name:]]</insertion_string>
Instrumentation Predicate Language supports conditional logic, using the following if/elif conditions:
<contains_regex>regex</contains_regex>
<not_contains_regex>regex</not_contains_regex>
<is_compiler>true|false</is_compiler>
<is_linker>true|false</is_linker>
<if> <conditions> <cond> <is_compiler>true</is_compiler> </cond> </conditions> <then> ... // insertion and/or removal rules </then> </if> <elif> <conditions> <cond> <is_linker>true</is_linker> </cond> </conditions> <then> ... // different insertion and/or removal rules </then> </elif> ...
You can use and/or conjunctions in if/elif statements, using the
<and_cond>
and <or_cond>
xml fields. See below:
<and_cond> <cond> <contains_regex>some regex</contains_regex> </cond> <cond> <is_compiler>false</is_compiler> </cond> <cond> <is_linker>true</is_linker> </cond> </and_cond>
In the example above, the <and_cond>
field returns
true if each of the nested conditions evaluate to true, otherwise the entire block is false.
When using <or_cond>
, the field is true if at least one of the nested
conditions is true, and will only be false if all of the nested conditions are false.
Conjunctions can be nested within each other. For example:
<and_cond> <cond>A</cond> <or_cond> <cond>B</cond> <cond>C</cond> </or_cond> </and_cond>
The example above will only return true if:
A
is true
AND
B
OR C
is true
Coverity instrumentation does not support certain switches for certain compilers. For example,
-spec file
is unsupported for gcc. If this appears on a native command line, it
will result in a fatal error for cov-translate
. You can specify which
switches are unsupported by using <instrument_unsupported>
tags. For example:
<instrument_unsupported>switch1</instrument_unsupported> <instrument_unsupported>switch2</instrument_unsupported>
These tags must be placed outside of the instrument_predicates
tags.
As with non-CIT compilers, CIT compilers can also make use of regex matching to achieve removal and insertion of command-line arguments. However, because CIT compilers support switch table rules, more complex extensions to the Instrumentation Predicate Language can be used.
Supported CIT predicate language extensions include non-regex versions of the standard, non-CIT predicates:
<contains>..</contains>
<not_contains>..</not_contains>
<remove>..</remove>
<precedes>..</precedes>
<following>..</following>
![]() | |
Each of the above is evaluated using the switch-parsing rules in the relevant
< |
<contains> <switch>foo</switch> <arg>arg1</arg> <arg>arg2</arg> <arg>...</arg> </contains>The "switch" xml tag should not contain any switch prefixes. For example, if the command line contains any of the following switches, then the
switch
tag should only contain "foo
":-foo
/foo
--foo
switch
tag are considered regex matches.
Specifically, consider the following command line:
> -foo=two /foo=three --foo=fourThe condition below will match on
-foo=two /foo=three
.<contain> <switch>foo</switch> <arg>^t</arg> </contains>Note in particular that the regex matches on
switch
arguments do not cross argument boundaries.
Switches such as -Ipath1,path2,path3
that are really equivalent to
-Ipath1 -Ipath2 -Ipath3
(where the switch arguments are separated by one or more delimiters)
are specified in the switch table as oa_split
followed immediately by the delimiter(s). Specifying
CIT instrumentation predicates for switches like this behave as follows:
<contains> <switch>I</switch> <arg>path</arg> </contains>
The above rule evaluates to true if there is a switch -I
on the command line that has
any argument component matching the regex "path".
In the following condition:
<contains> <switch>I</switch> <arg>path1</arg> <arg>path2</arg> </contains>If there exists a switch
-I
on the command line where either
path1
or path2
is one of its arguments, there is a match.
When a switch is marked oa_alternate_table
in the switch table, this designates that the
switch is for specifying switches to another program, and that you need to use an alternate switch table to
interpret it. For example, the following signifies that the value to Xpreprocessor
should be
interpreted by <complier
>_preprocessor_switches.dat and the results should be a
ppended to the command line:
{"Xpreprocessor", oa_dash|oa_alternate_table, "preprocessor", oa_append}
Suppose a command line contains:
-Wp,-MD file.c -Wp,foo,-MM,-MP -Wp,-MD,bar -Wp,-MD,foo2where the preprocessor switch table contains entries
{"MD", oa_dash|oa_unattached}, {"MM", oa_dash}, {"MP", oa_dash}Example 1:
<contains> <switch>Wp,</switch> <arg>MD</arg> </contains>This will result in a match against any switch grouping with
-Wp,MD
in it. In the above case, a match will
be made on all of -Wp,-MD -Wp,foo,-MM,-MP -Wp,-MD,bar -Wp,-MD,foo2
.
Example 2:
<contains> <switch>Wp,</switch> <arg>MD</arg> <arg>foo</arg> </contains>
This will match on -Wp,-MD -Wp,foo,-MM,-MP -Wp,-MD,foo2
.
Example 3:
<contains> <switch>Wp,</switch> <arg>MD</arg> <arg>bar</arg> </contains>
This will match on -Wp,-MD,bar