Synopsys logo

Coverity® 2020.12 CodeXM

Learning CodeXM

Introduction

QuickStart Tutorial

How to Write a CodeXM Checker

A CodeXM Style Guide

Hide/show Contents

Coverity® 2020.12 CodeXM

Learning to Write CodeXM Checkers

Introduction

The name CodeXM is short for Code eXaMination.

CodeXM is a specialized language used to write customized checkers that run using the Coverity engine. It allows you to concisely define problematic patterns that you want to find in your source code.

CodeXM checkers are written in the CodeXM (CXM) language, which is easy to learn and allows you to write meaningful checkers quickly. The CodeXM checker definition involves three elements:

CodeXM checkers are seamlessly integrated with the typical Coverity workflow. After writing your checker definitions in a text file, you simply pass the name of that file to the cov-analyze command. The checkers defined in that file are executed along with Coverity’s built-in checkers, and results are reported in the same way.

The first section of this guide, “QuickStart Tutorial”, demonstrates how to define a CodeXM checker and then run it.

Typographic Conventions

Describing CodeXM is a bit different from describing other languages. Most programming books talk about a single language only. To describe CodeXM, we have to talk about CodeXM code itself, and the code of the language it is inspecting—the target language.

This guide shows all code (except pseudocode) in monospace type.

CodeXM code appears in shades of green.

Target code, whatever the language. appears in shades of orange.

Reminder: By default, many browsers do not print out background colors.

QuickStart Tutorial

This section walks you through the steps of using CodeXM to create a checker.

Writing Your First CodeXM Checker

Let’s jump right in. Specifically, let’s look at creating a checker that detects goto statements.

Use case:
Remove goto statements from the target code base.

goto statements lead to spaghetti code: hard to read, hard to debug.

Name the checker

The first thing to do is to give your checker an appropriate name.

At its simplest, a CodeXM checker has the parts shown in the code sample that follows. Open a text editor or development environment of your choice (not a word processor, though), and then type the following code, or copy it and paste it:

CXM code follows
checker { name = "NO_GOTO"; reports = { // ... logic to describe what you're looking for ... events = [ { // ... messages to describe what you found ... } ]; }; };

Note: In these first samples, we use the CodeXM /* comment */ notation to represent parts of the checker you will later replace with more detailed code. (CodeXM comments have two forms. You can use the /* slash-asterisk-multi-line comments */ form. You can also use the single-line form: // slash-slash single-line comment.)

Recommendation: In most cases, CodeXM disregards white space such as indentation or line breaks. The “Style Guide” section at the end of this guide describes a consistent style for CodeXM source that we find useful and readable. The code samples in this guide follow that style. For guidelines to white space, in particular, see the entries Braces and Spacing and Line Length.

We have begin to define a checker named NO_GOTO (as declared in the checker’s name field). Next, we put the checker implementation itself (such as code patterns you’re looking for and how to report defects) into the reports statement.

Add specifics

The next thing to do is to specify where to look and what to look for.

Since we want the checker to look for goto statements in each function of C/C++ source code, the logic to find what we’re looking for would look like this:

CXM code follows
include `C/C++`; checker { name = "NO_GOTO"; reports = for c in globalset allFunctionCode where c matches gotoStatement : { events = [ { // ... Messages to describe what you found } ]; }; };

Here we have added a simple expression that looks at all the code in our project (that’s what the for loop does), and uses the gotoStatement pattern to match all instances of goto in the source code under examination.

The include statement above the checker includes the CodeXM C/C++ Standard Library; this provides the patterns needed for CodeXM to understand C or C++, and ensures that only C or C++ source code is analyzed by this checker. (And yes, those are back-ticks being used to surround the language name.)

Report events

Now that we’ve defined the pattern that we’re looking for, we want to report findings as events, or issues.

For a simple one-event defect like this goto statement, we just need to provide the description (what the event is supposed to say) and the location (where the event should be placed relative to your source code.)

What follows is a completed checker for finding a goto statement:

CXM code follows
include `C/C++`; checker { name = "NO_GOTO"; reports = for c in globalset allFunctionCode where c matches gotoStatement : { events = [ { description = "Use of goto is not allowed"; location = c.location; } ]; }; };

Note: In this checker, our loop variable c matches a gotoStatement pattern that is specified in the C/C++ standard library. This library provides patterns that are appropriate for C/C++ source code, and that limits this checker to analyzing C/C++ source code only. The full set of available patterns can be found in the CodeXM C/C++ Library Reference.

When it runs, our checker displays the message Use of goto is not allowed wherever it encounters a goto in the source code being scanned. The following section shows how to run a scan.

Running Your CodeXM Checker

Let’s run the checker that we’ve just created.

Your code base might not actually have any goto statements—and we’d rather not have you put one in if that’s the case—but we would like you to see some results, so here is a simple C program with a goto in it, just for this purpose:

C code follows
#include <stdio.h>
void donothing() {
// Absolutely nothing
}
void function(int x, int y) {
int z;
if(x > 42) {
goto pastIt;
}
donothing();
pastIt:
z = x * y;
printf("z = %d", z);
}
void main() {
function(1, 2);
}

Save this as mygoto.c, and run one of the variants of cov-build as shown below, depending on which compiler you have at hand: gcc or the Microsoft® C/C++ compiler cl. This builds your intermediate directory—we’ve named it mycxm—which you can reference when you run your CodeXM checker.

Command line follows
$ cov-build --dir mycxm gcc -o mygoto.o mygoto.c

... or:

Command line follows
$ cov-build --dir mycxm cl -o mygoto.o mygoto.c

Note: If you are using a different compiler—and assuming that it is available on your path—simply substitute that compiler’s command name for gcc or cl, above.

Now that we have an intermediate directory, we can get back to running our CodeXM checker, as follows.

To apply your checker to the little sample we gave you, execute the following command line:

Command line follows
$ cov-analyze --dir mycxm --codexm mychecker.cxm

Once the analysis is complete, its output should appear something like this:

Command output follows
Using 36 workers as limited by CPU(s) Looking for translation units |0----------25-----------50----------75---------100| **************************************************** [STATUS] Computing links for 1 translation unit |0----------25-----------50----------75---------100| **************************************************** [STATUS] Computing virtual overrides |0----------25-----------50----------75---------100| **************************************************** [STATUS] Computing callgraph |0----------25-----------50----------75---------100| **************************************************** [STATUS] Topologically sorting 3 functions |0----------25-----------50----------75---------100| **************************************************** [STATUS] Computing node costs |0----------25-----------50----------75---------100| **************************************************** [STATUS] Running analysis |0----------25-----------50----------75---------100| **************************************************** [STATUS] Exporting summaries |0----------25-----------50----------75---------100| **************************************************** Analysis summary report: ------------------------ Files analyzed : 1 Total LoC input to cov-analyze : 2526 Functions analyzed : 3 Paths analyzed : 6 Time taken by analysis : 00:00:02 Defect occurrences found : 1 NO_GOTO

Typically you would use cov-commit-defects --dir intermediate-dir to commit your results to Coverity Connect. For the purposes of this exercise (and when rapidly developing a CodeXM checker), you can also quickly inspect the results by invoking cov-format-errors, as follows:

Command line follows
$ cov-format-errors --dir mycxm --html-output errs

When this finishes, you should find a folder, errs, that contains some HTML files, notably index.html. To view the defects found by the analysis, use a browser to open index.html.

This concludes our QuickStart. To learn more, you can read further in this manual, or you can consult some of the other documents provided with CodeXM.

Interactive Checker Development

With the CodeXM extension for Microsoft® Visual Studio® Code, you can interactively write, test, and debug CodeXM checkers. (You can also test source code in the VS Code IDE.) The CodeXM extension is available from Synopsys, and can be found in the Marketplace site for Visual Studio Code.

See Also

This QuickStart Tutorial only scrapes the surface, getting you up and running with just a hint of what CodeXM is capable of doing. For a fuller understanding of CodeXM, we encourage you to also examine the following documents:

How to Write a CodeXM Checker

These sections elaborate on the QuickStart by describing further ways to use CodeXM, and further features of the CodeXM language.

Reviewing Basic CodeXM Concepts and Exercises

Let’s take a moment to assimilate what we’ve learned. (If you haven’t gone through the “QuickStart Tutorial”, please do so now.)

CodeXM checkers examine your project’s source code by looking for specific kinds of code constructs. These can be fairly simple (like specific statement types) or they can be much more complex. More sophisticated checkers (which we’ll take a look at later on), look for more complex patterns. The main idea is that CodeXM checkers look for patterns.

Further Exploration

Now that we know how to write a basic checker that finds goto statements (using the gotoStatement pattern we invoked in the QuickStart), try modifying your NO_GOTO checker to look for an ifStatement pattern, forLoop pattern, and returnStatement. Since any of these are likely to occur in your own code, you can just run the analysis against your code base.

You can also modify the NO_GOTO source file to include the new checkers. Be sure to re-run cov-build so analysis will detect the changes you made.

To illustrate, in the next section we write a simple checker, named VAR_DECL, to detect all declarations of variables.

Basic Pattern Matches and Conditions

We’ve now reviewed the basics of creating a CodeXM checker, so let’s go ahead and extend them.

CodeXM comes with some predefined patterns, but you can define your own as well. This allows you to match more complex code. In NO_GOTO, when code matches a goto statement in the source code, it generates an event.

To illustrate, let’s write a simple checker called VAR_DECL to detect all declarations of variables.

Use case:
Find variable declarations.

For example, you might want to verify that variables are initialized at declaration time. Uninitialized variables can lead to unexpected and erroneous values, and even to crashes.

We can use the basic structure of a CodeXM checker and start off with a stock checker definition (like we did in the NO_GOTO example):

CXM code follows
include `C/C++`; checker { name = "VAR_DECL"; reports = for code in globalset allFunctionCode where code matches /* something */ : { events = [ { description = "Variable declared."; location = code.location; } ] }; };

This time, instead of matching goto statements, we want to match variable declarations. To do this, we use the aptly named variableDeclaration pattern in the following way:

CXM code follows
include `C/C++`; checker { name = "VAR_DECL"; reports = for code in globalset allFunctionCode where code matches variableDeclaration : { events = [ { description = "Variable declared."; location = code.location; } ] }; };

Note: Just like the gotoStatement pattern, the variableDeclaration pattern is provided by the CodeXM C/C++ library. The use of this library is specified by the include directive shown at the top of the code sample.

This checker is complete. You can run it by following the same steps you followed for your NO_GOTO checker, as previously described in Running Your CodeXM Checker.

Going to the next level of complexity, let’s say we want to determine whether the variable is initialized or not. To check for this, we can make a new checker by enhancing our current VAR_DECL checker.

In CodeXM, when a pattern matches some code, the pattern doesn’t just return true. More usefully, we get access to the matched code, along with properties about that code. By adding as <name> to the end of a matches operator, we tell CodeXM to assign a name to the result, so that we can refer to the result later on.

Our work-in-progress checker follows:

CXM code follows
include `C/C++`; checker { name = "UNINIT_VAR"; reports = for code in globalset allFunctionCode where code matches variableDeclaration as decl: { events = [ { description = "Variable declared."; location = code.location; } ] }; };

In the example above, the variable code can refer to any sort of code: if statements, assignment expressions, function calls, and yes, variable declarations. But decl will only ever be a variable declaration because it is the result of a match for that particular C or C++ pattern. So we can use decl to access properties of a variable declaration in order to check for further conditions.

Now that we know we have a variable declaration, we want to decide if this is the kind of declaration we’re interested in: that is, a declaration without an initialization. The general pattern for narrowing down a search in this way looks like the following code fragment:

CXM code follows
where code matches variableDeclaration as decl && /* ... examine the declaration ... */ : {

Seeing the general form above, we know we want to only consider variable declarations that are uninitialized. It turns out that variableDeclaration has a property, initializer, whose value is null when a variable isn’t initialized at declaration time. Pattern properties, such as fields in a record or attributes in an object, are specified by a dot notation. So to express that a variable does not have an initializer, we use the following code:

CXM code follows
where code matches variableDeclaration as decl && (decl.initializer == null) : {

Combining all of the above, the code for the UNINIT_VAR checker should now look like the following:

CXM code follows
include `C/C++`; checker { name = "UNINIT_VAR"; reports = for code in globalset allFunctionCode where code matches variableDeclaration as decl && (decl.initializer == null) : { events = [ { description = "Variable " + decl.variable + " is declared but not initialized."; location = code.location; } ] }; };

Decomposing a Pattern to Match Specific Properties

In the previous sections, we learned how to refine our pattern-match search by using the logical && conjunction. Now let’s see how we can further specify the pattern for properties we’re interested in, by using a more compact notation known as pattern decomposition.

The previous example looked like this:

CXM code follows
code matches variableDeclaration as decl && (decl.initializer == null)

The following CodeXM construct tests for the same condition:

CXM code follows
code matches variableDeclaration { .initializer == null }

In other words, pattern decomposition means breaking a test pattern into one or more constituent constraints—such as .initializer == null. The decomposed pattern is enclosed in curly brackets. You can specify more than one constraint within the decomposition. Separate one constraint from the next by using a semicolon ( ; ). When you specify multiple constraints, all the constraints must be true in order for the overall pattern to match.

Using Nested Properties

To illustrate another case of pattern decomposition, let’s now write a checker that detects calls to a problematic function, system().

Use case:
Find calls to system().

Calls to system() are “problematic” because they can lead to security breaches. This is a function that should be used carefully and sparingly.

To find calls to system() we must match functionCall patterns. For each function call, we must then check its calledFunction property, which has a subproperty, .identifier, that contains the declared name of the function being called.

Here’s an example:

CXM code follows
for code in globalset allFunctionCode where code matches functionCall { .calledFunction.identifier == "system" }

As the sample code shows, a pattern decomposition can use properties that are nested. Simply use the . operator to name them in sequence—rather like the way you specify a path to a particular file.

A tip about nested properties: Nested properties can be tremendously useful, so be sure to check for them, especially when you use a pattern you haven’t used before. For example, if you look at the description of functionCall in the C/C++ Library Reference, you see a table of that pattern’s properties. Always check the property types. In the example we have just shown, the .calledFunction property, which we use, has the type functionSymbol. Now, functionSymbol is itself a library pattern: one that has a large number of properties—one of which, .identfier, we take advantage of in our example.

Using the Filtering Operator

You can use the % operator, a set-filter expression (also known as the which are operator) to filter or simplify the code even further. For this example, we’ll use the % operator to help detect function calls.

Use case:
Add the % which are operator to our checker that finds calls to system().

This operator makes the code more concise, easier to read.

Since the where clause happens to limit itself to just expressing a more precise functionCall pattern, we can use the % operator and rewrite our code as follows (we are only interested in those portions of the code that are function calls to system()):

CXM code follows
for code in globalset allFunctionCode % functionCall { .calledFunction.identifier == "system" }

We can now quite economically define the entire checker, using both pattern decomposition and filtering, as follows:

CXM code follows
include `C/C++`;. checker { name = "CALL_TO_SYSTEM"; reports = for code in globalset allFunctionCode % functionCall { .calledFunction.identifier` == "system" } : { events = [ { description = "This is a call to " + "system".formattedAsCode + "."; location = code.location; } ] }; };

To summarize in plain English what the code does, the checker finds each call to the system() function, and reports each find. The report describes the issue and shows where the call is located in the source code.

Note: The construct .formattedAsCode, which we have not shown before, is a CodeXM property of any string value, either constant or variable. It simply formats the string for display in Coverity Analysis output.

Creating Custom Patterns

You have now used patterns from the C/C++ library, such as functionCall and gotoStatement. In CodeXM, you can also define your own patterns. This can be useful if you find yourself matching the same thing many times, or if you just want to assign a descriptive name to what you are trying to match: It is important to write readable code.

Let’s assume that you want to find any use of the comma operator in your source code.

Use case:
Find uses of the C/C++ comma operator.

The comma operator is problematic because it is error-prone. The expression to the left of the comma is evaluated, but this result is not saved (presumably, the evaluation generates some side effect). Not only is this confusing, it is easy to overlook when reviewing code.

Since the comma operator is a binary operator, we can find it by using the C/C++ library pattern binaryOperator.

This fragment shows the structure for using binaryOperator:

CXM code follows
for code in globalset allFunctionCode where code matches binaryOperator // ...

But there are many kinds of binary operators. We need to be more specific. The binaryOperator pattern has a property, .operator, that tells us which kind of operator it has matched. So the following code fragment shows how to detect instances of using the comma operator:

CXM code follows
for code in globalset allFunctionCode where code matches binaryOperator { .operator == `,` }

Note: The various C-language binary operators are represented by literal values—they are members of an enum—so in CodeXM code you need to surround them with backticks. For example, the addition operator is indicated by using `+` and the multiplication operator is indicated by using `*`.

Now, let’s suppose we are writing a complex checker that requires finding the C-language comma operator several times. In order to prevent repeating the previous code over and over again, we can declare a reusable custom pattern. We can call the new pattern, theCommaOperator.

Here is a piece of code that does just that:

CXM code follows
pattern theCommaOperator { binaryOperator { .operator == `,` } };

(Always end a pattern declaration with a semicolon, as you do for any other kind of declaration.)

That’s it! This new pattern finds any use of the comma operator.

Remember: Not all commas are instances of the comma operator: a comma can also be a separator in a list, a separator between parameters in a function declaration or a call to that function, and so forth. The comma operator is defined as evaluating its left-hand operand, throwing it away, then evaluating its right-hand operand, whose result then becomes the result of the whole expression.

Tip: To use a custom pattern as often as you’d like, put the pattern declaration above the checker keyword. That is, make it global to the individual checkers in your code.

Now that you have defined theCommaOperator, you can use it anywhere a library pattern would be valid; for example, as an argument to the matches operator or to the % operator. The following code fragments show this usage:

CXM code follows
for code in globalset allFunctionCode where code matches theCommaOperator as comma // ... alternatively ... for comma in globalset allFunctionCode % theCommaOperator

Returning Objects from Patterns

A custom CodeXM pattern can also return a new object. The object can have a different set of properties than the patterns for which the custom pattern searches. This is useful when you want to match multiple related code constructs and then return information in a uniform way.

Use case:
Find loops.

Such a checker might come in handy when you are reviewing a program’s flow of control.

The following sample code finds loops of various types:

CXM code follows
pattern allLoops { | forLoopRange as loop -> { body = loop.bodyStatement; condition = null; after = null; } | forLoopSimple as loop -> { body = loop.bodyStatement; condition = loop.conditionExpression; after = loop.updateStatement; } | whileLoop as loop -> { body = loop.bodyStatement; condition = loop.conditionExpression; after = null; } | doWhileLoop as loop -> { body = loop.bodyStatement; condition = loop.conditionExpression; after = null; } };

The pattern above matches four different kinds of loops: while, do-while, and the two kinds of for loop. The C# or Java library patterns that match these (forLoopRange, forLoopSimple, whileLoop, and doWhileLoop) each has its own particular set of properties. The loop variable returned by our new allLoops pattern, on the other hand, uses three properties to summarize those different loop types. (The .after property applies only to forLoopSimple: It is the statement that is executed at the end of every iteration.)

As an example of using the custom allLoops pattern, suppose we wanted to write a CodeXM checker that inspected the body of each loop in the target code, without caring what kind of loop contained that body. The following code sample shows how you could set this up:

CXM code follows
for loop in allFunctionCode % allLoops : let body = loop.body in /* Something that needs to use the information in a loop's body, but doesn't care what kind of loop it is. */

The new pattern provides a generic way to locate loop bodies, greatly simplifying our search condition.

Note: Some language libraries have a pattern to match all loop types. The C/C++ library and the Python library do not.

Finding a Literal Argument Value

With the functionCall pattern, available for all supported target languages, you can detect a literal value being passed to a function.

Use case:
Find an argument that has a particular value, or that does not have a desired value.

You might want to detect an illegal value being passed, a mandatory value, or an inappropriate usage. For example, legacy code might specify a cryptographic standard that is no longer considered secure.

The following example simply checks for a mandatory value. The functionCall pattern’s argumentList property lets your checker inspect what is being passed to a speficied function.

CXM code follows
checker { name = "BAD_ARGUMENT_VALUE"; reports = for node in globalset allFunctionCode where node matches functionCall { .calledFunction.identifier == "setIterations" } as fc && fc.argumentList[1] matches intLiteral { .value != 256 } : { events = [ { tag = "testArgValue"; description = "The 'steps' parameter" + " must equal 256."; location = node.location; } ]; } };

Handling Null Values

In the previous pattern for loops (Returning Objects from Patterns), you might have noticed that the after property and condition property can be null, depending on the type of loop that is matched. In fact, we say that these properties are nullable.

In CodeXM documentation, a nullable value is indicated by the potential type followed by a question mark; for example, string?.

Accessing a null value is not a fatal run-time error in CodeXM, as it is in many other languages. A nullable property is one that might not return an actual value: This does require additional CodeXM code to account for that possibility.

You can handle a nullable type in one of the following ways:

The following code sample uses the method of matching NonNull:

CXM code follows
for loop in globalset allFunctionCode % allLoops where loop.after matches NonNull as aft && // Use aft here, knowing now that it isn't null.

The following code sample uses the converse method of checking that a property value is null:

CXM code follows
for loop in globalset allFunctionCode % allLoops where loop.after == null && ( /* The loop.after property is now confirmed to be null, so don't attempt to use its value here! */ )

The null-coalescing operator simply specifies a default value to use when a property is null. The following code fragment shows how to use it in code:

CXM code follows
myNullableString ?? "default value"

... After this expression, the CodeXM code can now access myNullableString just as it accesses any non-nullable property. If the property has a value, that is what CodeXM inspects. If the property is null, CodeXM uses the value "default value" instead.

Note: Some language libraries have a pattern to match all loop types. The C/C++ library and the Python library do not.

Using the ‘exists’ Expression

Imagine that you want to write a checker to find all switch statements that don’t have a default clause.

Use case:
Find switch statements that lack a default clause.

A missing default clause is not only poor logic; it can lead to crashes.

The following code outline is an initial pass for such a checker:

CXM code follows
include `C/C++`; checker { name = "SWITCH_NO_DEFAULT"; reports = for code in globalset allFunctionCode where code matches switchStatement as sw && /* ... we’re getting to this part next ... */ : { events = [ { description = "Switch without default"; location = sw.location; } ]; }; };

Thus far, the code shown above finds all instances of switch statements. We need to narrow the search. The variable sw is a switchStatement object, so it has a property called caseList, which contains all the switch statements, including the default case.

We might be inclined to search sw.caselist for defaultStatement—but this just returns the statement itself, when what we want to know is whether such a statement is present. The exists expression comes to our aid.

A CodeXM exists expression looks like a for-in loop (to help illustrate what it does). It differs from the for loop in that it returns true if there is at least one object that satisfies the condition. It returns false otherwise.

The following code fragment shows how an exists expression is constructed:

CXM code follows
exists t in sw.caseList where t matches defaultStatement

Actually, we want to also use the logical NOT operator, !, to find the switch statement that does not contain a default clause. So the completed checker looks like the following code:

CXM code follows
include `C/C++`; checker { name = "SWITCH_NO_DEFAULT"; reports = for code in globalset allFunctionCode where code matches switchStatement as sw && ( ! (exists t in sw.caseList where t matches defaultStatement) ) : { events = [ { description = "Switch without default"; location = sw.location; } ]; }; };

Note: The space after the NOT operator ! is not strictly required, but we put it there to make the operator stand out. While you are reading code, neglecting to notice a NOT in a condition can lead to great confusion.

Defining CodeXM Variables

Like most common programming languages, CodeXM has the concept of variables.

Unlike most common programming languages, where you can change the value of a variable once it has been declared, when you declare a variable in CodeXM you must assign it a value, and thereafter you cannot assign a new value to it. The value of a CodeXM variable is effectively constant.

In CodeXM, you can define global and local variables, and the two have similar—though not identical—syntax. They are also placed in two different locations within a file.

Global Variables

To define a global variable, you place it outside of a checker. This global variable is accessible (or, in scope) from any part of the file in which the variable is declared. The following code shows the syntax for defining a global variable:

CXM code follows
let variableName = value;

Always close a global variable declaration with a semicolon (;).

The following code fragment is an example of a global declaration as it might appear in CodeXM code:

CXM code follows
let defect_message = "A defect was found.";

Local Variables

You define a local variable within the body of a checker or a function. The following code shows the syntax for defining a local variable:

CXM code follows
let variableName = value in /* Some expression where the variable is used */

The variable takes on the value computed, just as in the global case, but its value is available (in scope) only within the expression that follows the keyword in.

The following example demonstrates variables local to particular for loops:

CXM code follows
for sw in globalset allFunctionCode % switchStatement : let sw_cases = for vcs in sw.caseList % caseStatement where vcs.valueExpression matches intLiteral as lit : lit in for c in sw_cases where c.value < 0 : // Do something with cases that have negative values.

Reviewing what the example shows, the first line identifies all switch statements (which we refer to, individually, as sw). The first for loop enumerates all the case statements within a given instance of sw, looking specifically for those that have an integer literal expression (for example, case 42:) and then collecting that set into a variable named sw_cases.

In the lines that follow the third in (the one on its own line), sw_cases is used in the next for loop to inspect each such integer to see if its value is less than zero.

The variable vcs is defined only within the first of the nested for loops, and the variable c is defined only within the second. The variable sw is available to both of the nested for loops, but it is available nowhere outside the enclosing for loop that begins this code fragment.

CodeXM Types

The types that the CodeXM language supports, such as integers (int) or strings (string), are likely to be familiar to you. Like many programming languages, CodeXM is strongly typed. In many cases, however, when you declare a variable you don’t need to use the name of the type: The type is inferred from the declaration syntax. For example, the following code snippet shows implicit typing:

CXM code follows
let i = 5;

... In this example, i is clearly an integer whose value is 5, so it has the type int.

You can specify a type explicitly. To do so, use a colon (:), as shown in the following code snippet:

CXM code follows
let s : string = "foo" in // Some logic that expects a string named s

Types are described fully in the Syntax Reference. These are some of the widely used ones:

bool
A Boolean whose value is either true or false.

int
An integer value; can be positive or zero. (CodeXM treats a minus sign ( - ) as applying a negation operation to a positive integer.)

string
A sequence of characters. A string literal is enclosed in double quotes; for example, "orange".

list<type>
An ordered collection of values; for example, let myList = list [ 1, 2, 4 ];. The values in a list must all be of the same type.

You can access a particular list element by using an array subscript operator; for example, myList[0] designates the first element in myList. Lists are indexed starting from zero.

A list also has a .length property that holds the number of elements in that list. For example, myList.length equals 3.

set<type>
Similar to a list, but the elements are not ordered. (Because of this, they can’t be indexed.)

Unlike the mathematical notion of a set, a CodeXM set can have duplicate entries.

enum<literal>
An unordered collection of literal identifiers; for example, let myDecision = { `yes`, `no`, `maybe` }.

The literals in an enum are similar to strings: They can contain any characters, including white space, and are enclosed in back-ticks.

Using Switch Statements

As many common languages do, CodeXM has a switch expression that compares one value to multiple patterns and returns a value based on the first pattern that matches. The following sample code shows the overall syntax of a switch:

CXM code follows
switch( /*variable*/ ) { | /* a pattern */ -> // What to return if a pattern matches | /* another pattern */ -> // What to return if a different pattern matches /* And so on: as many cases as you need */ | default -> // What to return if we get here. // This clause is optional // (but strongly reommended). }

CodeXM only goes through the cases—in the order they are listed—until one of the patterns matches, or the until it reaches the default case. In any event, when a match does occur, the switch returns the corresponding result. Therefore, the order of your patterns within the switch matters. The default case will always match if no case before it has done so. (The default case is optional, but we recommend that you always use it.)

Since you specify a pattern before each arrow, you can also specify an as variable. The variable definition goes after the pattern but before the arrow. This way, the expression to the right of the arrow can use the local variable to refer to the pattern’s result. A variable specified within a case remains in scope until the next case begins (at the vertical bar). Because the variable is local to a single case only, you can use the same variable name for several patterns.

The following sample code shows a completed switch statement that uses local, case-by-case variables:

CXM code follows
let cond = switch(code) { | ifStatement as c -> [ c.condition ] // A list of one element | whileLoop as c -> [ c.condition ] | forLoop as c -> [ c.condition ] | default -> [ ] // An empty list } in // An expression using the variable cond, which is of type list<expression>

Note: Strictly speaking, the vertical bar (|) before the first of the cases is optional, but our recommended style is to use it, to maintain readability and visual consistency.

To summarize this example, the switch checks whether code matches one of three statement types: in order, an if, a while, or a for. If code does match one of these, then cond is assigned a one-element list that contains the matching statement’s conditional expression. If code does not match (the default case), then cond is assigned an empty list.

Using Conditional Expressions

CodeXM also has an operator that is similar to the one used in other programming languages; namely, the ?: ternary or conditional operator. It provides the expected if-then-else functionality.

The following code fragment uses the conditional operator:

CXM code follows
let loopStr = (code matches whileLoop) ? "is a while loop" : "is not a while loop" in /* ... Further processing ... */

In other words, if code matches the whileLoop pattern, the conditional returns the string "is a while loop"; otherwise, it returns "is not a while loop".

The conditional operator can be hard to read. We recommend you use it only for brief, one-line tests such as the previous example. If the test requires more than a single line, and especially if it contains subexpressions, use an if statement.

Here is a sample of using an if statement to perform the same comparison we performed in the previous example:

CXM code follows
let loopStr = if (code matches whileLoop) then "is a while loop" else "is not a while loop" endif in /* ... Further processing ... */

An if statement is constructed out of the keywords if then elsif else endif, where the elsif clauses (there can be more than one of these) are optional.

Using the ‘debug()’ Function

The debug() function is useful when you debug your CodeXM checker.

This function prints its parameter (for example, a message string, the value of a variable, or both) to the console while your CodeXM checker executes.

Watch Out: The debug() function does not generate output unless you specify the --codexm-print-debug option when you invoke cov-analyze.

Because all functions return values (and you might not care what debug() returns), when you use the debug() function wrap it inside a let ... in expression.

The following code sample shows how you might call debug(). (Because we don’t care about the value that debug() returns, we use the underscore as a placeholder variable name. We won’t reference it elsewhere.)

CXM code follows
for e in globalset allFunctionCode % someFilterPattern : let _ = debug("Value of e is: " + e) // The message from debug() // is printed to the console. in { // Whatever you would ordinarily specify here // as the body of the 'for' expression. }

The code in this example prints to the console every expression that matches the filter pattern provided.

Defining CodeXM Functions

In addition to custom patterns, CodeXM allows you to define your own functions. This is particularly useful for structuring your CodeXM checker. Functions are even required to perform certain kinds of tests. This section shows an example of that technique.

Use case:
Find a throw operator outside a try statement.

A throw outside a try is simply poor logic.

This kind of search requires recursion, and to search recursively you need a function to call.

Here is an example of the code we want to examine:

C++ code follows
void f()
{
throw 1; // Find this one.
try {
throw 2; // But not this one.
}
catch( /*...*/ ) {
return;
}
};

We can use the following pattern to check for throw operators that are not inside of a try statement:

CXM code follows
pattern badThrow { throwOperator as t where ! insideTry(t) };

... But this example is not complete, because it relies on a call to the function insideTry(). Now we need to define that function as well.

CodeXM functions, like functions elsewhere, take arguments and return the value obtained from evaluating their function body.

To satisfy the needs of the pattern we defined above, the implementation of insideTry() returns a Boolean value that indicates whether some ancestor of the function’s argument is a try statement or not. The code to define the function looks like this:

CXM code follows
function insideTry(n : astnode) : bool -> if n.parent matches NonNull as p then // Check whether parent exists switch(p) { | tryStatement -> true // If a try statement, return true | default -> insideTry(p) // ... else, recursively check parent } else false // No more parents: not inside a try endif;

Examining this example, we see that the function expects an astnode argument and returns a Boolean value. (An astnode is a node in the Abstract Syntax Tree: in effect, any statement or expression.)

First, the function checks whether the node has a parent.

Functions, like patterns, are good for modularizing your code, providing easy-to-maintain reusable elements, or otherwise just giving descriptive names to complicated logic. Used together with checkers, they can make CodeXM an effective and powerful language.

Including Other CodeXM Files

With CodeXM, you can include other CodeXM files by using the include directive. When you include another file, the patterns, functions, and other definitions kept in that file become available to the current CodeXM file. This allows you to define a collection of useful tools in one place, and share them among many checker files.

To do this, use the following syntax in your checker file:

CXM code follows
include "../path/to/codexm/file.cxm";

You’ll notice that unlike other instances of include mentioned in this document, the code that follows the include keyword is a (quote-enclosed) string, not a (backtick-enclosed) enumeration member. When quotes are used, this tells CodeXM to look for a file on your file system. In contrast, the enum-based include directives cause predefined libraries installed with cov-analyze to be included.

Relative paths used with the quoted-string variant are relative to the location of the CodeXM file.

Finding Built-in Code Patterns

Sometimes it might not be clear what pattern you should use to match a particular bit of your code.

To help you find the pattern (or patterns) that would match a specific part of your code, use cov-manage-emit to dump the specific part of your code. (In particular, a function in which you notice the thing you’re looking for.)

To do this, it helps to generate an intermediate directory that contains the desired code function. You might want to try this on a small test case first, where a small source file contains the function (or a relevant part of it, anyway) for which you’re trying to find a pattern. Given that, invoke cov-build as follows:

Command line follows
$ cov-build --dir mycxm -- gcc -c -o test.o test.c

This captures the source of test.c into the mycxm directory.

Now, we need to see the CodeXM patterns in our test file. We do this by running cov-manage-emit. The general form is:

Command line follows
$ cov-manage-emit --dir mycxm find function-regex --print-codexm

So for our example, let’s say you’ve created a little function called example() (that you put into test.c, above) containing the code of interest. Use the following command syntax:

Command line follows
$ cov-manage-emit --dir mycxm find ".*example.*" --print-codexm

After running cov-manage-emit, the output for any function, class, enum, or global variable is presented in the following form:

Command output follows
{ pattern : "functionDefinitionCode", record : { body : { record : { containedStatements : [ { pattern : "simpleStatement", record : { expression : { pattern : "assignmentOperator", ... functionSymbol : { record : { "identifier?" : "example", ...

By reading this JSON code, you can identify all the patterns present in your function, including the part that you’re interested in matching.

Note: As you can see in the sample output, the JSON pattern descriptions also use the <typeName>? convention to show nullable types.

Function Properties

A function property is, in effect, a sub-checker that provides information about a called function. You can think of it as a pattern on steroids, since it allows the main checker to match characteristics found in other functions that the current function calls. Examples might include Does the function being called contains side effects? Or, Does this function, or any function it calls, call system()? Function properties permit a lightweight form of interprocedural analysis.

Function properties are defined using the key-phrase function property, followed by an identifier, followed by a record definition that has a single property named models, which should be a pattern. The general form appears like this:

CXM code follows
function property /* name */ { models = /* The pattern to look for in called functions */; }

The pattern you provide to the models property should have the following form:

CXM code follows
pattern(astnode) -> record { events : list<eventRecord> }

That is to say, the pattern should expect to match an astnode (a statement, an expression, and so on) and return a record object with the member field events containing a list of eventRecord items (much like an ordinary checker does). This pattern should match nodes of interest within the called function, and produce one or more events descriptive of what was matched.

The following example shows a function property that determines whether a given called function calls system().

Use case:
Find function calls that contain calls to system().

This is similar to our earlier example that found calls to system() itself.
CXM code follows
function property callsSystem { models = pattern { /* The basic part of the pattern: Does this function call system() directly? */ | functionCall { .calledFunction.identifier == "system" } as call -> { events = [ { description = "system".formattedAsCode + " is called directly here."; location = call.location; } ]; } /* An optional part of the pattern: only necessary if the property is transitive (that is, if it also applies to functions within the function called). */ | callsSystem as transitiveCall -> { events = [ { description = "system".formattedAsCode + " is called by this function" + " (or one it calls)."; location = transitiveCall.location; } ]; } } };

Here we see that the pattern in the function property contains two cases. The first (top) case identifies an event of interest directly: a call of a function named system(). The second (bottom) case applies the function property recursively on call sites within the called function. (If that transitive behavior is not desired, that alternative of the pattern can be omitted.)

The function property behaves like a pattern you match against functionCall results. If the property matches, the called function in question exhibits that characteristic; if the property does not match, neither does the function.

To use our new function property we could write the following, treating the function property as an expression:

CXM code follows
for funCall in globalset allFunctionCode % functionCall where funCall matches callsSystem : /* Do something, such as report an issue. */

To be more complete, here the pattern is used to implement a complete checker. In this case, the function property is treated as a pattern:

CXM code follows
checker { name = "CALLS_SYSTEM" ; reports = for c in globalset allFunctionCode % callsSystem : { events = [ { description = "This function calls " + "system".formattedAsCode + "."; location = c.location; } ]; }; };

In this second example we use the function property like a pattern, making it the right-hand operand of the % (which-are) operator, instead of filtering against functionCall and then matching against the function property. Both methods work.

If you play with using this code, you’ll see that this brief checker finds all instances where a function calls system(), including indirectly so. This is a more general test than the earlier checker that searched for system() calls, but it might generate more issues than you expect.

If FunctionA calls FunctionB, which in turn calls FunctionC, which finally calls system(), you will get exactly what the checker was told to do: an issue report for each of FunctionA, FunctionB, and FunctionC, because each of them ultimately causes a call to system().

This might not be the result you want. More often you will probably want to combine the function property with another condition, specific to the current function, so only when both conditions are met do you actually generate an issue event.

Path-Sensitive Analysis

So far we have focused on detecting the presence of single elements of code within your project: a goto statement, for example, or a call to the system() function. Finding such standalone elements is useful, but it can only help you detect a certain class of coding problems. Wouldn’t it be nice if you could also have CodeXM find problems that involve a specific sequence of code elements?

You can, through a mechanism called path-sensitive analysis.

As you are probably aware, the computer executes the instructions in your functions one at a time (at least conceptually; let’s ignore threads and all the pipeline optimizations modern processors do to speed up your code: in spite of all these neat tricks, it’s all meant to give the illusion that instructions are executed in discrete, ordered steps). Any nontrivial function will likely have some control flow: perhaps a condition is tested, and two or possibly more different ways can be taken based on the outcome. Each possible way through a given function—from the entry point to where it returns—is called a path.

CodeXM has the ability to evaluate the feasible paths through a function, looking for specific sequences of code along any one of them. This is what path-sensitive analysis is all about.

Finding sequences of code along paths is not much more work than what we’ve already done. We just need to introduce two syntax elements: the sequence() function and the >=> operator (AKA the happens before operator). These constructs, working together, describe sequences of code elements rather than just individual elements.

Random Thoughts

Calling things in the wrong order is a common problem when programming. Take, for example, forgetting to seed the pseudorandom number generator before you start to pull numbers out of it.

Use case:
Find calls to rand() not preceded by a call to srand().

Like failing to initialize a variable, failing to seed a pseudorandom number generator leads to unexpected and incorrect reults.

Here is an example of outright forgetting to call srand():

C++ code follows
void forgotToSeed(int i) {
int x;
// Some logic
x = rand();
// Some more logic expecting x to be random
}

More troublesome, though—and not as easily detected manually—is the case where srand() is called, but only sometimes:

C++ code follows
void onlySometimesSeeded(int i) {
int r;
// Some logic
if(someComplicstedCondition) {
srand(i);
}
// More logic
r = rand(i);
// Still more logic, now expecting r to be validly random (it might not be)
}

This second example collapses what might be a complicated bit of control flow into a single if statement, but the point is that while there is a path that does call srand(), there is also a path where srand() is not called. The second of these paths is the one we want to find.

With the problem laid out, let’s see how we attack it. To express it in words: We are looking for a sequence where we have not called srand() before we call rand(). Here is how we can express it in CodeXM:

CXM code follows
sequence( ! callToSRand >=> callToRand )

These are some issues you should be aware of:

The sequence() function just produces a special kind of pattern that understands paths. In fact, we’ll end up putting it into a checker that specifically walks along paths. The checker looks like this:

CXM code follows
checker { name = "UNSEEDED_RAND"; reports = for f in globalset allFunctionDefinitions where f.paths matches sequence( ! callToSRand >=> callToRand ) as errs : errs };

So the checker itself is pretty straightforward. All we need to do now is to define what we mean by callToSRand and callToRand. Given that CodeXM is a define-before-use language, somewhere above the preceding snippet we need to define the two patterns that match what we are looking for. The first one is the simpler of the two and should be fairly self-evident, but it does introduce one new concept: the path step. Let’s examine it:

CXM code follows
pattern callToSRand { nodePathStep { .node == functionCall { .calledFunction.identifier == "srand" } } -> { event = null } };

This pattern is special in that it specifically matches a path step; that is, a single, discrete bit of code that is part of a larger path object. In order to be interesting to us, the node in question is expected to be a call to the function srand().

Another curiosity is that this pattern returns an empty event list. We can get away with this because the sequence is looking for the absence of a call to srand() (that is, ! callToSRand) so we have nothing meaningful to report if srand() is actually called.

The other pattern we need finds a call to rand(), and report its finding. Structurally this is similar to the previous pattern, but of course we change the name of the function we’re looking for, and in this pattern we do want to report something. To construct the report, we need to save what was matched, in a new variable, and then construct a proper event list that references the variable. Here is the pattern in its entirety:

CXM code follows
pattern callToRand { nodePathStep { .node == functionCall { .calledFunction.identifier == "rand" } } as randCall -> { event = [ { tag = "call_rand"; description = "Calling " + randCall.node + " without previously calling " + "srand()".formattedAsCode + "."; location = randCall.node.location } ] } };

So what might you expect to find when you run this checker? The simple case we illustrated above is straightforward. The following sample annotates the code with a message generated by the checker UNSEEDED_RAND:

C++ code follows
void forgotToSeed(int i) {
int x;
// Some logic
x = rand(); [Issue] Event call_rand -
[1] Calling rand() without previously calling srand().
// Some more logic expecting x to be random
}

The second example—where some paths seed the generator but others don’t—is more interesting: Not only does the checker tell you that the code forgot to call rand(), it tells you which path that was taken failed to do so.

Here is the second example, with messages from Coverity Analyze added:

C++ code follows
void onlySometimesSeeded(int i) {
int r;
// Some logic
if(someComplicstedCondition) { [1] Event cond_false: Condition
"some_complicated_condition", taking false branch.
srand(i);
} [2] Event if_end: End of if statement.
// More logic
r = rand(i); [Issue] Event call_rand - [3]
Calling rand() without previously calling srand().
// Still more logic, now expecting r to be validly random (it might not be)
}

So describing sequences of steps in your code is not much more difficult than writing other kinds of checkers. This capability opens the door to solving many problems: For example, you can use path sensitivity to check that resources you opened are always closed, or that you only call certain functions between two “book-end” functions.

Remember: A sequence is limited to finding steps within a single function. Writing an interprocedural checker that combines function properties and sequences is possible, but we’ll leave that as a proverbial exercise for the reader.

Writing Your Own Don’t Call Checker

A family of Coverity Analysis checkers known as Don’t Call checkers (their names begin with “DC.”) report calls to functions that for one reason or another are considered risks to the security and integrity of a program. You can use CodeXM to add custom checkers of the Don’t Call type.

Because it can execute operating system commands, the system() call in C and C++ is one such function, as we have mentioned before. Those sections were written with teaching the use of CodeXM in mind, and so the examples they use are fragments of code. Here is the complete code for a checker that provides a Don’t Call for system():

CXM code follows
include `C/C++`; checker { name = "DONT_CALL_SYSTEM"; reports = for call in globalset allFunctionCode % functionCall { .calledFunction.mangledName == "system" } : { events = [ { description = "Calling " + call.calledExpression; location = call.location; } ]; }; };

Although this is not a lengthy example, it brings together several of the features and techniques that previous sections have described.

A CodeXM Style Guide

Here are some guidelines for CodeXM source code. Both developers at Synopsys and developers outside the company have found them helpful for keeping code consistent and easy to read. Code examples in the CodeXM documents follow these guidelines, too (for the most part).

Alternatives in Patterns

Alternatives in multiline patterns that use the | separator should include the first, optional |. (The switch expression uses the same convention.)

The following code shows an example:

CXM code follows
pattern operationExpressions { | binaryOperator as o -> { lhs = o.lhsExpression; rhs = o.rhsExpression } | unaryOperator as o -> { lhs = o.operandExpression; rhs = null } };

Anonymous Functions

Try to keep anonymous, or lambda, functions to a single line, as in the following code:

CXM code follows
let noNegatives = filter(function (a: int) -> a > 0, listType) in // ...

If the lambda function has to use several lines, use the following indentation style:

CXM code follows
let hasAcceptableValue = filter( function (a: int) -> switch (a) { | 1 -> true | 10 -> true | default -> false }, listType ) in // ...

Braces

Use the style known as the One True Brace Style (1TBS or OTBS).

This style is a variation of the Kernighan and Ritchie (K&R) style, the style used in those authors’ book, The C Programming Language.

The salient points of the 1TBS style are:

The following code is an example of using the One True Brace Style:

CXM code follows
checker { name = "NO_GOTO"; reports = { events = [ // ... ]; }; };

Caution: Unlike some languages, CodeXM does not support using brackets to enclose a single statement (as opposed to a block) for an if statement, its else clause or elsif clauses, or the contents of a for loop.

Comments

Conditional Expressions

Use the ?: operator only for simple conditional expressions that fit onto a single line, as in the following example:

CXM code follows
let maxStr = (x > y) ? "x is greater" : "y is greater"

If the conditional expression requires more than one line, or if you think that readability is important, use if then else endif instead, as in the following example:

CXM code follows
let maxStr = if (x > y) then "x is greater" else "y is greater" endif

Place the keywords if, else, and endif at the same indentation level. This applies to elsif clauses as well, as in the following example:

CXM code follows
if a matches binaryOperator as b then { lhs = b.lhsExpression; rhs = b.rhsExpression } elsif a matches unaryOperator as b then { lhs = b.operandExpression; rhs = null } else { lhs = a; rhs = null } endif

If the logic of the conditions is hard to follow, it might be better to nest if-expressions, as the next example shows:

CXM code follows
if e.type matches intType { .isSigned == true } then let interval = getIntegerInterval(e) in if interval.lower matches NonNull as lower && lower >= 0 then `NEVER` else if interval.upper matches NonNull as upper && upper < 0 then `SURE` else `UNKNOWN` endif endif else `NEVER` endif

Constants

Avoid using “raw” literals in code. To improve maintainability and readability, use the let operator to create a global value that has a name, as in the following example:

CXM code follows
let b = switch(a) { | binaryOperator as o -> o.lhsExpression | unaryOperator as o -> o.operandExpression | default -> null }

Enum, List, Record, and Set Literals

Examples of Enums

CXM code follows
typedef stateKind = enum { `INIT`, `ELSE`, `LB` };
CXM code follows
typedef statedef = enum { `warm`, `tepid`, `cold` };

Examples of Lists

CXM code follows
typedef numberList = list [ 1, 2, 3 ];
CXM code follows
typedef longList = list [ { field1 = /* ... */; field2 = /* ... */ }, { field1 = /* ... */; field2 = /* ... */ }, { field1 = /* ... */; field2 = /* ... */ } ];

Examples of Records

CXM code follows
let thisRecord = { field1 = 1; field2 = 2; /* ... */ fieldN = n };
CXM code follows
let thisRecord = { field1 = 1; field2 = 2; /* ... */; fieldN = n };

Examples of Sets

CXM code follows
typedef fib = set [ 1, 1, 2, 3, 5 ];
CXM code follows
typedef longSet = set [ { field1 = /* ... */; field2 = /* ... */ }, { field1 = /* ... */; field2 = /* ... */ }, { field1 = /* ... */; field2 = /* ... */ } ];

‘for’ Loops

CXM code follows
for n in globalset allFunctionCode where n matches binaryOperator : // ...
CXM code follows
for n in globalset allFunctionCode where n matches binaryOperator as o && o.lhsExpression matches variableReference as v : // ...

Functions

Unless the line becomes too long (>100 characters), put the function’s name, argument list, and optional return type on a single line, and close the line with the -> arrow that introduces the function body.

If the entire function fits on a single line, that is OK. Here is an example:

CXM code follows
function addOne ( num : int ) -> num + 1 ;

If the function arguments don’t all fit on one line, give each argument after the first one a line of its own, aligned with the first argument, as the following example shows:

CXM code follows
function manyArguments ( aArg: aArgType, bArg: bArgType, // ... xArg: xArgType ) -> // ... function body ... ;

If the function requires more than one line, place the semicolon ( ; ) that closes the function body on a line of its own, at the same indentation level as the keyword function. This is shown in the previous example and in the one that follows.

If a return type fits on the same line as the function keyword, that is OK. Otherwise, put the return type on a line of its own and align the colon ( : ) that introduces it with the name of the function. Here is an example:

CXM code follows
function manyArguments ( aArg: aArgType, bArg: bArgType, // ... xArg: xArgType ) : longReturnType -> // ... function body ... ;

Identifiers

Use camelCase for identifiers (as opposed to snake_case).

‘let’ Bindings

If a let expression fits on a single line, end the line with the in keyword. Indent the lines that follow by one more level, as in the following sample code:

CXM code follows
let x = x + y in // ...

If the let expression is too long to fit on one line, then use multiple lines and align the in keyword with the opening let.

Here is an example of a multiline let expression:

CXM code follows
let b = switch(a) { | binaryoperator as o -> o.lhsExpression | unaryoperator as o -> o.operandExpression | default -> null } in // ...

Multiline Logical Expressions

If a conditional expression is too long to fit on one line, begin a new line with the first subcondition, and indent this line by one level. If there are further subconditions, put each on its own line. Begin each new line with the logical operator (such as && or ||). Indent each subcondition at the same level as the first.

Here is an example of a multiline logical expression:

CXM code follows
node matches ptn as p where p.field1 matches expression && p.field2 matches pointerDereference && p.field3 matches intType

Pattern Decompositions

The indentation of pattern-decomposition code should parallel the level of nesting it is matching in the target code. Here is an example:

CXM code follows
pattern assignToHexFortyTwo { assignmentOperator { .targetExpression == variableReference; .sourceExpression == intLiteral { .value == 42; .base == `hexadecimal` } } };

Spacing and Line Length

‘switch’ Expressions

The following sample code illustrates all of these conventions:

CXM code follows
switch (a) { | case1 -> let b = someFunction(a) in { field1 = /* ... */; field2 = /* ... */; } | pattern { binaryOperator { .lhsExpression == variableReference } } -> let b = someFunction(a) in { field1 = /* ... */; field2 = /* ... */; } | default -> null }