Synopsys logo

Coverity® 2020.12 CodeXM

Syntax Reference

Introduction

Read the “QuickStart” First

Typographic Conventions

Describing Syntax

How Does Coverity Analysis Work?

CodeXM File-Level Elements

Expressions

The CodeXM Type System

Appendix: CodeXM Keywords

Hide/show Contents

Coverity® 2020.12 CodeXM

Syntax Reference

Introduction

CodeXM is all of the following things:

In CodeXM, as in Coverity Analysis, the central element is a checker: a portion of code that describes what you’re looking for, and what to do when you find it. A CodeXM file can define any number of these checkers, along with supporting functionality. Each checker can examine code in order to look for different kinds of patterns.

A number of predefined utilities are provided to support CodeXM checkers. These include both language-specific predefined patterns and functions, and general-purpose functions for use with any supported language.

As mentioned above, CodeXM is a functional programming language. A functional language is slightly different from most programming languages commonly employed in production environments today: These languages are variously described as being procedural or imperative languages.

In those other languages, you typically define a set of steps: first do this; when that is complete, do the next thing; and so on. In CodeXM, as in other functional languages, the logic is expressed in expressions, not statements. Using a functional language is more like defining the formulas in a spreadsheet, than it is like programming a procedural language such as C or Java.

In CodeXM you specify where you’re looking, and what you’re looking for. You also describe how to present the results when a CodeXM checker finds the pattern you’ve defined

Though it uses expressions rather than statements, CodeXM should still be easy to learn if you have programmed before. Most of the inner workings of code analysis are left unexposed, and what does show through is revealed in ways that ought to be familiar to developers who use typical programming languages. If you’re looking for a goto statement or an if statement, you will write a CodeXM checker that use a gotoStatement or an ifStatement pattern to find—or, in CodeXM parlance, to match—these statements in the target code.

Read the “QuickStart” First

If you have not already done so, we strongly recommend that you read the QuickStart Tutorial section of the Learning to Write CodeXM Checkers guide. This lesson gives you a taste of actually writing a CodeXM checker.

This syntax reference only describes the grammar of CodeXM: the rules of the CodeXM language. When you write a checker, as the “QuickStart” shows, you will also need to work with the patterns that are provided by the libraries for the various supported languages: Each such library has a reference manual of its own.

Typographic Conventions

Describing CodeXM is a bit different from describing other languages. Most programming books talk about one language only. To describe CodeXM, we have to talk about CodeXM code itself, and the code of the language it is inspecting—the target language.

This reference shows all code (except pseudocode) in monospace type.

CodeXM code appears in shades of green.

Target code (whatever the language) appears in shades of orange.

Nullable Types

The type of certain pattern properties is said to be nullable. This means that the property might return a value, or it might not. If it does not, its value matches the CodeXM keyword null. In the CodeXM documents, a nullable type is indicated by the name of the type the property might return, followed by a question-mark; for example, int?.

In checker code, a nullable type requires some special handling in order to avoid the error of referencing the null. The “Handling Null Values” section of Learning to Write CodeXM Checkers describes how to do this.

Describing Syntax

This reference, like much programming-language documentation, uses both railroad diagrams and a version of EBNF to describe the syntax of CodeXM.

Syntax Diagrams

Syntax diagrams (often called “railroad diagrams”) have long been used to visually illustrate the syntax of various programming languages. Because they are easy to read, we use them in this document, too.

In case you’ve never seen these before and are unfamiliar with how to read them, the notion is that you follow the path from left to right. If an item lies on the path, it must be part of the code. If the path branches, you can choose one of the options that are shown. The path can even loop back to indicate a repeating element, but you cannot reverse direction to go against the overall movement from left to right.

When you reach the right end of the diagram, you have constructed an example of the portion of syntax that the diagram describes.

Syntax diagrams use a few visual conventions to help you see what the diagram is describing. In addition to the “railroad-track” path, the diagrams can include the following three elements:

The following text and graphics show some examples of syntax diagrams.

The syntax requires keyword to introduce another portion of syntax:

keyword another-rule

The syntax requires one of two choices. You can pick one or the other, but you have to make a choice:

choice1 choice2

The syntax allows an option. You can include the option, or omit it:

optional

The syntax allows the something element to repeat an indefinite number of times:

something

The syntax allows the something element to repeat an indefinite number of times, but each additional something must be preceded by a plus sign:

something + something

As you see, syntax diagrams are easy to follow, but they are versatile enough to fully describe the syntax of any context-free grammar.

Extended Backus-Naur Form

This reference also describes CodeXM syntax by using Extended Backus-Naur Form (EBNF). EBNF is a notation that uses text rather than graphics to describe programming-language syntax.

Attention: The syntax diagrams and EBNF notation both describe exactly the same syntax. However, they do so in slightly different ways. Do not be alarmed if there is not always a one-to-one correspondence between the symbols of the two notations.

Many variations of EBNF are in use. The variation we use, and describe in this section, is based on that used by the W3C. This version is comparatively simple, clean, and free of clutter.

Basic Components of EBNF

Modifiers to Basic EBNF

The grammatical elements described in the previous list can optionally be qualified by one of the following regular-expression conventions.

Reminder: When a qualifier follows a parenthesized group, the qualifier applies to the entire group as if the group were a single entity.

For example, the following EBNF specifies a lowercase letter followed by zero or more digits:

[a-z][0-9]*

Careful! No parentheses group the range of letters [a-z] with the range of digits, so the asterisk applies only to the range of digits. This character sequence begins with a lowercase letter, but there can be only one of these.

For example, the following EBNF specifies a sequence of one or more digits—in other words, it can represent any positive integer:

[0-9]+

For example, the following EBNF specifies an optional semicolon:

';'?

CodeXM Character Set

Characters in a CodeXM source file are limited to the original 95 ASCII characters that can be typed or printed: This includes the uppercase and lowercase letters from A to Z, the ten digits, and 33 special characters in common use. Extended characters such as letters with accents are not supported.

White Space and Comments

As with most software languages, white space in CodeXM is largely discretionary (but see our recommendations in “A CodeXM Style Guide”). The exception is that you need white space to separate adjacent keywords and identifiers.

For example, function myFunction parses correctly, but functionmyFunction does not: You need a space to separate the keyword function from the identifier myFunction.

You can’t put a space within an identifier.

Spaces within a literal character sequence, such as a string value or a member of an enum, are allowed and are significant.

In CodeXM, you can place a comment anywhere that white space is allowed and is not significant. The format of comments in CodeXM will be familiar if you have used C, C++, or related programming languages. Specifically, there are two kinds of comments:

Here are examples:

CXM code follows
/* A multi-line comment: This one spans four lines. */ // This comment ends at the end of this line.

How Does Coverity Analysis Work?

To fully answer the question posed by the title of this chapter would require a lot of detail. This chapter isn’t a thorough description, just a birds-eye view of what Coverity Analysis does. If you understand the methods described here, you will have an easier time writing CodeXM checkers.

For a compiled, procedural language, one of the first steps the compiler does is to parse the source code and construct a syntax tree. You might already be familiar with how these look.

For example, the source code x - y*2 + 1 can be expressed in tree form, as the following illustration shows:

Code expressed as a tree of nodes

Constructing a syntax tree is also one of the first steps taken when you run cov-analyze.

A code compiler proceeds to traverse the syntax tree and use it to generate object code.

The cov-analyze program also traverses the syntax tree, but in this case it does not generate code. Instead it searches for the patterns specified by the checkers it is running. Those checkers can be provided off-the-shelf, as components of Coverity, or they can be custom checkers written using either CodeXM or the Extend SDK. When a checker finds a target pattern, it generates a report—an event.

A standard compiler generates a tree that accurately represents the source code. Coverity, on the other hand, applies some optimizations and simplifications, to facilitate the work of static analysis. (It would be possible to analyze literal source, but that would be much, much slower.) Because of this, the structure built by Coverity is known as an abstract syntax tree, or AST.

To use Coverity, you don’t need to know details about the AST structure or how analysis works. If you are writing checkers of your own, it can help to bear in mind that the checker is not analyzing the original source code, but a representation of it that is built of node objects organized into a tree.

In terms of CodeXM code, each of the nodes shown in the tree shown above is an astnode object. There are various types of these. Some nodes represent binary operators such as assignment, addition, and multiplication. Other nodes represent variables; still others represent literals such as integers or strings.

Note: There are no astnode objects for white space or comments.

A pattern you specify in CodeXM, for example, can look for instances of addition—this would locate the y*2 + 1 expression in the tree illustrated above, or any other occurrence of the + operator). A pattern can look for multiplication of a variable and a constant—as in y*2. It can also look for binomial expressions such as instances of addition (or subtraction) used with multiplication—such a pattern would match y*2+1 or z*3-4, but would not match 1+2.

A pattern you write can be as specific and as thorough as you need it to be.

Every CodeXM language library has an astnode object, and each astnode has a specific set of properties. In addition, there are various types of astnode objects: Each type has specific properties of its own, as well as the ones it inherits from the parent astnode object. See any of the language library references for the details.

Once it has constructed the abstract syntax tree, cov-analyze inspects all possible execution paths. This is one of the advantages of static analysis: It is fast enough to comprehensively look at what the code will do—unlike dynamic analysis, which has to execute the target program, and so to be efficient has to rely on a sampling of behaviors.

Not that we recommend you use only static analysis. A robust testing strategy should employ both techniques, as appropriate.

CodeXM File-Level Elements

CodeXM source files have a .cxm filename extension.

This chapter describes the high-level syntax elements that can be present in such a file.

Specifying the Target Language

In your CodeXM source, begin with an include declaration that names the target; for example, include `C/C++`;.

Specifying the target language includes the library for that language. This makes language-specific patterns and functions available within the CodeXM file.

The language specification also causes the checkers defined in your CodeXM file to be applied only to appropriate target source code in your code base.

The identifier

Identifiers name elements of CodeXM syntax: variables, record properties, and so on. All identifiers use the same syntax.

Syntax

An identifier begins with either an underscore ( _ ) or a letter. This first character can then be followed by zero or more letters, underscores, or digits. Letters can be either upper case or lower case, but the two are not interchangeable. CodeXM is case sensitive, so the identifier variable is a different entity than Variable, and both are different from VARIABLE.

The characters in an identifier must be in the set of allowable CodeXM characters.

identifier ::= [_a-zA-Z][0-9_a-zA-Z]* // White space is not allowed.

The CodeXM-file

A CodeXM file consists of checker definitions and the declarations that support them.

Syntax

The source file contains any number of declarations. Each declaration is terminated by a semicolon ( ; ). Declarations can appear in any order, except that a definition or declaration needs to appear before the first time a reference to it appears; in other words, the rule is Declare it before you use it.

Each of these definitions is described in greater detail below.

function-definition forward-declaration const-definition checker-definition function-property-definition include-directive pattern-definition type-definition ;

CodeXM-file ::= ( ( checker-definition | const-definition | forward-declaration | function-definition | function-property-definition | include-directive | pattern-definition | type-definition ) ';' )*

The checker-definition

Most of a typical CodeXM file consists of checker-definition elements. A checker element defines what the checker is searching for. It also specifies the report to generate when the examined target code is found to match the pattern described by this definition. Other CodeXM elements work to support the logic embodied by checkers.

Syntax

A checker is defined using the keyword checker followed by a record-expression.

When used to define a checker (as opposed to simply structuring data), a record-expression must contain a specific set of elements, structured in a specific way. These requirements are described in the “Details” section that follows.

checker record-expression

checker-definition ::= 'checker' record-expression

Details

At the top level, a checker must contain a record that contains two elements. The names of these elements are as follows:

name
A string that names the checker

reports
A record that defines the checker’s behavior

The reports record contains the following components:

First, a for-loop-expression (required) that defines the pattern for which this checker searches.
This expression must declare a variable to represent the portion of the code that is being searched. Typically this portion is in the range globalset—which, if not qualified, indicates all target-language code within the target source file.

events (required)
A list that contains at least one record (and that usually contains only a single record). The events records specify what to report when a search is successful.

issueType (optional)
A user-defined record that describes the checker. This record’s type and subtype fields can be used to categorize the checker’s purpose. Other fields can be used in the messages displayed by the reports record’s events component. To set the issueType you need to call defineIssueType(): See The ‘defineIssueType’ Function.

Finally, in its turn, a record in the events list contains the following components:

description (required)
A string or eventstring that the checker will display when it finds an issue. This text should describe the issue that was found.

location (required)
The location property must be assigned the .location property of the variable declared in the reports-level for loop.

tag (optional)
A string to identify this particular events record. For example, you might use tag values to facilitate searches in a .cxm file that contains multiple checkers.

Example of a Checker

The sample code in this section shows a simple but working checker that contains all the components that Coverity requires to run the checker successfully.

CXM code follows
include `C/C++`; // Includes patterns specific to the C or C++ // languages, and ensures that the checker // tests *only* C code or C++ code. checker { name = "NO_GOTO"; // Gives the checker a name. // This name will appear in reports of issues found. reports = for g in globalset allFunctionCode % gotoStatement : { // Scans the target source looking for the // 'gotostatement' pattern. events = [ // ... If there is a find, displays an issue report // that shows the description and location // specified here: { description = "Found a " + "goto".formattedAsCode + " statement."; location = g.location; } ] } };

The ‘defineIssueType()’ Function

Sets the value of a checker’s optional reports > issueType component. The issueType describes the purpose of the checker.

Here is an example of a call to defineIssueType():

CXM code follows
let myIssueType = defineIssueType( { type = "floating_point_equality"; name = "Bad comparison of floating-point expressions"; description = "Floating-point expressions shall not be directly" + " or indirectly tested for equality or inequality."; localEffect = "Unexpected behavior. Depending on the implementation," + " the comparison result may vary."; impact = `Medium`; } )

Parameters

Name Type Description
type symbol A type identifier for the checker
subtype symbol? (Optional) A subtype identifier for the checker; null if not specified
name string The name of the new checker
description string A description of the checker
localEffect string The consequence that the detected issue has, or might have
impact string The severity of the detected issue

The type and subtype values are identifiers, not strings. They must conform to customary rules for identifiers: have a leading alphabetical character, followed by letters or digits or the underscore ( _ ). These rules are summarized by the regular expression ^[a-zA-Z_][a-zA-Z0-9_]*$. In addition, these identifiers must be no longer than 64 characters.

The const-definition

A constant is a named value.

Syntax

A constant declaration begins with the keyword let, followed by an identifer that names the constant, then an equals sign ( = ) followed by an expression that evaluates to the desired value.

You can optionally specify a type for the constant. If you do not, the type is inferred from the expression.

let identifier : type = expression

const-definition ::= 'let' identifier ( ':' type )? '=' expression

Details

If a constant is declared at the file level, its scope is global.

Expressions can use a let binding to declare a named value. The usage is similar, but when let is used within an expression, the scope of the name is limited to the portion of the expression that follows the binding.

The forward-declaration

All identifiers must be declared before they are used. With a forward-declaration, you can declare an identifier before you provide its detailed definition. This can be convenient to do, especially when you declare two or more entities whose definitions are interdependent.

Syntax

The declaration is introduced by the keyword declare, the identifier to declare, a colon ( : ), and then the type the identifier will have when the declaration is completed.

declare identifier : type

forward-declaration ::= 'declare' identifier ':' type

The function-definition

CodeXM allows you to define functions that can evaluate specific expressions, allowing you to give a descriptive name to what otherwise might be a complicated expression. This helps readability and allows you to use that expression in more than one place.

Syntax

A function declaration is introduced by the keyword function and an identifier that names the function. This is optionally followed by generic type IDs, and then by at least one formal parameter that has a name and an explicit type. The return type of the function can optionally be specified, and then an arrow delimiter ( -> ) introduces the expression that defines the body of the function.

function identifier < type-parameter-identifier > , type-parameter-identifier > ( parameter-identifier : type , parameter-identifier : type ) : type -> expression

function-definition ::= 'function' identifier ( '<' type-parameter-identifier ( ',' type-parameter-identifier )* '>' )? '(' parameter-identifier ':' type ( ',' parameter-identifier ':' type )* ')' (':' type)? '->' expression

Each type-parameter-identifier is a generic placeholder for an actual type (see below). This identifier can be used within the parameter list as if it were a known type. It can also be used as the function’s return type, or in any portion of the expression that the function evaluates. When present, these identfiers appear immediately after the function name. Generic type identifiers are enclosed by angle brackets ( < and > ) and separated by commas ( , ).

Each parameter-identifier is an identifier for a formal, explicitly typed parameter. Each parameter name must be unique to the function. Named parameters are enclosed by parentheses ( ( and ) ) and separated by commas ( , ). Each named parameter has an associated type.

Each parameter’s type must be one of the following:

Details

Because CodeXM is a functional (rather than procedural) language, a function cannot generate side effects: For example, a function cannot set variables that persist beyond the scope of the function call itself.

A function can be recursive: The function’s name and type are accessible within the body of the function, enabling it to call itself.

For an example of recursive function use, see “Defining CodeXM Functions” in Learning to Write CodeXM Checkers.

Generic Parameters

By specifying generic parameters, as mentioned above, you are defining a parameter type (or return type) that can change according to the context in which the function is evaluated. For example, consider the following declaration:

CXM code follows
function genericDemo<T> ( l : list<T> ) : bool -> // Function body here ;

... here T is the type-parameter-identifier. At run time, T might be passed as an int, a string, a bool, or any other type.

The function-property-definition

A function property is, in effect, a subordinate checker that provides information about functions in the source code. It allows the checker to match characteristics found in functions called by top-level functions.

An example might be, “Does this function, or any function it calls, call system()?”. Function properties permit a lightweight form of interprocedural analysis. See Function Properties in Learning to Write CodeXM Checkers.

Syntax

Function properties are defined using the keyword phrase function property, followed by an identifier that names the function property, followed by a record that has a single property named models, specified by a pattern.

function property identifier { models = pattern-expression ; }

function-property-definition ::= 'function' 'property' identifier '{' 'models' '=' pattern-expression';' '}'

The include-directive

The include-directive performs two functions. First, it incorporates the contents of the specified CodeXM file, as if it were present in the current file. This permits you to write common support functionality and make it available to many different checkers, or to aggregate many checkers (separated for reasons of maintainability) into a single file that you can specify on the cov-analyze command line. Second, if an enum-literal-expression is present, and it indicates a language library, the expression ensures that the contents of this CodeXM file are applied only to code of the same language. That is to say, if your file contains the following line:

CXM code follows
include `C/C++`;

... it causes the checkers declared in your file to be applied only to C or C++ code.

Syntax

The directive is introduced by the include keyword, followed by either of the following:

include string-literal-expression enum-literal-expression

include-directive ::= 'include' ( string-literal-expression | enum-literal-expression )

The pattern-definition

A pattern-definition lets you define code patterns that can be more complex than the predefined patterns provided by a language library.

Custom patterns can be built using other patterns that have already been defined (either library patterns or other custom patterns in your own CodeXM project).

Syntax

A pattern definition is introduced by the pattern keyword, This is followed by an identifier to name the pattern, followed in turn by a match-list enclosed by curly braces ( { and } ).

pattern pattern-name { match-list }

pattern-definition ::= 'pattern' pattern-name '{' match-list '}'

Details

A pattern can be recursive, but to implement this, you need to define a function; also see “Defining CodeXM Functions” in Learning to Write CodeXM Checkers.

The type-definition

With a type-definition you can define a name for a new type based on one of the standard CodeXM types. You might want to do this to define a special-purpose type; for example, a specialized enum or record type.

Syntax

The type definition is introduced by the typedef keyword. This is followed by an identifier that gives the new type its name. Following that is an equals sign ( = ), and then the name of the base type.

typedef identifier = type

type-definition ::= 'typedef' identifier '=' type

Expressions

The expression

One of the implications of CodeXM being a functional programming language is that, loosely speaking, “everything is an expression”. More precisely, CodeXM has no statements that are executed sequentially, one after another; rather, the logic in CodeXM takes the form of expressions that are evaluated. Each expression produces a result that might in turn be used to further evaluate some other expression.

After the file-level elements, expressions are the main building blocks of a CodeXM program.

Syntax

> lambda-expression if-exists-expression if-expression happens-before-expression for-accumulate-expression for-loop-expression exists-expression element-access-expression decomposing-pattern-expression conditional-expression comparison-expression call-expression binary-expression addition-expression ( expression ) let-binding-expression list-expression literal-expression logical-expression matches-expression multiplication-expression pattern-expression property-access-expression record-expression record-update-expression set-expression set-filter-expression switch-expression unary-expression variable-expression

expression ::= ( '(' expression ' )' ) | addition-expression | binary-expression | call-expression | comparison-expression | conditional-expression | decomposing-pattern-expression | element-access-expression | exists-expression | for-accumulate-expression | for-loop-expression | happens-before-expression | if-exists-expression | if-expression | lambda-expression | let-binding-expression | list-expression | literal-expression | logical-expression | matches-expression | multiplication-expression | pattern-expression | property-access-expression | record-expression | record-update-expression | set-expression | set-filter-expression | switch-expression | unary-expression | variable-expression

Operator Precedence

Within an expression, operators have a precedence that determines the order of their evaluation in the absence of parentheses. This is a convention originated by algebra, and used by nearly all programming languages. CodeXM uses such a convention as well, though some of its operations are not commonly used in other contexts.

Operators also have an associativity, also applied in the absence of parentheses. This determines the order of evaluation when operators appear in a sequence; for example, the order in which to execute x + y + z would be ambiguous without associativity. Addition/subtraction and multiplication/division associate to the left, so the order of executing the previous example is ( x + y ) + z. (Parentheses have top priority, and override the default precedence of operations.)

The following table shows the order of precedence for the CodeXM operators. The top row has the highest precedence, the bottom row the lowest.

Operator Associativity
() {} not associative
. left
[] not associative
as not associative
% left
new not associative
default not associative
matches left
!, unary - right
* / left
+ - ++ ?? left
<=> left
> < >= <= left
== != left
&& left
|| left
>=> left
?: right
in not associative
where not associative

The ‘default’ Operator

Besides introducing the default clause of a switch-expression, you can use the keyword default as an operator that matches any value. The most common way to use default as an operator is in a matches-expression; for example:

CXM code follows
&& objectPropertyValue(obj, "expiry") matches default as x

... In other words, this expression sets the value of x to equal the value of the expiry property, regardless of what that value might be.

The addition-expression

An addition-expression adds ( + ) or subtracts ( - ) two integer values.

This expression has two specialized uses as well:

String concatenation: You can also use the plus-sign operator ( + ) to concatenate strings. However, the result of string concatenation is not of type string, but of type eventstring: so this usage is a special case that is not a true addition-expression.

Syntax

expression + - expression set-expression ++ expression nullable-expression ?? expression

addition-expression ::= ( expression ( '+' | '-' ) expression ) | ( set-expression '++' expression ) | ( nullable-expression '??' expression )

The set-expression evaluates to a set or a list.

The nullable-expression evaluates to a nullable type.

Null Coalescing

The null-coalescing operator ?? is not strictly speaking an addition operation, but it does have the same precedence as the other additive operators.

This operator evaluates its left-hand operand. If this is not null, this is the value that ?? returns; otherwise, it returns its right-hand operand. The point is to return a result that is usable in expression evaluations, and will not cause a run-time failure.

The following expression returns the same result as ??, but does so in a lengthier fashion:

CXM code follows
leftHandOperand matches NonNull as noLongerNull ? noLongerNull : rightHandOperand

For more information, see Handling Null Values in Learning to Write CodeXM Checkers.

The binary-expression

A binary-expression has a single operand infixed between two other expressions. Binary operators include standard operations for arithmetic, comparison, and logic; a few others have a more special-purpose use in the context of CodeXM

Syntax

expression == <=> ?? ++ / * - + != < <= > >= && || >=> % expression

binary-expression ::= expression ( '+' | '-' | '*' | '/' | '++' | '??' | '<=>' | '==' | '!=' | '<' | '<=' | '>' | '>=' | '&&' | '||' | '>=>' | '%' ) expression

The call-expression

Calls a function, using the specified arguments.

Syntax

The list of arguments is enclosed by parentheses ( ( ) ), and arguments are separated by commas ( , ).

The number of arguments in the call must equal the number of arguments in the function-definition. The arguments passed can be either named or unnamed. If the name is specified, it is followed by a colon ( : ) and then by the value of that argument.

function-producing-expression ( identifier : expression , identifier : expression )

call-expression ::= function-producing-expression '(' ( identifier ':' )? expression ( ',' ( identifier ':' )? expression )* ')'

The function-producing-expression specifies a function that has already been declared (unless you use forward-declaration).

Details

If an argument is named, that name must be the same as the corresponding formal parameter in that function’s definition. It is an error to use an argument name that doesn't appear in the definition. It is also an error to change the order of the arguments, whether or not you specify argument names.

For example, if a function is declared like this:

CXM code follows
function func(first: int, second: int, third: int)

... it can be invoked like this (that is, not naming the arguments):

CXM code follows
func(1, 42, 99)

... or like this (naming the arguments, in the right order):

CXM code follows
func(first: 1, second: 42, third: 99)

... or even this (naming only one argument):

CXM code follows
func(1, second: 42, 99)

... but the following is an error because reordering arguments is not permitted:

CXM code follows
func(second: 42, first: 1, third: 99)

The comparison-expression

The comparison-expression uses common operators to perform tests for equality or inequality, and relations such as greater-than or less-than.

Syntax

expression < != == <= > >= <=> expression

comparison-expression ::= expression ( '==' | '!=' | '<' | '<=' | '>' | '>=' | '<=>' ) expression

The operator <=> is the three-way comparison: Rather than a Boolean value, its result is –1 for less-than, 0 for equals, and +1 for greater-than.

Careful: The character-combination <> is not a valid operator, as it is in some languages. To compare for inequality, use the != operator.

Details

In general, you can compare expressions of the following types:

Note: If the two values being compared are not of the same type, CodeXM uses unification to find a substitution that makes the two types comparable, if that is possible. Unification is an operation that is similar to casting (also known as type conversion).

The conditional-expression

The ?: (“ternary”) operator evaluates one of two expressions, depending on whether the specified condition is true or false.

Syntax

condition ? if-true-expression : if-false-expression

conditional-expression ::= condition '?' if-true-expression ':' if-false-expression

A condition is simply an expression that evaluates to either true or false.

The decomposing-pattern-expression

With the decomposing-pattern-expression, your checker can match not only a particular pattern, but the value of specified properties of that pattern. This expression is a way to narrow a search.

Syntax

Pattern decomposition is expressed by a list enclosed by curly braces ( { and } ). Each item in the list consists of a property prefixed by a dot ( . ), an equals ( == ) or not-equals ( != ) assertion, and an expression that specifies the value of the property to match or to not match.

expression { . property-name == != comparand-expression ; . property-name == != comparand-expression }

decomposing-pattern-expression ::= expression '{' ( '.' property-name )+ ( '==' | '!=' ) comparand-expression ( ';' ( '.' property-name )+ ( '==' | '!=' ) comparand-expression )* '}'

Each property-name is an identifier that names a property of the pattern.

Properties can nest, so the property-name can be a path to a property. In other words, .calledFunction.identifier == something is legal syntax. Properties with a nullable type along the path are handled correctly.

Each comparand-expression should evaluate to one of the following:

Details

Suppose your checker uses a pattern, intLiteral, that matches integer literals in the target code. The intLiteral pattern has a property named value. The value property represents the numeric value of the integer literal.

If you use the pattern with no qualifications, as in the following code:

CXM code follows
x matches intLiteral

... then your checker matches any integer literal it finds in the code it is inspecting.

By comparison, if you use a property to qualify the pattern, as the following example shows:

CXM code follows
x matches intLiteral { .value == 42 }

... then your checker matches only instances of integer literals whose value equals 42.

The element-access-expression

This expression allows you to access individual elements within a list.

Syntax

The notation is probably familiar to you from working with arrays in other programming languages: It is simply the name of the list followed by square brackets ( [ ] ) that enclose an index value that specifies a particular list entry.

list-producing-expression [ index-expression ]

element-access-expression ::= list-producing-expression '[' index-expression ']'

The list-producing-expression is an expression that has already created a list.

The index-expression is a valid index to the list. The first element in the list is element zero (0), so the index of the last entry is one less than the list’s length property.

Details

This expression is available to lists but not to sets or enums. Sets and enums are considered to be unordered. You can convert a list into a set, but the reverse is not true.

The exists-expression

The exists-expression answers the question, “Is there at least one value in a set that meets a specified condition?” Its result, therefore, is simply either true or false.

Syntax

The expression is introduced by the keyword exists, followed by an identifier that is used to traverse the set.

The keyword in introduces the set itself. (You can also specify a list to search.) It is followed optionally by the keyword globalset (which is necessary if the set in question is a globalset type) and then by an expression that evaluates to the set.

The keyword where introduces a condition-expression, presumably expressed in terms of the identifier, that describes the object of the search.

exists identifier in globalset set-producing-expression where conditional-expression

exists-expression ::= 'exists' identifier 'in' ( 'globalset' )? set-producing-expression 'where' conditional-expression

Details

The exists-expression shown in the following code:

CXM code follows
exists x in someSet where cond // ...

... can be considered shorthand for the following lengthier (and less readable) CodeXM pattern:

CXM code follows
! (for x in someSet where cond: x).empty

If the keyword globalset is present, the condition-expression cannot reference local variables that were defined outside of the loop. See “The globalset-type” for more information.

The for-accumulate-expression

The for-accumulate-expression traverses a set or a list. For each member that it visits, it performs an operation. It accumulates the results of these operations, then returns the accumulated value as its result.

(If you are familiar with functional programming, you might recognize this as a fold operation over a set of elements.)

Syntax

The syntax of the for-accumulate-expression is very similar to the syntax of the for-loop-expression. The main difference is the additional accumulate clause.

The keyword for introduces the loop variable. The loop variable will be assigned, in turn, each of the values in the set being examined. The set itself is introduced by the keyword in, followed by the variable or expression that represents the set.

The keyword accumulate follows the set-expression. It introduces the accumulator-identifier, which names the accumulator variable. This is followed by an equals sign ( = ) followed by an initial-expression.

(You have the option of also specifying an explicit type for the accumulator.)

To filter the set to only those members that meet a specific criterion, you can use the where keyword followed by a condition-expression that describes the criterion. (If you omit the filter condition, the loop visits every member of the set: It is as if the condition-expression were simply the value true.)

Finally, a colon ( : ) introduces the yielding-expression.

The variable named by identifier is within scope, and can be referred to, in both the condition-expression and the yielding-expression. The accumulator-identifier, and any variable that is defined in the where clause, is within the scope of the yielding-expression.

for identifier in set-expression accumulate accumulator-identifier : type = initial-expression where condition-expression : yielding-expression

for-accumulate-expression ::= 'for' identifier 'in' set-expression 'accumulate' accumulator-identifier ( ':' type )? '=' initial-expression ( 'where' condition-expression )? ':' yielding-expression

The set-expression must result in a set or a list.

The initial-expression sets the initial value of the accumulator variable.

The condition-expression describes a condition (criterion). Members of the set that do not meet this condition are not evaluated.

For each member that is evaluated, the result of the yielding-expression is assigned to the accumulator variable. When the loop has completed, it returns the final value of the accumulator.

Details

Initializing the Accumulator

The value of initial-expression is typically the identity value for the accumulator’s type:

The Logic of for-accumulate

The following pseudocode shows the very basic logic of a for-accumulate-expression, expressed procedurally:

<type> accumulator = initial-expression foreach ( identifier in set-expression ) { if ( condition-expression is true ) { accumulator = yielding-expression } }

Thus, the CodeXM expression that follows:

for x in [1, 2, 3, 4, 5] accumulate sum = 0 : sum + x

... might be expressed by the following pseudocode:

int sum = 0 foreach x in [1,2,3,4,5] { // There is no ‘where’ clause, so all members are evaluated. sum = sum + x }

In a similar way, but using more complex expressions, the CodeXM code that follows finds the largest odd value (9, in this case) among a list of positive integers:

CXM code follows
for i in [ 1, 4, 8, 3, 7, 2, 9, 5 ] accumulate max = 0 where odd( i ): i > max ? i : max

... and this might be expressed procedurally by the following pseudocode:

int max = 0 foreach ( i in [ 1, 4, 8, 3, 7, 2, 9, 5 ] ) { if ( odd( i ) ) { if ( i > max ) then { max = i } else { // max doesn’t change } } }

The for-loop-expression

A for-loop-expression traverses a set or a list. Frequently a for loop is at the heart of a CodeXM checker, since analyzing code involves traversing the abstract syntax tree (AST) nodes of the target code, or a calling tree of functions in that code.

Syntax

The keyword for introduces the loop variable. The loop variable will be assigned, in turn, each of the values in the set being examined. The set itself is introduced by the keyword in, followed by the variable or expression that represents the set.

If the set to examine is of the type globalset, then the for-loop-expression must include the keyword globalset, immediately following the keyword in.

To filter the set to only those members that meet a specific criterion, you can use the where keyword followed by a condition-expression that describes the criterion. (If you omit the filter condition, the loop visits every member of the set: It is as if the condition-expression were simply the value true.)

Finally, a colon ( : ) introduces the result-expression.

The variable named by identifier is within scope, and can be referred to, in both the condition-expression and the result-expression.

for identifier in globalset set-producing-expression where condition-expression : result-expression

for-loop-expression ::= 'for' identifier 'in' ( 'globalset' )? set-producing-expression ( 'where' condition-expression )? ':' result-expression

The set-producing-expression must result in a set or a list.

The condition-expression describes a condition (criterion) and the result-expression is evaluated for each member of the set that meets that condition.

Details

The Result of a for-loop

The result of a for loop is a set that contains each result-expression that was found by the loop and whose value was not null.

For example, the following sample code shows a simple for-loop-expression (we presume that the function odd() has been coded previously, perhaps in a library):

CXM code follows
for x in [1, 2, 3, 4, 5, 6] where odd(x) : x * 2

The preceding code produces a set that contains twice the value of each odd number in the set-producing-expression. The following snippet shows this result:

CXM code follows
[2, 6, 10]

The Result of a Checker

If your for loop is the main loop of a checker, then its result-expression must have a specific form; namely, it must be an expression that contains a record named events. In its turn, events must define two properties: one named description and the other named location.

Both these properties contribute to the issue report: the description specifies a string to display, and the location specifies the code location where the issue was found.

For example, the following code shows the result-expression for a checker named UNINIT_VAR:

CXM code follows
{ events = [ { description = "Variable " + decl.variable + " is declared but not initialized."; location = code.location; } ] };

A fine point: In this example, code is the loop variable. When you traverse the globalset, the loop variable has a .location property that contains the source-line number where the loop criterion was met.

Visualizing a for-loop

The following pseudocode might help you visualize how a for loop works, especially if you are mainly used to procedural languages:

empty resultset foreach ( x in set-expression ) { /* (We assume that condition-expression and result-expression are both expressed in terms of x.) */ if ( condition-expression is true ) { result = result-expression if ( result is not null ) append result to resultset } } return resultset

That is to say, the result of a for loop begins as the empty set. When a member of the set-expression matches the specified condition AND that value of that result is not null, then the member is added to the resultset. When the loop completes, the resultset contains all the non-null matches from the traversal of set-expression.

The happens-before-expression

The happens-before-expression takes two operands, and performs a Boolean test on the sequence in which those operations are invoked in the target code.

Syntax

! path-pattern-expression >=> ! path-pattern-expression

happens-before-expression ::= [!] path-pattern-expression ( '>=>' [!] path-pattern-expression )+

Each path-pattern-expression is a CodeXM pattern to match a statement or other executable code in the target language.

Be aware: This expression is valid only when it is used as an argument to the sequence() function. It is not allowed in other locations in the CodeXM source.

Details

The two patterns do not have to be consecutive: There can be other executable code between them.

You can think of the sequence() function as a special kind of pattern that understands paths. For example, sequence(pattern_1 >=> pattern_2) returns true if pattern_1 executes in the target code before pattern_2 does, false otherwise.

The ‘sequence()’ Function

The sequence() function reports on whether a particular execution sequence occurs (or does not occur) in the code being analyzed.

The argument to sequence() is a happens-before-expression. The function returns true if the happens-before-expression expression can be matched, false if it cannot.

Typically, sequence() is called from within a matches-expression.

For more information about using the sequence() function and the >=> operator, see “Path-Sensitive Analysis” in Learning to Write CodeXM Checkers.

The if-exists-expression

The if-exists-expression finds the first member of a set that satisfies some criterion. It returns a result based on that member.

(The if-exists-expression differs from the exists-expression in the kind of information it provides. An exists expression reports only true or false about the existence of a member. An if exists expression actually returns the value of the member being searched for, or some computation based upon that value.)

Syntax

The keywords if exists introduce an identifier that is used to traverse the set.

The keyword in introduces the set itself. (You can also specify a list to search.)

The keyword where can introduce an optional condition-expression.

Finally, a colon ( : ) introduces a yields-expression. If the member being sought is present in the set or list, and if it satisfies the condition (if one is specified), then the yields-expression is evaluated and becomes the result of the if exists search.

if exists identifier in set-producing-expression where conditional-expression : yields-expression

if-exists-expression ::= 'if' 'exists' identifier 'in' set-producing-expression ( 'where' conditional-expression )? ':' yields-expression

Details

The result of an if-exists-expression is null if the set-producing-expression is empty, or if the set-producing-expression does not contain any member that meets the criterion expressed by the condition-expression.

While CodeXM is not procedural, this expression comes close. You can visualize its operation as executing the following pseudocode:

if ( x in set-producing-expression satisfies condition-expression ) { return yield-expression // The yield-expression is expressed in terms of x. }

The if-expression

The if expression is a more versatile—and much more readable—construct than the conditional-expression. An if keyword introduces a conditional expression, followed by a then clause and an optional else clause. To handle additional conditions, before the else clause you can include any number of elsif clauses.

Syntax

if condition then if-true-expression elsif condition then if-true-expression else if-false-expression endif

if-expression ::= 'if' condition 'then' if-true-expression ( 'elsif' condition 'then' if-true-expression )* 'else' if-false-expression 'endif'

A condition is simply an expression that evaluates to either true or false.

Details

Once the conditional-expression evaluates an if-true-expression, it does not evaluate any subsequent expressions in the if.

If a condition defines a variable (for example, via the matches-expression as keyword), that variable remains in scope in the corresponding if-true-expression— but it is not in scope or available in the if-false-expression.

Tip: To choose among multiple conditions, the switch-expression can be an alternative to coding an if statement that specifies multiple elsif clauses.

The lambda-expression

A lambda-expression is an unnamed function literal, frequently called an anonymous function.

In general, the criteria for using a pattern-expression also apply to a lambda-expression: Use these constructs in situations where the expression is only used once, and is simple enough to convey its intent without reducing the readability of the expression in which it appears.

Syntax

Syntactically, a lambda is similar to a function-definition. But no name appears after the function keyword, and the return type is inferred from context rather than explicitly declared.

function < type-parameter-identifier , type-parameter-identifier > ( parameter-identifier : type , parameter-identifier : type ) -> expression

lambda-expression ::= 'function' ( '<' type-parameter-identifier ( ',' type-parameter-identifier )* '>' )? '(' parameter-identifier ':' type ( ',' parameter-identifier ':' type )* ')' '->' expression

Each type-parameter-identifier is a generic placeholder for an actual type. This identifier can be used within the parameter list as if it were a known type, the return type, or any portion of the expression the function evaluates.

Each parameter-identifier is an identifier for an explicitly typed parameter. Each parameter name must be unique to the function.

Each parameter’s type must be one of the following:

The let-binding-expression

With a let-binding-expression, you can define a variable that becomes available for use in subsequent expressions.

The let-binding-expression is similar to const-definition. Each of these elements defines a variable whose value is constant for the duration of its scope. The difference is that const-definition defines a variable with global scope, which is evaluated (at most) only once during the life of the checker. The let-binding-expression defines a variable with local scope, whose use is limited to the expression specified by the let-binding-expression, and which is evaluated each time the let-binding-expression is evaluated.

Syntax

The let keyword introduces identifier for the local variable, followed by an optional type, then an equals sign ( = ) followed by the expression to evaluate and assign to the variable. Finally, the in keyword introduces the expression during which the identifier will be usable.

let identifier : type = value-expression in scope-expression

let-binding-expression ::= 'let' identifier ( ':' type )? '=' value-expression 'in' scope-expression

The value-expression can be any expression. Its result is assigned to the identifier.

The scope-expression can be any expression, and defines the scope within which identifier is available (“in scope”).

Details

Because CodeXM is a functional programming language (not a procedural one), the value of a “variable” does not change while the variable is in scope. You cannot assign a new and different value to an existing variable.

The let-binding-expression defines a variable, computes its value, then makes that variable available to a subsequent expression during which the variable is accessed. The following code scheme shows this construction:

CXM code follows
let x = // Some computation in // An expression that refers to x

Not really exceptions: With a for-loop-expression or a for-accumulate-expression, the loop variable does change over time. In both these cases, even though the variable changes while the loop is running, the value the loop returns cannot change any further, so loops are not an exception to the general rule.

The list-expression

A list-expression is a list of values considered to be an ordered group.

Syntax

The list is enclosed by square brackets ( [ ] ). The items in the list are separated by commas ( , ).

[ expression , expression ]

list-expression ::= '[' expression ( ',' expression )* ']'

Details

Each expression in the list is evaluated when the list is created.

The order of items in the list is the order in which they are specified in the source code.

Elements do not have to be unique: More than one element can have the same value.

To access elements in a list, use either a for-loop-expression or an element-access-expression. You can also use the set-filter-expression to retrieve members of a list.

The literal-expression

CodeXM provides several literal types.

(The int-literal and string-literal expressions correspond to the int and string builtin-types.)

Syntax

Literals can take a variety of forms. A literal can be a string-literal-expression, an enum-literal-expression, or an int-literal-expression: Each of these has a syntactic structure of its own. The remaining literals are simply keywords, as shown in the syntax descriptors here, and described in the “Details” section that follows:

string-literal-expression int-literal-expression enum-literal-expression null NonNull true false

literal-expression ::= enum-literal-expression | int-literal-expression | string-literal-expression | 'null' | 'NonNull' | 'true' | 'false'

Details

null, NonNull
The keyword null represents an optional value that has not been defined, as shown in the following snippet of code:
CXM code follows
somethingNullable != null
The keyword NonNull matches any nullable value that has been defined, as the following code sample shows:
CXM code follows
somethingNullable matches NonNull as myNotNullThing
For more information, see Handling Null Values in Learning to Write CodeXM Checkers.

true, false
These keywords represent the two possible Boolean values.

The enum-literal-expression

An enum literal is similar to a string literal, but it is enclosed by backticks ( ` ) rather than double quotes.

Syntax

enum-literal-expression ::= '`' ( [0-9_a-zA-Z] | [^`\] | '\`' | '\\' )* '`' // An enum literal can contain white space.

Details

As with string literals, any embedded backticks need to be escaped by a preceding backslash ( \` ) in order to not be parsed as an end-of-enum marker.

The following literal uses correct syntax to include backticks:

`Including the \`backtick\` mark`

The int-literal-expression

An integer literal is a positive 64-bit value. You can express integers in decimal, hexadecimal, or binary notation.

Syntax

int-literal-expression ::= [0-9]+ | '0x' [0-9a-fA-F]+ | '0b' [0-1]+ // white space is not allowed

Details

A positive integer value can range from 0 to 263–1.

You might recognize that this range implies that integers use a two’s complement representation. In other words, there is no “sign bit” in the CodeXM representation of integers. CodeXM does allow you to work with negative values, but the minus sign ( - ) is always treated as a unary operator, and not as part of the literal itself.

The string-literal-expression

A string is a sequence of characters enclosed by double quotation marks ( " ).

Syntax

string-literal-expression ::= '"' ( [0-9_a-zA-Z] | [^"\] | '\"' | '\\' | '\n' )* '"' // The string can contain white space.

Details

A string can contain any allowable CodeXM character, but certain situations must be handled specially, as follows:

The following string uses correct syntax to include double quotation marks:

"This is a string that contains \"quoted\" text"

For example, if the source contains the following lines:

CXM code follows
let str = "This is " "a" " multi-line string"

... then the variable str is assigned the string value "This is a multi-line string".

Careful: You cannot concatenate string variables in this way.

"The first line\nAnd the second line"

The logical-expression

A logical-expression expresses logical (as opposed to bitwise) AND or OR. The result of a logical operation is either true or false.

Syntax

Two expressions are conjoined using either the logical AND ( && ) or the logical OR ( || ) operator.

expression || && expression

logical-expression ::= expression ( '||' | '&&' ) expression

Details

Both of these operators are short-circuiting: In other words, if the overall truth of the operation can be known by evaluating the left-hand expression (that is, if it is true for || or false for &&) then the right-hand expression is not evaluated.

For logical AND ( && ) expressions, if the left-hand expression defines a variable via the matches-expression as keyword, that variable remains in scope for the right-hand expression as well.

The matches-expression

Much of the power of CodeXM relies on pattern matching: comparing code (or other things) to a pattern to find which match that pattern.

The matches-expression is a predicate in the logical sense; that is, a statement that is true or false, depending on whether the condition it describes has been satisfied or not. In the context of CodeXM, the matches-expression compares an expression to a pattern. If the two do match, the matches-expression is successful. It can optionally define an appropriately typed variable that describes the results of that match.

Syntax

The expression to match is followed by the keyword matches, and then by an expression to represent the pattern.

expression matches pattern-producing-expression as identifier

matches-expression ::= expression 'matches' pattern-producing-expression ( 'as' identifier )?

The pattern-producing-expression is simply an expression that produces a pattern. Typically this will be an identifier that names a pattern you have already created.

Details

You have the option of using the keyword as to define a variable whose value is defined by the initial match and whose scope is the remainder of the matches-expression.

Most often you would define such a variable to further refine the match, as in the following commented code snippet:

CXM code follows
// defined // | and // | used // | | node matches variableReference as v && v.name == something

Be careful: You cannot use the logical OR ( || ) in an expression of this sort, because OR implies that the expression to its right will be evaluated even when the match has failed. This is not logical, and indeed if the match fails, the value of the new variable v is undefined.

The multiplication-expression

A multiplication-expression multiplies ( * ) or divides ( / ) two integer values.

Syntax

expression * / expression

multiplication-expression ::= expression ( '*' | '/' ) expression

Details

Each operand must evaluate to an integer value.

Either operand can be a complex subexpression, as well as an integer variable or an integer literal.

If the divisor does not evenly divide the dividend, the result is a truncated value: The remainder is discarded.

Dividing by zero causes a run-time error.

As in algebraic notation and nearly all programming languages, in an expression multiplication has precedence over division, and a unary operation has precedence over multiplication.

The pattern-expression

A pattern-expression allows you to define a pattern inline—that is, within another expression. For patterns, it is the analog to a lambda (nameless) expression for functions.

In general, the criteria for using a lambda function also apply to a pattern-expression: Use these constructs in situations where the expression is only used once, and is simple enough to convey its intent without reducing the readability of the expression in which it appears.

Syntax

A pattern definition is introduced by the pattern keyword, followed by the name (an identifier) by which the pattern will be known. This is followed by a match-list enclosed by curly braces ( { and } ).

pattern { match-list }

pattern-expression ::= 'pattern' '{' match-list '}'

The match-list

A match-list is a sequence of pattern expressions. A checker evaluates the patterns in order: When a match is found, one or more expressions are evaluated, and the remaining pattern expressions are skipped.

Syntax

Patterns in the list are separated by a vertical bar ( | ). A bar before the first pattern in the list is optional but recommended.

| expression as var0 where condition -> yields-expression | expression as var1 where condition -> yields-expression

match-list ::= '|'? expression ( 'as' var0 )? ( 'where' condition )? ( '->' yields-expression )? ( '|' expression ( 'as' var1 )? ( 'where' condition )? ( '->' yields-expression )? )*
varX
An identifier for a variable you can use in subsequent expressions within the match-list.

condition
An expression that describes an additional condition that must be matched.

yields-expression
An expression that is evaluated if the pattern matches the condition.

Details

The patterns are tested in the order that they appear. If a match is found, then the yields-expression is evaluated. This becomes the value returned by the match-list, and no further patterns are tested.

If no yields-expression is specified, then the upper-level expression becomes the value returned by the match-list.

If an as clause specifies a variable, that variable is available (that is, “in scope”) for use in expressions until the match-list has finished evaluating.

The property-access-expression

Allows you to access individual proprties within a record.

Syntax

An expression that refers to a collection of properties is followed by a dot ( . ), and then by the name of the property to access.

property-producing-expression . identifier

property-access-expression ::= property-producing-expression '.' identifier

Details

The property-producing-expression can simply be the name of an existing record. It can itself be a property-access-expression: This lets you write code that accesses the properties of a property.

The property-producing-expression can also be a function, a pattern, an astnode, or any other CodeXM type that has properties.

The record-expression

A record is a collection of property-value pairs.

Remember: In CodeXM, a checker takes the form of a record-expression that contains records nested in a particular way, with specific property-name values, some of them required.

Syntax

A record-expression consists of a list of one or more properties, separated by semicolons ( ; ). Each property consists of a name, an equals sign ( = ), and an expression that represents the value of that property. The entire record-expression is enclosed by curly braces ( { and } ).

{ property-name = expression ; property-name = expression ; }

record-expression ::= '{' property-name '=' expression ( ';' property-name '=' expression )* ';'? // The final semicolon is optional, '}'

Details

Each property-name is an identifier.

Each record has its own particular record-type.

The record-update-expression

A record-update-expression permits the contents of the indicated record expression to be updated with new values.

This expression can update the value of existing properties, but it cannot add new properties to a record structure.

Syntax

The syntax of a record update is similar to that of a record-expression. Property specifications are also enclosed by matching curly braces ( { and } ), but the update expression begins by specifying an existing record, followed by the keyword with.

The final semicolon ( ; ) is optional.

{ record-producing-expression with property-name = expression ; property-name = expression ; }

record-update-expression ::= '{' record-producing-expression 'with' property-name '=' expression ( ';' property-name '=' expression )* ';'? '}'

The record-producing-expression is an expression that refers to a record that has already been created.

Each property-name in this expression must correspond to a property that was declared for the record-producing-expression.

Details

The result of this expression is a record that has the same type as the original record-producing-expression, but with the specified properties assigned new values.

The set-expression

A set-expression is created by specifying a list of values, but a set is not considered to be ordered.

Syntax

[ expression , expression ]

set-expression ::= '[' expression ( ',' expression )* ']'

Details

Each expression in the set is evaluated when the set is created.

The items in the set are not considered to have an order.

Elements do not have to be unique: More than one element can have the same value. This is comparable to a CodeXM list-expression, and differs from the usual mathematical meaning of a “set”.

To access an element in a set, you can use a for-loop-expression. You cannot use an element-access-expression, because the set elements are unordered. On the other hand, you can use a set-filter-expression to retrieve members of a set that meet a particular criterion.

The set-filter-expression

The set-filter-expression filters members of a set or a list. It returns the members that meet the filtering criterion.

If you pass it a set, the set-filter-expression returns a set. If you pass it a list, it returns a list.

Before Coverity Analysis 2019.09: In previous releases, set-filter-expression returned a set regardless of whether it was passed a set or a list. This was a source of confusion. Legacy code need not change: The update is backward compatible because it has always been possible to treat a list as a set.

Syntax

set-producing-expression % pattern-producing-expression

set-filter-expression ::= set-producing-expression '%' pattern-producing-expression

The set-producing-expression specifies a set or a list that has already been declared.

The pattern-producing-expression is simply an expression that produces a pattern. Typically this will be an identifier that names a pattern you have already created.

Details

Set filtering is essentially a syntactic shorthand for the longer notation shown here:

CXM code follows
for x in myThings where x matches something as y : // ...

Be aware: If you use the same variable name before and after the matches clause—that is, x matches something as x—then the original x and the x that is the result are two different variables that happen to have the same name. This is known as shadowing: The first value is shadowed by the second one, and the original x value is no longer accessible.

Mnemonics:

The switch-expression

A switch-expression chooses among two or more alternatives, based on the outcome of a controlling expression. A switch is a convenient way to represent a chain of conditions, when each condition is a pattern of the same expression.

Syntax

The switch-expression is introduced by the switch keyword, followed by the controlling expression enclosed by parentheses. This is followed by a match-list enclosed by curly braces ( { and } ).

The match-list must be followed by a required default alternative. This alternative is also introduced by a vertical bar ( | ), followed by the keyword default. Then an arrow operator ( -> ) introduces an expression that is evaluated only if no prior matching expression has been found.

switch ( expression ) { match-list | default -> expression }

switch-expression ::= 'switch' '(' expression ')' '{' match-list '|' 'default' '->' expression '}'

Details

The syntax and behavior of a switch-expression are similar to a pattern, but they differ in certain ways:

The unary-expression

The unary-expression takes a single operand. A unary operation negates its operand—either numerically or logically.

Syntax

A numerical expression is negated by prefixing a minus sign ( - ). A logical expression is negated by prefixing the logical NOT operator, represented by an exclamation point ( ! ).

! - expression

unary-expression ::= ( '!' | '-' ) expression

The variable-expression

In CodeXM, a variable is an entity that has been given an identifier to name it.

Examples include a global constant defined by const-definition, a local constants defined by a parent let-binding-expression, and a named value defined by a matches-expression that uses the keyword as.

Remember: The CodeXM documentation often uses the term “variable” by analogy with other languages, but because CodeXM is a functional language, even named values are constant. They are available, with the same value, a long as they remain in scope.

(You can, on the other hand, create a record that is based on an existing record but with some of the property values changed: See record-update-expression.)

Syntax

Once it has been created, you simply refer to a variable by its identifier.

identifier

variable-expression ::= identifier

The CodeXM Type System

All objects, including expressions, in a CodeXM program have their own particular type. This chapter describes the possible types.

The type

The type is itself the root of the type system.

Syntax

Declaring an object can be as simple as assigning a value to an identifier; for example, assigning the identifier a literal numeric value or a string literal. Other type declarations require more elaborate syntax; for example, defining an enum or a function-type.

function-type enum-type collection-type builtin-type ( type ) globalset-type named-type node-type nullable-type pattern-type record-type

type ::= ( '(' type ')' ) | builtin-type | collection-type | enum-type | function-type | globalset-type | named-type | node-type | nullable-type | pattern-type | record-type

The builtin-type

CodeXM has several intrinsic built-in types.

(The int and string built-in types correspond to the int-literal and string-literal literal-expressions.)

Syntax

Built-in types are identified by their reserved names.

int bool eventstring string

builtin-type ::= 'int' | 'bool' | 'string' | 'eventstring'

Details

The collection-type

A collection-type object can be either a set or a list. All members of a set or a list share the same base type.

Syntax

set list < non-nullable-type >

collection-type ::= ( 'set' | 'list' ) '<' non-nullable-type '>'

The non-nullable-type is the base type of the collection. As the EBNF name implies, it cannot be nullable: See “The nullable-type”.

Details

The members of a set or a set do not need to be unique (for sets, this is different from the usual mathematical stipulation).

Both sets and lists have the following properties:

empty
Equals true if the set or list is empty, false if it contains members.

contains(item)
Returns true or false, depending on whether the set or list contains the specified item.

For a set, item can be either a literal value, or a pattern. If item is a literal, then contains() simply uses the equals operator ( == ) to make the comparison; otherwise, it uses the matches operator. See “The matches-expression”.

The main difference between a set and a list is that the members of a list have a particular order, while the members of a set do not. This difference does have implications for the way these two types of collections behave, as the following points explain:

The type_of_empty_list

The type_of_empty_list is the type of the empty-list expression ( [] ).

Syntax

[]

type_of_empty_list ::= '[]'

Details

You can initialize a set or a list to an empty state by using the empty list ( [] ). This can be useful when you work with the for-accumulate-expression.

The empty list belongs to all list types.

The enum-type

The enum-type is similar to a set, except that each member is an enum-literal-expression; that is, a character sequence enclosed by backticks ( ` ) rather than quotation marks.

Note: The values in an enum are not considered to be ordered.

Syntax

enum { enum-literal-expression , enum-literal-expression }

enum-type ::= 'enum' '{' ( enum-literal-expression (',' enum-literal-expression)* )? '}'

Details

Unlike some other languages, where the members of an enumeration are merely identifiers, in CodeXM the value of an enum member can be nearly any string-like value. For example, an enum describing operators might define its contents by using familiar symbols such as `+`, `-`, `*`, `/`, and so on.

The value of an enum is not required to be unique among all the enum types in your CodeXM source file. In fact, an enum literal that is not unique is considered to belong to any enum type that contains the same literal. For example, the literal `X` is assumed to be of the type enum{ `X` }, which is shared among all enums in the source file that include this same member.

For example, suppose your checker declares the following two enumerations:

CXM code follows
typedef E1 = enum {`A`, `Y`}; typedef E2 = enum {`X`, `Y` };

... then a variable of either type can be compared with a variable of the other.

The function-type

When you define a function the function has an implicit type, which has three components:

You can also define a particular function-type. You can specify a function type as the type of another function’s argument: doing so enables you to pass functions as arguments.

Syntax

A function type declaration is similar to a function-definition, but it doesn’t specify a function name, and instead of an expression to evaluate it simply declares the name of the new type.

function < type-parameter-identifier , type-parameter-identifier > ( parameter-identifier : type , parameter-identifier : type ) -> type

function-type ::= 'function' ( '<' type-parameter-identifier ( ',' type-parameter-identifier )* '>' )? '(' parameter-identifier ':' type ( ',' parameter-identifier ':' type )* ')' '->' type

Each type-parameter-identifier is a generic placeholder for an actual type. This identifier can be used within the parameter list as if it were a known type, the return type, or any portion of the expression the function evaluates.

Each parameter-identifier is an identifier for an explicitly typed parameter. Each parameter name must be unique to the function.

Each parameter’s type must be one of the following:

Example

A function that accepts a single integer parameter and returns a Boolean value can be written as follows:

CXM code follows
function(int) -> bool

Once declared, the function type can be used in other functions; for example, to specify the type of parameter that the function accepts, as in the following code sample:

CXM code follows
function( function(int) -> bool ) -> bool

In the following example, a filter function from a CodeXM library takes a function and a set of values, then applies that function to the values in the set and returns a filtered list:

CXM code follows
function filter<T>( predicate: function(T) -> bool, s: set<T> ) : list<T>

The globalset-type

For every target-language source file, a globalset object contains all the nodes that correspond to the source file’s code. A globalset can be used with exists and for loop expressions. Many checkers search the entire globalset.

Most language libraries include named globalset sets that contain specific kinds of objects; for example, allFunctionCode. You can use these named globalset sets to make the scope of a search more focused, and possibly quicker to run.

Syntax

globalset < type , [ globalset-domain , globalset-domain ] >

globalset-type ::= 'globalset' '<' type ',' '[' globalset-domain ( ',' globalset-domain )* ']' '>'

Details

A few restrictions apply to globalset sets, as follows:

The type declaration: In such a declaration, the type is simply the identifier of the new set, and each globalset-domain is one of the following values:

The domains are used for internal bookkeeping and optimization. There is rarely need for a user-written checker to declare a new globalset-type.

Example

The opening of the following pattern tells CodeXM to search only code within functions:

CXM code follows
for code in globalset allFunctionCode where code matches // ... further criteria

The named-type

The named-type is any previously defined type. You can use a type-definition to define a type that is particular to your own checker code.

When you specify a complex type, you can use parentheses to impose an evaluation order that might otherwise be ambiguous or contain undesired precedence.

The node-type

The target code that CodeXM analyzes is represented by a variety of Abstract Syntax Tree (AST) node types. An astnode is the root of this class hierarchy.

Tip: The various language libraries provide meaningful subtypes that are particular to the target language. As such, you are seldom likely to deal with these more abstract node types. Instead you might deal with patterns with names such as—for example—ifStatement or binaryExpression. The abstract node types can sometimes be useful; for example, as parameter types to certain CodeXM functions, or when you must specify a return type and no more specific type is exposed by the language library.

Syntax

Each of the node types is represented by its corresponding keyword.

expression statement astnode initializer ctorinit declaration

node-type ::= 'astnode' | 'ctorinit' | 'declaration' | 'expression' | 'initializer' | 'statement'

Details

astnode
An abstract, general-purpose base type.

The remaining node types described in this section have more specialized purposes than astsnode does.

Remember: The astnode type is available in every language library, and its properties are described in each language-library reference.

ctorinit
A constructor initializer. Constructors and destructors are features of some object-oriented programming languages, notably C#.

declaration
Code used to define an entity such as a variable or a function. In some languages, a variable declaration is constructed like an assignment statement, but in others it is not: so a declaration is not a statement per se.

In a language whose variable declarations do look and behave like statements, the initializer is a subtype of expression.

expression
A portion of target code to evaluate. For example (to continue using an if statement as an illustration) the condition of an if statement is an expression.

Quite frequently an expression is composed of subexpressions. For example, the expression x + 2*y is itself composed of two subexpressions: the identifier x and the subexpression 2*y, which in turn can be broken down into constituent subexpressions: the integer literal 2 and the identifier y.

Expressions of arbitrary complexity are possible, limited only by the capabilities of the language CodeXM is examining.

initializer
An expression used to set the initial value of a variable.

statement
A complete statement in the language being analyzed. Many languages permit compound statements; in particular, most flow-of-control statements can contain subordinate statements. For example, an if statement is—in many languages—a statement that contains one or two subordinate statements: the statement that is executed if the if-condition is true, and the statement, if present, that is executed if the condition is false.

The nullable-type

When a property is not specified, that property is null. If there is a possibility of this occurring, the property’s type is said to be nullable.

CodeXM code indicates a nullable type by appending a question mark ( ? ) to the name of the underlying non-nullable type.

For example, a function in the target language might return an integer value or it might not. The pattern for that function includes a property named .returnValue. The type for .returnValue would be int?.

Syntax

type ?

nullable-type ::= type'?'

Details

If a value is null, you cannot use it in an operation and a pattern cannot match it. Your checker code needs to detect occurrences of null and avoid referencing them. There are two ways to do so: by using the keyword NonNull, or by using the null-coalescing operator ( ?? ).

The keyword NonNull matches only non-null values. You can use it with matches to avoid illegally referencing null, as shown in the following sample code:

CXM code follows
myNullableValue matches NonNull as myNonnullableValue

The null-coalescing operator works a bit differently: It replaces null with a default non-nullable value that you can specify. The following is an example of code that uses ??:

CXM code follows
myNullableValue ?? someDefaultValueIfNull

An expression, such as a function, that accepts a nullable type will also accept an object that is null.

For further information, see “Handling Null Values” in Learning to Write CodeXM Checkers.

The type_of_null

This is the type of the object null; in other words, it is the type of a property that has not been specified.

Syntax

null

type_of_null ::= 'null'

Details

An expression, such as a function, that accepts a nullable type will also accept an object that is null.

The pattern-type

Patterns differ in the type of code element the pattern can be applied to (that is, what the pattern matches) and in what the pattern returns when it successfully finds a match.

Syntax

The keyword pattern introduces the pattern-type. This is followed by the name of the type that the pattern can be applied to, then followed by an arrow ( -> ) introducing the name of the type the pattern returns on finding a match.

pattern applicable-type -> type

pattern-type ::= 'pattern' applicable-type '->' type

Details

The applicable-type must not be nullable, though it can be the nullable type’s non-nullable counterpart. You can convert a nullable type to non-nullable; for example, by using NonNull. See “The nullable-type”.

The record-type

A record assembles various named properties, which can be of various types, into a single structure.

Syntax

record [ underlying-type ] { property-name ? : type ; property-name ? : type ; }

record-type ::= 'record' ( '[' underlying-type ']' )? '{' ( property-name '?'? ':' type ( ';' property-name '?'? ':' type )* ';'? // final semicolon is optional )? '}'

Each property-name is an identifier.

Details

When a question mark ( ? ) follows a property-name, the usage is similar to the character’s use with the name of a type, but there is a difference. When a type is nullable, the question mark indicates that the property-name must be present, but that its value might be null. In a record definition, on the other hand, when the question mark follows property-name, it indicates that the property itself can be omitted.

Appendix: CodeXM Keywords

accumulate endif include record
as enum initializer set
astnode eventstring int statement
bool exists let string
checker expression list switch
ctorinit false matches then
declaration for models true
declare function NonNull typedef
default globalset null where
else if pattern with
elsif in property