This post demonstrates the ambiguity that arises when parsing the C23
AlignmentSpecifier alignas,
and the context-sensitive nature of the parsing required to correctly
parse the specifier.
The ambiguity:
The specifier is represented by the non-terminal AlignmentSpecifier
in the C2x grammar:
AlignmentSpecifier: alignas ( TypeName )
AlignmentSpecifier: alignas ( ConstantExpression )
The ambiguity arises from the fact that both TypeName and
ConstantExpression can derive an Identifier.
For e.g., in the declaration
alignas(a) int b;
the variable b is declared to be of type int, with its address aligned at
the result of the expression alignas(a).
The identifier a can be a TypedefName (TypeName derives TypedefName):
typedef struct {
long double ld;
} a;
Or, it can be an identifier that represents a ConstantExpression
(ConstantExpression derives PrimaryExpression derives Identifier):
constexpr int a = 32;
Or, it can be an identifier that represents a ConstantExpression
(ConstantExpression derives PrimaryExpression derives
EnumerationConstant derives Identifier):
enum {
a = 32,
};
The ambiguity in an LR(1) parser:
The collection of canonical-LR(1)-item-sets shows the reduce/reduce conflicts as described below.
print_item_set: item-set[ 0]:k----------------------
[TranslationObject -> . TranslationUnit] jump=1
print_item_set: item-set[ 0]:-----------------------
. . .
[AlignmentSpecifier -> . alignas ( TypeName )] jump=4025
[AlignmentSpecifier -> . alignas ( ConstantExpression )] jump=4025
. . .
print_item_set: item-set[ 0]:done-------------------
print_item_set: item-set[4025]:k----------------------
[AlignmentSpecifier -> alignas . ( TypeName )] jump=4026
[AlignmentSpecifier -> alignas . ( ConstantExpression )] jump=4026
print_item_set: item-set[4025]:done-------------------
print_item_set: item-set[4026]:k----------------------
[AlignmentSpecifier -> alignas ( . TypeName )] jump=4027
[AlignmentSpecifier -> alignas ( . ConstantExpression )] jump=4029
print_item_set: item-set[4026]:-----------------------
. . .
[TypedefName -> . Identifier] jump=983
[PrimaryExpression -> . Identifier] jump=983
[EnumerationConstant -> . Identifier] jump=983
. . .
print_item_set: item-set[4026]:done-------------------
print_item_set: item-set[ 983]:k----------------------
[TypedefName -> Identifier .] las: ... ) ...
[PrimaryExpression -> Identifier .] las: ... ) ...
[EnumerationConstant -> Identifier .] las: ... ) ...
print_item_set: item-set[ 983]:done-------------------
The parser experiences reduce/reduce conflicts, as it lands into the
item-set-#983 immediately after shifting the identifier a. The next
available token in the input, and also the current look-head for the parser, is
the punctuator ).
The ambiguity in an Earley parser:
In an Earley parser, the ambiguity manifests as two back-edges pointing to the
completion of the non-terminal AlignmentSpecifier, corresponding to the two
different ways the reduction to AlignmentSpecifier can be performed.
cc_token_print: CC_TOKEN_RIGHT_PAREN ')'
item-set[ 4]:r[ 5]----------------------
[ 0]:[ALIGNMENT_SPECIFIER -> alignas ( TYPE_NAME ) .] (0)
[ 1]:[ALIGNMENT_SPECIFIER -> alignas ( CONSTANT_EXPRESSION ) .] (0)
[ 2]:[TYPE_SPECIFIER_QUALIFIER -> ALIGNMENT_SPECIFIER(4,0)(4,1) .] (0)
. . .
The context-sensitivity:
To determine exactly what the identifier a represents, the parser must
consult the symbol-table as soon as it parses the identifier a.
This also implies that the parser, as it parses the constructs, must keep the
symbol-table up-to-date, so that relevant information is available to resolve
any ambiguities.