Project 4 (100 points)
Project 4 is due 10/30/15 on or before 11:59:59pm MST.
Your goal is to finish an incomplete parser and write a type checker for a given language. The input to your project will be a program and the output will be either error messages if there is a type mismatch or a success message if there is no type mismatch. Your type checker will enforce semantic checks on the input program, and will be described in the following.
Grammar Description
program → decl body
decl → type_decl_section var_decl_section
type_decl_section → TYPE type_decl_list
type_decl_section → ε
type_decl_list → type_decl type_decl_list
type_decl_list → type_decl
type_decl → id_list COLON type_name SEMICOLON
type_name → REAL
type_name → INT
type_name → BOOLEAN
type_name → STRING
type_name → LONG
type_name → ID
var_decl_section → VAR var_decl_list
var_decl_section → ε
var_decl_list → var_decl var_decl_list
var_decl_list → var_decl
var_decl → id_list COLON type_name SEMICOLON
id_list → ID COMMA id_list
id_list → ID
body → LBRACE stmt_list RBRACE
stmt_list → stmt stmt_list
stmt_list → stmt
stmt → while_stmt
stmt → assign_stmt
stmt → do_stmt
stmt → switch_stmt
while_stmt → WHILE condition body
assign_stmt → ID EQUAL expr SEMICOLON
do_stmt → DO body WHILE condition SEMICOLON
switch_stmt → SWITCH ID LBRACE case_list RBRACE
case_list → case case_list
case_list → case
case → CASE NUM COLON body
expr → term PLUS expr
expr → term MINUS expr
expr → term
term → factor MULT term
term → factor DIV term
term → factor
factor → LPAREN expr RPAREN
factor → NUM
factor → REALNUM
factor → ID
condition → ID
condition → primary relop primary
primary → ID
primary → NUM
primary → REALNUM
relop → GREATER
relop → GTEQ
relop → LESS
relop → NOTEQUAL
relop → LTEQ
The tokens used in the grammar description are:
TYPE = TYPE
COLON = :
SEMICOLON = ;
REAL = REAL
INT = INT
BOOLEAN = BOOLEAN
STRING = STRING
LONG = LONG
VAR = VAR
COMMA = ,
LBRACE = {
RBRACE = }
WHILE = WHILE
EQUAL = =
DO = DO
SWITCH = SWITCH
CASE = CASE
PLUS = +
MINUS = -
MULT = *
DIV = /
LPAREN = (
RPAREN = )
GREATER = >
GTEQ = >=
LESS = <
LTEQ = <=
NOTEQUAL = <>
ID = letter(letter | digit)*
NUM = 0 | (digit digit*)
REALNUM = NUM \. digit*
Language Semantics
As can be seen from the grammar, in this language types are first declared, then variables are declared, then the body of the program follows.
Types
The language has five built-in types: INT, REAL, BOOLEAN, STRING, and LONG.
Programmers can declare types either explicitly or implicitly.
Explicit types are names that are not built-in types and that have their first appearance in the program as part of the
id_list
of atype_decl
.Implicit types are not built-in types and not explicit user-declared types. Implicit types have their first appearance as a
type_name
in avar_decl
.
Example
Consider the following program written in our language:
TYPE
a : INT;
b : a;
VAR
x : b;
y : c;
{
y = x;
}
There are three types declared by the programmer in this example, a
,
b
, and c
, where a
and b
are explicit types and c
is an
implicit type.
Variables
Programmers can declare variables either explicitly or implicitly.
Explicit variables are declared in an
id_list
of avar_decl
.Implicit variables are declared if it is not declared explicitly but it appears in the program body.
Example
Consider the following program written in our language:
TYPE
a : INT;
b : a;
VAR
x : b;
y : c;
{
y = x;
z = 10;
w = z * 5;
}
This program has four variables declared: x
, y
, z
, and w
, with
x
and y
explicitly declared and z
and w
implicitly declared.
Note that the implicitly declared variables z
and w
also have an
implicitly declared type.
Type System
Our language uses structural equivalence for checking type equivalence.
Implicit types (in variable declarations or on implicitly declared variables) will be inferred from the usage (in a simplified form of Hindley-Milner type inference).
Here are all the type rules/constraints that your type checker will enforce:
- Assignment cannot be made between variables of different types
- Operations (
PLUS
,MINUS
,MULT
, andDIV
) cannot be applied to operands of different types (but can be applied to operands of the same type, includingSTRING
andBOOLEAN
) - Relational operators (
relop
) cannot be applied to operands of different types (but can be applied to operands of the same type, includingSTRING
andBOOLEAN
) - The result of
p1
relop
p2
is of typeBOOLEAN
(assuming that bothp1
andp2
are the same type) condition
should be of typeBOOLEAN
- The type of an
expr
is the same as the type of its operands - The variable that follows the
SWITCH
keyword inswitch_stmt
should be of typeINT
NUM
constants are of typeINT
REALNUM
constants are of typeREAL
Incomplete Parser
The parser given on the submission site is
incomplete, as it is missing an implementation for while_stmt
,
condition
, do_stmt
, switch_stmt
, case_list
, and case
.
As described in the evaluation section, you must
implement parsing for while_stmt
, condition
, and do_stmt
, while
switch_stmt
, case_list
, and case
are extra credit (though note
that to receive full extra credit, you must implement all of the type
checks in addition to the parsing cases.
Semantic Error Output
Your program will check for the following semantic errors (with type checking being one of those semantic errors) and output the correct output when it encounters that error. Note that there will only be at most one error per test case.
Type name declared more than once
If a type name is declared more than once, either implicitly or explicitly, then the output of your program should be:
ERROR CODE 0 <type_name>
Where <type_name>
is replaced with the name that is declared more
than once.
Type re-declared as variable
If a type name is redeclared as a variable, either implicitly or explicitly, then the output of your program should be:
ERROR CODE 1 <type_name>
Where <type_name>
is replaced with the name of the type that is
re-declared as a variable.
Variable declared more than once
If a variable name is declared more than once, either implicitly or explicitly, then the output of your program should be:
ERROR CODE 2 <variable_name>
Where <variable_name>
is replaced with the name of the variable that
is declared more than once.
Type Mismatch
If there is a type mismatch in the program, then the output of your program should be:
ERROR CODE 3 <line_number>
Where <line_number>
is replaced with the line number that the type
mismatch occurs on. Note that you can assume that anywhere a type
error can occur will be on a single line.
Variable re-declared as type name
If a variable is re-declared as a type name, either implicitly or explicitly, then the output of your program should be:
ERROR CODE 4 <variable_name>
Where <variable_name>
is replaced with the name of the variable that
is re-declared as a type.
No semantic errors
If there are no semantic errors in the program, then your output should be the following:
All systems go!
Examples
Given the following:
TYPE
a,b,c,b : INT;
VAR
x : a;
{
x = 10;
}
The output will be the following:
ERROR CODE 0 b
Given the following:
TYPE
a : INT;
VAR
x : INT;
b, a : STRING;
{
x = 10;
}
The output should be the following:
ERROR CODE 1 a
Given the following:
VAR
x1 : INT;
x2, x3, x1 : a;
{
x1 = 0;
}
The output should be the following:
ERROR CODE 2 x1
Given the following:
VAR
x100 : INT;
y : STRING;
{
x100 = y;
}
The output should be the following:
ERROR CODE 3 5
Given the following:
VAR
x : INT;
{
x = 100;
y = 20.10;
y = x;
}
The output should be the following:
ERROR CODE 3 6
Given the following:
VAR
x, y : a1;
{
WHILE x <> 10
{
x = x + y;
y = y * 1.0;
}
}
The output should be the following:
ERROR CODE 3 7
Given the following:
VAR
x, y : STRING;
z : x;
{
y = x;
}
The output should be the following:
ERROR CODE 4 x
Given the following:
TYPE
a, b : INT;
c : a;
d : STRING;
VAR
x : e;
y : c;
test : d;
{
a1 = 100;
b1 = a1 + (10 - 50);
foo = b1 / 50;
SWITCH foo
{
CASE 1:
{
foo = 0;
}
CASE 2:
{
test = test * test;
}
}
}
The output should be the following:
All systems go!
Evaluation
Your submission will be graded on passing the automated test cases.
The test cases (there will be multiple test cases in each category, each with equal weight) will be broken down in the following way (out of 100 points):
- Type error 0 (type name declared more than once) : 10 points
- Type error 1 (type re-declared as variable) : 10 points
- Type error 2 (variable declared more than once) : 10 points
- Type error 3 (type mismatch) : 60 points
- Type error 4 (variable re-declared as a type name) : 10 points
- Implementing parsing for
case
,case_list
, andswitch_stmt
: 5 points EC
Also, there will be a 50% penalty for Improperly Implemented Parser, so make sure that you implement the parser correctly!
Note that test cases must match the output exactly, and no partial credit will be given.
Submission
Submit your project here before the deadline.
Credit
This project description was adapted from Prof. Rida Bazzi, and is used with permission.