Project 2 is due 2/2/16 on or before 11:59:59pm MST.
You will be given a lexer that reads tokens from standard input. Your
goal is to write, in C or C++, a program that reads all tokens from
standard input by calling the lexer function
getToken() and storing
certain tokens in a linked list. After all tokens are read, your
program will print out the content of the linked list in a specific
The next section describes the lexer API. You also need to read the
provided code in
lexer.c and understand how the lexer works. You
will only use the
getToken() function in this project.
There are two functions that the lexer defines. These two functions
compose the application programming interface (API) of our lexer.
These functions are declared in
lexer.h (and implemented in
lexer.c). You will find the files
lexer.c on the
submission site for project 2.
getToken()reads the next token from standard input and returns its type as a
token_typeenum. If the token is of type
current_tokenas a null-terminated character array and the length of the string is stored in the global variable
token_length. There are two special
END_OF_FILE, which is returned when the lexer encounters the end of standard input and
ERROR, which is returned when the lexer encounters an unrecognized character in the input.
ungetToken()causes the next call to
getToken()to return the last token read by the previous call to
getToken(). Note that this means the next call to
getToken()will not read from standard input. It’s a logical error to call
getToken(). This function is useful for writing recursive descent parsers that you will see later on in this course.
There are four global variables declared in
lexer.h that are set
getToken() is called:
t_type: the token type is stored here. Note that this will be the same value that was returned by
current_token: the token value is stored in the array
current_token. If the token is of type
current_tokencontains the token string. For all other token types,
current_tokencontains the empty string.
token_length: the length of the string stored in
line: the current line number of the input when the token was read.
You should read the source code provided in
Here is a hint for using the lexer: you can use the token type labels
END_OF_FILE directly in the code. For example, if
you want to check if the token type is
NUM, you can write the
1 2 3 4
Your program should use the provided lexer and read all tokens from
the input by repeatedly calling the
getToken() function. Certain
token strings and additional data should be stored in a linked list.
Specifically, if either of the following conditions are true:
- The token is of type
- The token is of type
IDAND the actual token is equal to one of the following values:
Then, the token string and other information needs to be stored in a node of a linked list. The information that needs to be stored about each of these tokens in the linked list is the following:
Token type (from
Token value (from
Line number of the input where token was read (from
After reading all tokens from the input and storing information about tokens that match the criteria, your program should go over the linked list and print the information in reverse order from when that token was encountered. Each of the tokens in the linked list must be printed to standard output on a separate line with the following format:
<token_type_string> is the textual representation of the
token type. In this case, the possible values are
You should write all your code in a separate C or C++ file and include
lexer.h to be able to access the lexer functions. Here is how to do
it in C:
1 2 3 4 5 6 7
And this is how to do it in C++:
1 2 3 4 5 6 7 8 9 10
To compile your code, you should use the GCC compiler from CentOS 6.7 that you have installed from last project. This is how to compile your code if you are writing it in C:
your_code.c should be replaced by the file name where you
store your code.
To compile your C++ program, use the following two commands instead:
NOTE: You should not modify
lexer.h. Write your
code in a separate file as described above.
Here is an example input with four lines:
1 2 3 4
Here is the expected output:
1 2 3 4 5 6 7
Notice that the tokens are listed from last to first (reverse order).
This section is relevant to this project as well as all later projects in this course.
Input / Output
Your program should read the input from standard input. That is
normally the keyboard input and it can be accessed by using many
standard C functions like
scanf(), etc. In C++, you
cin to read from standard input. In this project, you
won’t read the input directly, the call to
getToken() will read the
input by using
getchar(). Read the source code in
lexer.c to see
how this is done.
Your program should write its output to standard output. That is
normally done by using such functions as
puts() in C
cout in C++.
Testing Your Code
Our grading system is automated hence another program (running on the
submission site server) tests your code for correctness. We provide
exact input/output format for every project in addition to multiple
test cases. A test case is composed of two parts:
A specific input in a file with
The expected output in a file with
Your program passes a test case if, given the input in the
file, it produces exactly the same output as the contents of the
.txt.expected file. Otherwise your program fails that
Testing with a single test case
To feed the input file to your program without typing its content on the keyboard, we use input redirection and you should too! Here is how you would redirect the standard input of a program to a file:
Let’s assume that we have a compiled program named
executable file). Normally, you would run this program like this (in a
Which would wait for you to provide the input by typing it on the
keyboard (that is if the program expects any input). To redirect the
standard input to a file named
test01.txt, you should run the
program like this instead:
The “less than” character instructs the command interpreter to read
the file specified after the
< and feed it to the program through
its standard input.
Similarly, we can redirect the standard output to a file. This way instead of printing the output of the program on the terminal, the output will be stored in a file. To redirect the standard output, we use the “greater than” character with a file name like this:
Which will store the program’s output in
NOTE: if a file with the same name exists, it will be overwritten!
Of course you can mix input and output redirection as well:
Our automated grading system compares the output generated by your
program with the expected output by using the
is a system utility that is used to compare the contents of two files.
If the files are different, it produces a report highlighting the
differences otherwise it outputs nothing. We use
option which causes it to ignore differences in whitespaces. Here is
an example of how to compare the output generated by the program with
the expected file:
Running multiple test cases
You can also use the shell script that we have provided called
test1.sh, which automates this process for multiple test cases. The
test script assumes that your compiled program is called
test cases are stored in a subdirectory named
tests. You can run it
If you see the error message
-bash: ./test1.sh: Permission denied,
then run the following command to fix the problem:
The test script runs your program with I/O redirection for each test
case in the
tests folder and compares the output with the expected
diff and reports any differences for failed cases. At the
end it will print the number of tests passed and removes any temporary
It is strongly recommended that you test your program before submission with the provided test cases the way described here.
Your submission will be graded on the number of test cases passing when you submit your code. If your program does not compile on the server, you will not receive any credit, so if you a see a compiler error message on the submission site, fix the problem and submit again.
You must use a C-style linked list: a
struct with a self-referential
field. This specific data structure is used in future projects, so it
it best that you become familiar with it. You are not allowed to use
the STL linked list libraries.
Submit your code on the course submission site. You should only upload
your source code. You should not upload the compiled program or
test cases. You should not upload
lexer.c, these files
will be automatically added to your submission. You should not use
space in your file names.
Submit your project here before the deadline: https://cse340.fulton.asu.edu/cse340/
FAQ (Frequently Asked Questions)
- How do I get input to the program to stop?
When entering input to your code on stdin (from the command line), you need to simulate the EOF input by entering Control-D in linux environments: http://unix.stackexchange.com/a/16338
This is why the programming doc for this class describes how to use file redirection to test your code (which will automatically signal the EOF when it, well, gets to the end of the file http://adamdoupe.com/teaching/classes/cse340-principles-of-programming-languages-s16/cse340_s16_programming_projects.pdf
char*is changing its value!?
We will get into the exact semantics of pointers later in the course (trust me, very in depth, so that hopefully you will never make this mistake again).
The key is to remember that a string in C is not a string “object”, but an array of characters, which are ended by a NULL (0) character (‘\0’). So a char* is a pointer to an array of strings. When you assign a variable that is a pointer from one to the other like so:
char* test = "foo"; char* bar; bar = test;
Now, bar and test both point to the same string “foo”, so that if I were to do the following:
test = 'b'; printf("%s\n", bar);
Would print out “boo”, because the variables test and bar both point to the same character array.
Now, if instead we did the following with the strdup libc function: http://linux.die.net/man/3/strdup
bar = strdup(test); test = 'b'; printf("%s\n", bar);
Would print out “foo”, because strdup copied the array of characters “foo” and returned a pointer to the new array.
This project description was created by Prof. Rida Bazzi and updated by Mohsen Zohrevandi, and is used with permission.