Another project we had during my first Winter quarter at Rose-Hulman was to use a finite state machine to colorize / perform syntax highlighting on Java source code. It’s easy enough as there are really 3 special cases that we are looking for (strings, comments, and keywords) and we only need 9 states to recognize them all (at least in my implementation).
All together, we have the following states and transitions:
If State =
NORMAL
- If you read " then output buffer (as
keyword
ornormal
) and State =STRING
- If you read / then output buffer (as
keyword
ornormal
) and State =NORMAL_SAW_SLASH
- If you read ’ then output buffer (as
keyword
ornormal
) and State =NORMAL_CHAR
- Otherwise, buffer characters until we see a non-letter; on a non-letter:
- If the buffer contains a keyword, output as a
keyword
- Otherwise, output as
normal
- Otherwise, output as
- If you read " then output buffer (as
If State =
NORMAL_CHAR
- If you read \ then State =
NORMAL_CHAR_ESCAPE
- If you read ’ then output buffer as
normal
and State =NORMAL
(yes, character literals can be more than one character long, we aren’t validating the input)- Otherwise, add the character to the buffer
- If you read \ then State =
If State =
NORMAL_CHAR_ESCAPE
- Add the character to the buffer and State =
NORMAL_CHAR
- Add the character to the buffer and State =
If State =
STRING
- If you read \ then State =
STRING_SAW_BACKSLASH
- If you read " then output buffer as
string
and State =NORMAL
- Otherwise, add the character to the buffer
- If you read \ then State =
If State =
STRING_SAW_BACKSLASH
- Add the character to the buffer and State =
STRING
- Add the character to the buffer and State =
If State =
NORMAL_SAW_SLASH
- If you read / then State =
COMMENT_LINE
- If you read * then State =
COMMENT_BLOCK
- Otherwise, State =
NORMAL
- Otherwise, State =
- If you read / then State =
If State =
COMMENT_LINE
- If you read \n then output buffer as
comment
and State =NORMAL
- If you read \n then output buffer as
If State =
COMMENT_BLOCK
- If you read * then State =
COMMENT_BLOCK_SAW_STAR
- Otherwise, add the character to the buffer
- If you read * then State =
If State =
COMMENT_BLOCK_SAW_STAR
- If you read / then output buffer as
comment
and State =NORMAL
- If you read * then add the character to the buffer
- Otherwise, add the character to the buffer and State =
COMMENT_BLOCK
- Otherwise, add the character to the buffer and State =
- If you read / then output buffer as
Perhaps it’s not perfect and it can definitely do some strange things on non-valid Java code, but it seems to work well enough. Here’s an example on a short HelloWorld program:
/**
* This HelloWorld program simply prints "Hello World!"
* to the console when it is run.
*/
class HelloWorld {
public static void main(String[] args) {
System.out.println("Hello World!"); // Display the string.
}
}
All nice and rendered, that looks something like this:
/** * This HelloWorld program simply prints "Hello World!" * to the console when it is run. */ class HelloWorld { public static void main(String[] args) { System.out.println("Hello World!"); // Display the string. } }
Well, that’s all I’ve got. If you want to take the code for a spin, you can download it below. To run it, just type java -jar colorize.jar HelloWorld
(for example, note the lack of an extension on the parameter). The source code is included in the JAR, so if you feel like modifying it, go for it.
Download: