The Tokenizer

The job of the tokenizer (tokens.e) is to break the code of an Euphoria file into small pieces to make it easier for the program to process. This version also adds the ability to easily modify the way it processes the data.

The main function is tokenize. Give it the a file number or a sequence containing the data, and it will return a sequence in the format {{token,line,col},{token,line,col}...}.

To tokenize a file or a sequence of data, use:

tokens = tokenize(file)

Name	Description
tokens	A list of tokens in the form {{token,line,col},{token,line,col}...}
file	A file number or a file name

To tokenize a string, use:

tokens = tokenize_string(data)

Name	Description
tokens	A list of tokens in the form {{token,line,col},{token,line,col}...}
data	The string to tokenize

Extending the Tokenizer

Procedure	Description
addWhitespaceDelimiter(delim)	Add a whitespace delimiter. For example: addWhitespaceDelimiter(" ") addWhitespaceDelimiter("\t")
addNewLineDelimiter(delim)	Add a new-line delimiter. For example: addNewLineDelimiter("\n") addNewLineDelimiter("\r") addNewLineDelimiter("\r\n")
addIncludedDelimiter(delim)	Add an included delimiter, like an operator. For example: addIncludedDelimiter("+") addIncludedDelimiter("+=") addIncludedDelimiter("(") addIncludedDelimiter("}")
addStringDelimiter(delim)	Adds a string delimiter. For example: addStringDelimiter("'") addStringDelimiter("\"")
addLineComment(delim)	Adds a single-line comment delimiter. For example: addLineComment("--")
addBlockComment(start,end)	Adds block comment syntax, starting with start and ending with end. For example: addBlockComment("/","/")
addNonDelimiter(nondelim)	When nondelim is encountered, it is added to the current token. One thing to note about the way the tokenizer works is that it sorts the delimiters list starting with the longest delimiter and ending with the shortest. This way, it doesn't call the procedure for + instead of += when it encounters +=. By using this procedure with "", if no delimiters are encountered, then it calls routine.It also allows for always adding "a" to the current token, though I don't know of any reason why you would want to. addNonDelimiter("")
addSpecialDelimiter(delim,routine)	When delim is encountered, call routine. global procedure processLineComment(integer whichOne) if length(token[1]) then tokens = append(tokens,token) end if token = {"",curline,curcol} while 1 do if cchar > length(file_data) then exit end if if isNewLine() then exit end if token[1] = token[1] & file_data[cchar] cchar = cchar + 1 curcol = curcol + 1 end while tokens = append(tokens,token) token = {"",curline,curcol} end procedure addSpecialDelimiter("--","processLineComment") -- or -- addSpecialDelimiter("--",routine_id("processLineComment"))
addExtendedDelimiter(delim,routine,extra)	When delim is encountered, call routine. Store extra in DELIMITERS[whichOne][3] as in: global procedure processBlockComment(integer whichOne) if length(token[1]) then tokens = append(tokens,token) end if token = {DELIMITERS[whichOne][1],curline,curcol} curcol = curcol + length(token[1]) cchar = cchar + length(token[1]) c = DELIMITERS[whichOne][3] while 1 do if cchar > length(file_data) then exit end if if cmp() then token[1] = token[1] & DELIMITERS[whichOne][3] curcol = curcol + length(DELIMITERS[whichOne][3]) cchar = cchar + length(DELIMITERS[whichOne][3]) exit end if token[1] = token[1] & file_data[cchar] if isNewLine() then cchar = cchar + length(DELIMITERS[isNewLine()][1]) curcol = 0 curline = curline + 1 else cchar = cchar + 1 curcol = curcol + 1 end if end while tokens = append(tokens,token) token = {"",curline,curcol} end procedure addExtendedDelimiter("/","processBlockComment","/")

Prev: Extending the pre-processor

Next: tree.e

Table of Contents