The job of the tokenizer (tokens.e) is to break the code of an Euphoria file into small pieces to make it easier for the program to process. This version also adds the ability to easily modify the way it processes the data.
The main function is tokenize. Give it the a file number or a sequence containing the data, and it will return a sequence in the format {{token,line,col},{token,line,col}...}.
To tokenize a file or a sequence of data, use:
tokens = tokenize(file)
| Name | Description |
|---|---|
| tokens | A list of tokens in the form {{token,line,col},{token,line,col}...} |
| file | A file number or a file name |
To tokenize a string, use:
tokens = tokenize_string(data)
| Name | Description |
|---|---|
| tokens | A list of tokens in the form {{token,line,col},{token,line,col}...} |
| data | The string to tokenize |
| Procedure | Description |
|---|---|
| addWhitespaceDelimiter(delim) | Add a whitespace delimiter. For example:addWhitespaceDelimiter(" ")
addWhitespaceDelimiter("\t")
|
| addNewLineDelimiter(delim) | Add a new-line delimiter. For example:addNewLineDelimiter("\n")
addNewLineDelimiter("\r")
addNewLineDelimiter("\r\n")
|
| addIncludedDelimiter(delim) | Add an included delimiter, like an operator. For
example:addIncludedDelimiter("+")
addIncludedDelimiter("+=")
addIncludedDelimiter("(")
addIncludedDelimiter("}")
|
| addStringDelimiter(delim) | Adds a string delimiter. For example:addStringDelimiter("'")
addStringDelimiter("\"")
|
| addLineComment(delim) | Adds a single-line comment delimiter. For example:addLineComment("--")
|
| addBlockComment(start,end) | Adds block comment syntax, starting with start and
ending with end. For example:addBlockComment("/*","*/")
|
| addNonDelimiter(nondelim) | When nondelim is encountered, it is added to the
current token. One thing to note about the way the
tokenizer works is that it sorts the delimiters list
starting with the longest delimiter and ending with the
shortest. This way, it doesn't call the procedure for +
instead of += when it encounters +=. By using this
procedure with "", if no delimiters are
encountered, then it calls routine.It also allows for
always adding "a" to the current token, though
I don't know of any reason why you would want to.addNonDelimiter("")
|
| addSpecialDelimiter(delim,routine) | When delim is encountered, call routine.global procedure processLineComment(integer whichOne)
if length(token[1]) then
tokens = append(tokens,token)
end if
token = {"",curline,curcol}
while 1 do
if cchar > length(file_data) then
exit
end if
if isNewLine() then
exit
end if
token[1] = token[1] & file_data[cchar]
cchar = cchar + 1
curcol = curcol + 1
end while
tokens = append(tokens,token)
token = {"",curline,curcol}
end procedure
addSpecialDelimiter("--","processLineComment")
-- or
-- addSpecialDelimiter("--",routine_id("processLineComment"))
|
| addExtendedDelimiter(delim,routine,extra) | When delim is encountered, call routine. Store extra
in DELIMITERS[whichOne][3] as in:global procedure processBlockComment(integer whichOne)
if length(token[1]) then
tokens = append(tokens,token)
end if
token = {DELIMITERS[whichOne][1],curline,curcol}
curcol = curcol + length(token[1])
cchar = cchar + length(token[1])
c = DELIMITERS[whichOne][3]
while 1 do
if cchar > length(file_data) then
exit
end if
if cmp() then
token[1] = token[1] & DELIMITERS[whichOne][3]
curcol = curcol + length(DELIMITERS[whichOne][3])
cchar = cchar + length(DELIMITERS[whichOne][3])
exit
end if
token[1] = token[1] & file_data[cchar]
if isNewLine() then
cchar = cchar + length(DELIMITERS[isNewLine()][1])
curcol = 0
curline = curline + 1
else
cchar = cchar + 1
curcol = curcol + 1
end if
end while
tokens = append(tokens,token)
token = {"",curline,curcol}
end procedure
addExtendedDelimiter("/*","processBlockComment","*/")
|