The job of the tokenizer (tokens.e) is to break the code of an Euphoria file into small pieces to make it easier for the program to process. This version also adds the ability to easily modify the way it processes the data.
The main function is tokenize. Give it the a file number or a sequence containing the data, and it will return a sequence in the format {{token,line,col},{token,line,col}...}.
To tokenize a file or a sequence of data, use:
tokens = tokenize(file)
Name | Description |
---|---|
tokens | A list of tokens in the form {{token,line,col},{token,line,col}...} |
file | A file number or a file name |
To tokenize a string, use:
tokens = tokenize_string(data)
Name | Description |
---|---|
tokens | A list of tokens in the form {{token,line,col},{token,line,col}...} |
data | The string to tokenize |
Procedure | Description |
---|---|
addWhitespaceDelimiter(delim) | Add a whitespace delimiter. For example:addWhitespaceDelimiter(" ") addWhitespaceDelimiter("\t") |
addNewLineDelimiter(delim) | Add a new-line delimiter. For example:addNewLineDelimiter("\n") addNewLineDelimiter("\r") addNewLineDelimiter("\r\n") |
addIncludedDelimiter(delim) | Add an included delimiter, like an operator. For
example:addIncludedDelimiter("+") addIncludedDelimiter("+=") addIncludedDelimiter("(") addIncludedDelimiter("}") |
addStringDelimiter(delim) | Adds a string delimiter. For example:addStringDelimiter("'") addStringDelimiter("\"") |
addLineComment(delim) | Adds a single-line comment delimiter. For example:addLineComment("--") |
addBlockComment(start,end) | Adds block comment syntax, starting with start and
ending with end. For example:addBlockComment("/*","*/") |
addNonDelimiter(nondelim) | When nondelim is encountered, it is added to the
current token. One thing to note about the way the
tokenizer works is that it sorts the delimiters list
starting with the longest delimiter and ending with the
shortest. This way, it doesn't call the procedure for +
instead of += when it encounters +=. By using this
procedure with "", if no delimiters are
encountered, then it calls routine.It also allows for
always adding "a" to the current token, though
I don't know of any reason why you would want to.addNonDelimiter("") |
addSpecialDelimiter(delim,routine) | When delim is encountered, call routine.global procedure processLineComment(integer whichOne) if length(token[1]) then tokens = append(tokens,token) end if token = {"",curline,curcol} while 1 do if cchar > length(file_data) then exit end if if isNewLine() then exit end if token[1] = token[1] & file_data[cchar] cchar = cchar + 1 curcol = curcol + 1 end while tokens = append(tokens,token) token = {"",curline,curcol} end procedure addSpecialDelimiter("--","processLineComment") -- or -- addSpecialDelimiter("--",routine_id("processLineComment")) |
addExtendedDelimiter(delim,routine,extra) | When delim is encountered, call routine. Store extra
in DELIMITERS[whichOne][3] as in:global procedure processBlockComment(integer whichOne) if length(token[1]) then tokens = append(tokens,token) end if token = {DELIMITERS[whichOne][1],curline,curcol} curcol = curcol + length(token[1]) cchar = cchar + length(token[1]) c = DELIMITERS[whichOne][3] while 1 do if cchar > length(file_data) then exit end if if cmp() then token[1] = token[1] & DELIMITERS[whichOne][3] curcol = curcol + length(DELIMITERS[whichOne][3]) cchar = cchar + length(DELIMITERS[whichOne][3]) exit end if token[1] = token[1] & file_data[cchar] if isNewLine() then cchar = cchar + length(DELIMITERS[isNewLine()][1]) curcol = 0 curline = curline + 1 else cchar = cchar + 1 curcol = curcol + 1 end if end while tokens = append(tokens,token) token = {"",curline,curcol} end procedure addExtendedDelimiter("/*","processBlockComment","*/") |