ezEngine  Milestone 7
ezTokenizer Class Reference

Takes text and splits it up into ezToken objects. The result can be used for easier parsing. More...

#include <Tokenizer.h>

Public Member Functions

 ezTokenizer ()
 Constructor.
 
void Tokenize (const ezDynamicArray< ezUInt8 > &Data, ezLogInterface *pLog)
 Clears any previous result and creates a new token stream for the given array.
 
const ezDeque< ezToken > & GetTokens () const
 Gives read access to the token stream.
 
ezDeque< ezToken > & GetTokens ()
 Gives read and write access to the token stream.
 
ezResult GetNextLine (ezUInt32 &uiFirstToken, ezHybridArray< const ezToken *, 32 > &Tokens) const
 Returns an array of tokens that represent the next line in the file. More...
 
ezResult GetNextLine (ezUInt32 &uiFirstToken, ezHybridArray< ezToken *, 32 > &Tokens)
 

Private Member Functions

void NextChar ()
 
void AddToken ()
 
void HandleUnknown ()
 
void HandleString1 ()
 
void HandleString2 ()
 
void HandleLineComment ()
 
void HandleBlockComment ()
 
void HandleWhitespace ()
 
void HandleIdentifier ()
 
void HandleNonIdentifier ()
 

Private Attributes

ezLogInterfacem_pLog
 
ezTokenType::Enum m_CurMode
 
ezStringView m_Iterator
 
ezUInt32 m_uiCurLine
 
ezUInt32 m_uiCurColumn
 
ezUInt32 m_uiCurChar
 
ezUInt32 m_uiNextChar
 
ezUInt32 m_uiLastLine
 
ezUInt32 m_uiLastColumn
 
const char * m_szCurCharStart
 
const char * m_szNextCharStart
 
const char * m_szTokenStart
 
ezDeque< ezTokenm_Tokens
 
ezDynamicArray< ezUInt8 > m_Data
 

Detailed Description

Takes text and splits it up into ezToken objects. The result can be used for easier parsing.

The tokenizer is built to work on code that is similar to C. That means it will tokenize comments and strings as they are defined in the C language. Also line breaks that end with a backslash are not really considered as line breaks.
White space is defined as spaces and tabs.
Identifiers are names that consist of alphanumerics and underscores.
Non-Identifiers are everything else. However, they will currently never consist of more than a single character. Ie. '++' will be tokenized as two consecutive non-Identifiers.
Parenthesis etc. will not be tokenized in any special way, they are all considered as non-Identifiers.

The token stream will always end with an end-of-file token.

Member Function Documentation

ezResult ezTokenizer::GetNextLine ( ezUInt32 &  uiFirstToken,
ezHybridArray< const ezToken *, 32 > &  Tokens 
) const

Returns an array of tokens that represent the next line in the file.

Returns EZ_SUCCESS when there was more data to return, EZ_FAILURE if the end of the file was reached already. uiFirstToken is the index from where to start. It will be updated automatically. Consecutive calls to GetNextLine() with the same uiFirstToken variable will give one line after the other.

Note
This function takes care of handling the 'backslash/newline' combination, as defined in the C language. That means all such sequences will be ignored. Therefore the tokens that are returned as one line might not contain all tokens that are actually in the stream. Also the tokens might have different line numbers, when two or more lines from the file are merged into one logical line.
Todo:
Theoretically, if the line ends with an identifier, and the next directly starts with one again,

The documentation for this class was generated from the following files: