ezEngine  Milestone 7
ezTokenizer Class Reference

Takes text and splits it up into ezToken objects. The result can be used for easier parsing. More...

#include <Tokenizer.h>

Public Member Functions

 ezTokenizer ()
void Tokenize (const ezDynamicArray< ezUInt8 > &Data, ezLogInterface *pLog)
 Clears any previous result and creates a new token stream for the given array.
const ezDeque< ezToken > & GetTokens () const
 Gives read access to the token stream.
ezDeque< ezToken > & GetTokens ()
 Gives read and write access to the token stream.
ezResult GetNextLine (ezUInt32 &uiFirstToken, ezHybridArray< const ezToken *, 32 > &Tokens) const
 Returns an array of tokens that represent the next line in the file. More...
ezResult GetNextLine (ezUInt32 &uiFirstToken, ezHybridArray< ezToken *, 32 > &Tokens)

Private Member Functions

void NextChar ()
void AddToken ()
void HandleUnknown ()
void HandleString1 ()
void HandleString2 ()
void HandleLineComment ()
void HandleBlockComment ()
void HandleWhitespace ()
void HandleIdentifier ()
void HandleNonIdentifier ()

Private Attributes

ezTokenType::Enum m_CurMode
ezStringView m_Iterator
ezUInt32 m_uiCurLine
ezUInt32 m_uiCurColumn
ezUInt32 m_uiCurChar
ezUInt32 m_uiNextChar
ezUInt32 m_uiLastLine
ezUInt32 m_uiLastColumn
const char * m_szCurCharStart
const char * m_szNextCharStart
const char * m_szTokenStart
ezDeque< ezTokenm_Tokens
ezDynamicArray< ezUInt8 > m_Data

Detailed Description

Takes text and splits it up into ezToken objects. The result can be used for easier parsing.

The tokenizer is built to work on code that is similar to C. That means it will tokenize comments and strings as they are defined in the C language. Also line breaks that end with a backslash are not really considered as line breaks.
White space is defined as spaces and tabs.
Identifiers are names that consist of alphanumerics and underscores.
Non-Identifiers are everything else. However, they will currently never consist of more than a single character. Ie. '++' will be tokenized as two consecutive non-Identifiers.
Parenthesis etc. will not be tokenized in any special way, they are all considered as non-Identifiers.

The token stream will always end with an end-of-file token.

Member Function Documentation

ezResult ezTokenizer::GetNextLine ( ezUInt32 &  uiFirstToken,
ezHybridArray< const ezToken *, 32 > &  Tokens 
) const

Returns an array of tokens that represent the next line in the file.

Returns EZ_SUCCESS when there was more data to return, EZ_FAILURE if the end of the file was reached already. uiFirstToken is the index from where to start. It will be updated automatically. Consecutive calls to GetNextLine() with the same uiFirstToken variable will give one line after the other.

This function takes care of handling the 'backslash/newline' combination, as defined in the C language. That means all such sequences will be ignored. Therefore the tokens that are returned as one line might not contain all tokens that are actually in the stream. Also the tokens might have different line numbers, when two or more lines from the file are merged into one logical line.
Theoretically, if the line ends with an identifier, and the next directly starts with one again,

The documentation for this class was generated from the following files: