Overview   Class List   Class Hierarchy   Class Members   Functions & Constants   Defines   Header Files  

uima::TokenProperties Class Reference

List of all members.


Detailed Description

The class TokenProperties is used to encapsulate information about the characters occuring in a token (for example, upper and lower).

At the centre it is a bitset, but with inline member functions for convenient access. This has to be filled by each compliant tokenizer and stored with each token. Example:

See also:

Public Member Functions

Constructors
 TokenProperties (void)
 Constructs an object, initializing all bit values to zero.
 TokenProperties (const icu::UnicodeString &ustrInputString)
 Constructs an object from a UString, computing the bit values for the string.
 TokenProperties (const UnicodeStringRef &ulstrInputString)
 Constructs an object from a UString, computing the bit values for the string.
 TokenProperties (const UChar *cpucCurrent, const UChar *cpucEnd)
 Constructs an object from a two pointers, computing the bit values for the string.
 TokenProperties (WORD32 w32Val)
 initializes bits to value of w32Val
Properties
bool hasLeadingUpper (void) const
 true if the first char in the token is upper case
void setLeadingUpper (bool bSetOn=true)
 sets the hasLeadingUpper() property to bSetOn
bool hasTrailingUpper (void) const
 true if some char after the first char in the token is upper case
void setTrailingUpper (bool bSetOn=true)
 sets the hasTrailingUpper() property to bSetOn
bool hasUpper (void) const
 true if the token has upper case chars (leading or trailing)
bool hasLower (void) const
 true if the token has lower case chars
void setLower (bool bSetOn=true)
 sets the hasLower() property to bSetOn
bool hasNumeric (void) const
 true if the token has numeric chars
void setNumeric (bool bSetOn=true)
 sets the hasNumeric() property to bSetOn
bool hasSpecial (void) const
 true if the token has special chars (e.g. hyphen, period etc.)
void setSpecial (bool bSetOn=true)
 sets the hasSpecial() property to bSetOn
Miscellaneous
bool isPlainWord () const
 true if not hasSpecial() and not hasNumeric()
bool isAllUppercaseWord (void) const
 true if only hasUpper()
bool isAllLowercaseWord (void) const
 true if only hasLower()
bool isInitialUppercaseWord (void) const
 true if only hasLeadingUpper() and hasTrailingUpper()
bool isPlainNumber () const
 true if hasNumeric() && !(hasLower() || hasUpper()) Note: this might have decimal point and sign
bool isPureNumber () const
 unlike isPlainNumber() this only allows for digits (no sign and point)
bool isPureSpecial () const
 true if hasSpecail() && !(hasLower() || hasUpper() || hasNumeric()) Note: this might have decimal point and sign
void reset (void)
 Resets all bits in *this, and returns *this.
void initFromString (const UChar *cpucCurrent, const UChar *cpucEnd)
 Resets all bits and reinitializes from the string.
std::string to_string (void) const
 Returns an object of type string, N characters long.
unsigned long to_ulong (void) const
 Returns the integral value corresponding to the bits in *this.

Constructor & Destructor Documentation

uima::TokenProperties::TokenProperties ( void   )  [inline]

Constructs an object, initializing all bit values to zero.

uima::TokenProperties::TokenProperties ( const icu::UnicodeString &  ustrInputString  ) 

Constructs an object from a UString, computing the bit values for the string.

uima::TokenProperties::TokenProperties ( const UnicodeStringRef ulstrInputString  ) 

Constructs an object from a UString, computing the bit values for the string.

uima::TokenProperties::TokenProperties ( const UChar *  cpucCurrent,
const UChar *  cpucEnd 
)

Constructs an object from a two pointers, computing the bit values for the string.

Note: cpucEnd points beyond the end of the string

uima::TokenProperties::TokenProperties ( WORD32  w32Val  )  [inline]

initializes bits to value of w32Val


Member Function Documentation

bool uima::TokenProperties::hasLeadingUpper ( void   )  const [inline]

true if the first char in the token is upper case

References UIMA_TOKEN_PROP_LEADING_UPPER.

void uima::TokenProperties::setLeadingUpper ( bool  bSetOn = true  )  [inline]

sets the hasLeadingUpper() property to bSetOn

References UIMA_TOKEN_PROP_LEADING_UPPER.

bool uima::TokenProperties::hasTrailingUpper ( void   )  const [inline]

true if some char after the first char in the token is upper case

References UIMA_TOKEN_PROP_TRAILING_UPPER.

void uima::TokenProperties::setTrailingUpper ( bool  bSetOn = true  )  [inline]

sets the hasTrailingUpper() property to bSetOn

References UIMA_TOKEN_PROP_TRAILING_UPPER.

bool uima::TokenProperties::hasUpper ( void   )  const [inline]

true if the token has upper case chars (leading or trailing)

References UIMA_TOKEN_PROP_LEADING_UPPER, and UIMA_TOKEN_PROP_TRAILING_UPPER.

bool uima::TokenProperties::hasLower ( void   )  const [inline]

true if the token has lower case chars

References UIMA_TOKEN_PROP_LOWER.

void uima::TokenProperties::setLower ( bool  bSetOn = true  )  [inline]

sets the hasLower() property to bSetOn

References UIMA_TOKEN_PROP_LOWER.

bool uima::TokenProperties::hasNumeric ( void   )  const [inline]

true if the token has numeric chars

References UIMA_TOKEN_PROP_NUMERIC.

void uima::TokenProperties::setNumeric ( bool  bSetOn = true  )  [inline]

sets the hasNumeric() property to bSetOn

References UIMA_TOKEN_PROP_NUMERIC.

bool uima::TokenProperties::hasSpecial ( void   )  const [inline]

true if the token has special chars (e.g. hyphen, period etc.)

References UIMA_TOKEN_PROP_SPECIAL.

void uima::TokenProperties::setSpecial ( bool  bSetOn = true  )  [inline]

sets the hasSpecial() property to bSetOn

References UIMA_TOKEN_PROP_SPECIAL.

bool uima::TokenProperties::isPlainWord ( void   )  const [inline]

bool uima::TokenProperties::isAllUppercaseWord ( void   )  const [inline]

bool uima::TokenProperties::isAllLowercaseWord ( void   )  const [inline]

true if only hasLower()

References UIMA_TOKEN_PROP_LOWER.

bool uima::TokenProperties::isInitialUppercaseWord ( void   )  const [inline]

bool uima::TokenProperties::isPlainNumber (  )  const [inline]

true if hasNumeric() && !(hasLower() || hasUpper()) Note: this might have decimal point and sign

References UIMA_TOKEN_PROP_NUMERIC, and UIMA_TOKEN_PROP_SPECIAL.

bool uima::TokenProperties::isPureNumber (  )  const [inline]

unlike isPlainNumber() this only allows for digits (no sign and point)

References UIMA_TOKEN_PROP_NUMERIC.

bool uima::TokenProperties::isPureSpecial (  )  const [inline]

true if hasSpecail() && !(hasLower() || hasUpper() || hasNumeric()) Note: this might have decimal point and sign

References UIMA_TOKEN_PROP_SPECIAL.

void uima::TokenProperties::reset ( void   )  [inline]

Resets all bits in *this, and returns *this.

void uima::TokenProperties::initFromString ( const UChar *  cpucCurrent,
const UChar *  cpucEnd 
)

Resets all bits and reinitializes from the string.

std::string uima::TokenProperties::to_string ( void   )  const

Returns an object of type string, N characters long.

Each position in the new string is initialized with a character ('0' for zero and '1' for one), representing the value stored in the corresponding bit position of this. Character position N - 1 corresponds to bit position 0. Subsequent decreasing character positions correspond to increasing bit positions.

unsigned long uima::TokenProperties::to_ulong ( void   )  const [inline]

Returns the integral value corresponding to the bits in *this.


The documentation for this class was generated from the following file:

Generated on Mon Oct 1 11:15:09 2012 for UIMACPP API by  doxygen 1.5.6