Overview   Class List   Class Hierarchy   Class Members   Functions & Constants   Defines   Header Files  

uima::UnicodeStringRef Class Reference

List of all members.


Detailed Description

The class UnicodeStringRef provides support for non zero-terminated strings that are presented as pointers to Unicode character arrays with an associated length.

As this type of string is supposed to be used only as string reference into read-only buffers, the string pointer is constant. The member functions are named to implement the icu::UnicodeString interface but only providing const member functions This class is a quick ,light-weight, shallow string (internally it consists only of a pointer and a length) which can be copied by value without performance penalty. It allows references into other string buffers to be treated like real string objects. Since it does not own it's string memory care must be taken to make sure the lifetime of an UnicodeStringRef object does not exceed the lifetime of the Unicode character buffer it references.

Public Member Functions

 UnicodeStringRef (void)
 Default Constructor.
 UnicodeStringRef (const icu::UnicodeString &crUniString)
 Constructor from icu::UnicodeString.
 UnicodeStringRef (UChar const *cpacString)
 Constructor from zero terminated string.
 UnicodeStringRef (UChar const *cpacString, int32_t uiLength)
 Constructor from string and length.
 UnicodeStringRef (UChar const *paucStringBegin, UChar const *paucStringEnd)
 Constructor from a two pointers (begin/end).
int32_t getSizeInBytes (void) const
 Accessor for the number of bytes occupied by this string.
UChar const * getBuffer (void) const
 CONST Accessor for the string content (NOT ZERO DELIMITED!).
UnicodeStringRefoperator= (UnicodeStringRef const &crclRHS)
 Assignment operator.
int operator== (const UnicodeStringRef &crclRHS) const
 Equality operator.
int operator!= (const UnicodeStringRef &crclRHS) const
 Inequality operator.
bool operator< (UnicodeStringRef const &text) const
 less operator
bool operator<= (UnicodeStringRef const &text) const
 less equal operator
bool operator> (UnicodeStringRef const &text) const
 greater operator
bool operator>= (UnicodeStringRef const &text) const
 greater equal operator
int8_t compare (const UnicodeStringRef &text) const
 Compare the characters bitwise in this UnicodeStringRef to the characters in text.
int8_t compare (const icu::UnicodeString &text) const
 Compare the characters bitwise in this UnicodeStringRef to the characters in text.
int8_t compare (int32_t start, int32_t length, const UnicodeStringRef &srcText) const
 Compare the characters bitwise in the range [start, start + length) with the characters in srcText.
int8_t compare (int32_t start, int32_t length, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength) const
 Compare the characters bitwise in the range [start, start + length) with the characters in srcText in the range [srcStart, srcStart + srcLength).
int8_t compare (UChar const *srcChars, int32_t srcLength) const
 Compare the characters bitwise in this UnicodeStringRef with the first srcLength characters in srcChars.
int8_t compare (int32_t start, int32_t length, UChar const *srcChars) const
 Compare the characters bitwise in the range [start, start + length) with the first length characters in srcChars.
int8_t compare (int32_t start, int32_t length, UChar const *srcChars, int32_t srcStart, int32_t srcLength) const
 Compare the characters bitwise in the range [start, start + length) with the characters in srcChars in the range [srcStart, srcStart + srcLength).
int8_t compareBetween (int32_t start, int32_t limit, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLimit) const
 Compare the characters bitwise in the range [start, limit) with the characters in srcText in the range [srcStart, srcLimit).
int8_t compareCodePointOrder (const UnicodeStringRef &text) const
 Compare two Unicode strings in code point order.
int8_t compareCodePointOrder (int32_t start, int32_t length, const UnicodeStringRef &srcText) const
 Compare two Unicode strings in code point order.
int8_t compareCodePointOrder (int32_t start, int32_t length, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength) const
 Compare two Unicode strings in code point order.
int8_t compareCodePointOrder (UChar const *srcChars, int32_t srcLength) const
 Compare two Unicode strings in code point order.
int8_t compareCodePointOrder (int32_t start, int32_t length, UChar const *srcChars) const
 Compare two Unicode strings in code point order.
int8_t compareCodePointOrder (int32_t start, int32_t length, UChar const *srcChars, int32_t srcStart, int32_t srcLength) const
 Compare two Unicode strings in code point order.
int8_t compareCodePointOrderBetween (int32_t start, int32_t limit, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLimit) const
 Compare two Unicode strings in code point order.
int8_t caseCompare (const UnicodeStringRef &text, uint32_t options) const
 Compare two strings case-insensitively using full case folding.
int8_t caseCompare (int32_t start, int32_t length, const UnicodeStringRef &srcText, uint32_t options) const
 Compare two strings case-insensitively using full case folding.
int8_t caseCompare (int32_t start, int32_t length, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength, uint32_t options) const
 Compare two strings case-insensitively using full case folding.
int8_t caseCompare (UChar const *srcChars, int32_t srcLength, uint32_t options) const
 Compare two strings case-insensitively using full case folding.
int8_t caseCompare (int32_t start, int32_t length, UChar const *srcChars, uint32_t options) const
 Compare two strings case-insensitively using full case folding.
int8_t caseCompare (int32_t start, int32_t length, UChar const *srcChars, int32_t srcStart, int32_t srcLength, uint32_t options) const
 Compare two strings case-insensitively using full case folding.
int8_t caseCompareBetween (int32_t start, int32_t limit, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLimit, uint32_t options) const
 Compare two strings case-insensitively using full case folding.
bool startsWith (const UnicodeStringRef &text) const
 Determine if this starts with the characters in text.
bool startsWith (const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength) const
 Determine if this starts with the characters in srcText in the range [srcStart, srcStart + srcLength).
bool startsWith (UChar const *srcChars, int32_t srcLength) const
 Determine if this starts with the characters in srcChars.
bool startsWith (UChar const *srcChars, int32_t srcStart, int32_t srcLength) const
 Determine if this starts with the characters in srcChars in the range [srcStart, srcStart + srcLength).
bool endsWith (const UnicodeStringRef &text) const
 Determine if this ends with the characters in text.
bool endsWith (const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength) const
 Determine if this ends with the characters in srcText in the range [srcStart, srcStart + srcLength).
bool endsWith (UChar const *srcChars, int32_t srcLength) const
 Determine if this ends with the characters in srcChars.
bool endsWith (UChar const *srcChars, int32_t srcStart, int32_t srcLength) const
 Determine if this ends with the characters in srcChars in the range [srcStart, srcStart + srcLength).
int32_t indexOf (const UnicodeStringRef &text) const
 Locate in this the first occurrence of the characters in text, using bitwise comparison.
int32_t indexOf (const UnicodeStringRef &text, int32_t start) const
 Locate in this the first occurrence of the characters in text starting at offset start, using bitwise comparison.
int32_t indexOf (const UnicodeStringRef &text, int32_t start, int32_t length) const
 Locate in this the first occurrence in the range [start, start + length) of the characters in text, using bitwise comparison.
int32_t indexOf (const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength, int32_t start, int32_t length) const
 Locate in this the first occurrence in the range [start, start + length) of the characters in srcText in the range [srcStart, srcStart + srcLength), using bitwise comparison.
int32_t indexOf (UChar const *srcChars, int32_t srcLength, int32_t start) const
 Locate in this the first occurrence of the characters in srcChars starting at offset start, using bitwise comparison.
int32_t indexOf (UChar const *srcChars, int32_t srcLength, int32_t start, int32_t length) const
 Locate in this the first occurrence in the range [start, start + length) of the characters in srcChars, using bitwise comparison.
int32_t indexOf (UChar const *srcChars, int32_t srcStart, int32_t srcLength, int32_t start, int32_t length) const
 Locate in this the first occurrence in the range [start, start + length) of the characters in srcChars in the range [srcStart, srcStart + srcLength), using bitwise comparison.
int32_t indexOf (UChar c) const
 Locate in this the first occurrence of the code unit c, using bitwise comparison.
int32_t indexOf (UChar32 c) const
 Locate in this the first occurrence of the code point c, using bitwise comparison.
int32_t indexOf (UChar c, int32_t start) const
 Locate in this the first occurrence of the code unit c starting at offset start, using bitwise comparison.
int32_t indexOf (UChar32 c, int32_t start) const
 Locate in this the first occurrence of the code point c starting at offset start, using bitwise comparison.
int32_t indexOf (UChar c, int32_t start, int32_t length) const
 Locate in this the first occurrence of the code unit c in the range [start, start + length), using bitwise comparison.
int32_t indexOf (UChar32 c, int32_t start, int32_t length) const
 Locate in this the first occurrence of the code point c in the range [start, start + length), using bitwise comparison.
int32_t lastIndexOf (const UnicodeStringRef &text) const
 Locate in this the last occurrence of the characters in text, using bitwise comparison.
int32_t lastIndexOf (const UnicodeStringRef &text, int32_t start) const
 Locate in this the last occurrence of the characters in text starting at offset start, using bitwise comparison.
int32_t lastIndexOf (const UnicodeStringRef &text, int32_t start, int32_t length) const
 Locate in this the last occurrence in the range [start, start + length) of the characters in text, using bitwise comparison.
int32_t lastIndexOf (const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength, int32_t start, int32_t length) const
 Locate in this the last occurrence in the range [start, start + length) of the characters in srcText in the range [srcStart, srcStart + srcLength), using bitwise comparison.
int32_t lastIndexOf (UChar const *srcChars, int32_t srcLength, int32_t start) const
 Locate in this the last occurrence of the characters in srcChars starting at offset start, using bitwise comparison.
int32_t lastIndexOf (UChar const *srcChars, int32_t srcLength, int32_t start, int32_t length) const
 Locate in this the last occurrence in the range [start, start + length) of the characters in srcChars, using bitwise comparison.
int32_t lastIndexOf (UChar const *srcChars, int32_t srcStart, int32_t srcLength, int32_t start, int32_t length) const
 Locate in this the last occurrence in the range [start, start + length) of the characters in srcChars in the range [srcStart, srcStart + srcLength), using bitwise comparison.
int32_t lastIndexOf (UChar c) const
 Locate in this the last occurrence of the code unit c, using bitwise comparison.
int32_t lastIndexOf (UChar32 c) const
 Locate in this the last occurrence of the code point c, using bitwise comparison.
int32_t lastIndexOf (UChar c, int32_t start) const
 Locate in this the last occurrence of the code unit c starting at offset start, using bitwise comparison.
int32_t lastIndexOf (UChar32 c, int32_t start) const
 Locate in this the last occurrence of the code point c starting at offset start, using bitwise comparison.
int32_t lastIndexOf (UChar c, int32_t start, int32_t length) const
 Locate in this the last occurrence of the code unit c in the range [start, start + length), using bitwise comparison.
int32_t lastIndexOf (UChar32 c, int32_t start, int32_t length) const
 Locate in this the last occurrence of the code point c in the range [start, start + length), using bitwise comparison.
UChar charAt (int32_t offset) const
 Return the code unit at offset offset.
UChar operator[] (int32_t offset) const
 Return the code unit at offset offset.
UChar32 char32At (int32_t offset) const
 Return the code point that contains the code unit at offset offset.
int32_t getChar32Start (int32_t offset) const
 Adjust a random-access offset so that it points to the beginning of a Unicode character.
int32_t getChar32Limit (int32_t offset) const
 Adjust a random-access offset so that it points behind a Unicode character.
int32_t moveIndex32 (int32_t index, int32_t delta) const
 Move the code unit index along the string by delta code points.
void extract (int32_t start, int32_t length, UChar *dst, int32_t dstStart=0) const
 Copy the characters in the range [start, start + length) into the array dst, beginning at dstStart.
void extractBetween (int32_t start, int32_t limit, UChar *dst, int32_t dstStart=0) const
 Copy the characters in the range [start, limit) into the array dst, beginning at dstStart.
int32_t extract (UChar *dst, int32_t dstCapacity, UErrorCode &errorCode) const
 Copy the contents of the string into dst.
void extract (int32_t start, int32_t length, UnicodeString &dst) const
 Copy the characters in the range [start, start + length) into the UnicodeString dst.
void extractBetween (int32_t start, int32_t limit, UnicodeString &dst) const
 Copy the characters in the range [start, limit) into the UnicodeString dst.
int32_t extract (int32_t start, int32_t startLength, char *target, const char *codepage=0) const
 Copy the characters in the range [start, start + length) into an array of characters in a specified codepage.
int32_t extract (int32_t start, int32_t startLength, char *target, uint32_t targetLength, const char *codepage=0) const
 Copy the characters in the range [start, start + length) into an array of characters in a specified codepage.
int32_t extract (char *target, int32_t targetCapacity, UConverter *cnv, UErrorCode &errorCode) const
 Convert the UnicodeStringRef into a codepage string using an existing UConverter.
int32_t extract (int32_t start, int32_t startLength, std::string &target, const char *codepage=0) const
 Copy the characters in the range [start, start + length) into a std::string object in a specified codepage.
int32_t extract (std::string &target, const char *codepage=0) const
 Copy all the characters in the string into an std::string object in a specified codepage.
int32_t extractUTF8 (std::string &target) const
 Copy all the characters in the string into an std::string object in UTF-8.
std::string asUTF8 (void) const
 Convert to a UTF8 string.
int32_t length (void) const
 Return the length of the UnicodeStringRef object.
int32_t countChar32 (int32_t start=0, int32_t length=0x7fffffff) const
 Count Unicode code points in the length UChar code units of the string.
bool isEmpty (void) const
 Determine if this string is empty.
UnicodeStringRefsetTo (const UnicodeStringRef &srcText)
 Set the text in the UnicodeString object to the characters in srcText.
UnicodeStringRefsetTo (const UnicodeString &srcText)
 Set the text in the UnicodeString object to the characters in srcText.
UnicodeStringRefsetTo (const UChar *srcChars, int32_t srcLength)
 Set the characters in the UnicodeString object to the characters in srcChars.
void toSingleByteStream (std::ostream &outStream) const
 Print a single byte version to outStream.

Static Public Member Functions

static void release (std::string &target)
 Release contents of string container allocated by extract methods Useful when caller and callee use different heaps, e.g.


Constructor & Destructor Documentation

uima::UnicodeStringRef::UnicodeStringRef ( void   )  [inline]

Default Constructor.

Referenced by uima::strtrim().

uima::UnicodeStringRef::UnicodeStringRef ( const icu::UnicodeString &  crUniString  )  [inline]

Constructor from icu::UnicodeString.

uima::UnicodeStringRef::UnicodeStringRef ( UChar const *  cpacString  )  [inline, explicit]

Constructor from zero terminated string.

References EXISTS.

uima::UnicodeStringRef::UnicodeStringRef ( UChar const *  cpacString,
int32_t  uiLength 
) [inline]

Constructor from string and length.

References EXISTS.

uima::UnicodeStringRef::UnicodeStringRef ( UChar const *  paucStringBegin,
UChar const *  paucStringEnd 
) [inline]

Constructor from a two pointers (begin/end).

Note: end points to the first char behind the string.

Deprecated:
Replace with UnicodeStringRef(paucStringBegin,paucStringEnd-paucStringBegin).

References EXISTS.


Member Function Documentation

int32_t uima::UnicodeStringRef::getSizeInBytes ( void   )  const [inline]

Accessor for the number of bytes occupied by this string.

UChar const * uima::UnicodeStringRef::getBuffer ( void   )  const [inline]

CONST Accessor for the string content (NOT ZERO DELIMITED!).

Referenced by extract(), indexOf(), lastIndexOf(), and uima::strtrim().

UnicodeStringRef & uima::UnicodeStringRef::operator= ( UnicodeStringRef const &  crclRHS  )  [inline]

Assignment operator.

References iv_pUChars, and iv_uiLength.

int uima::UnicodeStringRef::operator== ( const UnicodeStringRef crclRHS  )  const [inline]

Equality operator.

References iv_pUChars, and iv_uiLength.

int uima::UnicodeStringRef::operator!= ( const UnicodeStringRef crclRHS  )  const [inline]

Inequality operator.

bool uima::UnicodeStringRef::operator< ( UnicodeStringRef const &  text  )  const [inline]

less operator

References iv_uiLength.

bool uima::UnicodeStringRef::operator<= ( UnicodeStringRef const &  text  )  const [inline]

less equal operator

References iv_uiLength.

bool uima::UnicodeStringRef::operator> ( UnicodeStringRef const &  text  )  const [inline]

greater operator

References iv_uiLength.

bool uima::UnicodeStringRef::operator>= ( UnicodeStringRef const &  text  )  const [inline]

greater equal operator

References iv_uiLength.

int8_t uima::UnicodeStringRef::compare ( const UnicodeStringRef text  )  const [inline]

Compare the characters bitwise in this UnicodeStringRef to the characters in text.

Parameters:
text The UnicodeStringRef to compare to this one.
Returns:
The result of bitwise character comparison: 0 if text contains the same characters as this, -1 if the characters in text are bitwise less than the characters in this, +1 if the characters in text are bitwise greater than the characters in this.

References iv_uiLength.

Referenced by startsWith().

int8_t uima::UnicodeStringRef::compare ( const icu::UnicodeString &  text  )  const [inline]

Compare the characters bitwise in this UnicodeStringRef to the characters in text.

Parameters:
text The UnicodeString to compare to this one.
Returns:
The result of bitwise character comparison: 0 if text contains the same characters as this, -1 if the characters in text are bitwise less than the characters in this, +1 if the characters in text are bitwise greater than the characters in this.

int8_t uima::UnicodeStringRef::compare ( int32_t  start,
int32_t  length,
const UnicodeStringRef srcText 
) const [inline]

Compare the characters bitwise in the range [start, start + length) with the characters in srcText.

Parameters:
start the offset at which the compare operation begins
length the number of characters of text to compare.
srcText the text to be compared
Returns:
The result of bitwise character comparison: 0 if text contains the same characters as this, -1 if the characters in text are bitwise less than the characters in this, +1 if the characters in text are bitwise greater than the characters in this.

References iv_uiLength.

int8_t uima::UnicodeStringRef::compare ( int32_t  start,
int32_t  length,
const UnicodeStringRef srcText,
int32_t  srcStart,
int32_t  srcLength 
) const [inline]

Compare the characters bitwise in the range [start, start + length) with the characters in srcText in the range [srcStart, srcStart + srcLength).

Parameters:
start the offset at which the compare operation begins
length the number of characters in this to compare.
srcText the text to be compared
srcStart the offset into srcText to start comparison
srcLength the number of characters in src to compare
Returns:
The result of bitwise character comparison: 0 if text contains the same characters as this, -1 if the characters in text are bitwise less than the characters in this, +1 if the characters in text are bitwise greater than the characters in this.

int8_t uima::UnicodeStringRef::compare ( UChar const *  srcChars,
int32_t  srcLength 
) const [inline]

Compare the characters bitwise in this UnicodeStringRef with the first srcLength characters in srcChars.

Parameters:
srcChars The characters to compare to this UnicodeStringRef.
srcLength the number of characters in srcChars to compare
Returns:
The result of bitwise character comparison: 0 if text contains the same characters as this, -1 if the characters in text are bitwise less than the characters in this, +1 if the characters in text are bitwise greater than the characters in this.

int8_t uima::UnicodeStringRef::compare ( int32_t  start,
int32_t  length,
UChar const *  srcChars 
) const [inline]

Compare the characters bitwise in the range [start, start + length) with the first length characters in srcChars.

Parameters:
start the offset at which the compare operation begins
length the number of characters to compare.
srcChars the characters to be compared
Returns:
The result of bitwise character comparison: 0 if text contains the same characters as this, -1 if the characters in text are bitwise less than the characters in this, +1 if the characters in text are bitwise greater than the characters in this.

int8_t uima::UnicodeStringRef::compare ( int32_t  start,
int32_t  length,
UChar const *  srcChars,
int32_t  srcStart,
int32_t  srcLength 
) const [inline]

Compare the characters bitwise in the range [start, start + length) with the characters in srcChars in the range [srcStart, srcStart + srcLength).

Parameters:
start the offset at which the compare operation begins
length the number of characters in this to compare
srcChars the characters to be compared
srcStart the offset into srcChars to start comparison
srcLength the number of characters in srcChars to compare
Returns:
The result of bitwise character comparison: 0 if text contains the same characters as this, -1 if the characters in text are bitwise less than the characters in this, +1 if the characters in text are bitwise greater than the characters in this.

int8_t uima::UnicodeStringRef::compareBetween ( int32_t  start,
int32_t  limit,
const UnicodeStringRef srcText,
int32_t  srcStart,
int32_t  srcLimit 
) const [inline]

Compare the characters bitwise in the range [start, limit) with the characters in srcText in the range [srcStart, srcLimit).

Parameters:
start the offset at which the compare operation begins
limit the offset immediately following the compare operation
srcText the text to be compared
srcStart the offset into srcText to start comparison
srcLimit the offset into srcText to limit comparison
Returns:
The result of bitwise character comparison: 0 if text contains the same characters as this, -1 if the characters in text are bitwise less than the characters in this, +1 if the characters in text are bitwise greater than the characters in this.

int8_t uima::UnicodeStringRef::compareCodePointOrder ( const UnicodeStringRef text  )  const [inline]

Compare two Unicode strings in code point order.

This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:

In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.

Parameters:
text Another string to compare this one to.
Returns:
a negative/zero/positive integer corresponding to whether this string is less than/equal to/greater than the second one in code point order

References iv_uiLength.

int8_t uima::UnicodeStringRef::compareCodePointOrder ( int32_t  start,
int32_t  length,
const UnicodeStringRef srcText 
) const [inline]

Compare two Unicode strings in code point order.

This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:

In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.

Parameters:
start The start offset in this string at which the compare operation begins.
length The number of code units from this string to compare.
srcText Another string to compare this one to.
Returns:
a negative/zero/positive integer corresponding to whether this string is less than/equal to/greater than the second one in code point order

References iv_uiLength.

int8_t uima::UnicodeStringRef::compareCodePointOrder ( int32_t  start,
int32_t  length,
const UnicodeStringRef srcText,
int32_t  srcStart,
int32_t  srcLength 
) const [inline]

Compare two Unicode strings in code point order.

This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:

In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.

Parameters:
start The start offset in this string at which the compare operation begins.
length The number of code units from this string to compare.
srcText Another string to compare this one to.
srcStart The start offset in that string at which the compare operation begins.
srcLength The number of code units from that string to compare.
Returns:
a negative/zero/positive integer corresponding to whether this string is less than/equal to/greater than the second one in code point order

int8_t uima::UnicodeStringRef::compareCodePointOrder ( UChar const *  srcChars,
int32_t  srcLength 
) const [inline]

Compare two Unicode strings in code point order.

This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:

In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.

Parameters:
srcChars A pointer to another string to compare this one to.
srcLength The number of code units from that string to compare.
Returns:
a negative/zero/positive integer corresponding to whether this string is less than/equal to/greater than the second one in code point order

int8_t uima::UnicodeStringRef::compareCodePointOrder ( int32_t  start,
int32_t  length,
UChar const *  srcChars 
) const [inline]

Compare two Unicode strings in code point order.

This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:

In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.

Parameters:
start The start offset in this string at which the compare operation begins.
length The number of code units from this string to compare.
srcChars A pointer to another string to compare this one to.
Returns:
a negative/zero/positive integer corresponding to whether this string is less than/equal to/greater than the second one in code point order

int8_t uima::UnicodeStringRef::compareCodePointOrder ( int32_t  start,
int32_t  length,
UChar const *  srcChars,
int32_t  srcStart,
int32_t  srcLength 
) const [inline]

Compare two Unicode strings in code point order.

This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:

In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.

Parameters:
start The start offset in this string at which the compare operation begins.
length The number of code units from this string to compare.
srcChars A pointer to another string to compare this one to.
srcStart The start offset in that string at which the compare operation begins.
srcLength The number of code units from that string to compare.
Returns:
a negative/zero/positive integer corresponding to whether this string is less than/equal to/greater than the second one in code point order

int8_t uima::UnicodeStringRef::compareCodePointOrderBetween ( int32_t  start,
int32_t  limit,
const UnicodeStringRef srcText,
int32_t  srcStart,
int32_t  srcLimit 
) const [inline]

Compare two Unicode strings in code point order.

This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:

In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.

Parameters:
start The start offset in this string at which the compare operation begins.
limit The offset after the last code unit from this string to compare.
srcText Another string to compare this one to.
srcStart The start offset in that string at which the compare operation begins.
srcLimit The offset after the last code unit from that string to compare.
Returns:
a negative/zero/positive integer corresponding to whether this string is less than/equal to/greater than the second one in code point order

int8_t uima::UnicodeStringRef::caseCompare ( const UnicodeStringRef text,
uint32_t  options 
) const [inline]

Compare two strings case-insensitively using full case folding.

This is equivalent to this->foldCase(options).compare(text.foldCase(options)).

Parameters:
text Another string to compare this one to.
options Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I
Returns:
A negative, zero, or positive integer indicating the comparison result.

References iv_uiLength.

int8_t uima::UnicodeStringRef::caseCompare ( int32_t  start,
int32_t  length,
const UnicodeStringRef srcText,
uint32_t  options 
) const [inline]

Compare two strings case-insensitively using full case folding.

This is equivalent to this->foldCase(options).compare(srcText.foldCase(options)).

Parameters:
start The start offset in this string at which the compare operation begins.
length The number of code units from this string to compare.
srcText Another string to compare this one to.
options Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I
Returns:
A negative, zero, or positive integer indicating the comparison result.

References iv_uiLength.

int8_t uima::UnicodeStringRef::caseCompare ( int32_t  start,
int32_t  length,
const UnicodeStringRef srcText,
int32_t  srcStart,
int32_t  srcLength,
uint32_t  options 
) const [inline]

Compare two strings case-insensitively using full case folding.

This is equivalent to this->foldCase(options).compare(srcText.foldCase(options)).

Parameters:
start The start offset in this string at which the compare operation begins.
length The number of code units from this string to compare.
srcText Another string to compare this one to.
srcStart The start offset in that string at which the compare operation begins.
srcLength The number of code units from that string to compare.
options Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I
Returns:
A negative, zero, or positive integer indicating the comparison result.

int8_t uima::UnicodeStringRef::caseCompare ( UChar const *  srcChars,
int32_t  srcLength,
uint32_t  options 
) const [inline]

Compare two strings case-insensitively using full case folding.

This is equivalent to this->foldCase(options).compare(srcChars.foldCase(options)).

Parameters:
srcChars A pointer to another string to compare this one to.
srcLength The number of code units from that string to compare.
options Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I
Returns:
A negative, zero, or positive integer indicating the comparison result.

int8_t uima::UnicodeStringRef::caseCompare ( int32_t  start,
int32_t  length,
UChar const *  srcChars,
uint32_t  options 
) const [inline]

Compare two strings case-insensitively using full case folding.

This is equivalent to this->foldCase(options).compare(srcChars.foldCase(options)).

Parameters:
start The start offset in this string at which the compare operation begins.
length The number of code units from this string to compare.
srcChars A pointer to another string to compare this one to.
options Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I
Returns:
A negative, zero, or positive integer indicating the comparison result.

int8_t uima::UnicodeStringRef::caseCompare ( int32_t  start,
int32_t  length,
UChar const *  srcChars,
int32_t  srcStart,
int32_t  srcLength,
uint32_t  options 
) const [inline]

Compare two strings case-insensitively using full case folding.

This is equivalent to this->foldCase(options).compare(srcChars.foldCase(options)).

Parameters:
start The start offset in this string at which the compare operation begins.
length The number of code units from this string to compare.
srcChars A pointer to another string to compare this one to.
srcStart The start offset in that string at which the compare operation begins.
srcLength The number of code units from that string to compare.
options Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I
Returns:
A negative, zero, or positive integer indicating the comparison result.

int8_t uima::UnicodeStringRef::caseCompareBetween ( int32_t  start,
int32_t  limit,
const UnicodeStringRef srcText,
int32_t  srcStart,
int32_t  srcLimit,
uint32_t  options 
) const [inline]

Compare two strings case-insensitively using full case folding.

This is equivalent to this->foldCase(options).compareBetween(text.foldCase(options)).

Parameters:
start The start offset in this string at which the compare operation begins.
limit The offset after the last code unit from this string to compare.
srcText Another string to compare this one to.
srcStart The start offset in that string at which the compare operation begins.
srcLimit The offset after the last code unit from that string to compare.
options Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I
Returns:
A negative, zero, or positive integer indicating the comparison result.

bool uima::UnicodeStringRef::startsWith ( const UnicodeStringRef text  )  const [inline]

Determine if this starts with the characters in text.

Parameters:
text The text to match.
Returns:
TRUE if this starts with the characters in text, FALSE otherwise

References compare(), and iv_uiLength.

bool uima::UnicodeStringRef::startsWith ( const UnicodeStringRef srcText,
int32_t  srcStart,
int32_t  srcLength 
) const [inline]

Determine if this starts with the characters in srcText in the range [srcStart, srcStart + srcLength).

Parameters:
srcText The text to match.
srcStart the offset into srcText to start matching
srcLength the number of characters in srcText to match
Returns:
TRUE if this starts with the characters in text, FALSE otherwise

bool uima::UnicodeStringRef::startsWith ( UChar const *  srcChars,
int32_t  srcLength 
) const [inline]

Determine if this starts with the characters in srcChars.

Parameters:
srcChars The characters to match.
srcLength the number of characters in srcChars
Returns:
TRUE if this starts with the characters in srcChars, FALSE otherwise

bool uima::UnicodeStringRef::startsWith ( UChar const *  srcChars,
int32_t  srcStart,
int32_t  srcLength 
) const [inline]

Determine if this starts with the characters in srcChars in the range [srcStart, srcStart + srcLength).

Parameters:
srcChars The characters to match.
srcStart the offset into srcText to start matching
srcLength the number of characters in srcChars to match
Returns:
TRUE if this starts with the characters in srcChars, FALSE otherwise

bool uima::UnicodeStringRef::endsWith ( const UnicodeStringRef text  )  const [inline]

Determine if this ends with the characters in text.

Parameters:
text The text to match.
Returns:
TRUE if this ends with the characters in text, FALSE otherwise

References iv_uiLength.

bool uima::UnicodeStringRef::endsWith ( const UnicodeStringRef srcText,
int32_t  srcStart,
int32_t  srcLength 
) const [inline]

Determine if this ends with the characters in srcText in the range [srcStart, srcStart + srcLength).

Parameters:
srcText The text to match.
srcStart the offset into srcText to start matching
srcLength the number of characters in srcText to match
Returns:
TRUE if this ends with the characters in text, FALSE otherwise

bool uima::UnicodeStringRef::endsWith ( UChar const *  srcChars,
int32_t  srcLength 
) const [inline]

Determine if this ends with the characters in srcChars.

Parameters:
srcChars The characters to match.
srcLength the number of characters in srcChars
Returns:
TRUE if this ends with the characters in srcChars, FALSE otherwise

bool uima::UnicodeStringRef::endsWith ( UChar const *  srcChars,
int32_t  srcStart,
int32_t  srcLength 
) const [inline]

Determine if this ends with the characters in srcChars in the range [srcStart, srcStart + srcLength).

Parameters:
srcChars The characters to match.
srcStart the offset into srcText to start matching
srcLength the number of characters in srcChars to match
Returns:
TRUE if this ends with the characters in srcChars, FALSE otherwise

int32_t uima::UnicodeStringRef::indexOf ( const UnicodeStringRef text  )  const [inline]

Locate in this the first occurrence of the characters in text, using bitwise comparison.

Parameters:
text The text to search for.
Returns:
The offset into this of the start of text, or -1 if not found.

References iv_uiLength.

Referenced by indexOf().

int32_t uima::UnicodeStringRef::indexOf ( const UnicodeStringRef text,
int32_t  start 
) const [inline]

Locate in this the first occurrence of the characters in text starting at offset start, using bitwise comparison.

Parameters:
text The text to search for.
start The offset at which searching will start.
Returns:
The offset into this of the start of text, or -1 if not found.

References indexOf(), and iv_uiLength.

int32_t uima::UnicodeStringRef::indexOf ( const UnicodeStringRef text,
int32_t  start,
int32_t  length 
) const [inline]

Locate in this the first occurrence in the range [start, start + length) of the characters in text, using bitwise comparison.

Parameters:
text The text to search for.
start The offset at which searching will start.
length The number of characters to search
Returns:
The offset into this of the start of text, or -1 if not found.

References indexOf(), and iv_uiLength.

int32_t uima::UnicodeStringRef::indexOf ( const UnicodeStringRef srcText,
int32_t  srcStart,
int32_t  srcLength,
int32_t  start,
int32_t  length 
) const [inline]

Locate in this the first occurrence in the range [start, start + length) of the characters in srcText in the range [srcStart, srcStart + srcLength), using bitwise comparison.

Parameters:
srcText The text to search for.
srcStart the offset into srcText at which to start matching
srcLength the number of characters in srcText to match
start the offset into this at which to start matching
length the number of characters in this to search
Returns:
The offset into this of the start of text, or -1 if not found.

References getBuffer(), and indexOf().

int32_t uima::UnicodeStringRef::indexOf ( UChar const *  srcChars,
int32_t  srcLength,
int32_t  start 
) const [inline]

Locate in this the first occurrence of the characters in srcChars starting at offset start, using bitwise comparison.

Parameters:
srcChars The text to search for.
srcLength the number of characters in srcChars to match
start the offset into this at which to start matching
Returns:
The offset into this of the start of text, or -1 if not found.

References indexOf().

int32_t uima::UnicodeStringRef::indexOf ( UChar const *  srcChars,
int32_t  srcLength,
int32_t  start,
int32_t  length 
) const [inline]

Locate in this the first occurrence in the range [start, start + length) of the characters in srcChars, using bitwise comparison.

Parameters:
srcChars The text to search for.
srcLength the number of characters in srcChars
start The offset at which searching will start.
length The number of characters to search
Returns:
The offset into this of the start of srcChars, or -1 if not found.

References indexOf().

int32_t uima::UnicodeStringRef::indexOf ( UChar const *  srcChars,
int32_t  srcStart,
int32_t  srcLength,
int32_t  start,
int32_t  length 
) const

Locate in this the first occurrence in the range [start, start + length) of the characters in srcChars in the range [srcStart, srcStart + srcLength), using bitwise comparison.

Parameters:
srcChars The text to search for.
srcStart the offset into srcChars at which to start matching
srcLength the number of characters in srcChars to match
start the offset into this at which to start matching
length the number of characters in this to search
Returns:
The offset into this of the start of text, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf ( UChar  c  )  const [inline]

Locate in this the first occurrence of the code unit c, using bitwise comparison.

Parameters:
c The code unit to search for.
Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf ( UChar32  c  )  const [inline]

Locate in this the first occurrence of the code point c, using bitwise comparison.

Parameters:
c The code point to search for.
Returns:
The offset into this of c, or -1 if not found.

References indexOf().

int32_t uima::UnicodeStringRef::indexOf ( UChar  c,
int32_t  start 
) const [inline]

Locate in this the first occurrence of the code unit c starting at offset start, using bitwise comparison.

Parameters:
c The code unit to search for.
start The offset at which searching will start.
Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf ( UChar32  c,
int32_t  start 
) const [inline]

Locate in this the first occurrence of the code point c starting at offset start, using bitwise comparison.

Parameters:
c The code point to search for.
start The offset at which searching will start.
Returns:
The offset into this of c, or -1 if not found.

References indexOf().

int32_t uima::UnicodeStringRef::indexOf ( UChar  c,
int32_t  start,
int32_t  length 
) const [inline]

Locate in this the first occurrence of the code unit c in the range [start, start + length), using bitwise comparison.

Parameters:
c The code unit to search for.
start the offset into this at which to start matching
length the number of characters in this to search
Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf ( UChar32  c,
int32_t  start,
int32_t  length 
) const [inline]

Locate in this the first occurrence of the code point c in the range [start, start + length), using bitwise comparison.

Parameters:
c The code point to search for.
start the offset into this at which to start matching
length the number of characters in this to search
Returns:
The offset into this of c, or -1 if not found.

References indexOf().

int32_t uima::UnicodeStringRef::lastIndexOf ( const UnicodeStringRef text  )  const [inline]

Locate in this the last occurrence of the characters in text, using bitwise comparison.

Parameters:
text The text to search for.
Returns:
The offset into this of the start of text, or -1 if not found.

References iv_uiLength.

Referenced by lastIndexOf().

int32_t uima::UnicodeStringRef::lastIndexOf ( const UnicodeStringRef text,
int32_t  start 
) const [inline]

Locate in this the last occurrence of the characters in text starting at offset start, using bitwise comparison.

Parameters:
text The text to search for.
start The offset at which searching will start.
Returns:
The offset into this of the start of text, or -1 if not found.

References iv_uiLength, and lastIndexOf().

int32_t uima::UnicodeStringRef::lastIndexOf ( const UnicodeStringRef text,
int32_t  start,
int32_t  length 
) const [inline]

Locate in this the last occurrence in the range [start, start + length) of the characters in text, using bitwise comparison.

Parameters:
text The text to search for.
start The offset at which searching will start.
length The number of characters to search
Returns:
The offset into this of the start of text, or -1 if not found.

References iv_uiLength, and lastIndexOf().

int32_t uima::UnicodeStringRef::lastIndexOf ( const UnicodeStringRef srcText,
int32_t  srcStart,
int32_t  srcLength,
int32_t  start,
int32_t  length 
) const [inline]

Locate in this the last occurrence in the range [start, start + length) of the characters in srcText in the range [srcStart, srcStart + srcLength), using bitwise comparison.

Parameters:
srcText The text to search for.
srcStart the offset into srcText at which to start matching
srcLength the number of characters in srcText to match
start the offset into this at which to start matching
length the number of characters in this to search
Returns:
The offset into this of the start of text, or -1 if not found.

References getBuffer(), and lastIndexOf().

int32_t uima::UnicodeStringRef::lastIndexOf ( UChar const *  srcChars,
int32_t  srcLength,
int32_t  start 
) const [inline]

Locate in this the last occurrence of the characters in srcChars starting at offset start, using bitwise comparison.

Parameters:
srcChars The text to search for.
srcLength the number of characters in srcChars to match
start the offset into this at which to start matching
Returns:
The offset into this of the start of text, or -1 if not found.

References lastIndexOf().

int32_t uima::UnicodeStringRef::lastIndexOf ( UChar const *  srcChars,
int32_t  srcLength,
int32_t  start,
int32_t  length 
) const [inline]

Locate in this the last occurrence in the range [start, start + length) of the characters in srcChars, using bitwise comparison.

Parameters:
srcChars The text to search for.
srcLength the number of characters in srcChars
start The offset at which searching will start.
length The number of characters to search
Returns:
The offset into this of the start of srcChars, or -1 if not found.

References lastIndexOf().

int32_t uima::UnicodeStringRef::lastIndexOf ( UChar const *  srcChars,
int32_t  srcStart,
int32_t  srcLength,
int32_t  start,
int32_t  length 
) const

Locate in this the last occurrence in the range [start, start + length) of the characters in srcChars in the range [srcStart, srcStart + srcLength), using bitwise comparison.

Parameters:
srcChars The text to search for.
srcStart the offset into srcChars at which to start matching
srcLength the number of characters in srcChars to match
start the offset into this at which to start matching
length the number of characters in this to search
Returns:
The offset into this of the start of text, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf ( UChar  c  )  const [inline]

Locate in this the last occurrence of the code unit c, using bitwise comparison.

Parameters:
c The code unit to search for.
Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf ( UChar32  c  )  const [inline]

Locate in this the last occurrence of the code point c, using bitwise comparison.

Parameters:
c The code point to search for.
Returns:
The offset into this of c, or -1 if not found.

References lastIndexOf().

int32_t uima::UnicodeStringRef::lastIndexOf ( UChar  c,
int32_t  start 
) const [inline]

Locate in this the last occurrence of the code unit c starting at offset start, using bitwise comparison.

Parameters:
c The code unit to search for.
start The offset at which searching will start.
Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf ( UChar32  c,
int32_t  start 
) const [inline]

Locate in this the last occurrence of the code point c starting at offset start, using bitwise comparison.

Parameters:
c The code point to search for.
start The offset at which searching will start.
Returns:
The offset into this of c, or -1 if not found.

References lastIndexOf().

int32_t uima::UnicodeStringRef::lastIndexOf ( UChar  c,
int32_t  start,
int32_t  length 
) const [inline]

Locate in this the last occurrence of the code unit c in the range [start, start + length), using bitwise comparison.

Parameters:
c The code unit to search for.
start the offset into this at which to start matching
length the number of characters in this to search
Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf ( UChar32  c,
int32_t  start,
int32_t  length 
) const [inline]

Locate in this the last occurrence of the code point c in the range [start, start + length), using bitwise comparison.

Parameters:
c The code point to search for.
start the offset into this at which to start matching
length the number of characters in this to search
Returns:
The offset into this of c, or -1 if not found.

References lastIndexOf().

UChar uima::UnicodeStringRef::charAt ( int32_t  offset  )  const [inline]

Return the code unit at offset offset.

Parameters:
offset a valid offset into the text
Returns:
the code unit at offset offset

References EXISTS.

UChar uima::UnicodeStringRef::operator[] ( int32_t  offset  )  const [inline]

Return the code unit at offset offset.

Parameters:
offset a valid offset into the text
Returns:
the code unit at offset offset

References EXISTS.

UChar32 uima::UnicodeStringRef::char32At ( int32_t  offset  )  const [inline]

Return the code point that contains the code unit at offset offset.

Parameters:
offset a valid offset into the text that indicates the text offset of any of the code units that will be assembled into a code point (21-bit value) and returned
Returns:
the code point of text at offset

int32_t uima::UnicodeStringRef::getChar32Start ( int32_t  offset  )  const [inline]

Adjust a random-access offset so that it points to the beginning of a Unicode character.

The offset that is passed in points to any code unit of a code point, while the returned offset will point to the first code unit of the same code point. In UTF-16, if the input offset points to a iv_uiLength surrogate of a surrogate pair, then the returned offset will point to the first surrogate.

Parameters:
offset a valid offset into one code point of the text
Returns:
offset of the first code unit of the same code point

int32_t uima::UnicodeStringRef::getChar32Limit ( int32_t  offset  )  const [inline]

Adjust a random-access offset so that it points behind a Unicode character.

The offset that is passed in points behind any code unit of a code point, while the returned offset will point behind the last code unit of the same code point. In UTF-16, if the input offset points behind the first surrogate (i.e., to the iv_uiLength surrogate) of a surrogate pair, then the returned offset will point behind the iv_uiLength surrogate (i.e., to the first surrogate).

Parameters:
offset a valid offset after any code unit of a code point of the text
Returns:
offset of the first code unit after the same code point

int32_t uima::UnicodeStringRef::moveIndex32 ( int32_t  index,
int32_t  delta 
) const

Move the code unit index along the string by delta code points.

Interpret the input index as a code unit-based offset into the string, move the index forward or backward by delta code points, and return the resulting index. The input index should point to the first code unit of a code point, if there is more than one.

Both input and output indexes are code unit-based as for all string indexes/offsets in ICU (and other libraries, like MBCS char*). If delta<0 then the index is moved backward (toward the start of the string). If delta>0 then the index is moved forward (toward the end of the string).

This behaves like CharacterIterator::move32(delta, kCurrent).

Examples: // s has code points 'a' U+10000 'b' U+10ffff U+2029 UnicodeStringRef s=UNICODE_STRING("a\\U00010000b\\U0010ffff\\u2029", 31).unescape();

// initial index: position of U+10000 int32_t index=1;

// the following examples will all result in index==4, position of U+10ffff

// skip 2 code points from some position in the string index=s.moveIndex32(index, 2); // skips U+10000 and 'b'

// go to the 3rd code point from the start of s (0-based) index=s.moveIndex32(0, 3); // skips 'a', U+10000, and 'b'

// go to the next-to-last code point of s

index=s.moveIndex32(s.length(), -2); // backward-skips U+2029 and U+10ffff

Parameters:
index input code unit index
delta (signed) code point count to move the index forward or backward in the string
Returns:
the resulting code unit index

void uima::UnicodeStringRef::extract ( int32_t  start,
int32_t  length,
UChar *  dst,
int32_t  dstStart = 0 
) const [inline]

Copy the characters in the range [start, start + length) into the array dst, beginning at dstStart.

If the string aliases to dst itself as an external buffer, then extract() will not copy the contents.

Parameters:
start offset of first character which will be copied into the array
length the number of characters to extract
dst array in which to copy characters. The length of dst must be at least (dstStart + length).
dstStart the offset in dst where the first character will be extracted

References getBuffer().

Referenced by extract(), and extractBetween().

void uima::UnicodeStringRef::extractBetween ( int32_t  start,
int32_t  limit,
UChar *  dst,
int32_t  dstStart = 0 
) const [inline]

Copy the characters in the range [start, limit) into the array dst, beginning at dstStart.

Parameters:
start offset of first character which will be copied into the array
limit offset immediately following the last character to be copied
dst array in which to copy characters. The length of dst must be at least (dstStart + (limit - start)).
dstStart the offset in dst where the first character will be extracted

References extract().

int32_t uima::UnicodeStringRef::extract ( UChar *  dst,
int32_t  dstCapacity,
UErrorCode &  errorCode 
) const

Copy the contents of the string into dst.

This is a convenience function that checks if there is enough space in dst, extracts the entire string if possible, and NUL-terminates dst if possible.

If the string fits into dst but cannot be NUL-terminated (length()==dstCapacity) then the error code is set to U_STRING_NOT_TERMINATED_WARNING. If the string itself does not fit into dst (length()>dstCapacity) then the error code is set to U_BUFFER_OVERFLOW_ERROR.

If the string aliases to dst itself as an external buffer, then extract() will not copy the contents.

Parameters:
dst Destination string buffer.
dstCapacity Number of UChars available at dst.
errorCode ICU error code.
Returns:
length()

void uima::UnicodeStringRef::extract ( int32_t  start,
int32_t  length,
UnicodeString &  dst 
) const [inline]

Copy the characters in the range [start, start + length) into the UnicodeString dst.

Parameters:
start offset of first character which will be copied
length the number of characters to extract
dst UnicodeString into which to copy characters.
Returns:
A reference to dst

References getBuffer().

void uima::UnicodeStringRef::extractBetween ( int32_t  start,
int32_t  limit,
UnicodeString &  dst 
) const [inline]

Copy the characters in the range [start, limit) into the UnicodeString dst.

Parameters:
start offset of first character which will be copied
limit offset immediately following the last character to be copied
dst UnicodeString into which to copy characters.
Returns:
A reference to dst

References extract().

int32_t uima::UnicodeStringRef::extract ( int32_t  start,
int32_t  startLength,
char *  target,
const char *  codepage = 0 
) const [inline]

Copy the characters in the range [start, start + length) into an array of characters in a specified codepage.

The output string is NUL-terminated.

Parameters:
start offset of first character which will be copied
startLength the number of characters to extract
target the target buffer for extraction
codepage the desired codepage for the characters. 0 has the special meaning of the default codepage If codepage is an empty string (""), then a simple conversion is performed on the codepage-invariant subset ("invariant characters") of the platform encoding. See utypes.h. If target is NULL, then the number of bytes required for target is returned. NOTE: It is assumed that the target is big enough to fit all of the characters.
Returns:
the output string length, not including the terminating NUL

References extract().

int32_t uima::UnicodeStringRef::extract ( int32_t  start,
int32_t  startLength,
char *  target,
uint32_t  targetLength,
const char *  codepage = 0 
) const

Copy the characters in the range [start, start + length) into an array of characters in a specified codepage.

This function does not write any more than targetLength characters but returns the length of the entire output string so that one can allocate a larger buffer and call the function again if necessary. The output string is NUL-terminated if possible.

Parameters:
start offset of first character which will be copied
startLength the number of characters to extract
target the target buffer for extraction
targetLength the length of the target buffer
codepage the desired codepage for the characters. 0 has the special meaning of the default codepage If codepage is an empty string (""), then a simple conversion is performed on the codepage-invariant subset ("invariant characters") of the platform encoding. See utypes.h. If target is NULL, then the number of bytes required for target is returned.
Returns:
the output string length, not including the terminating NUL

int32_t uima::UnicodeStringRef::extract ( char *  target,
int32_t  targetCapacity,
UConverter *  cnv,
UErrorCode &  errorCode 
) const

Convert the UnicodeStringRef into a codepage string using an existing UConverter.

The output string is NUL-terminated if possible.

This function avoids the overhead of opening and closing a converter if multiple strings are extracted.

Parameters:
target destination string buffer, can be NULL if targetCapacity==0
targetCapacity the number of chars available at target
cnv the converter object to be used (ucnv_resetFromUnicode() will be called), or NULL for the default converter
errorCode normal ICU error code
Returns:
the length of the output string, not counting the terminating NUL; if the length is greater than targetCapacity, then the string will not fit and a buffer of the indicated length would need to be passed in

int32_t uima::UnicodeStringRef::extract ( int32_t  start,
int32_t  startLength,
std::string &  target,
const char *  codepage = 0 
) const

Copy the characters in the range [start, start + length) into a std::string object in a specified codepage.

The output string is NUL-terminated.

Parameters:
start offset of first character which will be copied
startLength the number of characters to extract
target the target string for extraction
codepage the desired codepage for the characters. 0 has the special meaning of the default codepage. If codepage is an empty string (""), then a simple conversion is performed on the codepage-invariant subset ("invariant characters") of the platform encoding. See utypes.h.
Returns:
the output string length, not including the terminating NUL

int32_t uima::UnicodeStringRef::extract ( std::string &  target,
const char *  codepage = 0 
) const [inline]

Copy all the characters in the string into an std::string object in a specified codepage.

Equivalent to extract(0, length(), target, codepage)

Parameters:
target the target string for extraction
codepage the desired codepage for the characters.
Returns:
the output string length, not including the terminating NUL

References extract().

int32_t uima::UnicodeStringRef::extractUTF8 ( std::string &  target  )  const

Copy all the characters in the string into an std::string object in UTF-8.

Slightly more efficient than asUTF8() as avoids one copy.

Parameters:
target the target string for extraction
Returns:
the output string length, not including the terminating NUL

Referenced by asUTF8().

std::string uima::UnicodeStringRef::asUTF8 ( void   )  const [inline]

Convert to a UTF8 string.

Returns:
a std::string

References extractUTF8().

static void uima::UnicodeStringRef::release ( std::string &  target  )  [static]

Release contents of string container allocated by extract methods Useful when caller and callee use different heaps, e.g.

when debug code uses a release library. Is static so can be called on the UnicodeStringRef class directly.

int32_t uima::UnicodeStringRef::length ( void   )  const [inline]

Return the length of the UnicodeStringRef object.

The length is the number of characters in the text.

Returns:
the length of the UnicodeStringRef object

Referenced by uima::strtrim().

int32_t uima::UnicodeStringRef::countChar32 ( int32_t  start = 0,
int32_t  length = 0x7fffffff 
) const

Count Unicode code points in the length UChar code units of the string.

A code point may occupy either one or two UChar code units. Counting code points involves reading all code units.

This functions is basically the inverse of moveIndex32().

Parameters:
start the index of the first code unit to check
length the number of UChar code units to check
Returns:
the number of code points in the specified code units

bool uima::UnicodeStringRef::isEmpty ( void   )  const [inline]

Determine if this string is empty.

Returns:
TRUE if this string contains 0 characters, FALSE otherwise.

UnicodeStringRef & uima::UnicodeStringRef::setTo ( const UnicodeStringRef srcText  )  [inline]

Set the text in the UnicodeString object to the characters in srcText.

srcText is not modified.

Parameters:
srcText the source for the new characters
Returns:
a reference to this

References iv_pUChars, and iv_uiLength.

UnicodeStringRef & uima::UnicodeStringRef::setTo ( const UnicodeString &  srcText  )  [inline]

Set the text in the UnicodeString object to the characters in srcText.

srcText is not modified.

Parameters:
srcText the source for the new characters
Returns:
a reference to this

UnicodeStringRef & uima::UnicodeStringRef::setTo ( const UChar *  srcChars,
int32_t  srcLength 
) [inline]

Set the characters in the UnicodeString object to the characters in srcChars.

srcChars is not modified.

Parameters:
srcChars the source for the new characters
srcLength the number of Unicode characters in srcChars.
Returns:
a reference to this

void uima::UnicodeStringRef::toSingleByteStream ( std::ostream &  outStream  )  const

Print a single byte version to outStream.

The encoding is UTF-8 if outStream is directed to disk, if outStream is cout our cerr the encoding is a Console-CCSID that will allow most character to be readable in a shell/command window.


The documentation for this class was generated from the following file:

Generated on Mon Oct 1 11:15:09 2012 for UIMACPP API by  doxygen 1.5.6