UnicodeStringRef
provides support for non zero-terminated strings that are presented as pointers to Unicode character arrays with an associated length.
As this type of string is supposed to be used only as string reference into read-only buffers, the string pointer is constant. The member functions are named to implement the icu::UnicodeString interface but only providing const member functions This class is a quick ,light-weight, shallow string (internally it consists only of a pointer and a length) which can be copied by value without performance penalty. It allows references into other string buffers to be treated like real string objects. Since it does not own it's string memory care must be taken to make sure the lifetime of an UnicodeStringRef object does not exceed the lifetime of the Unicode character buffer it references.
Public Member Functions | |
UnicodeStringRef (void) | |
Default Constructor. | |
UnicodeStringRef (const icu::UnicodeString &crUniString) | |
Constructor from icu::UnicodeString. | |
UnicodeStringRef (UChar const *cpacString) | |
Constructor from zero terminated string. | |
UnicodeStringRef (UChar const *cpacString, int32_t uiLength) | |
Constructor from string and length. | |
UnicodeStringRef (UChar const *paucStringBegin, UChar const *paucStringEnd) | |
Constructor from a two pointers (begin/end). | |
int32_t | getSizeInBytes (void) const |
Accessor for the number of bytes occupied by this string. | |
UChar const * | getBuffer (void) const |
CONST Accessor for the string content (NOT ZERO DELIMITED!). | |
UnicodeStringRef & | operator= (UnicodeStringRef const &crclRHS) |
Assignment operator. | |
int | operator== (const UnicodeStringRef &crclRHS) const |
Equality operator. | |
int | operator!= (const UnicodeStringRef &crclRHS) const |
Inequality operator. | |
bool | operator< (UnicodeStringRef const &text) const |
less operator | |
bool | operator<= (UnicodeStringRef const &text) const |
less equal operator | |
bool | operator> (UnicodeStringRef const &text) const |
greater operator | |
bool | operator>= (UnicodeStringRef const &text) const |
greater equal operator | |
int8_t | compare (const UnicodeStringRef &text) const |
Compare the characters bitwise in this UnicodeStringRef to the characters in text . | |
int8_t | compare (const icu::UnicodeString &text) const |
Compare the characters bitwise in this UnicodeStringRef to the characters in text . | |
int8_t | compare (int32_t start, int32_t length, const UnicodeStringRef &srcText) const |
Compare the characters bitwise in the range [start , start + length ) with the characters in srcText . | |
int8_t | compare (int32_t start, int32_t length, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength) const |
Compare the characters bitwise in the range [start , start + length ) with the characters in srcText in the range [srcStart , srcStart + srcLength ). | |
int8_t | compare (UChar const *srcChars, int32_t srcLength) const |
Compare the characters bitwise in this UnicodeStringRef with the first srcLength characters in srcChars . | |
int8_t | compare (int32_t start, int32_t length, UChar const *srcChars) const |
Compare the characters bitwise in the range [start , start + length ) with the first length characters in srcChars . | |
int8_t | compare (int32_t start, int32_t length, UChar const *srcChars, int32_t srcStart, int32_t srcLength) const |
Compare the characters bitwise in the range [start , start + length ) with the characters in srcChars in the range [srcStart , srcStart + srcLength ). | |
int8_t | compareBetween (int32_t start, int32_t limit, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLimit) const |
Compare the characters bitwise in the range [start , limit ) with the characters in srcText in the range [srcStart , srcLimit ). | |
int8_t | compareCodePointOrder (const UnicodeStringRef &text) const |
Compare two Unicode strings in code point order. | |
int8_t | compareCodePointOrder (int32_t start, int32_t length, const UnicodeStringRef &srcText) const |
Compare two Unicode strings in code point order. | |
int8_t | compareCodePointOrder (int32_t start, int32_t length, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength) const |
Compare two Unicode strings in code point order. | |
int8_t | compareCodePointOrder (UChar const *srcChars, int32_t srcLength) const |
Compare two Unicode strings in code point order. | |
int8_t | compareCodePointOrder (int32_t start, int32_t length, UChar const *srcChars) const |
Compare two Unicode strings in code point order. | |
int8_t | compareCodePointOrder (int32_t start, int32_t length, UChar const *srcChars, int32_t srcStart, int32_t srcLength) const |
Compare two Unicode strings in code point order. | |
int8_t | compareCodePointOrderBetween (int32_t start, int32_t limit, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLimit) const |
Compare two Unicode strings in code point order. | |
int8_t | caseCompare (const UnicodeStringRef &text, uint32_t options) const |
Compare two strings case-insensitively using full case folding. | |
int8_t | caseCompare (int32_t start, int32_t length, const UnicodeStringRef &srcText, uint32_t options) const |
Compare two strings case-insensitively using full case folding. | |
int8_t | caseCompare (int32_t start, int32_t length, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength, uint32_t options) const |
Compare two strings case-insensitively using full case folding. | |
int8_t | caseCompare (UChar const *srcChars, int32_t srcLength, uint32_t options) const |
Compare two strings case-insensitively using full case folding. | |
int8_t | caseCompare (int32_t start, int32_t length, UChar const *srcChars, uint32_t options) const |
Compare two strings case-insensitively using full case folding. | |
int8_t | caseCompare (int32_t start, int32_t length, UChar const *srcChars, int32_t srcStart, int32_t srcLength, uint32_t options) const |
Compare two strings case-insensitively using full case folding. | |
int8_t | caseCompareBetween (int32_t start, int32_t limit, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLimit, uint32_t options) const |
Compare two strings case-insensitively using full case folding. | |
bool | startsWith (const UnicodeStringRef &text) const |
Determine if this starts with the characters in text . | |
bool | startsWith (const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength) const |
Determine if this starts with the characters in srcText in the range [srcStart , srcStart + srcLength ). | |
bool | startsWith (UChar const *srcChars, int32_t srcLength) const |
Determine if this starts with the characters in srcChars . | |
bool | startsWith (UChar const *srcChars, int32_t srcStart, int32_t srcLength) const |
Determine if this starts with the characters in srcChars in the range [srcStart , srcStart + srcLength ). | |
bool | endsWith (const UnicodeStringRef &text) const |
Determine if this ends with the characters in text . | |
bool | endsWith (const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength) const |
Determine if this ends with the characters in srcText in the range [srcStart , srcStart + srcLength ). | |
bool | endsWith (UChar const *srcChars, int32_t srcLength) const |
Determine if this ends with the characters in srcChars . | |
bool | endsWith (UChar const *srcChars, int32_t srcStart, int32_t srcLength) const |
Determine if this ends with the characters in srcChars in the range [srcStart , srcStart + srcLength ). | |
int32_t | indexOf (const UnicodeStringRef &text) const |
Locate in this the first occurrence of the characters in text , using bitwise comparison. | |
int32_t | indexOf (const UnicodeStringRef &text, int32_t start) const |
Locate in this the first occurrence of the characters in text starting at offset start , using bitwise comparison. | |
int32_t | indexOf (const UnicodeStringRef &text, int32_t start, int32_t length) const |
Locate in this the first occurrence in the range [start , start + length ) of the characters in text , using bitwise comparison. | |
int32_t | indexOf (const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength, int32_t start, int32_t length) const |
Locate in this the first occurrence in the range [start , start + length ) of the characters in srcText in the range [srcStart , srcStart + srcLength ), using bitwise comparison. | |
int32_t | indexOf (UChar const *srcChars, int32_t srcLength, int32_t start) const |
Locate in this the first occurrence of the characters in srcChars starting at offset start , using bitwise comparison. | |
int32_t | indexOf (UChar const *srcChars, int32_t srcLength, int32_t start, int32_t length) const |
Locate in this the first occurrence in the range [start , start + length ) of the characters in srcChars , using bitwise comparison. | |
int32_t | indexOf (UChar const *srcChars, int32_t srcStart, int32_t srcLength, int32_t start, int32_t length) const |
Locate in this the first occurrence in the range [start , start + length ) of the characters in srcChars in the range [srcStart , srcStart + srcLength ), using bitwise comparison. | |
int32_t | indexOf (UChar c) const |
Locate in this the first occurrence of the code unit c , using bitwise comparison. | |
int32_t | indexOf (UChar32 c) const |
Locate in this the first occurrence of the code point c , using bitwise comparison. | |
int32_t | indexOf (UChar c, int32_t start) const |
Locate in this the first occurrence of the code unit c starting at offset start , using bitwise comparison. | |
int32_t | indexOf (UChar32 c, int32_t start) const |
Locate in this the first occurrence of the code point c starting at offset start , using bitwise comparison. | |
int32_t | indexOf (UChar c, int32_t start, int32_t length) const |
Locate in this the first occurrence of the code unit c in the range [start , start + length ), using bitwise comparison. | |
int32_t | indexOf (UChar32 c, int32_t start, int32_t length) const |
Locate in this the first occurrence of the code point c in the range [start , start + length ), using bitwise comparison. | |
int32_t | lastIndexOf (const UnicodeStringRef &text) const |
Locate in this the last occurrence of the characters in text , using bitwise comparison. | |
int32_t | lastIndexOf (const UnicodeStringRef &text, int32_t start) const |
Locate in this the last occurrence of the characters in text starting at offset start , using bitwise comparison. | |
int32_t | lastIndexOf (const UnicodeStringRef &text, int32_t start, int32_t length) const |
Locate in this the last occurrence in the range [start , start + length ) of the characters in text , using bitwise comparison. | |
int32_t | lastIndexOf (const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength, int32_t start, int32_t length) const |
Locate in this the last occurrence in the range [start , start + length ) of the characters in srcText in the range [srcStart , srcStart + srcLength ), using bitwise comparison. | |
int32_t | lastIndexOf (UChar const *srcChars, int32_t srcLength, int32_t start) const |
Locate in this the last occurrence of the characters in srcChars starting at offset start , using bitwise comparison. | |
int32_t | lastIndexOf (UChar const *srcChars, int32_t srcLength, int32_t start, int32_t length) const |
Locate in this the last occurrence in the range [start , start + length ) of the characters in srcChars , using bitwise comparison. | |
int32_t | lastIndexOf (UChar const *srcChars, int32_t srcStart, int32_t srcLength, int32_t start, int32_t length) const |
Locate in this the last occurrence in the range [start , start + length ) of the characters in srcChars in the range [srcStart , srcStart + srcLength ), using bitwise comparison. | |
int32_t | lastIndexOf (UChar c) const |
Locate in this the last occurrence of the code unit c , using bitwise comparison. | |
int32_t | lastIndexOf (UChar32 c) const |
Locate in this the last occurrence of the code point c , using bitwise comparison. | |
int32_t | lastIndexOf (UChar c, int32_t start) const |
Locate in this the last occurrence of the code unit c starting at offset start , using bitwise comparison. | |
int32_t | lastIndexOf (UChar32 c, int32_t start) const |
Locate in this the last occurrence of the code point c starting at offset start , using bitwise comparison. | |
int32_t | lastIndexOf (UChar c, int32_t start, int32_t length) const |
Locate in this the last occurrence of the code unit c in the range [start , start + length ), using bitwise comparison. | |
int32_t | lastIndexOf (UChar32 c, int32_t start, int32_t length) const |
Locate in this the last occurrence of the code point c in the range [start , start + length ), using bitwise comparison. | |
UChar | charAt (int32_t offset) const |
Return the code unit at offset offset . | |
UChar | operator[] (int32_t offset) const |
Return the code unit at offset offset . | |
UChar32 | char32At (int32_t offset) const |
Return the code point that contains the code unit at offset offset . | |
int32_t | getChar32Start (int32_t offset) const |
Adjust a random-access offset so that it points to the beginning of a Unicode character. | |
int32_t | getChar32Limit (int32_t offset) const |
Adjust a random-access offset so that it points behind a Unicode character. | |
int32_t | moveIndex32 (int32_t index, int32_t delta) const |
Move the code unit index along the string by delta code points. | |
void | extract (int32_t start, int32_t length, UChar *dst, int32_t dstStart=0) const |
Copy the characters in the range [start , start + length ) into the array dst , beginning at dstStart . | |
void | extractBetween (int32_t start, int32_t limit, UChar *dst, int32_t dstStart=0) const |
Copy the characters in the range [start , limit ) into the array dst , beginning at dstStart . | |
int32_t | extract (UChar *dst, int32_t dstCapacity, UErrorCode &errorCode) const |
Copy the contents of the string into dst. | |
void | extract (int32_t start, int32_t length, UnicodeString &dst) const |
Copy the characters in the range [start , start + length ) into the UnicodeString dst . | |
void | extractBetween (int32_t start, int32_t limit, UnicodeString &dst) const |
Copy the characters in the range [start , limit ) into the UnicodeString dst . | |
int32_t | extract (int32_t start, int32_t startLength, char *target, const char *codepage=0) const |
Copy the characters in the range [start , start + length ) into an array of characters in a specified codepage. | |
int32_t | extract (int32_t start, int32_t startLength, char *target, uint32_t targetLength, const char *codepage=0) const |
Copy the characters in the range [start , start + length ) into an array of characters in a specified codepage. | |
int32_t | extract (char *target, int32_t targetCapacity, UConverter *cnv, UErrorCode &errorCode) const |
Convert the UnicodeStringRef into a codepage string using an existing UConverter. | |
int32_t | extract (int32_t start, int32_t startLength, std::string &target, const char *codepage=0) const |
Copy the characters in the range [start , start + length ) into a std::string object in a specified codepage. | |
int32_t | extract (std::string &target, const char *codepage=0) const |
Copy all the characters in the string into an std::string object in a specified codepage. | |
int32_t | extractUTF8 (std::string &target) const |
Copy all the characters in the string into an std::string object in UTF-8. | |
std::string | asUTF8 (void) const |
Convert to a UTF8 string. | |
int32_t | length (void) const |
Return the length of the UnicodeStringRef object. | |
int32_t | countChar32 (int32_t start=0, int32_t length=0x7fffffff) const |
Count Unicode code points in the length UChar code units of the string. | |
bool | isEmpty (void) const |
Determine if this string is empty. | |
UnicodeStringRef & | setTo (const UnicodeStringRef &srcText) |
Set the text in the UnicodeString object to the characters in srcText . | |
UnicodeStringRef & | setTo (const UnicodeString &srcText) |
Set the text in the UnicodeString object to the characters in srcText . | |
UnicodeStringRef & | setTo (const UChar *srcChars, int32_t srcLength) |
Set the characters in the UnicodeString object to the characters in srcChars . | |
void | toSingleByteStream (std::ostream &outStream) const |
Print a single byte version to outStream. | |
Static Public Member Functions | |
static void | release (std::string &target) |
Release contents of string container allocated by extract methods Useful when caller and callee use different heaps, e.g. |
uima::UnicodeStringRef::UnicodeStringRef | ( | void | ) | [inline] |
uima::UnicodeStringRef::UnicodeStringRef | ( | const icu::UnicodeString & | crUniString | ) | [inline] |
Constructor from icu::UnicodeString.
uima::UnicodeStringRef::UnicodeStringRef | ( | UChar const * | cpacString | ) | [inline, explicit] |
uima::UnicodeStringRef::UnicodeStringRef | ( | UChar const * | cpacString, | |
int32_t | uiLength | |||
) | [inline] |
uima::UnicodeStringRef::UnicodeStringRef | ( | UChar const * | paucStringBegin, | |
UChar const * | paucStringEnd | |||
) | [inline] |
Constructor from a two pointers (begin/end).
Note: end points to the first char behind the string.
References EXISTS.
int32_t uima::UnicodeStringRef::getSizeInBytes | ( | void | ) | const [inline] |
Accessor for the number of bytes occupied by this string.
UChar const * uima::UnicodeStringRef::getBuffer | ( | void | ) | const [inline] |
CONST Accessor for the string content (NOT ZERO DELIMITED!).
Referenced by extract(), indexOf(), lastIndexOf(), and uima::strtrim().
UnicodeStringRef & uima::UnicodeStringRef::operator= | ( | UnicodeStringRef const & | crclRHS | ) | [inline] |
int uima::UnicodeStringRef::operator== | ( | const UnicodeStringRef & | crclRHS | ) | const [inline] |
int uima::UnicodeStringRef::operator!= | ( | const UnicodeStringRef & | crclRHS | ) | const [inline] |
Inequality operator.
bool uima::UnicodeStringRef::operator< | ( | UnicodeStringRef const & | text | ) | const [inline] |
bool uima::UnicodeStringRef::operator<= | ( | UnicodeStringRef const & | text | ) | const [inline] |
bool uima::UnicodeStringRef::operator> | ( | UnicodeStringRef const & | text | ) | const [inline] |
bool uima::UnicodeStringRef::operator>= | ( | UnicodeStringRef const & | text | ) | const [inline] |
int8_t uima::UnicodeStringRef::compare | ( | const UnicodeStringRef & | text | ) | const [inline] |
Compare the characters bitwise in this UnicodeStringRef to the characters in text
.
text | The UnicodeStringRef to compare to this one. |
text
contains the same characters as this, -1 if the characters in text
are bitwise less than the characters in this, +1 if the characters in text
are bitwise greater than the characters in this. References iv_uiLength.
Referenced by startsWith().
int8_t uima::UnicodeStringRef::compare | ( | const icu::UnicodeString & | text | ) | const [inline] |
Compare the characters bitwise in this UnicodeStringRef to the characters in text
.
text | The UnicodeString to compare to this one. |
text
contains the same characters as this, -1 if the characters in text
are bitwise less than the characters in this, +1 if the characters in text
are bitwise greater than the characters in this. int8_t uima::UnicodeStringRef::compare | ( | int32_t | start, | |
int32_t | length, | |||
const UnicodeStringRef & | srcText | |||
) | const [inline] |
Compare the characters bitwise in the range [start
, start + length
) with the characters in srcText
.
start | the offset at which the compare operation begins | |
length | the number of characters of text to compare. | |
srcText | the text to be compared |
text
contains the same characters as this, -1 if the characters in text
are bitwise less than the characters in this, +1 if the characters in text
are bitwise greater than the characters in this. References iv_uiLength.
int8_t uima::UnicodeStringRef::compare | ( | int32_t | start, | |
int32_t | length, | |||
const UnicodeStringRef & | srcText, | |||
int32_t | srcStart, | |||
int32_t | srcLength | |||
) | const [inline] |
Compare the characters bitwise in the range [start
, start + length
) with the characters in srcText
in the range [srcStart
, srcStart + srcLength
).
start | the offset at which the compare operation begins | |
length | the number of characters in this to compare. | |
srcText | the text to be compared | |
srcStart | the offset into srcText to start comparison | |
srcLength | the number of characters in src to compare |
text
contains the same characters as this, -1 if the characters in text
are bitwise less than the characters in this, +1 if the characters in text
are bitwise greater than the characters in this. int8_t uima::UnicodeStringRef::compare | ( | UChar const * | srcChars, | |
int32_t | srcLength | |||
) | const [inline] |
Compare the characters bitwise in this UnicodeStringRef with the first srcLength
characters in srcChars
.
srcChars | The characters to compare to this UnicodeStringRef. | |
srcLength | the number of characters in srcChars to compare |
text
contains the same characters as this, -1 if the characters in text
are bitwise less than the characters in this, +1 if the characters in text
are bitwise greater than the characters in this. int8_t uima::UnicodeStringRef::compare | ( | int32_t | start, | |
int32_t | length, | |||
UChar const * | srcChars | |||
) | const [inline] |
Compare the characters bitwise in the range [start
, start + length
) with the first length
characters in srcChars
.
start | the offset at which the compare operation begins | |
length | the number of characters to compare. | |
srcChars | the characters to be compared |
text
contains the same characters as this, -1 if the characters in text
are bitwise less than the characters in this, +1 if the characters in text
are bitwise greater than the characters in this. int8_t uima::UnicodeStringRef::compare | ( | int32_t | start, | |
int32_t | length, | |||
UChar const * | srcChars, | |||
int32_t | srcStart, | |||
int32_t | srcLength | |||
) | const [inline] |
Compare the characters bitwise in the range [start
, start + length
) with the characters in srcChars
in the range [srcStart
, srcStart + srcLength
).
start | the offset at which the compare operation begins | |
length | the number of characters in this to compare | |
srcChars | the characters to be compared | |
srcStart | the offset into srcChars to start comparison | |
srcLength | the number of characters in srcChars to compare |
text
contains the same characters as this, -1 if the characters in text
are bitwise less than the characters in this, +1 if the characters in text
are bitwise greater than the characters in this. int8_t uima::UnicodeStringRef::compareBetween | ( | int32_t | start, | |
int32_t | limit, | |||
const UnicodeStringRef & | srcText, | |||
int32_t | srcStart, | |||
int32_t | srcLimit | |||
) | const [inline] |
Compare the characters bitwise in the range [start
, limit
) with the characters in srcText
in the range [srcStart
, srcLimit
).
start | the offset at which the compare operation begins | |
limit | the offset immediately following the compare operation | |
srcText | the text to be compared | |
srcStart | the offset into srcText to start comparison | |
srcLimit | the offset into srcText to limit comparison |
text
contains the same characters as this, -1 if the characters in text
are bitwise less than the characters in this, +1 if the characters in text
are bitwise greater than the characters in this. int8_t uima::UnicodeStringRef::compareCodePointOrder | ( | const UnicodeStringRef & | text | ) | const [inline] |
Compare two Unicode strings in code point order.
This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:
In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.
text | Another string to compare this one to. |
References iv_uiLength.
int8_t uima::UnicodeStringRef::compareCodePointOrder | ( | int32_t | start, | |
int32_t | length, | |||
const UnicodeStringRef & | srcText | |||
) | const [inline] |
Compare two Unicode strings in code point order.
This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:
In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.
start | The start offset in this string at which the compare operation begins. | |
length | The number of code units from this string to compare. | |
srcText | Another string to compare this one to. |
References iv_uiLength.
int8_t uima::UnicodeStringRef::compareCodePointOrder | ( | int32_t | start, | |
int32_t | length, | |||
const UnicodeStringRef & | srcText, | |||
int32_t | srcStart, | |||
int32_t | srcLength | |||
) | const [inline] |
Compare two Unicode strings in code point order.
This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:
In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.
start | The start offset in this string at which the compare operation begins. | |
length | The number of code units from this string to compare. | |
srcText | Another string to compare this one to. | |
srcStart | The start offset in that string at which the compare operation begins. | |
srcLength | The number of code units from that string to compare. |
int8_t uima::UnicodeStringRef::compareCodePointOrder | ( | UChar const * | srcChars, | |
int32_t | srcLength | |||
) | const [inline] |
Compare two Unicode strings in code point order.
This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:
In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.
srcChars | A pointer to another string to compare this one to. | |
srcLength | The number of code units from that string to compare. |
int8_t uima::UnicodeStringRef::compareCodePointOrder | ( | int32_t | start, | |
int32_t | length, | |||
UChar const * | srcChars | |||
) | const [inline] |
Compare two Unicode strings in code point order.
This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:
In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.
start | The start offset in this string at which the compare operation begins. | |
length | The number of code units from this string to compare. | |
srcChars | A pointer to another string to compare this one to. |
int8_t uima::UnicodeStringRef::compareCodePointOrder | ( | int32_t | start, | |
int32_t | length, | |||
UChar const * | srcChars, | |||
int32_t | srcStart, | |||
int32_t | srcLength | |||
) | const [inline] |
Compare two Unicode strings in code point order.
This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:
In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.
start | The start offset in this string at which the compare operation begins. | |
length | The number of code units from this string to compare. | |
srcChars | A pointer to another string to compare this one to. | |
srcStart | The start offset in that string at which the compare operation begins. | |
srcLength | The number of code units from that string to compare. |
int8_t uima::UnicodeStringRef::compareCodePointOrderBetween | ( | int32_t | start, | |
int32_t | limit, | |||
const UnicodeStringRef & | srcText, | |||
int32_t | srcStart, | |||
int32_t | srcLimit | |||
) | const [inline] |
Compare two Unicode strings in code point order.
This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:
In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.
start | The start offset in this string at which the compare operation begins. | |
limit | The offset after the last code unit from this string to compare. | |
srcText | Another string to compare this one to. | |
srcStart | The start offset in that string at which the compare operation begins. | |
srcLimit | The offset after the last code unit from that string to compare. |
int8_t uima::UnicodeStringRef::caseCompare | ( | const UnicodeStringRef & | text, | |
uint32_t | options | |||
) | const [inline] |
Compare two strings case-insensitively using full case folding.
This is equivalent to this->foldCase(options).compare(text.foldCase(options)).
text | Another string to compare this one to. | |
options | Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I |
References iv_uiLength.
int8_t uima::UnicodeStringRef::caseCompare | ( | int32_t | start, | |
int32_t | length, | |||
const UnicodeStringRef & | srcText, | |||
uint32_t | options | |||
) | const [inline] |
Compare two strings case-insensitively using full case folding.
This is equivalent to this->foldCase(options).compare(srcText.foldCase(options)).
start | The start offset in this string at which the compare operation begins. | |
length | The number of code units from this string to compare. | |
srcText | Another string to compare this one to. | |
options | Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I |
References iv_uiLength.
int8_t uima::UnicodeStringRef::caseCompare | ( | int32_t | start, | |
int32_t | length, | |||
const UnicodeStringRef & | srcText, | |||
int32_t | srcStart, | |||
int32_t | srcLength, | |||
uint32_t | options | |||
) | const [inline] |
Compare two strings case-insensitively using full case folding.
This is equivalent to this->foldCase(options).compare(srcText.foldCase(options)).
start | The start offset in this string at which the compare operation begins. | |
length | The number of code units from this string to compare. | |
srcText | Another string to compare this one to. | |
srcStart | The start offset in that string at which the compare operation begins. | |
srcLength | The number of code units from that string to compare. | |
options | Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I |
int8_t uima::UnicodeStringRef::caseCompare | ( | UChar const * | srcChars, | |
int32_t | srcLength, | |||
uint32_t | options | |||
) | const [inline] |
Compare two strings case-insensitively using full case folding.
This is equivalent to this->foldCase(options).compare(srcChars.foldCase(options)).
srcChars | A pointer to another string to compare this one to. | |
srcLength | The number of code units from that string to compare. | |
options | Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I |
int8_t uima::UnicodeStringRef::caseCompare | ( | int32_t | start, | |
int32_t | length, | |||
UChar const * | srcChars, | |||
uint32_t | options | |||
) | const [inline] |
Compare two strings case-insensitively using full case folding.
This is equivalent to this->foldCase(options).compare(srcChars.foldCase(options)).
start | The start offset in this string at which the compare operation begins. | |
length | The number of code units from this string to compare. | |
srcChars | A pointer to another string to compare this one to. | |
options | Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I |
int8_t uima::UnicodeStringRef::caseCompare | ( | int32_t | start, | |
int32_t | length, | |||
UChar const * | srcChars, | |||
int32_t | srcStart, | |||
int32_t | srcLength, | |||
uint32_t | options | |||
) | const [inline] |
Compare two strings case-insensitively using full case folding.
This is equivalent to this->foldCase(options).compare(srcChars.foldCase(options)).
start | The start offset in this string at which the compare operation begins. | |
length | The number of code units from this string to compare. | |
srcChars | A pointer to another string to compare this one to. | |
srcStart | The start offset in that string at which the compare operation begins. | |
srcLength | The number of code units from that string to compare. | |
options | Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I |
int8_t uima::UnicodeStringRef::caseCompareBetween | ( | int32_t | start, | |
int32_t | limit, | |||
const UnicodeStringRef & | srcText, | |||
int32_t | srcStart, | |||
int32_t | srcLimit, | |||
uint32_t | options | |||
) | const [inline] |
Compare two strings case-insensitively using full case folding.
This is equivalent to this->foldCase(options).compareBetween(text.foldCase(options)).
start | The start offset in this string at which the compare operation begins. | |
limit | The offset after the last code unit from this string to compare. | |
srcText | Another string to compare this one to. | |
srcStart | The start offset in that string at which the compare operation begins. | |
srcLimit | The offset after the last code unit from that string to compare. | |
options | Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I |
bool uima::UnicodeStringRef::startsWith | ( | const UnicodeStringRef & | text | ) | const [inline] |
Determine if this starts with the characters in text
.
text | The text to match. |
text
, FALSE otherwise References compare(), and iv_uiLength.
bool uima::UnicodeStringRef::startsWith | ( | const UnicodeStringRef & | srcText, | |
int32_t | srcStart, | |||
int32_t | srcLength | |||
) | const [inline] |
Determine if this starts with the characters in srcText
in the range [srcStart
, srcStart + srcLength
).
srcText | The text to match. | |
srcStart | the offset into srcText to start matching | |
srcLength | the number of characters in srcText to match |
text
, FALSE otherwise bool uima::UnicodeStringRef::startsWith | ( | UChar const * | srcChars, | |
int32_t | srcLength | |||
) | const [inline] |
Determine if this starts with the characters in srcChars
.
srcChars | The characters to match. | |
srcLength | the number of characters in srcChars |
srcChars
, FALSE otherwise bool uima::UnicodeStringRef::startsWith | ( | UChar const * | srcChars, | |
int32_t | srcStart, | |||
int32_t | srcLength | |||
) | const [inline] |
Determine if this starts with the characters in srcChars
in the range [srcStart
, srcStart + srcLength
).
srcChars | The characters to match. | |
srcStart | the offset into srcText to start matching | |
srcLength | the number of characters in srcChars to match |
srcChars
, FALSE otherwise bool uima::UnicodeStringRef::endsWith | ( | const UnicodeStringRef & | text | ) | const [inline] |
Determine if this ends with the characters in text
.
text | The text to match. |
text
, FALSE otherwise References iv_uiLength.
bool uima::UnicodeStringRef::endsWith | ( | const UnicodeStringRef & | srcText, | |
int32_t | srcStart, | |||
int32_t | srcLength | |||
) | const [inline] |
Determine if this ends with the characters in srcText
in the range [srcStart
, srcStart + srcLength
).
srcText | The text to match. | |
srcStart | the offset into srcText to start matching | |
srcLength | the number of characters in srcText to match |
text
, FALSE otherwise bool uima::UnicodeStringRef::endsWith | ( | UChar const * | srcChars, | |
int32_t | srcLength | |||
) | const [inline] |
Determine if this ends with the characters in srcChars
.
srcChars | The characters to match. | |
srcLength | the number of characters in srcChars |
srcChars
, FALSE otherwise bool uima::UnicodeStringRef::endsWith | ( | UChar const * | srcChars, | |
int32_t | srcStart, | |||
int32_t | srcLength | |||
) | const [inline] |
Determine if this ends with the characters in srcChars
in the range [srcStart
, srcStart + srcLength
).
srcChars | The characters to match. | |
srcStart | the offset into srcText to start matching | |
srcLength | the number of characters in srcChars to match |
srcChars
, FALSE otherwise int32_t uima::UnicodeStringRef::indexOf | ( | const UnicodeStringRef & | text | ) | const [inline] |
Locate in this the first occurrence of the characters in text
, using bitwise comparison.
text | The text to search for. |
text
, or -1 if not found. References iv_uiLength.
Referenced by indexOf().
int32_t uima::UnicodeStringRef::indexOf | ( | const UnicodeStringRef & | text, | |
int32_t | start | |||
) | const [inline] |
Locate in this the first occurrence of the characters in text
starting at offset start
, using bitwise comparison.
text | The text to search for. | |
start | The offset at which searching will start. |
text
, or -1 if not found. References indexOf(), and iv_uiLength.
int32_t uima::UnicodeStringRef::indexOf | ( | const UnicodeStringRef & | text, | |
int32_t | start, | |||
int32_t | length | |||
) | const [inline] |
Locate in this the first occurrence in the range [start
, start + length
) of the characters in text
, using bitwise comparison.
text | The text to search for. | |
start | The offset at which searching will start. | |
length | The number of characters to search |
text
, or -1 if not found. References indexOf(), and iv_uiLength.
int32_t uima::UnicodeStringRef::indexOf | ( | const UnicodeStringRef & | srcText, | |
int32_t | srcStart, | |||
int32_t | srcLength, | |||
int32_t | start, | |||
int32_t | length | |||
) | const [inline] |
Locate in this the first occurrence in the range [start
, start + length
) of the characters in srcText
in the range [srcStart
, srcStart + srcLength
), using bitwise comparison.
srcText | The text to search for. | |
srcStart | the offset into srcText at which to start matching | |
srcLength | the number of characters in srcText to match | |
start | the offset into this at which to start matching | |
length | the number of characters in this to search |
text
, or -1 if not found. References getBuffer(), and indexOf().
int32_t uima::UnicodeStringRef::indexOf | ( | UChar const * | srcChars, | |
int32_t | srcLength, | |||
int32_t | start | |||
) | const [inline] |
Locate in this the first occurrence of the characters in srcChars
starting at offset start
, using bitwise comparison.
srcChars | The text to search for. | |
srcLength | the number of characters in srcChars to match | |
start | the offset into this at which to start matching |
text
, or -1 if not found. References indexOf().
int32_t uima::UnicodeStringRef::indexOf | ( | UChar const * | srcChars, | |
int32_t | srcLength, | |||
int32_t | start, | |||
int32_t | length | |||
) | const [inline] |
Locate in this the first occurrence in the range [start
, start + length
) of the characters in srcChars
, using bitwise comparison.
srcChars | The text to search for. | |
srcLength | the number of characters in srcChars | |
start | The offset at which searching will start. | |
length | The number of characters to search |
srcChars
, or -1 if not found. References indexOf().
int32_t uima::UnicodeStringRef::indexOf | ( | UChar const * | srcChars, | |
int32_t | srcStart, | |||
int32_t | srcLength, | |||
int32_t | start, | |||
int32_t | length | |||
) | const |
Locate in this the first occurrence in the range [start
, start + length
) of the characters in srcChars
in the range [srcStart
, srcStart + srcLength
), using bitwise comparison.
srcChars | The text to search for. | |
srcStart | the offset into srcChars at which to start matching | |
srcLength | the number of characters in srcChars to match | |
start | the offset into this at which to start matching | |
length | the number of characters in this to search |
text
, or -1 if not found. int32_t uima::UnicodeStringRef::indexOf | ( | UChar | c | ) | const [inline] |
Locate in this the first occurrence of the code unit c
, using bitwise comparison.
c | The code unit to search for. |
c
, or -1 if not found. int32_t uima::UnicodeStringRef::indexOf | ( | UChar32 | c | ) | const [inline] |
Locate in this the first occurrence of the code point c
, using bitwise comparison.
c | The code point to search for. |
c
, or -1 if not found. References indexOf().
int32_t uima::UnicodeStringRef::indexOf | ( | UChar | c, | |
int32_t | start | |||
) | const [inline] |
Locate in this the first occurrence of the code unit c
starting at offset start
, using bitwise comparison.
c | The code unit to search for. | |
start | The offset at which searching will start. |
c
, or -1 if not found. int32_t uima::UnicodeStringRef::indexOf | ( | UChar32 | c, | |
int32_t | start | |||
) | const [inline] |
Locate in this the first occurrence of the code point c
starting at offset start
, using bitwise comparison.
c | The code point to search for. | |
start | The offset at which searching will start. |
c
, or -1 if not found. References indexOf().
int32_t uima::UnicodeStringRef::indexOf | ( | UChar | c, | |
int32_t | start, | |||
int32_t | length | |||
) | const [inline] |
Locate in this the first occurrence of the code unit c
in the range [start
, start + length
), using bitwise comparison.
c | The code unit to search for. | |
start | the offset into this at which to start matching | |
length | the number of characters in this to search |
c
, or -1 if not found. int32_t uima::UnicodeStringRef::indexOf | ( | UChar32 | c, | |
int32_t | start, | |||
int32_t | length | |||
) | const [inline] |
Locate in this the first occurrence of the code point c
in the range [start
, start + length
), using bitwise comparison.
c | The code point to search for. | |
start | the offset into this at which to start matching | |
length | the number of characters in this to search |
c
, or -1 if not found. References indexOf().
int32_t uima::UnicodeStringRef::lastIndexOf | ( | const UnicodeStringRef & | text | ) | const [inline] |
Locate in this the last occurrence of the characters in text
, using bitwise comparison.
text | The text to search for. |
text
, or -1 if not found. References iv_uiLength.
Referenced by lastIndexOf().
int32_t uima::UnicodeStringRef::lastIndexOf | ( | const UnicodeStringRef & | text, | |
int32_t | start | |||
) | const [inline] |
Locate in this the last occurrence of the characters in text
starting at offset start
, using bitwise comparison.
text | The text to search for. | |
start | The offset at which searching will start. |
text
, or -1 if not found. References iv_uiLength, and lastIndexOf().
int32_t uima::UnicodeStringRef::lastIndexOf | ( | const UnicodeStringRef & | text, | |
int32_t | start, | |||
int32_t | length | |||
) | const [inline] |
Locate in this the last occurrence in the range [start
, start + length
) of the characters in text
, using bitwise comparison.
text | The text to search for. | |
start | The offset at which searching will start. | |
length | The number of characters to search |
text
, or -1 if not found. References iv_uiLength, and lastIndexOf().
int32_t uima::UnicodeStringRef::lastIndexOf | ( | const UnicodeStringRef & | srcText, | |
int32_t | srcStart, | |||
int32_t | srcLength, | |||
int32_t | start, | |||
int32_t | length | |||
) | const [inline] |
Locate in this the last occurrence in the range [start
, start + length
) of the characters in srcText
in the range [srcStart
, srcStart + srcLength
), using bitwise comparison.
srcText | The text to search for. | |
srcStart | the offset into srcText at which to start matching | |
srcLength | the number of characters in srcText to match | |
start | the offset into this at which to start matching | |
length | the number of characters in this to search |
text
, or -1 if not found. References getBuffer(), and lastIndexOf().
int32_t uima::UnicodeStringRef::lastIndexOf | ( | UChar const * | srcChars, | |
int32_t | srcLength, | |||
int32_t | start | |||
) | const [inline] |
Locate in this the last occurrence of the characters in srcChars
starting at offset start
, using bitwise comparison.
srcChars | The text to search for. | |
srcLength | the number of characters in srcChars to match | |
start | the offset into this at which to start matching |
text
, or -1 if not found. References lastIndexOf().
int32_t uima::UnicodeStringRef::lastIndexOf | ( | UChar const * | srcChars, | |
int32_t | srcLength, | |||
int32_t | start, | |||
int32_t | length | |||
) | const [inline] |
Locate in this the last occurrence in the range [start
, start + length
) of the characters in srcChars
, using bitwise comparison.
srcChars | The text to search for. | |
srcLength | the number of characters in srcChars | |
start | The offset at which searching will start. | |
length | The number of characters to search |
srcChars
, or -1 if not found. References lastIndexOf().
int32_t uima::UnicodeStringRef::lastIndexOf | ( | UChar const * | srcChars, | |
int32_t | srcStart, | |||
int32_t | srcLength, | |||
int32_t | start, | |||
int32_t | length | |||
) | const |
Locate in this the last occurrence in the range [start
, start + length
) of the characters in srcChars
in the range [srcStart
, srcStart + srcLength
), using bitwise comparison.
srcChars | The text to search for. | |
srcStart | the offset into srcChars at which to start matching | |
srcLength | the number of characters in srcChars to match | |
start | the offset into this at which to start matching | |
length | the number of characters in this to search |
text
, or -1 if not found. int32_t uima::UnicodeStringRef::lastIndexOf | ( | UChar | c | ) | const [inline] |
Locate in this the last occurrence of the code unit c
, using bitwise comparison.
c | The code unit to search for. |
c
, or -1 if not found. int32_t uima::UnicodeStringRef::lastIndexOf | ( | UChar32 | c | ) | const [inline] |
Locate in this the last occurrence of the code point c
, using bitwise comparison.
c | The code point to search for. |
c
, or -1 if not found. References lastIndexOf().
int32_t uima::UnicodeStringRef::lastIndexOf | ( | UChar | c, | |
int32_t | start | |||
) | const [inline] |
Locate in this the last occurrence of the code unit c
starting at offset start
, using bitwise comparison.
c | The code unit to search for. | |
start | The offset at which searching will start. |
c
, or -1 if not found. int32_t uima::UnicodeStringRef::lastIndexOf | ( | UChar32 | c, | |
int32_t | start | |||
) | const [inline] |
Locate in this the last occurrence of the code point c
starting at offset start
, using bitwise comparison.
c | The code point to search for. | |
start | The offset at which searching will start. |
c
, or -1 if not found. References lastIndexOf().
int32_t uima::UnicodeStringRef::lastIndexOf | ( | UChar | c, | |
int32_t | start, | |||
int32_t | length | |||
) | const [inline] |
Locate in this the last occurrence of the code unit c
in the range [start
, start + length
), using bitwise comparison.
c | The code unit to search for. | |
start | the offset into this at which to start matching | |
length | the number of characters in this to search |
c
, or -1 if not found. int32_t uima::UnicodeStringRef::lastIndexOf | ( | UChar32 | c, | |
int32_t | start, | |||
int32_t | length | |||
) | const [inline] |
Locate in this the last occurrence of the code point c
in the range [start
, start + length
), using bitwise comparison.
c | The code point to search for. | |
start | the offset into this at which to start matching | |
length | the number of characters in this to search |
c
, or -1 if not found. References lastIndexOf().
UChar uima::UnicodeStringRef::charAt | ( | int32_t | offset | ) | const [inline] |
Return the code unit at offset offset
.
offset | a valid offset into the text |
offset
References EXISTS.
UChar uima::UnicodeStringRef::operator[] | ( | int32_t | offset | ) | const [inline] |
Return the code unit at offset offset
.
offset | a valid offset into the text |
offset
References EXISTS.
UChar32 uima::UnicodeStringRef::char32At | ( | int32_t | offset | ) | const [inline] |
Return the code point that contains the code unit at offset offset
.
offset | a valid offset into the text that indicates the text offset of any of the code units that will be assembled into a code point (21-bit value) and returned |
offset
int32_t uima::UnicodeStringRef::getChar32Start | ( | int32_t | offset | ) | const [inline] |
Adjust a random-access offset so that it points to the beginning of a Unicode character.
The offset that is passed in points to any code unit of a code point, while the returned offset will point to the first code unit of the same code point. In UTF-16, if the input offset points to a iv_uiLength surrogate of a surrogate pair, then the returned offset will point to the first surrogate.
offset | a valid offset into one code point of the text |
int32_t uima::UnicodeStringRef::getChar32Limit | ( | int32_t | offset | ) | const [inline] |
Adjust a random-access offset so that it points behind a Unicode character.
The offset that is passed in points behind any code unit of a code point, while the returned offset will point behind the last code unit of the same code point. In UTF-16, if the input offset points behind the first surrogate (i.e., to the iv_uiLength surrogate) of a surrogate pair, then the returned offset will point behind the iv_uiLength surrogate (i.e., to the first surrogate).
offset | a valid offset after any code unit of a code point of the text |
int32_t uima::UnicodeStringRef::moveIndex32 | ( | int32_t | index, | |
int32_t | delta | |||
) | const |
Move the code unit index along the string by delta code points.
Interpret the input index as a code unit-based offset into the string, move the index forward or backward by delta code points, and return the resulting index. The input index should point to the first code unit of a code point, if there is more than one.
Both input and output indexes are code unit-based as for all string indexes/offsets in ICU (and other libraries, like MBCS char*). If delta<0 then the index is moved backward (toward the start of the string). If delta>0 then the index is moved forward (toward the end of the string).
This behaves like CharacterIterator::move32(delta, kCurrent).
Examples: // s has code points 'a' U+10000 'b' U+10ffff U+2029 UnicodeStringRef s=UNICODE_STRING("a\\U00010000b\\U0010ffff\\u2029", 31).unescape();
// initial index: position of U+10000 int32_t index=1;
// the following examples will all result in index==4, position of U+10ffff
// skip 2 code points from some position in the string index=s.moveIndex32(index, 2); // skips U+10000 and 'b'
// go to the 3rd code point from the start of s (0-based) index=s.moveIndex32(0, 3); // skips 'a', U+10000, and 'b'
// go to the next-to-last code point of s
index=s.moveIndex32(s.length(), -2); // backward-skips U+2029 and U+10ffff
index | input code unit index | |
delta | (signed) code point count to move the index forward or backward in the string |
void uima::UnicodeStringRef::extract | ( | int32_t | start, | |
int32_t | length, | |||
UChar * | dst, | |||
int32_t | dstStart = 0 | |||
) | const [inline] |
Copy the characters in the range [start
, start + length
) into the array dst
, beginning at dstStart
.
If the string aliases to dst
itself as an external buffer, then extract() will not copy the contents.
start | offset of first character which will be copied into the array | |
length | the number of characters to extract | |
dst | array in which to copy characters. The length of dst must be at least (dstStart + length ). | |
dstStart | the offset in dst where the first character will be extracted |
References getBuffer().
Referenced by extract(), and extractBetween().
void uima::UnicodeStringRef::extractBetween | ( | int32_t | start, | |
int32_t | limit, | |||
UChar * | dst, | |||
int32_t | dstStart = 0 | |||
) | const [inline] |
Copy the characters in the range [start
, limit
) into the array dst
, beginning at dstStart
.
start | offset of first character which will be copied into the array | |
limit | offset immediately following the last character to be copied | |
dst | array in which to copy characters. The length of dst must be at least (dstStart + (limit - start) ). | |
dstStart | the offset in dst where the first character will be extracted |
References extract().
int32_t uima::UnicodeStringRef::extract | ( | UChar * | dst, | |
int32_t | dstCapacity, | |||
UErrorCode & | errorCode | |||
) | const |
Copy the contents of the string into dst.
This is a convenience function that checks if there is enough space in dst, extracts the entire string if possible, and NUL-terminates dst if possible.
If the string fits into dst but cannot be NUL-terminated (length()==dstCapacity) then the error code is set to U_STRING_NOT_TERMINATED_WARNING. If the string itself does not fit into dst (length()>dstCapacity) then the error code is set to U_BUFFER_OVERFLOW_ERROR.
If the string aliases to dst
itself as an external buffer, then extract() will not copy the contents.
dst | Destination string buffer. | |
dstCapacity | Number of UChars available at dst. | |
errorCode | ICU error code. |
void uima::UnicodeStringRef::extract | ( | int32_t | start, | |
int32_t | length, | |||
UnicodeString & | dst | |||
) | const [inline] |
Copy the characters in the range [start
, start + length
) into the UnicodeString dst
.
start | offset of first character which will be copied | |
length | the number of characters to extract | |
dst | UnicodeString into which to copy characters. |
dst
References getBuffer().
void uima::UnicodeStringRef::extractBetween | ( | int32_t | start, | |
int32_t | limit, | |||
UnicodeString & | dst | |||
) | const [inline] |
Copy the characters in the range [start
, limit
) into the UnicodeString dst
.
start | offset of first character which will be copied | |
limit | offset immediately following the last character to be copied | |
dst | UnicodeString into which to copy characters. |
dst
References extract().
int32_t uima::UnicodeStringRef::extract | ( | int32_t | start, | |
int32_t | startLength, | |||
char * | target, | |||
const char * | codepage = 0 | |||
) | const [inline] |
Copy the characters in the range [start
, start + length
) into an array of characters in a specified codepage.
The output string is NUL-terminated.
start | offset of first character which will be copied | |
startLength | the number of characters to extract | |
target | the target buffer for extraction | |
codepage | the desired codepage for the characters. 0 has the special meaning of the default codepage If codepage is an empty string ("" ), then a simple conversion is performed on the codepage-invariant subset ("invariant characters") of the platform encoding. See utypes.h. If target is NULL, then the number of bytes required for target is returned. NOTE: It is assumed that the target is big enough to fit all of the characters. |
References extract().
int32_t uima::UnicodeStringRef::extract | ( | int32_t | start, | |
int32_t | startLength, | |||
char * | target, | |||
uint32_t | targetLength, | |||
const char * | codepage = 0 | |||
) | const |
Copy the characters in the range [start
, start + length
) into an array of characters in a specified codepage.
This function does not write any more than targetLength
characters but returns the length of the entire output string so that one can allocate a larger buffer and call the function again if necessary. The output string is NUL-terminated if possible.
start | offset of first character which will be copied | |
startLength | the number of characters to extract | |
target | the target buffer for extraction | |
targetLength | the length of the target buffer | |
codepage | the desired codepage for the characters. 0 has the special meaning of the default codepage If codepage is an empty string ("" ), then a simple conversion is performed on the codepage-invariant subset ("invariant characters") of the platform encoding. See utypes.h. If target is NULL, then the number of bytes required for target is returned. |
int32_t uima::UnicodeStringRef::extract | ( | char * | target, | |
int32_t | targetCapacity, | |||
UConverter * | cnv, | |||
UErrorCode & | errorCode | |||
) | const |
Convert the UnicodeStringRef into a codepage string using an existing UConverter.
The output string is NUL-terminated if possible.
This function avoids the overhead of opening and closing a converter if multiple strings are extracted.
target | destination string buffer, can be NULL if targetCapacity==0 | |
targetCapacity | the number of chars available at target | |
cnv | the converter object to be used (ucnv_resetFromUnicode() will be called), or NULL for the default converter | |
errorCode | normal ICU error code |
int32_t uima::UnicodeStringRef::extract | ( | int32_t | start, | |
int32_t | startLength, | |||
std::string & | target, | |||
const char * | codepage = 0 | |||
) | const |
Copy the characters in the range [start
, start + length
) into a std::string object in a specified codepage.
The output string is NUL-terminated.
start | offset of first character which will be copied | |
startLength | the number of characters to extract | |
target | the target string for extraction | |
codepage | the desired codepage for the characters. 0 has the special meaning of the default codepage. If codepage is an empty string ("" ), then a simple conversion is performed on the codepage-invariant subset ("invariant characters") of the platform encoding. See utypes.h. |
int32_t uima::UnicodeStringRef::extract | ( | std::string & | target, | |
const char * | codepage = 0 | |||
) | const [inline] |
Copy all the characters in the string into an std::string object in a specified codepage.
Equivalent to extract(0, length(), target, codepage)
target | the target string for extraction | |
codepage | the desired codepage for the characters. |
References extract().
int32_t uima::UnicodeStringRef::extractUTF8 | ( | std::string & | target | ) | const |
std::string uima::UnicodeStringRef::asUTF8 | ( | void | ) | const [inline] |
static void uima::UnicodeStringRef::release | ( | std::string & | target | ) | [static] |
Release contents of string container allocated by extract methods Useful when caller and callee use different heaps, e.g.
when debug code uses a release library. Is static so can be called on the UnicodeStringRef
class directly.
int32_t uima::UnicodeStringRef::length | ( | void | ) | const [inline] |
Return the length of the UnicodeStringRef object.
The length is the number of characters in the text.
Referenced by uima::strtrim().
int32_t uima::UnicodeStringRef::countChar32 | ( | int32_t | start = 0 , |
|
int32_t | length = 0x7fffffff | |||
) | const |
Count Unicode code points in the length UChar code units of the string.
A code point may occupy either one or two UChar code units. Counting code points involves reading all code units.
This functions is basically the inverse of moveIndex32().
start | the index of the first code unit to check | |
length | the number of UChar code units to check |
bool uima::UnicodeStringRef::isEmpty | ( | void | ) | const [inline] |
Determine if this string is empty.
UnicodeStringRef & uima::UnicodeStringRef::setTo | ( | const UnicodeStringRef & | srcText | ) | [inline] |
Set the text in the UnicodeString object to the characters in srcText
.
srcText
is not modified.
srcText | the source for the new characters |
References iv_pUChars, and iv_uiLength.
UnicodeStringRef & uima::UnicodeStringRef::setTo | ( | const UnicodeString & | srcText | ) | [inline] |
Set the text in the UnicodeString object to the characters in srcText
.
srcText
is not modified.
srcText | the source for the new characters |
UnicodeStringRef & uima::UnicodeStringRef::setTo | ( | const UChar * | srcChars, | |
int32_t | srcLength | |||
) | [inline] |
Set the characters in the UnicodeString object to the characters in srcChars
.
srcChars
is not modified.
srcChars | the source for the new characters | |
srcLength | the number of Unicode characters in srcChars. |
void uima::UnicodeStringRef::toSingleByteStream | ( | std::ostream & | outStream | ) | const |
Print a single byte version to outStream.
The encoding is UTF-8 if outStream is directed to disk, if outStream is cout our cerr the encoding is a Console-CCSID that will allow most character to be readable in a shell/command window.