layout | title | ref | framework | rating | description |
---|---|---|---|---|---|
post |
CFStringTransform |
CoreFoundation |
9.1 |
<tt>NSString</tt> is the crown jewel of Foundation. But as powerful as it is, we would be remiss no to mention its toll-free bridged cousin, <tt>CFMutableString</tt>. Or more specifically, <tt>CFStringTransform</tt>. |
There are two indicators that can tell you pretty much everything you need to know about how nice a language is to use:
- API Consistency
- Quality of String Implementation
NSString
is the crown jewel of Foundation. In an age where other languages still struggle to handle Unicode correctly, NSString
can not only handle anything you throw at it, but it can turn around and parse that input into linguistic tags. It's unfairly good.
But as powerful as NSString
/ NSMutableString
are, we would be remiss not to mention their toll-free bridged cousin, CFMutableString
. Or more specifically, CFStringTransform
.
As denoted by the CF
prefix, CFStringTransform
is part of Core Foundation, which is a C, rather than Objective-C API. The function returns a Boolean
for whether or not the transform was successful, and takes the following arguments:
string
: The string to be transformed. Since this argument is aCFMutableStringRef
, anNSMutableString
can be passed using toll-free bridging.range
: The range of the string over which the transformation should be applied.transform
: The transformation to apply. This argument takes one of the string constants described below.reverse
: Whether to run the transformation in reverse, where applicable.
CFStringTransform
covers a lot of ground with its transform
argument. Here's a rundown of what it can do:
Énġlišh långuãge lẳcks iñterêßţing diaçrïtičş, so it can be useful to normalize extended Latin characters into ASCII-friendly representations. You can get rid of the squiggly bits using the kCFStringTransformStripCombiningMarks
transformation.
kCFStringTransformToUnicodeName
allows you to determine the Unicode standard name for special characters, including Emoji. For instance, "🐑💨✨" becomes "{SHEEP} {DASH SYMBOL} {SPARKLES}".
With the exception of English (with its complicated spelling inconsistencies), writing systems encode speech sounds into phonetic, written representations. European languages generally use the Latin alphabet with a few added diacritics, Russian uses Cyrillic, Japanese uses Hiragana & Katakana, and Thai, Korean, & Arabic each have their own scripts.
Although each language has a particular inventory of sounds that other languages may not have, the overlap across all of the major writing systems is remarkably high--enough so that one can rather effectively transliterate from one to another (not to be confused with translation).
CFStringTransform
can transliterate between Latin and Arabic, Cyrillic, Greek, Korean (Hangul), Hebrew, Japanese (Hiragana & Katakana), Mandarin Chinese, and Thai. And not only that, but those transformations are all reversible:
Transformation | Input | Output |
---|---|---|
kCFStringTransformLatinArabic | mrḥbạ | مرحبا |
kCFStringTransformLatinCyrillic | privet | привет |
kCFStringTransformLatinGreek | geiá sou | γειά σου |
kCFStringTransformLatinHangul | annyeonghaseyo | 안녕하세요 |
kCFStringTransformLatinHebrew | şlwm | שלום |
kCFStringTransformLatinHiragana | hiragana | ひらがな |
kCFStringTransformLatinKatakana | katakana | カタカナ |
kCFStringTransformLatinThai | s̄wạs̄dī | สวัสดี |
kCFStringTransformHiraganaKatakana | にほんご | ニホンゴ |
kCFStringTransformMandarinLatin | 中文 | zhōng wén |
One of the more practical applications for all of this is to normalize unpredictable user input in useful ways. Even if your application doesn't specifically deal with languages, you should be able to intelligently process anything the user types into your app.
Let's say you want to build a searchable index of movies on the device, which includes titles from around the world. You could:
- First, apply the
kCFStringTransformToLatin
transform to transliterate all non-English text into a phonetic Latin alphabetic representation.
Hello! こんにちは! สวัสดี! مرحبا! 您好! →
Hello! kon'nichiha! s̄wạs̄dī! mrḥbạ! nín hǎo!
- Next, apply the
kCFStringTransformStripCombiningMarks
transform to remove any diacritics or accents.
Hello! kon'nichiha! s̄wạs̄dī! mrḥbạ! nín hǎo! →
Hello! kon'nichiha! swasdi! mrhba! nin hao!
- Finally, downcase the text and use
CFStringTokenizer
to split the text into tokens, and index the movie on them.
(hello, kon'nichiha, swasdi, mrhba, nin, hao)
If you do the same to search text entered by the user, you now have an easy way to search for names of titles, regardless of either the language of the search string or the movie title. Mathematical!
CFStringTransform
can be an insanely powerful way to bend language to your will. And it's but one of many powerful features that await you if you're brave enough to explore outside of Objective-C's warm OO embrace.