Runes and character encoding
Characters, ASCII and Unicode
runetype is an alias for
int32, and is used to emphasize than an integer represents a code point.
ASCII defines 128 characters, identified by the code points 0–127. It covers English letters, Latin numbers, and a few other characters.
Unicode, which is a superset of ASCII, defines a codespace of 1,114,112 code points. Unicode version 10.0 covers 139 modern and historic scripts (including the runic alphabet, but not Klingon) as well as multiple symbol sets.
Strings and UTF-8 encoding
stringis a sequence of bytes, not runes.
However, strings often contain Unicode text encoded in UTF-8, which encodes all Unicode code points using one to four bytes. (ASCII characters are encoded with one byte, while other code points use more.)
Since Go source code itself is encoded as UTF-8, string literals will automatically get this encoding.
For example, in the string
"café" the character
é (code point 233) is encoded using two bytes,
while the ASCII characters
(code points 99, 97 and 102) only use one:
fmt.Println(byte("café")) // [99 97 102 195 169] fmt.Println(rune("café")) // [99 97 102 233]
See Convert between byte array/slice and string and Convert between rune array/slice and string.
Escapes and multiline strings
Share this page: