Runes and character encoding
Characters, ASCII and Unicode
runetype is an alias for
int32, and is used to emphasize than an integer represents a code point.
ASCII defines 128 characters, identified by the code points 0–127. It covers English letters, Latin numbers, and a few other characters.
Unicode, which is a superset of ASCII, defines a codespace of 1,114,112 code points. Unicode version 10.0 covers 139 modern and historic scripts (including the runic alphabet, but not Klingon) as well as multiple symbol sets.
Strings and UTF-8 encoding
stringis a sequence of bytes, not runes.
However, strings often contain Unicode text encoded in UTF-8, which encodes all Unicode code points using one to four bytes. (ASCII characters are encoded with one byte, while other code points use more.)
Since Go source code itself is encoded as UTF-8, string literals will automatically get this encoding.
For example, in the string
"café" the character
é (code point 233) is encoded using two bytes,
while the ASCII characters
(code points 99, 97 and 102) only use one:
fmt.Println(byte("café")) // [99 97 102 195 169] fmt.Println(rune("café")) // [99 97 102 233]
See Convert between byte array/slice and string and Convert between rune array/slice and string.
Special escape characters
Arbitrary character values can be encoded with backslash escapes and
be used in any
'' character literal.
There are four different formats:
\xfollowed by exactly two hexadecimal digits,
\followed by exactly three octal digits,
\ufollowed by exactly four hexadecimal digits,
\Ufollowed by exactly eight hexadecimal digits,
where the escapes
\U represent Unicode code points.
The following predefined special values are also available.
||U+0007 alert or bell|
||U+0009 horizontal tab|
||U+000A line feed or newline|
||U+000C form feed|
||U+000D carriage return|
||U+000b vertical tab|
fmt.Println("\\caf\u00e9") // Prints: \café fmt.Printf("%c", '\u00e9') // Prints: é
However, in raw string literals, delimited by back quotes, text is interpreted literally and backslashes have no special meaning. See 2 ways to write multiline strings.
Share this page: