Runes and character encoding

yourbasic.org/golang
Runic letters encoded in stone
Runic letters encoded in stone, Ballstorp 1900

Characters, ASCII and Unicode

The rune type is an alias for int32, and is used to emphasize than an integer represents a code point.

ASCII defines 128 characters, identified by the code points 0–127. It covers English letters, Latin numbers, and a few other characters.

Unicode, which is a superset of ASCII, defines a codespace of 1,114,112 code points. Unicode version 10.0 covers 139 modern and historic scripts (including the runic alphabet, but not Klingon) as well as multiple symbol sets.

Strings and UTF-8 encoding

A string is a sequence of bytes, not runes.

However, strings often contain Unicode text encoded in UTF-8, which encodes all Unicode code points using one to four bytes. (ASCII characters are encoded with one byte, while other code points use more.)

Since Go source code itself is encoded as UTF-8, string literals will automatically get this encoding.

For example, in the string "café" the character é (code point 233) is encoded using two bytes, while the ASCII characters c, a and f (code points 99, 97 and 102) only use one:

fmt.Println([]byte("café")) // [99 97 102 195 169]
fmt.Println([]rune("café")) // [99 97 102 233]

See Convert between byte array/slice and string and Convert between rune array/slice and string.

Special escape characters

Arbitrary character values can be encoded with backslash escapes and be used in any "" string or '' character literal. There are four different formats:

where the escapes \u and \U represent Unicode code points.

The following predefined special values are also available.

Value Description
\a U+0007 alert or bell
\b U+0008 backspace
\\ U+005c backslash
\t U+0009 horizontal tab
\n U+000A line feed or newline
\f U+000C form feed
\r U+000D carriage return
\v U+000b vertical tab
fmt.Println("\\caf\u00e9") // Prints: \café
fmt.Printf("%c", '\u00e9') // Prints: é

However, in raw string literals, delimited by back quotes, text is interpreted literally and backslashes have no special meaning. See 2 ways to write multiline strings.

Share this page: