Runes and character encoding

yourbasic.org/golang

Runic letters encoded in stone, Ballstorp 1900

Characters, ASCII and Unicode

The rune type is an alias for int32, and is used to emphasize than an integer represents a code point.

ASCII defines 128 characters, identified by the code points 0–127. It covers English letters, Latin numbers, and a few other characters.

Unicode, which is a superset of ASCII, defines a codespace of 1,114,112 code points. Unicode version 10.0 covers 139 modern and historic scripts (including the runic alphabet, but not Klingon) as well as multiple symbol sets.

Strings and UTF-8 encoding

A string is a sequence of bytes, not runes.

However, strings often contain Unicode text encoded in UTF-8, which encodes all Unicode code points using one to four bytes. (ASCII characters are encoded with one byte, while other code points use more.)

Since Go source code itself is encoded as UTF-8, string literals will automatically get this encoding.

For example, in the string "café" the character é (code point 233) is encoded using two bytes, while the ASCII characters c, a and f (code points 99, 97 and 102) only use one:

fmt.Println([]byte("café")) // [99 97 102 195 169]
fmt.Println([]rune("café")) // [99 97 102 233]

See Convert between byte array/slice and string and Convert between rune array/slice and string.

Runes and character encoding

Characters, ASCII and Unicode

Strings and UTF-8 encoding

Further reading