range over a string

Iterate over bytes

You can iterate over bytes in a string:

https://codeeval.dev/gist/d3520f3abf15dfb6257c1b3891d56aad

Iterate over runes

Things are more complicated when you want to iterate over logical characters (runes) in a string:

https://codeeval.dev/gist/8dd33af1969cfae931d52be33f78b502

In Go strings are immutable sequence of bytes. Think a read-only []byte slice.

Each byte is in 0 to 255 range.

There are many more characters in all the world’s alphabets.

Unicode standard defines unique value for every known character. Unicode calls them code points and they are integers that can fit in 32 bits.

To represent Unicode code points, Go has a rune type. It is an alias for int32.

Literal strings in Go source code are UTF-8 encoded.

Every Unicode code point can be encoded with 1 to 4 bytes.

In this form of iteration, Go assumes that a string is UTF-8 encoded. range decodes each code point as UTF-8, returns decoded rune and its byte index in string.

You can see the byte index of last code point jumped by 3 because code point before it represents a Chinese character and required 3 bytes in UTF-8 encoding.

Strings and UTF-8

Go strings are slices of bytes. You can put arbitrary binary data in them.

How the bytes are interpreted is up to your code.

Most of the time a string represents Unicode string in UTF-8 encoding but outside of string literals in Go source code, Go doesn't check or ensure that string data form a valid UTF-8 sequence.

That being said, Go provides functionality for working with UTF-8 encoded data.