In Python 2 there are two variants of string: those made of bytes with type ([str](<https://docs.python.org/2/library/functions.html#str>)
) and those made of text with type ([unicode](<https://docs.python.org/2/library/functions.html#unicode>)
).
In Python 2, an object of type str
is always a byte sequence, but is commonly used for both text and binary data.
A string literal is interpreted as a byte string.
s = 'Cafe' # type(s) == str
There are two exceptions: You can define a Unicode (text) literal explicitly by prefixing the literal with u
:
s = u'Café' # type(s) == unicode
b = 'Lorem ipsum' # type(b) == str
Alternatively, you can specify that a whole module’s string literals should create Unicode (text) literals:
from __future__ import unicode_literals
s = 'Café' # type(s) == unicode
b = 'Lorem ipsum' # type(b) == unicode
In order to check whether your variable is a string (either Unicode or a byte string), you can use:
isinstance(s, basestring)
In Python 3, the str
type is a Unicode text type.
s = 'Cafe' # type(s) == str
s = 'Café' # type(s) == str (note the accented trailing e)
Additionally, Python 3 added a [bytes
object](https://docs.python.org/3/library/functions.html#func-bytes), suitable for binary “blobs” or writing to encoding-independent files. To create a bytes object, you can prefix b
to a string literal or call the string’s encode
method:
# Or, if you really need a byte string:
s = b'Cafe' # type(s) == bytes
s = 'Café'.encode() # type(s) == bytes
To test whether a value is a string, use:
isinstance(s, str)
It is also possible to prefix string literals with a u
prefix to ease compatibility between Python 2 and Python 3 code bases. Since, in Python 3, all strings are Unicode by default, prepending a string literal with u
has no effect:
u'Cafe' == 'Cafe'
Python 2’s raw Unicode string prefix ur
is not supported, however:
>>> ur'Café'
File "<stdin>", line 1
ur'Café'
^
SyntaxError: invalid syntax