We know that the earliest string encoding was ASCII encoding, which encodes only 10 numbers, 26 uppercase and lowercase English letters, and some special characters. ASCII code can only represent 256 symbols long, and each character only takes 1 byte.
With the development of information technology, the characters of various countries need to be encoded, so GBK, GB23112, UTF-8 encoding, etc. have appeared successively, which requires that the English character mother occupies 1 byte , Chinese characters occupy 2 bytes; UTF-8 is an internationally adopted encoding format, which contains characters needed by all countries in the world. It stipulates that English characters occupy 1 byte and Chinese characters occupy 3 bytes.
In Python, there are two commonly used string types, str and bytes, where str is used to represent Unicode characters and bytes is used to represent binary data. The str and bytes types need to be converted using encode() and decode() functions.
The encode() function provides methods for the string type (str) to convert the str type to the bytes type. This process is also called "encoding".
The syntax of the encode() function is as follows:
str.encode([encoding=”utf-8”][errors=”strict”])
Note that the parameters enclosed in [] in the format are optional parameters, that is, when using this function, the parameters in [] can be used or not.
The meaning of each parameter of this method is shown in the following table:
Parameter | Meaning |
---|---|
str | Represents a string to be converted. |
encoding = “utf-8” | Specifies the character encoding used for encoding. This option defaults to UTF-8 encoding. When only one parameter is used in the method, you can omit the preceding "encoding =" and write the encoding format directly, such as str.encode ("UTF-8"). |
errors = “strict” |
Specifies the error handling method. The selectable values are:
The default value of this parameter is strict. |
Note that using the encode () method to encode the original string will not directly modify the original string. If you want to modify the original string, you need to re-assign.
[Example:] The string type string "Python Language Website" is converted to bytes type.
string = "Python Language Website" string.encode()
The output is:
b'Python Language Website'
In contrast to the encode() function, the decode() function is used to convert binary data of type bytes to str. This process is also called "decoding".
The syntax of the decode () method is as follows:
bytes.decode([encoding="utf-8”][errors=”strict”])
The meanings of the parameters in this method are shown in the following table.
Parameter | Meaning |
---|---|
bytes | Represents binary data to be converted. |
encoding = “utf-8” | Specifies the character encoding used during decoding. The default is UTF-8. When only one parameter is used in the method, you can omit "encoding =" and write the encoding directly. Note that when decoding bytes type data, you must choose the same format as when you originally encoded it. |
errors = “strict” |
Specifies the error handling method. The selectable values are:
The default value of this parameter is strict. |
[Example:]
string = "Python Language Website" string.decode()
The output is:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
[ipython-input-2-8e0054fd2199] in [module]
1 string = "Python Language Website"
----> 2 string.decode()
AttributeError: 'str' object has no attribute 'decode'
Note that if the encoding is not the default UTF-8 encoding, you must choose the same format as when
More Tutorials:
Python Installation - Linux (Ubuntu)More Python Exercises:
Python String Exercises