How to read multiple files line by line in Python?
In previous chapters, we learned to use a combination of open and read (or readline, readlines) function
to read data from a single file. However, in some scenarios, it may be necessary to read data from
multiple files. In this case, it is obviously inappropriate to use this combination.
Fortunately, Python provides a Fileinput module. With the input function in this module, we
can open multiple specified files at the same time and read the contents of these files one by one.
The syntax of the input () function in the fileinput module is as follows:
This function returns a FileInput object, which can be understood as a file object after merging multiple
specified files. The meaning of each parameter is as follows:
files: list of paths for multiple files;
inplace: used to specify whether to write the results of standard output back to a file. The default
value of this parameter is False;
backup: used to specify the extension of the backup file;
bufsize: specify the size of the buffer, the default is 0;
mode: Open file format, default is r (read-only format);
openhook: Controls how files are opened, such as encoding formats.
Note that, unlike the open function, the input function cannot specify the encoding
format of the open file. This means that all files read using this function must be in the same encoding
format as the current file unless read in binary mode. The default encoding format of the operating system is
the same, otherwise the Python interpreter may prompt a UnicodeDecodeError.
Unlike the open function, which returns a single file object, the fileinput object does not need to call
functions such as read, readline, readlines, and can directly read data from multiple files through a
It is worth mentioning that the fileinput module also provides a lot of functions (as shown in the following
table). By calling these functions, we can help us achieve the desired function faster.
Returns the name of the file currently being read.
Returns the file descriptor of the file currently being read.
Returns how many rows are currently read.
Returns the line number of the content currently being read in the current file.
Determines whether the currently read content is on line 1 in the current file.
Close the file currently being read and start reading the next file.
Close the FileInput object.
Here is an example. Suppose you use input function to read 2 files,
a.txt and file.txt, which are in the same directory and each contains the following:
The following program shows how to use the input function to read these two files one by one:
#Use a for loop to iterate over the fileinput object
for line in fileinput.input (files = ('a.txt', 'file.txt')):
# Output the read content
# Close file stream
Obviously, the order in which file contents are read depends on the order of file names in the input function.
Before using the input function in the fileinput module, must ensure to import the fileinput module.
Python linecache Module
How to randomly read a specified line of a file?
In addition to reading files with the help of the fileinput module, Python also provides a linecache module. Unlike the former,
the linecache module is good at reading specified lines in a specified file. In other words, if we want to read the
data contained in a specified line in a file, we can use the linecache module.
It is worth mentioning that the linecache module is commonly used to read the code in Python source files. It uses the UTF-8
encoding format to read the file contents. This means that the file read by this module must also be encoded in UTF-8,
otherwise the data read is garbled, or the read fails directly (the Python interpreter will report a SyntaxError
To use the linecache module, you must know which functions it contains. The functions and functions commonly used
in the linecache module are shown in the following table:
Read the specified line of the specified file in the specified module (the specified module is not necessary
when reading only the specified file).
• the filename parameter is used to specify the filename,
• lineno is used to specify the line number,
• module_globals is used to specify the specific module name to be read.
Note that when the specified file is passed to the filename parameter as a relative path, the function looks for
the file at the path specified by sys.path.
If the program somewhere no longer needs the data previously read using the getline function, you can use this
function to clear the cache.
Check the validity of the cache, that is, if the data read using the getline function has been modified
locally, and we need new data, you can use this function to check whether the cache is new data. Note that
if the file name is omitted, this function will check the validity of all cached data.
#Read the data in line 3 of the string module
print (linecache.getline (string .__ file__, 3))
# Read the second line of a normal file
print (linecache.getline ('a.txt', 2))
Before executing this program, you need to ensure that a.txt file is saved in UTF-8 encoding format
(modules provided by Python, usually encoding format is UTF-8). On this basis, the program is executed,
and the output is: