Reading files in C++ can sometimes can be complex process. I wrote an article of four different ways to read a entire file into a string. I outline the pros and cons of each method. I have also done benchmarking to figure out which one is the fastest.
Problem statement: Given a file path (std::string
), read the entire file into fileData
(std::string
).
The four methods I used are C’s fread
, C++’s read
, rdbuf
, and istreambuf_iterator
.
A string is not null terminated in the same way a C string is, so we can read binary data into it.
Basic Algorithm Outline
The basic outline for each method is:
- Open the file
- Check if the file is opened
- Determine file size
- Resize string to file size
- Read the Data
- Close file
Open the file
We open the file for reading and binary format. Why binary format? Because I don’t know what type of data you want to open. So I went with binary. Binary format reading just prevents any interpretation of the data. If you have a text file with lots new lines we can read that differently and we can use getline
for that.
Check if file is opened
Checking if the file is opened is important because we can handle the situation if the file is not opened properly. For example, if the file is missing or locked by another process. Otherwise, if we perform any operations on the file such as reading, we likely will get an exception and the program could crash.
Determine file size
Determining the file size is important for allocating space for the string.
Resize string to file size
Reszing the string to the file size (or allocating space for the string) is important because re-allocation of memory is generally a slow process. If we open a large file, we want the string to be able to take in all the contents of the file.
Read the data
Finally, read the data into the string. Ideally, we should only need one read with the file size.
Close the file
We will close the file so other processes can use the file again if needed. With the C++ methods, when the file handle goes out of scope it will close the handle but not with the C style. In general, it is good practice to always close the file handle once you’re done with the file.
Benchmarks
I ran some back of the envelope benchmarking to understand the performance for each of the methods. For small files that are opened occionally, the speed of the method matters less. However, for larger files or having to read many files, speed matters.
I tested with 3 runs of 500 MB file and 5 runs of 25 MB file. I normalized the results to the fastest of the results which was C’s fread
as 1.0
Method | Relative performance |
---|---|
C’s fread |
1.0x |
read |
~2x |
rdbuf |
~10x |
istreambuf_iterator |
~20x |
fread
was fastest, C++’s read
was about twice as slow, rdbuf
was about 10x slower, and istreambuf_iterator
about 20x slower.
rdbuf
was about 10x slower because the string has to get copied out of the stringstream, which slowed things down.
I suggest you trying to time your code. Measure then optimize!
C’s fread
Technically not C++, so you may want to skip this method. HOwever, if you are in need for speed, then this is the fastest.
⚠️ Check ftell
result before casting to an unsigned type because it can return -1
if something goes wrong, otherwise casting -1
to unsigned will give max unsigned number, and that’s not something you want to allocate!
void doRead()
{
std::string fileData;
// Open the file
FILE* f = fopen(filePath.c_str(), "rb");
// Check if file opened
if (!f)
{
// Handle error
std::println("Unable to open file");
return;
}
// Determine file size
fseek(f, 0, SEEK_END);
long ftellResult = ftell(f);
if (ftellResult < 0)
{
// handle error
std::println("ftell failed");
return;
}
rewind(f);
size_t fileSize = static_cast<size_t>(ftellResult);
// Resize string to file size
fileData.resize(fileSize);
// Read data
fread(&fileData[0], sizeof(char), fileSize, f);
// Close file
fclose(f);
}
C++’s read
C++’s read
. From my understanding, this is a wrapper over the C’s file I/O operations, providing more error handling. For some reason, it cuts the performance in half though!
ℹ️ Nifty trick is to open the file at the end to get its file size with tellg
by using std::ios::ate
flag when opening the file.
⚠️ Similar to ftell
, check tellg
result before casting to an unsigned type because it can return -1
if something goes wrong, otherwise casting -1
to unsigned will give max unsigned number, and that’s not something you want to allocate!
void doRead()
{
std::string fileData;
// Open file
// seek to the end with std::ios::ate
std::ifstream file(filePath, std::ios::in | std::ios::binary | std::ios::ate);
// Check if file opened
if (!file.is_open())
{
// handle error
std::println("Unable to open file");
return;
}
size_t fileSize;
// Determine file size (method 1)
auto tellgResult = file.tellg();
if (tellgResult == -1)
{
// handle error
std::println("file_size failed");
return;
}
fileSize = static_cast<size_t>(tellgResult);
file.seekg(0, std::ios_base::beg);
// Resize string
fileData.resize(fileSize);
// Read file into string
file.read(fileData.data(), fileSize);
// Close file
file.close();
}
ℹ️ We can alternatively determine file size with std::filesystem
:
// Determine file size (method 2)
namespace fs = std::filesystem;
std::error_code errorCode;
//fs::path path = filePath;
auto file_sizeResult = fs::file_size(filePath, errorCode);
if (errorCode)
{
// handle error
std::println("file_size failed");
return;
}
Rdbuf method
Alternatively, we can use rdbuf
to read the file into a string stream. However, it requires an extra copy to a string, significantly reducing performance 😢. This method is good if you need to use a string steam instead.
void doRead()
{
// Open file
std::ifstream in(filePath, std::ios::binary);
std::ostringstream stringStream;
// Check if file opened
if (!in.is_open())
{
// Handle error
std::println("Unable to open file");
return;
}
// Read file
stringStream << in.rdbuf();
// Copy string stream to string
std::string fileData = stringStream.str();
// Close file
in.close();
}
istreambuf_iterator
method
This is by far the slowest method, however, the least amount of code.
void doRead()
{
// Open file
std::ifstream ifs = std::ifstream(filePath, std::ios::binary);
// Check if file opened
if (!ifs.is_open())
{
// Handle error
std::println("Unable to open file");
return;
}
// Read file into string
std::string fileData(std::istreambuf_iterator<char>{ifs}, {});
// Close file
ifs.close();
}
If you want to throw caution to the wind, and not check the file opened correctly or manually close the file we can write the following one liner:
std::string fileData(std::istreambuf_iterator<char>{std::ifstream(filePath, std::ios::binary)}, {});
Summary and Recommendations
We discussed four different ways to read an entire file into a string. We discussed the pro and con for each and their performance.
I recommend for you to benchmark (i.e., measure) your file I/O methods and determine the best for your situation.
From my data, the fastest method was C’s fread
however, it is not technically C++ code. C++’s read
method is the fastest C++ method. I recommend not using any method that you have to do additional copies or cannot allocate your string up front, as these are costly operations.