Pitfall - Small reads writes on unbuffered streams are inefficient

Consider the following code to copy one file to another:

import java.io.*;

public class FileCopy {

    public static void main(String[] args) throws Exception {
        try (InputStream is = new FileInputStream(args[0]);
             OutputStream os = new FileOutputStream(args[1])) {
           int octet;
           while ((octet = is.read()) != -1) {
               os.write(octet);
           }
        }
    }
}

(We have deliberated omitted normal argument checking, error reporting and so on because they are not relevant to point of this example.)

If you compile the above code and use it to copy a huge file, you will notice that it is very slow. In fact, it will be at least a couple of orders of magnitude slower than the standard OS file copy utilities.

(Add actual performance measurements here!)

The primary reason that the example above is slow (in the large file case) is that it is performing one-byte reads and one-byte writes on unbuffered byte streams. The simple way to improve performance is to wrap the streams with buffered streams. For example:

import java.io.*;

public class FileCopy {

    public static void main(String[] args) throws Exception {
        try (InputStream is = new BufferedInputStream(
                     new FileInputStream(args[0]));
             OutputStream os = new BufferedOutputStream(
                     new FileOutputStream(args[1]))) {
           int octet;
           while ((octet = is.read()) != -1) {
               os.write(octet);
           }
        }
    }
}

These small changes will improve data copy rate by at least a couple of orders of magnitude, depending on various platform-related factors. The buffered stream wrappers cause the data to be read and written in larger chunks. The instances both have buffers implemented as byte arrays.

With is, data is read from the file into the buffer a few kilobytes at a time. When read() is called, the implementation will typically return a byte from the buffer. It will only read from the underlying input stream if the buffer has been emptied.
The behavior for os is analogous. Calls to os.write(int) write single bytes into the buffer. Data is only written to the output stream when the buffer is full, or when os is flushed or closed.

What about character-based streams?

As you should be aware, Java I/O provides different APIs for reading and writing binary and text data.

InputStream and OutputStream are the base APIs for stream-based binary I/O
Reader and Writer are the base APIs for stream-based text I/O.

For text I/O, BufferedReader and BufferedWriter are the equivalents for BufferedInputStream and BufferedOutputStream.

Why do buffered streams make this much difference?

The real reason that buffered streams help performance is to do with the way that an application talks to the operating system:

Java method in a Java application, or native procedure calls in the JVM’s native runtime libraries are fast. They typically take a couple of machine instructions and have minimal performance impact.
By contrast, JVM runtime calls to the operating system are not fast. They involve something known as a “syscall”. The typical pattern for a syscall is as follows:

1. Put the syscall arguments into registers.
2. Execute a SYSENTER trap instruction.
3. The trap handler switched to privileged state and changes the virtual memory mappings.  Then it dispatches to the code to handle the specific syscall.
4. The syscall handler checks the arguments, taking care that it isn't being told to access memory that the user process should not see.
5. The syscall specific work is performed.  In the case of a `read` syscall, this may involve:
   1. checking that there is data to be read at the file descriptor's current position
   2. calling the file system handler to fetch the required data from disk (or wherever it is stored) into the buffer cache,
   3. copying data from the buffer cache to the JVM-supplied address
   4. adjusting thstream pointerse file descriptor position
6. Return from the syscall.  This entails changing VM mappings again and switching out of privileged state.