Reading From a Binary File in Java
Reading files in Java is the cause for a lot of defoliation. At that place are multiple ways of accomplishing the same task and it's oft not clear which file reading method is best to use. Something that'south quick and dirty for a small case file might not exist the best method to use when you need to read a very large file. Something that worked in an earlier Java version, might not be the preferred method anymore.
This article aims to be the definitive guide for reading files in Java 7, 8 and 9. I'1000 going to embrace all the ways you tin read files in Java. Too frequently, you'll read an article that tells you ane way to read a file, only to discover subsequently in that location are other ways to do that. I'chiliad actually going to encompass 15 unlike means to read a file in Java. I'm going to comprehend reading files in multiple ways with the core Java libraries also as two tertiary party libraries.
But that's not all – what expert is knowing how to do something in multiple ways if you don't know which way is best for your situation?
I also put each of these methods to a real performance test and certificate the results. That way, you will have some hard data to know the performance metrics of each method.
Methodology
JDK Versions
Java lawmaking samples don't live in isolation, especially when information technology comes to Java I/O, every bit the API keeps evolving. All code for this article has been tested on:
- Java SE 7 (jdk1.7.0_80)
- Coffee SE viii (jdk1.8.0_162)
- Coffee SE nine (jdk-nine.0.4)
When there is an incompatibility, it will be stated in that section. Otherwise, the lawmaking works unaltered for dissimilar Coffee versions. The main incompatibility is the utilise of lambda expressions which was introduced in Java 8.
Java File Reading Libraries
In that location are multiple ways of reading from files in Coffee. This commodity aims to be a comprehensive collection of all the different methods. I will cover:
- java.io.FileReader.read()
- java.io.BufferedReader.readLine()
- java.io.FileInputStream.read()
- java.io.BufferedInputStream.read()
- java.nio.file.Files.readAllBytes()
- coffee.nio.file.Files.readAllLines()
- java.nio.file.Files.lines()
- java.util.Scanner.nextLine()
- org.apache.commons.io.FileUtils.readLines() – Apache Eatables
- com.google.common.io.Files.readLines() – Google Guava
Closing File Resources
Prior to JDK7, when opening a file in Java, all file resource would need to be manually closed using a try-grab-finally block. JDK7 introduced the effort-with-resources statement, which simplifies the procedure of endmost streams. You no longer demand to write explicit code to close streams because the JVM will automatically shut the stream for you, whether an exception occurred or not. All examples used in this article apply the attempt-with-resources statement for importing, loading, parsing and closing files.
File Location
All examples will read test files from C:\temp.
Encoding
Grapheme encoding is not explicitly saved with text files so Java makes assumptions about the encoding when reading files. Unremarkably, the assumption is right but sometimes yous want to be explicit when instructing your programs to read from files. When encoding isn't correct, you'll see funny characters appear when reading files.
All examples for reading text files use two encoding variations:
Default system encoding where no encoding is specified and explicitly setting the encoding to UTF-eight.
Download Code
All code files are bachelor from Github.
Code Quality and Lawmaking Encapsulation
In that location is a difference between writing lawmaking for your personal or work projection and writing code to explain and teach concepts.
If I was writing this lawmaking for my own projection, I would apply proper object-oriented principles like encapsulation, abstraction, polymorphism, etc. But I wanted to brand each example stand solitary and easily understood, which meant that some of the code has been copied from 1 example to the next. I did this on purpose because I didn't desire the reader to accept to figure out all the encapsulation and object structures I then cleverly created. That would take away from the examples.
For the same reason, I chose NOT to write these example with a unit of measurement testing framework like JUnit or TestNG because that's non the purpose of this article. That would add another library for the reader to understand that has nothing to practice with reading files in Java. That'south why all the example are written inline within the master method, without extra methods or classes.
My master purpose is to brand the examples as easy to empathize as possible and I believe that having actress unit of measurement testing and encapsulation lawmaking will not help with this. That doesn't mean that'southward how I would encourage you lot to write your own personal code. It's just the mode I chose to write the examples in this article to make them easier to empathise.
Exception Handling
All examples declare any checked exceptions in the throwing method annunciation.
The purpose of this article is to show all the different means to read from files in Java – it's non meant to evidence how to handle exceptions, which will be very specific to your situation.
So instead of creating unhelpful try take hold of blocks that just impress exception stack traces and clutter up the lawmaking, all example will declare whatsoever checked exception in the calling method. This volition make the lawmaking cleaner and easier to sympathize without sacrificing any functionality.
Future Updates
As Coffee file reading evolves, I will be updating this article with any required changes.
File Reading Methods
I organized the file reading methods into iii groups:
- Classic I/O classes that have been part of Coffee since earlier JDK 1.7. This includes the java.io and java.util packages.
- New Coffee I/O classes that have been function of Java since JDK1.7. This covers the java.nio.file.Files course.
- Third party I/O classes from the Apache Commons and Google Guava projects.
Classic I/O – Reading Text
1a) FileReader – Default Encoding
FileReader reads in one character at a time, without any buffering. It'south meant for reading text files. It uses the default grapheme encoding on your arrangement, so I take provided examples for both the default case, equally well as specifying the encoding explicitly.
one
2
3
4
5
6
seven
8
9
10
11
12
xiii
xiv
15
16
17
18
19
import coffee.io.FileReader ;
import java.io.IOException ;public form ReadFile_FileReader_Read {
public static void main( Cord [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;endeavour ( FileReader fileReader = new FileReader (fileName) ) {
int singleCharInt;
char singleChar;
while ( (singleCharInt = fileReader.read ( ) ) != - 1 ) {
singleChar = ( char ) singleCharInt;//display 1 character at a fourth dimension
System.out.print (singleChar) ;
}
}
}
}
1b) FileReader – Explicit Encoding (InputStreamReader)
It's actually not possible to set the encoding explicitly on a FileReader so y'all have to utilize the parent course, InputStreamReader and wrap it effectually a FileInputStream:
1
2
3
4
5
6
7
8
nine
ten
xi
12
13
14
15
16
17
eighteen
19
twenty
21
22
import coffee.io.FileInputStream ;
import java.io.IOException ;
import java.io.InputStreamReader ;public class ReadFile_FileReader_Read_Encoding {
public static void main( String [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
FileInputStream fileInputStream = new FileInputStream (fileName) ;//specify UTF-8 encoding explicitly
try ( InputStreamReader inputStreamReader =
new InputStreamReader (fileInputStream, "UTF-8" ) ) {int singleCharInt;
char singleChar;
while ( (singleCharInt = inputStreamReader.read ( ) ) != - 1 ) {
singleChar = ( char ) singleCharInt;
Organization.out.print (singleChar) ; //display ane character at a time
}
}
}
}
2a) BufferedReader – Default Encoding
BufferedReader reads an unabridged line at a time, instead of one character at a time like FileReader. Information technology'south meant for reading text files.
1
2
3
4
5
6
vii
eight
9
10
11
12
thirteen
fourteen
15
16
17
import java.io.BufferedReader ;
import coffee.io.FileReader ;
import java.io.IOException ;public class ReadFile_BufferedReader_ReadLine {
public static void master( String [ ] args) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
FileReader fileReader = new FileReader (fileName) ;try ( BufferedReader bufferedReader = new BufferedReader (fileReader) ) {
String line;
while ( (line = bufferedReader.readLine ( ) ) != null ) {
System.out.println (line) ;
}
}
}
}
2b) BufferedReader – Explicit Encoding
In a similar way to how we fix encoding explicitly for FileReader, we need to create FileInputStream, wrap it inside InputStreamReader with an explicit encoding and pass that to BufferedReader:
1
2
3
iv
five
half-dozen
7
8
9
10
11
12
13
14
fifteen
16
17
18
xix
20
21
22
import java.io.BufferedReader ;
import java.io.FileInputStream ;
import java.io.IOException ;
import java.io.InputStreamReader ;public class ReadFile_BufferedReader_ReadLine_Encoding {
public static void main( String [ ] args) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;FileInputStream fileInputStream = new FileInputStream (fileName) ;
//specify UTF-8 encoding explicitly
InputStreamReader inputStreamReader = new InputStreamReader (fileInputStream, "UTF-8" ) ;try ( BufferedReader bufferedReader = new BufferedReader (inputStreamReader) ) {
String line;
while ( (line = bufferedReader.readLine ( ) ) != nothing ) {
Organisation.out.println (line) ;
}
}
}
}
Archetype I/O – Reading Bytes
1) FileInputStream
FileInputStream reads in i byte at a time, without any buffering. While it'southward meant for reading binary files such every bit images or audio files, it can still exist used to read text file. Information technology's similar to reading with FileReader in that yous're reading one character at a fourth dimension as an integer and you demand to cast that int to a char to see the ASCII value.
By default, it uses the default character encoding on your system, then I have provided examples for both the default case, also as specifying the encoding explicitly.
i
2
3
iv
5
vi
7
8
9
10
11
12
xiii
14
xv
16
17
18
19
20
21
import coffee.io.File ;
import java.io.FileInputStream ;
import coffee.io.FileNotFoundException ;
import coffee.io.IOException ;public form ReadFile_FileInputStream_Read {
public static void main( String [ ] pArgs) throws FileNotFoundException, IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;try ( FileInputStream fileInputStream = new FileInputStream (file) ) {
int singleCharInt;
char singleChar;while ( (singleCharInt = fileInputStream.read ( ) ) != - ane ) {
singleChar = ( char ) singleCharInt;
System.out.print (singleChar) ;
}
}
}
}
2) BufferedInputStream
BufferedInputStream reads a set of bytes all at once into an internal byte array buffer. The buffer size tin be prepare explicitly or use the default, which is what we'll demonstrate in our instance. The default buffer size appears to exist 8KB merely I have not explicitly verified this. All performance tests used the default buffer size so it volition automatically re-size the buffer when it needs to.
1
ii
3
4
v
6
7
8
nine
10
11
12
13
xiv
fifteen
16
17
18
19
twenty
21
22
import java.io.BufferedInputStream ;
import coffee.io.File ;
import coffee.io.FileInputStream ;
import java.io.FileNotFoundException ;
import java.io.IOException ;public class ReadFile_BufferedInputStream_Read {
public static void primary( String [ ] pArgs) throws FileNotFoundException, IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;
FileInputStream fileInputStream = new FileInputStream (file) ;attempt ( BufferedInputStream bufferedInputStream = new BufferedInputStream (fileInputStream) ) {
int singleCharInt;
char singleChar;
while ( (singleCharInt = bufferedInputStream.read ( ) ) != - i ) {
singleChar = ( char ) singleCharInt;
System.out.print (singleChar) ;
}
}
}
}
New I/O – Reading Text
1a) Files.readAllLines() – Default Encoding
The Files class is function of the new Java I/O classes introduced in jdk1.seven. Information technology only has static utility methods for working with files and directories.
The readAllLines() method that uses the default character encoding was introduced in jdk1.viii so this example will not work in Java 7.
1
2
three
four
5
6
vii
8
9
ten
eleven
12
13
xiv
15
xvi
17
import java.io.File ;
import java.io.IOException ;
import java.nio.file.Files ;
import coffee.util.Listing ;public course ReadFile_Files_ReadAllLines {
public static void main( String [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;List fileLinesList = Files.readAllLines (file.toPath ( ) ) ;
for ( Cord line : fileLinesList) {
Arrangement.out.println (line) ;
}
}
}
1b) Files.readAllLines() – Explicit Encoding
1
two
iii
4
5
six
7
8
9
x
11
12
13
xiv
15
sixteen
17
18
nineteen
import coffee.io.File ;
import java.io.IOException ;
import java.nio.charset.StandardCharsets ;
import java.nio.file.Files ;
import java.util.List ;public grade ReadFile_Files_ReadAllLines_Encoding {
public static void main( String [ ] pArgs) throws IOException {
Cord fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;//utilize UTF-8 encoding
List fileLinesList = Files.readAllLines (file.toPath ( ), StandardCharsets.UTF_8 ) ;for ( String line : fileLinesList) {
Arrangement.out.println (line) ;
}
}
}
2a) Files.lines() – Default Encoding
This code was tested to work in Java viii and 9. Java 7 didn't run because of the lack of back up for lambda expressions.
1
ii
3
4
v
6
vii
8
ix
10
xi
12
thirteen
14
15
16
17
import java.io.File ;
import java.io.IOException ;
import java.nio.file.Files ;
import java.util.stream.Stream ;public class ReadFile_Files_Lines {
public static void principal( Cord [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;attempt (Stream linesStream = Files.lines (file.toPath ( ) ) ) {
linesStream.forEach (line -> {
System.out.println (line) ;
} ) ;
}
}
}
2b) Files.lines() – Explicit Encoding
Merely like in the previous example, this code was tested and works in Coffee 8 and 9 just non in Java 7.
1
two
3
4
5
6
7
eight
nine
x
11
12
13
fourteen
15
xvi
17
18
import coffee.io.File ;
import java.io.IOException ;
import java.nio.charset.StandardCharsets ;
import coffee.nio.file.Files ;
import java.util.stream.Stream ;public grade ReadFile_Files_Lines_Encoding {
public static void chief( String [ ] pArgs) throws IOException {
Cord fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;effort (Stream linesStream = Files.lines (file.toPath ( ), StandardCharsets.UTF_8 ) ) {
linesStream.forEach (line -> {
System.out.println (line) ;
} ) ;
}
}
}
3a) Scanner – Default Encoding
The Scanner course was introduced in jdk1.vii and can be used to read from files or from the console (user input).
1
2
3
4
5
6
7
8
9
ten
eleven
12
xiii
14
15
16
17
eighteen
xix
import java.io.File ;
import java.io.FileNotFoundException ;
import coffee.util.Scanner ;public class ReadFile_Scanner_NextLine {
public static void master( Cord [ ] pArgs) throws FileNotFoundException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;try (Scanner scanner = new Scanner(file) ) {
Cord line;
boolean hasNextLine = false ;
while (hasNextLine = scanner.hasNextLine ( ) ) {
line = scanner.nextLine ( ) ;
Organisation.out.println (line) ;
}
}
}
}
3b) Scanner – Explicit Encoding
1
ii
3
4
5
6
7
8
ix
10
11
12
13
14
15
16
17
18
19
xx
import java.io.File ;
import java.io.FileNotFoundException ;
import java.util.Scanner ;public class ReadFile_Scanner_NextLine_Encoding {
public static void main( Cord [ ] pArgs) throws FileNotFoundException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;//use UTF-8 encoding
try (Scanner scanner = new Scanner(file, "UTF-8" ) ) {
String line;
boolean hasNextLine = imitation ;
while (hasNextLine = scanner.hasNextLine ( ) ) {
line = scanner.nextLine ( ) ;
Organisation.out.println (line) ;
}
}
}
}
New I/O – Reading Bytes
Files.readAllBytes()
Even though the documentation for this method states that "it is not intended for reading in large files" I found this to be the accented best performing file reading method, even on files as large every bit 1GB.
1
2
three
4
5
6
7
viii
9
x
11
12
13
14
15
16
17
import java.io.File ;
import java.io.IOException ;
import java.nio.file.Files ;public grade ReadFile_Files_ReadAllBytes {
public static void main( String [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;byte [ ] fileBytes = Files.readAllBytes (file.toPath ( ) ) ;
char singleChar;
for ( byte b : fileBytes) {
singleChar = ( char ) b;
System.out.print (singleChar) ;
}
}
}
third Party I/O – Reading Text
Commons – FileUtils.readLines()
Apache Commons IO is an open up source Java library that comes with utility classes for reading and writing text and binary files. I listed it in this article because information technology tin can be used instead of the built in Java libraries. The class we're using is FileUtils.
For this commodity, version 2.6 was used which is uniform with JDK 1.7+
Note that yous need to explicitly specify the encoding and that method for using the default encoding has been deprecated.
ane
2
3
four
v
six
seven
8
9
ten
11
12
13
14
15
sixteen
17
xviii
import coffee.io.File ;
import java.io.IOException ;
import java.util.List ;import org.apache.commons.io.FileUtils ;
public class ReadFile_Commons_FileUtils_ReadLines {
public static void main( String [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;List fileLinesList = FileUtils.readLines (file, "UTF-eight" ) ;
for ( String line : fileLinesList) {
System.out.println (line) ;
}
}
}
Guava – Files.readLines()
Google Guava is an open source library that comes with utility classes for common tasks like collections handling, enshroud management, IO operations, cord processing.
I listed it in this article considering it can exist used instead of the congenital in Java libraries and I wanted to compare its operation with the Java built in libraries.
For this article, version 23.0 was used.
I'yard not going to examine all the different means to read files with Guava, since this article is not meant for that. For a more detailed wait at all the unlike means to read and write files with Guava, have a look at Baeldung'south in depth article.
When reading a file, Guava requires that the character encoding exist set explicitly, just like Apache Commons.
Compatibility note: This code was tested successfully on Coffee 8 and 9. I couldn't get information technology to work on Coffee 7 and kept getting "Unsupported major.minor version 52.0" error. Guava has a separate API doctor for Java 7 which uses a slightly dissimilar version of the Files.readLine() method. I idea I could get it to work but I kept getting that error.
ane
2
3
iv
v
6
vii
viii
9
10
xi
12
thirteen
14
xv
xvi
17
18
19
import java.io.File ;
import java.io.IOException ;
import java.util.List ;import com.google.mutual.base.Charsets ;
import com.google.common.io.Files ;public form ReadFile_Guava_Files_ReadLines {
public static void main( String [ ] args) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;Listing fileLinesList = Files.readLines (file, Charsets.UTF_8 ) ;
for ( Cord line : fileLinesList) {
System.out.println (line) ;
}
}
}
Operation Testing
Since in that location are so many ways to read from a file in Java, a natural question is "What file reading method is the best for my state of affairs?" And then I decided to test each of these methods against each other using sample data files of different sizes and timing the results.
Each code sample from this article displays the contents of the file to a string and then to the console (System.out). Withal, during the functioning tests the System.out line was commented out since it would seriously boring down the performance of each method.
Each performance test measures the time it takes to read in the file – line by line, character by character, or byte past byte without displaying annihilation to the console. I ran each test 5-x times and took the average so as not to let whatever outliers influence each test. I also ran the default encoding version of each file reading method – i.eastward. I didn't specify the encoding explicitly.
Dev Setup
The dev environment used for these tests:
- Intel Core i7-3615 QM @two.3 GHz, 8GB RAM
- Windows eight x64
- Eclipse IDE for Java Developers, Oxygen.2 Release (4.7.2)
- Java SE 9 (jdk-ix.0.4)
Data Files
GitHub doesn't allow pushing files larger than 100 MB, so I couldn't discover a practical way to store my large examination files to allow others to replicate my tests. So instead of storing them, I'grand providing the tools I used to generate them so yous tin can create test files that are like in size to mine. Plainly they won't be the same, but yous'll generate files that are similar in size as I used in my performance tests.
Random Cord Generator was used to generate sample text and then I just copy-pasted to create larger versions of the file. When the file started getting too large to manage inside a text editor, I had to use the command line to merge multiple text files into a larger text file:
re-create *.txt sample-1GB.txt
I created the following 7 data file sizes to test each file reading method across a range of file sizes:
- 1KB
- 10KB
- 100KB
- 1MB
- 10MB
- 100MB
- 1GB
Performance Summary
There were some surprises and some expected results from the operation tests.
As expected, the worst performers were the methods that read in a file graphic symbol by character or byte past byte. But what surprised me was that the native Java IO libraries outperformed both 3rd political party libraries – Apache Eatables IO and Google Guava.
What'south more than – both Google Guava and Apache Commons IO threw a coffee.lang.OutOfMemoryError when trying to read in the 1 GB test file. This also happened with the Files.readAllLines(Path) method just the remaining 7 methods were able to read in all examination files, including the 1GB test file.
The post-obit table summarizes the average time (in milliseconds) each file reading method took to complete. I highlighted the meridian three methods in dark-green, the boilerplate performing methods in yellow and the worst performing methods in red:
The following chart summarizes the above table but with the following changes:
I removed coffee.io.FileInputStream.read() from the chart because its performance was and then bad information technology would skew the entire chart and yous wouldn't meet the other lines properly
I summarized the information from 1KB to 1MB because after that, the chart would get too skewed with so many under performers and also some methods threw a coffee.lang.OutOfMemoryError at 1GB
The Winners
The new Java I/O libraries (java.nio) had the best overall winner (java.nio.Files.readAllBytes()) only information technology was followed closely behind by BufferedReader.readLine() which was also a proven summit performer beyond the board. The other excellent performer was java.nio.Files.lines(Path) which had slightly worse numbers for smaller test files but really excelled with the larger test files.
The accented fastest file reader across all data tests was java.nio.Files.readAllBytes(Path). It was consistently the fastest and even reading a 1GB file just took about i 2d.
The post-obit nautical chart compares operation for a 100KB exam file:
You can encounter that the lowest times were for Files.readAllBytes(), BufferedInputStream.read() and BufferedReader.readLine().
The following chart compares operation for reading a 10MB file. I didn't bother including the bar for FileInputStream.Read() because the operation was so bad it would skew the unabridged chart and you couldn't tell how the other methods performed relative to each other:
Files.readAllBytes() really outperforms all other methods and BufferedReader.readLine() is a distant 2nd.
The Losers
As expected, the absolute worst performer was java.io.FileInputStream.read() which was orders of magnitude slower than its rivals for most tests. FileReader.read() was also a poor performer for the same reason – reading files byte past byte (or character past character) instead of with buffers drastically degrades operation.
Both the Apache Eatables IO FileUtils.readLines() and Guava Files.readLines() crashed with an OutOfMemoryError when trying to read the 1GB examination file and they were most average in performance for the remaining test files.
java.nio.Files.readAllLines() besides crashed when trying to read the 1GB test file simply information technology performed quite well for smaller file sizes.
Performance Rankings
Hither'south a ranked list of how well each file reading method did, in terms of speed and handling of large files, also equally compatibility with different Java versions.
| Rank | File Reading Method |
|---|---|
| 1 | coffee.nio.file.Files.readAllBytes() |
| two | java.io.BufferedFileReader.readLine() |
| 3 | coffee.nio.file.Files.lines() |
| 4 | java.io.BufferedInputStream.read() |
| v | java.util.Scanner.nextLine() |
| 6 | java.nio.file.Files.readAllLines() |
| 7 | org.apache.commons.io.FileUtils.readLines() |
| viii | com.google.common.io.Files.readLines() |
| 9 | java.io.FileReader.read() |
| ten | java.io.FileInputStream.Read() |
Conclusion
I tried to present a comprehensive set of methods for reading files in Java, both text and binary. We looked at 15 different ways of reading files in Java and we ran performance tests to come across which methods are the fastest.
The new Coffee IO library (java.nio) proved to be a great performer just so was the archetype BufferedReader.
Source: https://funnelgarden.com/java_read_file/
0 Response to "Reading From a Binary File in Java"
Post a Comment