Preface
In daily work, it's often necessary to determine a file's type. This summary outlines two common principles for identifying file types:
-
Based on File Extension
- Advantages: Fast, simple code.
- Disadvantages: Cannot detect the true file type for forged files or files without extensions.
-
Based on the First Few Characters in the File Stream
- Advantages: Can identify the true file type.
- Disadvantages: Slower, more complex code.
To illustrate, tests were conducted with the following files:
-
test.png
: A standard PNG file namedtest.png
. -
test.doc
: A copy oftest.png
renamed totest.doc
.
1. Using Files.probeContentType
Introduced in Java 7, the Files.probeContentType
method detects MIME types.
public static void test() throws IOException {
Path path = new File("d:/test.png").toPath();
String mimeType = Files.probeContentType(path);
System.out.println(mimeType);
}
Results
File | Result | Conclusion |
---|---|---|
test.png |
image/png |
✔️ |
test.doc |
application/msword |
❌ |
-
Mechanism: Uses OS-specific
FileTypeDetector
implementations to determine MIME types. - Limitation: Accuracy depends on the operating system.
Conclusion: This method relies on file extensions.
2. Using URLConnection
The URLConnection
class offers several APIs for detecting MIME types.
2.1 Using getContentType
public void test() {
File file = new File("d:/test.png");
URLConnection connection = file.toURL().openConnection();
String mimeType = connection.getContentType();
}
Results
File | Result | Conclusion |
---|---|---|
test.png |
image/png |
✔️ |
test.doc |
image/png |
✔️ |
- Conclusion: Detects the true file type, but is slow.
2.2 Using guessContentTypeFromName
public void test() {
File file = new File("d:/test.png");
String mimeType = URLConnection.guessContentTypeFromName(file.getName());
}
Results
File | Result | Conclusion |
---|---|---|
test.png |
image/png |
✔️ |
test.doc |
null |
❌ Please refer to 2.4 below for details |
This method uses the internal FileNameMap
to determine the MIME type.
- Conclusion: Relies on file extensions.
2.3 Using guessContentTypeFromStream
public static void test() throws Exception {
FileInputStream inputFile = new FileInputStream("d:/test.doc");
String mimeType = URLConnection.guessContentTypeFromStream(new BufferedInputStream(inputFile));
System.out.println(mimeType);
}
Results
File | Result | Conclusion |
---|---|---|
test.png |
image/png |
✔️ |
test.doc |
image/png |
✔️ |
- Conclusion: Detects the true file type by analyzing the file stream.
2.4 Using getFileNameMap
public void test() {
File file = new File("d:/test.png");
FileNameMap fileNameMap = URLConnection.getFileNameMap();
String mimeType = fileNameMap.getContentTypeFor(file.getName());
}
Results
File | Result | Conclusion |
---|---|---|
test.png |
image/png |
✔️ |
test.doc |
null |
❌ |
The method returns the MIME type table used by all instances of URLConnection
. This table is then used to determine the type of input files.
When it comes to URLConnection
, the built-in table of MIME types is quite limited.
By default, this class uses the content-types.properties
file, located in the JRE_HOME/lib
directory. However, we can extend it by specifying a user-specific table using the content.types.user.table
property:
System.setProperty("content.types.user.table","<path-to-file>");
Conclusion: Relies on file extensions.
3. Using MimeTypesFileTypeMap
Available in Java 6, this class uses a predefined mime.types
file for MIME type detection.
public void test() {
File file = new File("product.png");
MimetypesFileTypeMap fileTypeMap = new MimetypesFileTypeMap();
String mimeType = fileTypeMap.getContentType(file.getName());
}
Results
File | Result | Conclusion |
---|---|---|
test.png |
image/png |
✔️ |
test.doc |
application/octet-stream |
❌ |
Here we can pass either the filename or the File
instance itself as a parameter to the function. However, the function that takes the File
instance internally calls an overloaded method, which accepts the filename as a parameter.
Internally this method looks for a file named mime.types
to resolve the type. It is important to note that this method searches for the file in a specific order:
- Entries added programmatically to the
MimetypesFileTypeMap
instance -
mime.types
in the user's home directory -
<java.home>/lib/mime.types
- A resource named
META-INF/mime.types
- A resource named
META-INF/mimetypes.default
(usually found only in theactivation.jar
file)
If the file cannot be found, the method will return application/octet-stream
as the response.
Conclusion: The file type is determined based on the file extension.
4. Using jMimeMagic
jMimeMagic
is a third-party library for detecting MIME types.
Configure Maven Dependency:
Dependency
<dependency>
<groupId>net.sf.jmimemagic</groupId>
<artifactId>jmimemagic</artifactId>
<version>0.1.5</version>
</dependency>
We can find the latest version of this library on Maven Central.
Next, let’s explore how to use this library:
public void test() {
File file = new File("d:/test.doc");
MagicMatch match = Magic.getMagicMatch(file, false);
System.out.println(match.getMimeType());
}
The library can handle data streams, so the file does not need to exist in the file system.
Results
File | Result | Conclusion |
---|---|---|
test.png |
image/png |
✔️ |
test.doc |
image/png |
✔️ |
- Conclusion: Detects true file types based on file streams.
5. Using Apache Tika
Apache Tika is a toolkit that can detect and extract metadata and text from various files. It features a rich and powerful API, and with tika-core, we can use it to detect the MIME type of files.
Configuring Maven Dependency:
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
<version>1.18</version>
</dependency>
Next, we will use the detect() method to parse the type:
public void whenUsingTika_thenSuccess() {
File file = new File("d:/test.doc");
Tika tika = new Tika();
String mimeType = tika.detect(file);
}
Results
File | Result | Conclusion |
---|---|---|
test.png |
image/png |
✔️ |
test.doc |
image/png |
✔️ |
- Conclusion: Accurately detects true file types using file streams.
Summary
The classification based on the detection principles is summarized as follows:
Detection Principle | Methods |
---|---|
Based on File Extension | 1. Files.probeContentType 2. URLConnection.guessContentTypeFromName 3. URLConnection.getFileNameMap 4. MimeTypesFileTypeMap
|
Based on File Stream | 1. URLConnection.getContentType 2. URLConnection.guessContentTypeFromStream 3. jMimeMagic 4. Apache Tika
|
Top comments (0)