DEV Community

Excalibra
Excalibra

Posted on

Summary of Methods to Obtain MIME Types of Files in Java

Preface

In daily work, it's often necessary to determine a file's type. This summary outlines two common principles for identifying file types:

  1. Based on File Extension

    • Advantages: Fast, simple code.
    • Disadvantages: Cannot detect the true file type for forged files or files without extensions.
  2. Based on the First Few Characters in the File Stream

    • Advantages: Can identify the true file type.
    • Disadvantages: Slower, more complex code.

To illustrate, tests were conducted with the following files:

  • test.png: A standard PNG file named test.png.
  • test.doc: A copy of test.png renamed to test.doc.

1. Using Files.probeContentType

Introduced in Java 7, the Files.probeContentType method detects MIME types.

public static void test() throws IOException {
    Path path = new File("d:/test.png").toPath();
    String mimeType = Files.probeContentType(path);
    System.out.println(mimeType);
}
Enter fullscreen mode Exit fullscreen mode

Results

File Result Conclusion
test.png image/png ✔️
test.doc application/msword
  • Mechanism: Uses OS-specific FileTypeDetector implementations to determine MIME types.
  • Limitation: Accuracy depends on the operating system.

Conclusion: This method relies on file extensions.


2. Using URLConnection

The URLConnection class offers several APIs for detecting MIME types.

2.1 Using getContentType

public void test() {
    File file = new File("d:/test.png");
    URLConnection connection = file.toURL().openConnection();
    String mimeType = connection.getContentType();
}
Enter fullscreen mode Exit fullscreen mode

Results

File Result Conclusion
test.png image/png ✔️
test.doc image/png ✔️
  • Conclusion: Detects the true file type, but is slow.

2.2 Using guessContentTypeFromName

public void test() {
    File file = new File("d:/test.png");
    String mimeType = URLConnection.guessContentTypeFromName(file.getName());
}
Enter fullscreen mode Exit fullscreen mode

Results

File Result Conclusion
test.png image/png ✔️
test.doc null ❌ Please refer to 2.4 below for details

This method uses the internal FileNameMap to determine the MIME type.

  • Conclusion: Relies on file extensions.

2.3 Using guessContentTypeFromStream

public static void test() throws Exception {
    FileInputStream inputFile = new FileInputStream("d:/test.doc");
    String mimeType = URLConnection.guessContentTypeFromStream(new BufferedInputStream(inputFile));
    System.out.println(mimeType);
}
Enter fullscreen mode Exit fullscreen mode

Results

File Result Conclusion
test.png image/png ✔️
test.doc image/png ✔️
  • Conclusion: Detects the true file type by analyzing the file stream.

2.4 Using getFileNameMap

public void test() {
    File file = new File("d:/test.png");
    FileNameMap fileNameMap = URLConnection.getFileNameMap();
    String mimeType = fileNameMap.getContentTypeFor(file.getName());
}
Enter fullscreen mode Exit fullscreen mode

Results

File Result Conclusion
test.png image/png ✔️
test.doc null

The method returns the MIME type table used by all instances of URLConnection. This table is then used to determine the type of input files.

When it comes to URLConnection, the built-in table of MIME types is quite limited.

By default, this class uses the content-types.properties file, located in the JRE_HOME/lib directory. However, we can extend it by specifying a user-specific table using the content.types.user.table property:

System.setProperty("content.types.user.table","<path-to-file>");
Enter fullscreen mode Exit fullscreen mode

Conclusion: Relies on file extensions.


3. Using MimeTypesFileTypeMap

Available in Java 6, this class uses a predefined mime.types file for MIME type detection.

public void test() {
    File file = new File("product.png");
    MimetypesFileTypeMap fileTypeMap = new MimetypesFileTypeMap();
    String mimeType = fileTypeMap.getContentType(file.getName());
}
Enter fullscreen mode Exit fullscreen mode

Results

File Result Conclusion
test.png image/png ✔️
test.doc application/octet-stream

Here we can pass either the filename or the File instance itself as a parameter to the function. However, the function that takes the File instance internally calls an overloaded method, which accepts the filename as a parameter.

Internally this method looks for a file named mime.types to resolve the type. It is important to note that this method searches for the file in a specific order:

  1. Entries added programmatically to the MimetypesFileTypeMap instance
  2. mime.types in the user's home directory
  3. <java.home>/lib/mime.types
  4. A resource named META-INF/mime.types
  5. A resource named META-INF/mimetypes.default (usually found only in the activation.jar file)

If the file cannot be found, the method will return application/octet-stream as the response.

Conclusion: The file type is determined based on the file extension.


4. Using jMimeMagic

jMimeMagic is a third-party library for detecting MIME types.

Configure Maven Dependency:

Dependency

<dependency>
    <groupId>net.sf.jmimemagic</groupId>
    <artifactId>jmimemagic</artifactId>
    <version>0.1.5</version>
</dependency>
Enter fullscreen mode Exit fullscreen mode

We can find the latest version of this library on Maven Central.

Next, let’s explore how to use this library:

public void test() {
    File file = new File("d:/test.doc");
    MagicMatch match = Magic.getMagicMatch(file, false);
    System.out.println(match.getMimeType());
}
Enter fullscreen mode Exit fullscreen mode

The library can handle data streams, so the file does not need to exist in the file system.

Results

File Result Conclusion
test.png image/png ✔️
test.doc image/png ✔️
  • Conclusion: Detects true file types based on file streams.

5. Using Apache Tika

Apache Tika is a toolkit that can detect and extract metadata and text from various files. It features a rich and powerful API, and with tika-core, we can use it to detect the MIME type of files.

Configuring Maven Dependency:

<dependency>
    <groupId>org.apache.tika</groupId>
    <artifactId>tika-core</artifactId>
    <version>1.18</version>
</dependency>
Enter fullscreen mode Exit fullscreen mode

Next, we will use the detect() method to parse the type:

public void whenUsingTika_thenSuccess() {
    File file = new File("d:/test.doc");
    Tika tika = new Tika();
    String mimeType = tika.detect(file);
}
Enter fullscreen mode Exit fullscreen mode

Results

File Result Conclusion
test.png image/png ✔️
test.doc image/png ✔️
  • Conclusion: Accurately detects true file types using file streams.

Summary

The classification based on the detection principles is summarized as follows:

Detection Principle Methods
Based on File Extension 1. Files.probeContentType 2. URLConnection.guessContentTypeFromName 3. URLConnection.getFileNameMap 4. MimeTypesFileTypeMap
Based on File Stream 1. URLConnection.getContentType 2. URLConnection.guessContentTypeFromStream 3. jMimeMagic 4. Apache Tika

Top comments (0)