Get File Extension From MIME Type in Java

1. Overview

A MIME type is a label that specifies the type and the format of data on the internet. A single MIME type can be associated with multiple file extensions. For instance, the “image/jpeg” MIME type encompasses extensions like “.jpg“, “.jpeg” or “.jpe“.

In this tutorial, we’ll explore different methods for determining the file extension for a particular MIjava ME type in Java. We’ll focus on four major approaches to solve the problem.

Some of our implementations will include an optional last dot in the extension. For example, if our MIME type name is “image/jpeg“, either the string “jpg” or “.jpg” will be returned as the file’s extension.

2. Using Apache Tika

Apache Tika is a toolkit that detects and extracts metadata and text from various files. It includes a rich and powerful API that can be used to detect file extensions for a MIME type.

Let’s begin by configuring the Maven dependency:

<dependency>
    <groupId>org.apache.tika</groupId>
    <artifactId>tika-core</artifactId>
    <version>2.9.0</version>
</dependency>

As mentioned before, a single MIME type can have multiple extensions. To handle this, the MimeType class provides two distinct methods: getExtension() and getExtensions().

The getExtension() method returns the preferred file extension, while getExtensions() returns the list of all known file extensions for that MIME type.

Next, we’ll use both the methods from the MimeType class to retrieve the extension:

@Test
public void whenUsingTika_thenGetFileExtension() {
    List<String> expectedExtensions = Arrays.asList(".jpg", ".jpeg", ".jpe", ".jif", ".jfif", ".jfi");
    MimeTypes allTypes = MimeTypes.getDefaultMimeTypes();
    MimeType type = allTypes.forName("image/jpeg");
    String primaryExtension = type.getExtension();
    assertEquals(".jpg", primaryExtension);
    List<String> detectedExtensions = type.getExtensions();
    assertThat(detectedExtensions).containsExactlyElementsOf(expectedExtensions);
}

3. Using Jodd Util

We can alternatively use the Jodd Util library, which contains a utility to find file extensions for a MIME type.

Let’s begin by adding the Maven dependency:

<dependency>
    <groupId>org.jodd</groupId>
     <artifactId>jodd-util</artifactId>
    <version>6.2.1</version>
</dependency>

Next, we’ll use the findExtensionsByMimeTypes() method to get all the supported file extensions:

@Test
public void whenUsingJodd_thenGetFileExtension() {
    List<String> expectedExtensions = Arrays.asList("jpeg", "jpg", "jpe");
    String[] detectedExtensions = MimeTypes.findExtensionsByMimeTypes("image/jpeg", false);
    assertThat(detectedExtensions).containsExactlyElementsOf(expectedExtensions);
}

Jodd Util provides a limited set of recognized file types and extensions. It prioritizes simplicity over comprehensive coverage.

In the findExtensionsByMimeTypes() method, we can activate wildcard mode with the second boolean parameter set to true. When a wildcard pattern is provided as a MIME type, we’ll get extensions for all the MIME types that match the specified wildcard pattern.

For instance, when we set the MIME type as image/* and enable wildcard mode, we obtain extensions for all MIME types within the image category.

4. Using SimpleMagic

SimpleMagic is a utility package whose primary use is MIME type detection for files. It also contains a way to convert a MIME type to a file extension.

Let’s start by adding the Maven dependency:

<dependency>
    <groupId>com.j256.simplemagic</groupId>
    <artifactId>simplemagic</artifactId>
    <version>1.17</version>
</dependency>

Now, we’ll use the getFileExtensions() method of the ContentInfo class to get all the supported file extensions:

@Test
public void whenUsingSimpleMagic_thenGetFileExtension() {
    List<String> expectedExtensions = Arrays.asList("jpeg", "jpg", "jpe");
    String[] detectedExtensions = ContentType.fromMimeType("image/jpeg").getFileExtensions();
    assertThat(detectedExtensions).containsExactlyElementsOf(expectedExtensions);
}

We have an enum ContentType in the SimpleMagic library, which includes mappings of MIME types along with their corresponding file extensions and simple names. getFileExtensions() uses this enum, enabling us to retrieve the file extension based on the provided MIME type.

5. Using a Custom Map of MIME Type to Extensions

We can also obtain a file extension from a MIME type without depending on external libraries. We’ll create a custom mapping of MIME types to file extensions to do this.

Let’s create a HashMap named mimeToExtensionMap to associate MIME types with their corresponding file extensions. The get() method allows us to look up the preconfigured file extensions for the provided MIME type in the map and return them:

@Test
public void whenUsingCustomMap_thenGetFileExtension() {
    Map<String, Set<String>> mimeToExtensionMap = new HashMap<>();
    List<String> expectedExtensions = Arrays.asList(".jpg", ".jpe", ".jpeg");
    addMimeExtensions(mimeToExtensionMap, "image/jpeg", ".jpg");
    addMimeExtensions(mimeToExtensionMap, "image/jpeg", ".jpe");
    addMimeExtensions(mimeToExtensionMap, "image/jpeg", ".jpeg");
    Set<String> detectedExtensions = mimeToExtensionMap.get("image/jpeg");
    assertThat(detectedExtensions).containsExactlyElementsOf(expectedExtensions);
}
void addMimeExtensions(Map<String, Set> map, String mimeType, String extension) {
    map.computeIfAbsent(mimeType, k-> new HashSet<>()).add(extension);
}

The sample map includes a few examples, but it can be easily customized by adding additional mappings as necessary.

6. Conclusion

In this article, we explored different methods for extracting file extensions from MIME types. We examined two distinct approaches: leveraging existing libraries and crafting custom logic tailored to our needs.

When dealing with a limited set of MIME types, custom logic is an option, though it can have maintenance challenges. Conversely, libraries such as Apache Tika or Jodd Util offer broad MIME type coverage and ease of use, making them a reliable choice for handling a wide array of MIME types.

As always, the source code used in this article is available over on GitHub.

       

\"IT電腦補習
立刻註冊及報名電腦補習課程吧!

Find A Teacher Form:
https://docs.google.com/forms/d/1vREBnX5n262umf4wU5U2pyTwvk9O-JrAgblA-wH9GFQ/viewform?edit_requested=true#responses

Email:
public1989two@gmail.com






www.itsec.hk
www.itsec.vip
www.itseceu.uk

Be the first to comment

Leave a Reply

Your email address will not be published.


*