JEP 254能节省多少内存?


JEP 254将字符串中 的char[] 替换 为 byte[] 会节省多少内存?让我们看看 JEP 254 的实际影响。

JEP 254 提案后,Java 的字符串从 Java 9 开始被压缩。如果所有字符都属于 LATIN1 字符集,则可以将字符串压缩为一个字节,而不是每个字符使用两个字节。这可以为我们节省大量空间,具体取决于字符串的长度。在长度为165时可节省了 45%。

如何计算 Java 11+ 和 Java 8 之间字符串的成本节省?有没有办法分析堆并找到所有 String 实例?幸运的是,我发现了Ryan Cuprak 的一个有趣的演讲,它为我指出了一个简单的解决方案。只需使用 Netbeans 分析器 API 来分析堆。由于它在 Maven Central 上,我们可以向我们的 pom.xml 文件添加依赖项:

<dependency>
  <groupId>org.netbeans.modules</groupId>
  <artifactId>org-netbeans-lib-profiler</artifactId>
  <version>RELEASE160</version>
</dependency>

不幸的是,netbeans的剖析器api没有与JPMS模块化,因此我们需要依靠自动模块。这并不理想,但目前还能用。下面是我的module-info.java文件。

module eu.javaspecialists.tjsn.issue306 {
  requires org.netbeans.lib.profiler.RELEASE160;
}

分析堆转储的代码看起来像这样。它非常简单--我们用Netbeans的HeapFactory读入堆转储。然后我们通过 "java.lang.String "类进行过滤,创建包含编码和数组长度的StringData实例。如果我们使用-verbose参数,我们可以选择打印出所有的字符串。最后,我们可以看到字符串的总数,以及我们在Java 11中使用JEP 254与Java 8相比节省了多少空间。

package eu.javaspecialists.tjsn.issue306;

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.Locale;
import java.util.function.Predicate;
import java.util.stream.Stream;

import org.netbeans.lib.profiler.heap.HeapFactory;
import org.netbeans.lib.profiler.heap.Instance;
import org.netbeans.lib.profiler.heap.PrimitiveArrayInstance;

import static java.nio.charset.StandardCharsets.ISO_8859_1;
import static java.nio.charset.StandardCharsets.UTF_16LE;

public class HeapAnalysis {
    private enum Coder {LATIN1, UTF16}
    private record StringData(Coder coder, int length) {}

    public static void main(String... args) throws IOException {
        if (args.length < 1 || args.length > 2 ||
                args.length > 1 && !args[0].equals("-verbose")) {
            System.err.println(
"Usage: java HeapAnalysis " +
                   
"[-verbose] heapdump");
            System.exit(1);
        }
        var verbose = args.length == 2;
        var filename = args[args.length - 1];
        System.out.println(
"Inspecting heap file " + filename);
        var heap = HeapFactory.createHeap(new File(filename));
        var stringClass = heap.getJavaClassByName(
               
"java.lang.String");
        var instances = stringClass.getInstancesIterator();
        var stats = extractStringData(instances, verbose);
        printStatistics(stats);
    }

    private static List<StringData> extractStringData(
            Iterator<Instance> instances, boolean verbose) {
        var result = new ArrayList<StringData>();
        while (instances.hasNext()) {
            Instance instance = instances.next();
            Coder coder = getCoder(instance);
            int length = getLength(instance, coder, verbose);
            result.add(new StringData(coder, length));
        }
        return result;
    }

    private static Coder getCoder(Instance instance) {
        Byte coder = (Byte) instance.getValueOfField(
"coder");
        return switch (coder) {
            case 0 -> Coder.LATIN1;
            case 1 -> Coder.UTF16;
            case null -> throw new IllegalStateException(
                   
"Analysis for Java 11+ heap dumps only -"
                            +
" field coder not found in"
                            +
" java.lang.String");
            default -> throw new IllegalStateException(
                   
"Unknown coder: " + coder);
        };
    }

    private static int getLength(Instance instance, Coder coder,
                                 boolean verbose) {
        var array = (PrimitiveArrayInstance)
                instance.getValueOfField(
"value");
        if (array == null)
            throw new IllegalStateException(
                   
"java.lang.String instances did not have a"
                            +
" value array field");

        int length = array.getLength();

        if (verbose) {
            List<String> arrayValues = array.getValues();
            byte[] bytes = new byte[length];
            int i = 0;
            for (String str : arrayValues)
                bytes[i++] = Byte.parseByte(str);
            System.out.println(switch (coder) {
                case LATIN1 ->
"LATIN1: "
                        + new String(bytes, ISO_8859_1);
                case UTF16 ->
"UTF16: "
                        + new String(bytes, UTF_16LE);
            });
        }
        return length;
    }

    private static final Predicate<StringData> LATIN1_FILTER =
            datum -> datum.coder() == Coder.LATIN1;
    private static final Predicate<StringData> UTF16_FILTER =
            datum -> datum.coder() == Coder.UTF16;

    private static void printStatistics(List<StringData> data) {
        long j8Memory = memoryUsed(data.stream(), 2);
        long j11MemoryLatin1 =
                memoryUsed(data.stream().filter(LATIN1_FILTER), 1);
        long j11MemoryUTF16 =
                memoryUsed(data.stream().filter(UTF16_FILTER), 2);
        long j11Memory = j11MemoryLatin1 + j11MemoryUTF16;
        var latin1Size = data.stream().filter(LATIN1_FILTER).count();
        var utf16Size = data.stream().filter(UTF16_FILTER).count();
        System.out.printf(Locale.US,
"""
                        Total number of String instances:
                            LATIN1 %,d
                            UTF16  %,d
                            Total  %,d
                       
""",
                latin1Size, utf16Size, latin1Size + utf16Size);
        System.out.printf(Locale.US,
"""
                Java 8 memory used by String instances:
                    Total  %,d bytes
               
""", j8Memory);
        System.out.printf(Locale.US,
"""
                Java 11+ memory used by String instances:
                    LATIN1 %,d bytes
                    UTF16  %,d bytes
                    Total  %,d bytes
               
""", j11MemoryLatin1, j11MemoryUTF16, j11Memory);
        System.out.printf(Locale.US,
"Saving of %.2f%%%n", 100.0 *
                (j8Memory - j11Memory) / j8Memory);
    }

    private static int memoryUsed(Stream<StringData> stats,
                                  int bytesPerChar) {
        return stats
                .mapToInt(datum -> getStringSize(datum.length(),
                        bytesPerChar))
                .sum();
    }

    private static int getStringSize(int length, int bytesPerChar) {
        return 24 + 16 +
                (int) (Math.ceil(length * bytesPerChar / 8.0) * 8);
    }
}

下面是一个针对IntelliJ IDEA的堆转储运行我们代码的例子:

Inspecting heap file idea.hprof
Total number of String instances:
    LATIN1 2,656,321
    UTF16  47,231
    Total  2,703,552
Java 8 memory used by String instances:
    Total  275,473,456 bytes
Java 11+ memory used by String instances:
    LATIN1 192,371,520 bytes
    UTF16  8,066,152 bytes
    Total  200,437,672 bytes
Saving of 27.24%

而这里是对我的JavaSpecialists.eu Tomcat服务器的分析:

Inspecting heap file javaspecialists.hprof
Total number of String instances:
    LATIN1 165,865
    UTF16  174
    Total  166,039
Java 8 memory used by String instances:
    Total  15,145,800 bytes
Java 11+ memory used by String instances:
    LATIN1 11,210,944 bytes
    UTF16  111,600 bytes
    Total  11,322,544 bytes
Saving of 25.24%