使用Java Stream API中DistinctBy删除重复数据


Stream API提供distinct()方法,该方法基于数据Object类的equals()方法返回列表的不同元素。下面先做一个数据Object类,用来发现重复数据:

public class LegacyObject {

  private final UUID id;
  private final String foo;
  private final int bar;

  public LegacyObject(UUID id, String foo, int bar) {
    this.id = id;
    this.foo = foo;
    this.bar = bar;
  }

  @Override
  public int hashCode() {
    return Objects.hash(id);
  }

  // Implementation of equals() using only the id field
 
// Getters
}

public class DeduplicateWrapper {

  private final LegacyObject object;

  public DeduplicateWrapper(LegacyObject object) {
    this.object = object;
  }

  public LegacyObject getObject() {
    return object;
  }

  @Override
  public int hashCode() {
    return Objects.hash(object.getFoo());
  }

 
// Implementation of equals() using only the foo field of the wrapped object
}

使用流API重复删除集合:

List<LegacyObject> duplicates = ...;

duplicates.stream()
    .map(DeduplicateWrapper::new)
    .distinct()
    .map(DeduplicateWrapper::getObject);

不使用Stream的Java8之前代码

List<LegacyObject> deduplicated = new ArrayList<>();
Set<DeduplicateWrapper> wrappers = new HashSet<>();
for (LegacyObject duplicate: duplicates) {
  wrappers.add(new DeduplicateWrapper(duplicate));
}
for (DeduplicateWrapper wrapper: wrappers) {
  deduplicated.add(wrapper.getObject());
}

如果你足够幸运能够使用Kotlin:

val duplicates: List<LegacyObject> = ...
duplicates.distinctBy { it.foo }