Java 스트림 API 기본 @ntalbs' stuff

Java에 스트림 API가 도입된 것은 2014년 발표된 Java 8부터다. 벌써 10년도 넘는 세월이 지났지만, 실제로 스트림 API를 사용할 기회는 많지 않았고, 간단한 작업에도 헤메는 경우가 많았다. 여기 스트림의 기본 사용법을 정리해 놓고, 필요할 때마다 참고하려 한다.

스트림 타입

Stream<T> 은 일반 객체에 대한 스트림, IntStream, LongStream, DoubleStream 은 각각 원시타입 int, long, double 에 대한 스트림이다.

IntStream, LongStream, DoubleStream 은 모두 숫자 스트림이다. 숫자로 이루어진 스트림을 다룰 때는 IntStream, LongStream, DoubleStream 을 쓰는 게 여러 모로 편리하다. 숫자 스트림에서는 min/max 에 Comparator 를 제공하지 않아도 되고, sum/average 도 간단히 구할 수 있다. summaryStatistics 를 이용하면 count/sum/min/average/max 를 한 번에 구할 수 있다.

return IntStream.of(1, 3, 5, 7, 9)
    .summaryStatistics();

IntSummaryStatistics{count=5, sum=25, min=1, average=5.000000, max=9}

Stream<T> 의 각 요소에서 숫자만 뽑아내 새로운 스트림을 만들 때는 mapTo{Int|Long|Double} 을 사용해 {Int|Long|Double}Stream 으로 바꿀 수 있다. 그냥 map 함수를 사용할 경우에는 Integer/Long/Double 래퍼 타입을 쓰게 된다.

스트림 생성

다음과 같이 Stream 인터페이스가 제공하는 메서드를 사용해 스트림을 직접 만들 수도 있다.

of

Stream.of 메서드를 사용해 스트림의 각 요소를 직접 나열해 스트림을 생성할 수 있다.

return Arrays.toString(
  IntStream
    .of(1, 2, 3, 4, 5)
    .toArray()
);

[1, 2, 3, 4, 5]

return Arrays.toString(
  Stream.of("alice", "bob", "carol", "david", "emma")
    .toArray()
);

[alice, bob, carol, david, emma]

generate

generate 메서드를 사용해 무한 스트림을 생성할 수 있다. 무작위 스트림이나 상수 스트림을 만들 때 유용하다.

return Arrays.toString(
  IntStream.generate(new Random()::nextInt)
    .limit(5)
    .toArray()
);

[1865190359, -1056582372, -111348795, 1248050296, -1927407346]

return Arrays.toString(
  Stream.generate(() -> "yes")
    .limit(5)
    .toArray()
);

[yes, yes, yes, yes, yes]

iterate

iterate 메서드로 스트림을 생성할 수 있다.

return Arrays.toString(
  IntStream.iterate(1, x -> x * 10)
    .limit(5)
    .toArray()
);

[1, 10, 100, 1000, 10000]

range

정수형 스트림(IntStream, LongStream)의 경우 range, rangeClosed 메서드로 지정된 범위의 숫자로 스트림을 만들 수 있다. 숫자는 1씩 증가하며, 증분을 따로 조절할 수 있는 방법은 없으므로, 2씩 증가하는, 또는 n씩 증가하는 스트림을 만들고 싶다면 별도 작업이 필요하다.

return IntStream.rangeClosed(1, 10)
  .filter(x -> x % 2 == 0)
  .boxed()
  .toList()
  .toString();

[2, 4, 6, 8, 10]

range 및 rangeClosed 는 범위의 시작과 끝을 나타내는 인자 둘만 받게 되어 있다. 내 생각에는 range(start, end, step) 메서드를 추가하면 IntStream 과 LongStream 도 범위로 스트림을 만들기가 조금 더 편해질 뿐 아니라 DoubleStream 도 range 로 스트림을 생성할 수 있게 되어 좋을 것 같은데, 현재 API에서는 지원하지 않는다.

Collection

이미 List, Map, Set 객체가 있다면, stream() 또는 parallelStream() 메서드를 이용해 스트림 또는 병렬 스트림을 생성할 수 있다. Map 의 경우에는 entrySet() 메서드를 이용해 맵의 엔트리 집합을 먼저 얻은 다음 stream() 또는 parallelStream() 메서드를 사용하면 된다.

return List.of(1, 2, 3, 4, 5)
  .stream()
  .map(x -> x * 10)
  .toList()
  .toString();

[10, 20, 30, 40, 50]

return Set.of(1, 2, 3, 4, 5)
  .stream()
  .map(x -> x + 10)
  .collect(Collectors.toSet())
  .toString();

[11, 12, 13, 14, 15]

return Map.of("alice", 10, "bob", 20, "carter", 30, "dan", 40)
  .entrySet()
  .stream()
  .mapToInt(e -> e.getValue())
  .sum();

String[] names = {"alice", "bob", "carter", "dan"};
return Arrays.stream(names)
  .map(String::toUpperCase)
  .collect(Collectors.joining(", "));

ALICE, BOB, CARTER, DAN

스트림 변환

filter, map

return Arrays.toString(
  IntStream.range(1, 10)
    .filter(x -> x % 2 == 0)
    .map(x -> x * x)
    .toArray()
);

[4, 16, 36, 64]

flatMap

var nestedStream = Stream.of(
  Stream.of(1, 2, 3),
  Stream.of(4, 5, 6),
  Stream.of(7, 8, 9)
);

return nestedStream
  .flatMap(x -> x)
  .toList()
  .toString();

[1, 2, 3, 4, 5, 6, 7, 8, 9]

return IntStream.range(1, 5)
  .mapToDouble(x -> x)
  .boxed()
  .map(x -> List.of(x / 10, x, x * 10))
  .flatMap(x -> x.stream())
  .toList()
  .toString();

[0.1, 1.0, 10.0, 0.2, 2.0, 20.0, 0.3, 3.0, 30.0, 0.4, 4.0, 40.0]

skip, limit, dropWhile, takeWhile

return Stream.iterate(1, x -> x + 2)
  .skip(10)
  .limit(10)
  .toList()
  .toString();

[21, 23, 25, 27, 29, 31, 33, 35, 37, 39]

return Stream.iterate(1, x -> x + 2)
  .dropWhile(x -> x < 20)
  .takeWhile(x -> x < 40)
  .toList()
  .toString();

[21, 23, 25, 27, 29, 31, 33, 35, 37, 39]

내 생각에는 skip 을 drop 으로(아니면 반대로 dropWhile 을 skipWhile 로), limit 를 take 로 했다면, drop-dropWhile (또는 skip-skipWhile), take-takeWhile 과 쌍을 이루게 되어, 더 일관성 있는 이름이 되지 않았을까 싶다.

결과 생성

스트림을 모두 소모하고 최종 결과를 생성한다. count/sum/average 와 같이 하나의 값을 결과로 낼 수도 있고, 다른 컬렉션을 생성할 수도 있다.

findFirst, findAny

return Stream.of("alice", "bob", "cater")
  .findFirst();

Optional[alice]

return Stream.of("alice", "bob", "cater")
  .parallel()
  .findAny();

Optional[cater]

API 문서에 의하면 findAny 동작은 명시적으로 비결정적이며, 스트림에서 어느 요소든 선택할 수 있어, 병렬 연산에서 최대 성능을 얻을 수 있다. 스트림을 병렬로 만들어 여러 번 실행해 보면 결과가 바뀌는 것을 확인할 수 있다.

anyMatch, allMatch, noneMatch

세 메서드 모두 불리언을 리턴한다.

anyMatch: 주어진 조건을 만족하는 요소가 스트림에 하나라도 있으면 true 를 리턴한다.
allMatch: 스트림의 모든 요소가 주어진 조건을 만족하면 true 를 리턴한다.
noneMatch: 스트림의 모든 요소가 주어진 조건을 만족하지 않으면 true 를 리턴한다.

return Stream.of(1, 2, 3, 4, 5).anyMatch(x -> x > 4);

true

return Stream.of(1, 2, 3, 4, 5).anyMatch(x -> x < 0);

false

return Stream.of(1, 2, 3, 4, 5).anyMatch(x -> x % 2 == 0);

true

return Stream.of(1, 2, 3, 4, 5).noneMatch(x -> x > 10);

true

min, max

숫자 스트림에서는 min/max 메서드를 사용하면 된다. 리턴 값이 Optional(OptionalInt, OptionalLong, OptionalDouble, 또는 Optional<T>)임에 유의한다.

return IntStream.of(1, 2, 3, 4, 5)
  .min();

OptionalInt[1]

return LongStream.range(1, 10)
  .max();

OptionalLong[9]

객체 스트림의 경우 최대값을 가진 객체를 찾으려면 다음과 같이 할 수 있다.

public record Employee(String name, int salary) {}

return Stream.of(
  new Employee("Alice", 2000),
  new Employee("Bob", 2500),
  new Employee("Carter", 3000)
).min((a, b) -> a.salary() - b.salary());

Optional[Employee[name=Alice, salary=2000]]

return Stream.of(
      new Employee("Alice", 2000),
      new Employee("Bob", 2200),
      new Employee("Carter", 3000)
    )
    .mapToInt(e -> e.salary())
    .max();

OptionalInt[3000]

count, sum, average, summaryStatistics

return IntStream.range(1, 10)
  .limit(10)
  .count();

return IntStream.range(1, 10)
  .limit(10)
  .sum();

return IntStream.range(1, 10)
  .limit(10)
  .average();

OptionalDouble[5.0]

return IntStream.range(1, 10)
  .limit(10)
  .summaryStatistics();

IntSummaryStatistics{count=9, sum=45, min=1, average=5.000000, max=9}

reduce

return IntStream.range(1, 10)
  .reduce(0, (a, b) -> a + b);

forEach

IntStream.range(1, 10)
  .forEach(x -> System.out.printf("val = %d\n", x));

val = 1
val = 2
val = 3
val = 4
val = 5
val = 6
val = 7
val = 8
val = 9

collect

import static java.util.stream.Collectors.toSet;

return Stream.of('h', 'e', 'l', 'l', 'o')
  .collect(Collectors.toSet())
  .toString();

[e, h, l, o]

import static java.util.stream.Collectors.joining;

return "hello, world".chars()
  .map(c -> c + 1)
  .mapToObj(c -> Character.valueOf((char)c))
  .map(String::valueOf)
  .collect(joining());

ifmmp-!xpsme

return "hello, world".codePoints()
  .map(c -> c + 1)
  .collect(
    StringBuilder::new,
    StringBuilder::appendCodePoint,
    StringBuilder::append)
  .toString();

ifmmp-!xpsme

마지막 예제에서 collect 메서드 사용법이 좀 복잡하다. API 문서를 보니 메서드 시그니처가 다음과 같이 되어 있다.

<R> R collect(
  Supplier<R> supplier,
  BiConsumer<R, ? super T> accumulator,
  BiConsumer<R,R> combiner
)

설명을 보니 위 코드는 다음과 동일하다고 한다.

R result = supplier.get();
for (T element : this stream)
  accumulator.accept(result, element);
return result;

별로 복잡하지는 않지만 이해가 안 된다. supplier, accumulator 는 알겠는데, combiner 는 어떻게 사용되는지 나와있지 않다. 구글 Gemini에게 물어보니 일반 스트림인 경우에는 combiner 가 필요하지 않지만, 병렬 스트림인 경우에는 중간 결과를 병합해 최종 결과를 만드는 데 combiner 가 쓰인다고 한다. 그럼 말이 된다.

API 문서에서 combiner 파라미터에 대한 설명을 다시 보니 조금 이해가 간다. 문서에서 설명을 좀더 명확하게 했으면 좋았을 듯.

collect 메서드에 Collector 를 넘길 수도 있다. Collector.of 를 써서 Collector 객체를 만들 수 있도 있고, Collectors 클래스에 구현된 Collector 를 가져다 쓸 수도 있다.

import static java.util.function.Function.identity;
import static java.util.stream.Collectors.counting;
import static java.util.stream.Collectors.groupingBy;

return "hello, world!".chars()
  .mapToObj(c -> Character.valueOf((char)c))
  .map(String::valueOf)
  .collect(groupingBy(identity(), counting()))
  .toString();

{ =1, !=1, r=1, d=1, e=1, w=1, h=1, ,=1, l=3, o=2}

사족

이 글을 Emacs에서 org-mode(ob-java)를 사용해 작성했다. 코드 블록을 실행하면 결과가 본문에 삽입된다. 이렇게 작업하면 본문 안에서 코드를 편집해 바로 실행할 수 있고, 코드 실행 결과도 본문에 바로 추가된다. IDE 등 다른 도구를 사용한다면, 코드를 편집해 실행하고, 코드 및 실행결과를 복사해 붙여넣어야 한다. 코드를 수정할 때마다 이 작업을 반복해야 하므로 여간 번거로운 게 아니다. Emacs에서는 이 과정을 생략해 글쓰기가 한결 쉬워진다.

Org-mode로 블로그 쓰기 > 기본 스타일에서 설명했듯이, org-mode에서는 인라인 스타일에 조사를 붙이면 스타일이 제대로 렌더링 되지 않는 문제가 있다. ZERO WIDTH SPACE 를 사용해 이 문제를 회피할 수 있지만, 인라인 스타일을 사용할 때마다 ZERO WIDTH SPACE 를 삽입하는 것도 번거롭고, 결정적으로 Hugo의 Org 렌더러는 여전히 스타일을 깨먹기 때문에 할 수 없이 인라인 스타일이 적용되 텍스트와 조사 사이에 공백을 추가했다.

코드 블록에 있는 return, Arrays.toString() 등은 ob-java의 출력 결과를 Hugo에서 제대로 렌더링할 수 있게 하기 위함이다.