OpenJDK / lambda / lambda / jdk
changeset 7626:5a128a6aa03e
Additional package-info spec
Contributed-by: dholmes
author | briangoetz |
---|---|
date | Wed, 13 Mar 2013 15:03:35 -0400 |
parents | f9763ce154d8 |
children | f136adc07108 |
files | src/share/classes/java/util/stream/package-info.java |
diffstat | 1 files changed, 87 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- a/src/share/classes/java/util/stream/package-info.java Wed Mar 13 14:32:07 2013 -0400 +++ b/src/share/classes/java/util/stream/package-info.java Wed Mar 13 15:03:35 2013 -0400 @@ -213,6 +213,93 @@ * lambdas sharing mutable state * * <h2>Side-effects</h2> + * + * <h2><a name="Reduction">Reduction operations</h2> + * + * In simple terms, a <em>reduction</em> operation takes a stream of elements and processes them in a way + * that reduces to a single value, such as finding the sum or maximum of a set of numbers. In more complex + * scenarios, the reduction operation might need to extract data from the elements before reducing that data + * to a single value, such as finding the sum of weights of a set of blocks. This would requireextracting the + * weight from each block before summing up the weights. + * + * <p>Of course, such operations can be readily implemented as simple sequential loops, as in: + * <pre> + * int max = Integer.MIN_VALUE; + * for (int x : numbers) { + * if (x > max) + * max = x; + * } + * </pre> + * However, there may be a significant advantage to preferring a {@link Stream#reduce reduce operation} over + * a mutative accumulation such as the above -- a properly constructed reduce operation is inherently parallelizable, + * so long as the {@link BinaryOperator} has the right characteristics, specifically that it is + * <a href="#Associativity">associative</a>. For example, given a stream of numbers for which we want to + * find the maximum, we can write: + * <pre> + * int max = numbers.reduce(Integer.MIN_VALUE, (x,y) -> x > y ? x : y ); + * </pre> + * or more succinctly: + * <pre> + * int max = numbers.reduce(Integer.MIN_VALUE, Integer::max); + * </pre> + * + * Reduction parallellizes easily since the implementation of {@code reduce} can operate on subsets of the + * stream in parallel, and then combine the intermediate results to get the final correct answer. Even if you were + * to use a parallelizable form of the {@link forEach} method in place of the original for-each loop above, you would + * still have to provide thread-safe updates to the shared accumulating variable {@code max}, and the required + * synchronization would likely eliminate any performance gain from parallelism. Using a {@code reduce} method instead + * removes all of the burden of parallelizing the reduction operation, and the library can provide an efficient + * parallel implementation. + * + * <p>In its more general form, a {@code reduce} operation on elements of type {@code T} yielding a result of + * type {@code U} requires three parameters: + * <pre> + * <U> U reduce(U identity, + * BiFunction<U, ? super T, U> accumlator, + * BinaryOperator<U> combiner); + * </pre> + * Here, the <em>identity</em> element is both an initial seed for the reduction, and a default result if + * there are no elements. The <em>accumulator</em> function takes a partial result and the next element, and + * produce a new partial result. The <em>combiner</em> function combines combine the partial results of two + * accumulators to produce a new partial result, or eventually the final result. + * + * <p>In our {@code max} example above, {@code Integer.MIN_VALUE} was used the identity element, while + * {@code Integer::max} was used as both the reducer and combiner. For a more complicated example such as the + * sum-of-weights, we would have: + * <pre> + * int sumOfWeights = blocks.stream().reduce(0, + * (sum, b) -> sum + b.getWeight()) + * Integer::sum); + * </pre> + * This computation could also be expressed as: + * <pre> + * int sumOfWeights = blocks.stream().map(b -> b.getWeight()) + * .reduce(0, Integer::sum); + * </pre> + * though there are situations where the generality of the three-argument form may enable a more efficient + * computation. + * + * <p>More formally, the {@code identity} value must be an <em>identity</em> for the combiner function. + * This means that for all {@code u}, {@code combiner(identity, u)} is equal to {@code u}. + * Additionally, the {@code combiner} function must be <a href="#Associativity">associative</a> and must be + * compatible with the {@code accumulator} function; for all {@code u} and {@code t}, the following must hold: + * <pre> + * combiner(u, accumulator(identity, t)) == accumulator(u, t) + * </pre> + * + * <a name="Associativity"><h2>Associativity</h2></a> + * + * An operator or function {@code op} is <em>associative</em> if the following holds: + * <pre> + * (a op b) op c == a op (b op c) + * </pre> + * The importance of this to parallel evaluation can be seen if we expand this to four terms: + * <pre> + * a op b op c op d == (a op b) op (c op d) + * </pre> + * So we can evaluate {@code (a op b)} in parallel with {@code (c op d)} and then invoke {@code op} on + * the results. + * */ package java.util.stream;