OpenJDK / lambda / lambda / jdk
changeset 7645:b4d0ae8455be
More spec on reduce in package-info
author | briangoetz |
---|---|
date | Thu, 14 Mar 2013 12:06:51 -0400 |
parents | 3981ddbd7b5f |
children | 91ae0cb93f70 |
files | src/share/classes/java/util/stream/package-info.java |
diffstat | 1 files changed, 96 insertions(+), 22 deletions(-) [+] |
line wrap: on
line diff
--- a/src/share/classes/java/util/stream/package-info.java Thu Mar 14 17:00:20 2013 +0100 +++ b/src/share/classes/java/util/stream/package-info.java Thu Mar 14 12:06:51 2013 -0400 @@ -217,40 +217,52 @@ * <h2><a name="Reduction">Reduction operations</h2> * * In simple terms, a <em>reduction</em> operation takes a stream of elements and processes them in a way - * that reduces to a single value, such as finding the sum or maximum of a set of numbers. In more complex + * that reduces to a single value, such as finding the sum or maximum of a set of numbers. (In more complex * scenarios, the reduction operation might need to extract data from the elements before reducing that data - * to a single value, such as finding the sum of weights of a set of blocks. This would requireextracting the - * weight from each block before summing up the weights. + * to a single value, such as finding the sum of weights of a set of blocks. This would require extracting the + * weight from each block before summing up the weights.) * * <p>Of course, such operations can be readily implemented as simple sequential loops, as in: * <pre> - * int max = Integer.MIN_VALUE; + * int sum = 0; * for (int x : numbers) { - * if (x > max) - * max = x; + * sum += x; * } * </pre> * However, there may be a significant advantage to preferring a {@link Stream#reduce reduce operation} over * a mutative accumulation such as the above -- a properly constructed reduce operation is inherently parallelizable, * so long as the {@link BinaryOperator} has the right characteristics, specifically that it is * <a href="#Associativity">associative</a>. For example, given a stream of numbers for which we want to - * find the maximum, we can write: + * find the sum, we can write: * <pre> - * int max = numbers.reduce(Integer.MIN_VALUE, (x,y) -> x > y ? x : y ); + * int sum = numbers.reduce(0, (x,y) -> x+y ); * </pre> * or more succinctly: * <pre> - * int max = numbers.reduce(Integer.MIN_VALUE, Integer::max); + * int sum = numbers.reduce(0, Integer::sum); * </pre> * - * Reduction parallellizes easily since the implementation of {@code reduce} can operate on subsets of the + * <p>(The primitive specializations of {@link java.util.stream.Stream}, such as {@link java.util.stream.IntStream}, + * even have convenience methods for common reductions, such as {@link java.util.stream.IntStream#sum()} or + * {@link java.util.stream.IntStream#max()}, which are implemented as convenience wrappers around reduce.) + * + * <p>Reduction parallellizes well since the implementation of {@code reduce} can operate on subsets of the * stream in parallel, and then combine the intermediate results to get the final correct answer. Even if you were * to use a parallelizable form of the {@link forEach} method in place of the original for-each loop above, you would - * still have to provide thread-safe updates to the shared accumulating variable {@code max}, and the required + * still have to provide thread-safe updates to the shared accumulating variable {@code sum}, and the required * synchronization would likely eliminate any performance gain from parallelism. Using a {@code reduce} method instead * removes all of the burden of parallelizing the reduction operation, and the library can provide an efficient * parallel implementation. * + * <p>A slightly more complicated example involves applying a transform to data elements before the reduction. + * If {@code blocks} is a collection of {@code Block} objects, which have a {@code getWeight} method, we can find + * the sum of the weight of all blocks with: + * <pre> + * int sumOfWeights = blocks.stream() + * .map(b -> b.getWeight()) + * .reduce(0, Integer::sum); + * </pre> + * * <p>In its more general form, a {@code reduce} operation on elements of type {@code T} yielding a result of * type {@code U} requires three parameters: * <pre> @@ -260,24 +272,21 @@ * </pre> * Here, the <em>identity</em> element is both an initial seed for the reduction, and a default result if * there are no elements. The <em>accumulator</em> function takes a partial result and the next element, and - * produce a new partial result. The <em>combiner</em> function combines combine the partial results of two + * produce a new partial result. The <em>combiner</em> function combines the partial results of two * accumulators to produce a new partial result, or eventually the final result. * - * <p>In our {@code max} example above, {@code Integer.MIN_VALUE} was used the identity element, while - * {@code Integer::max} was used as both the reducer and combiner. For a more complicated example such as the - * sum-of-weights, we would have: + * <p>This form is a generalization of the two-argument form, and is also a generalization of the map-reduce + * construct illustrated above. If we wanted to re-cast the simple {@code sum} example using the more general form, + * {@code 0} would be the identity element, while {@code Integer::sum} would be both the accumulator and combiner. + * For the sum-of-weights example, this could be re-cast as: * <pre> * int sumOfWeights = blocks.stream().reduce(0, * (sum, b) -> sum + b.getWeight()) * Integer::sum); * </pre> - * This computation could also be expressed as: - * <pre> - * int sumOfWeights = blocks.stream().map(b -> b.getWeight()) - * .reduce(0, Integer::sum); - * </pre> - * though there are situations where the generality of the three-argument form may enable a more efficient - * computation. + * though the map-reduce form is more readable and generally preferable. The generalized form is provided + * for cases where significant work can be optimized away by combining mapping and reducing into a single + * function. * * <p>More formally, the {@code identity} value must be an <em>identity</em> for the combiner function. * This means that for all {@code u}, {@code combiner(identity, u)} is equal to {@code u}. @@ -287,6 +296,71 @@ * combiner(u, accumulator(identity, t)) == accumulator(u, t) * </pre> * + * <h3><a name="MutableReduction">Mutable Reduction</a></h3> + * + * A <em>mutable</em> reduction operation is similar to an ordinary reduction, in that it reduces a stream + * of values to a single value, but instead of producing a distinct single-valued result, it instead + * mutates a general <em>result container</em>, such as a {@code Collection} or {@code StringBuilder}, as it + * processes the the elements in the stream. + * + * <p>If we wanted to take a stream of strings and concatenate them into a single long string, we <em>could</em> + * achieve this with ordinary reduction: + * <pre> + * String concatenated = strings.reduce("", String::concat) + * </pre> + * + * We would get the desired result, and it would even work in parallel. However, we might not be happy about + * the performance! Such an implementation would do a great deal of string copying, and the run time would be + * <em>O(n^2)</em> in the number of elements. A more performant approach would be to accumulate the results into + * a {@link StringBuilder}, which is a mutable container for accumulating strings. We can use the same technique + * to parallelize mutable reduction as we do with ordinary reduction. + * + * </p>The mutable reduction operation is generally referred to as {@link Stream#collect collect}, as it collects + * together the desired results into a result container such as {@code StringBuilder}. A {@code collect} operation + * requires three things: a factory function which will construct new instances of the result container, an + * accumulating function that will update a result container by incorporating a new element, and a combining function + * that can take two result containers and merge their contents. The form of this is very similar to the general + * form of ordinary reduction: + * <pre> + * <R> R collect(Supplier<R> resultFactory, + * BiConsumer<R, ? super T> accumulator, + * BiConsumer<R, R> combiner); + * </pre> + * As with {@code reduce()}, the benefit of expressing {@collect} in this abstract way is that it is directly + * amenable to parallelization: we can accumulate partial results in parallel and then combine them. For example, + * to collect the string representations of the elements in a stream accumulate them into an {@code ArrayList}, we + * could write the obvious sequential for-each form: + * <pre> + * ArrayList<String> strings = new ArrayList<>(); + * for (T element : stream) { + * strings.add(element.toString()); + * } + * </pre> + * Or we could use a parallelizable collect form: + * <pre> + * ArrayList<String> strings = stream.collect(() -> new ArrayList<>(), + * (c, e) -> c.add(e.toString()), + * (c1, c2) -> c1.addAll(c2)); + * </pre> + * or more succinctly: + * <pre> + * ArrayList<String> strings = stream.map(Object::toString) + * .collect(ArrayList::new, ArrayList::add, ArrayList::addAll); + * </pre> + * Here, our supplier is just the {@link ArrayList#ArrayList() ArrayList constructor}, the accumulator adds + * the stringified element to an {@code ArrayList}, and the combiner simply uses {@link ArrayList#addAll addAll} + * to copy the strings from one container into the other. As with the regular reduction operation, the ability to + * parallelize only comes if an associativity condition are met. + * + * <p> The three aspects of {@code collect}: supplier, accumulator, and combiner, are often very tightly coupled, and + * it is convenient to introduce the notion of a {@link Collector} as being an object that embodies all three aspects. + * There is a {@link Stream#collect(Collector) collect} method that simply takes a {@code Collector} and returns + * the resulting container. The above example for collecting strings into a {@code List} can be written as: + * <pre> + * ArrayList<String> strings = stream.map(Object::toString) + * .collect(Collectors.toList()); + * </pre> + * * <a name="Associativity"><h2>Associativity</h2></a> * * An operator or function {@code op} is <em>associative</em> if the following holds: