changeset 7645:b4d0ae8455be

More spec on reduce in package-info
author briangoetz
date Thu, 14 Mar 2013 12:06:51 -0400
parents 3981ddbd7b5f
children 91ae0cb93f70
files src/share/classes/java/util/stream/package-info.java
diffstat 1 files changed, 96 insertions(+), 22 deletions(-) [+]
line wrap: on
line diff
--- a/src/share/classes/java/util/stream/package-info.java	Thu Mar 14 17:00:20 2013 +0100
+++ b/src/share/classes/java/util/stream/package-info.java	Thu Mar 14 12:06:51 2013 -0400
@@ -217,40 +217,52 @@
  * <h2><a name="Reduction">Reduction operations</h2>
  *
  * In simple terms, a <em>reduction</em> operation takes a stream of elements and processes them in a way
- * that reduces to a single value, such as finding the sum or maximum of a set of numbers. In more complex
+ * that reduces to a single value, such as finding the sum or maximum of a set of numbers. (In more complex
  * scenarios, the reduction operation might need to extract data from the elements before reducing that data
- * to a single value, such as finding the sum of weights of a set of blocks.  This would requireextracting the
- * weight from each block before summing up the weights.
+ * to a single value, such as finding the sum of weights of a set of blocks.  This would require extracting the
+ * weight from each block before summing up the weights.)
  *
  * <p>Of course, such operations can be readily implemented as simple sequential loops, as in:
  * <pre>
- *    int max = Integer.MIN_VALUE;
+ *    int sum = 0;
  *    for (int x : numbers) {
- *       if (x > max)
- *           max = x;
+ *       sum += x;
  *    }
  * </pre>
  * However, there may be a significant advantage to preferring a {@link Stream#reduce reduce operation} over
  * a mutative accumulation such as the above -- a properly constructed reduce operation is inherently parallelizable,
  * so long as the {@link BinaryOperator} has the right characteristics, specifically that it is
  * <a href="#Associativity">associative</a>.  For example, given a stream of numbers for which we want to
- * find the maximum, we can write:
+ * find the sum, we can write:
  * <pre>
- *    int max = numbers.reduce(Integer.MIN_VALUE, (x,y) -> x > y ? x : y );
+ *    int sum = numbers.reduce(0, (x,y) -> x+y );
  * </pre>
  * or more succinctly:
  * <pre>
- *    int max = numbers.reduce(Integer.MIN_VALUE, Integer::max);
+ *    int sum = numbers.reduce(0, Integer::sum);
  * </pre>
  *
- * Reduction parallellizes easily since the implementation of {@code reduce} can operate on subsets of the
+ * <p>(The primitive specializations of {@link java.util.stream.Stream}, such as {@link java.util.stream.IntStream},
+ * even have convenience methods for common reductions, such as {@link java.util.stream.IntStream#sum()} or
+ * {@link java.util.stream.IntStream#max()}, which are implemented as convenience wrappers around reduce.)
+ *
+ * <p>Reduction parallellizes well since the implementation of {@code reduce} can operate on subsets of the
  * stream in parallel, and then combine the intermediate results to get the final correct answer.  Even if you were
  * to use a parallelizable form of the {@link forEach} method in place of the original for-each loop above, you would
- * still have to provide thread-safe updates to the shared accumulating variable {@code max}, and the required
+ * still have to provide thread-safe updates to the shared accumulating variable {@code sum}, and the required
  * synchronization would likely eliminate any performance gain from parallelism. Using a {@code reduce} method instead
  * removes all of the burden of parallelizing the reduction operation, and the library can provide an efficient
  * parallel implementation.
  *
+ * <p>A slightly more complicated example involves applying a transform to data elements before the reduction.
+ * If {@code blocks} is a collection of {@code Block} objects, which have a {@code getWeight} method, we can find
+ * the sum of the weight of all blocks with:
+ * <pre>
+ *     int sumOfWeights = blocks.stream()
+ *                              .map(b -> b.getWeight())
+ *                              .reduce(0, Integer::sum);
+ * </pre>
+ *
  * <p>In its more general form, a {@code reduce} operation on elements of type {@code T} yielding a result of
  * type {@code U} requires three parameters:
  * <pre>
@@ -260,24 +272,21 @@
  * </pre>
  * Here, the <em>identity</em> element is both an initial seed for the reduction, and a default result if
  * there are no elements. The <em>accumulator</em> function takes a partial result and the next element, and
- * produce a new partial result. The <em>combiner</em> function combines combine the partial results of two
+ * produce a new partial result. The <em>combiner</em> function combines the partial results of two
  * accumulators to produce a new partial result, or eventually the final result.
  *
- * <p>In our {@code max} example above, {@code Integer.MIN_VALUE} was used the identity element, while
- * {@code Integer::max} was used as both the reducer and combiner.  For a more complicated example such as the
- * sum-of-weights, we would have:
+ * <p>This form is a generalization of the two-argument form, and is also a generalization of the map-reduce
+ * construct illustrated above.  If we wanted to re-cast the simple {@code sum} example using the more general form,
+ * {@code 0} would be the identity element, while {@code Integer::sum} would be both the accumulator and combiner.
+ * For the sum-of-weights example, this could be re-cast as:
  * <pre>
  *     int sumOfWeights = blocks.stream().reduce(0,
  *                                               (sum, b) -> sum + b.getWeight())
  *                                               Integer::sum);
  * </pre>
- * This computation could also be expressed as:
- * <pre>
- *     int sumOfWeights = blocks.stream().map(b -> b.getWeight())
- *                                       .reduce(0, Integer::sum);
- * </pre>
- * though there are situations where the generality of the three-argument form may enable a more efficient
- * computation.
+ * though the map-reduce form is more readable and generally preferable.  The generalized form is provided
+ * for cases where significant work can be optimized away by combining mapping and reducing into a single
+ * function.
  *
  * <p>More formally, the {@code identity} value must be an <em>identity</em> for the combiner function.
  * This means that for all {@code u}, {@code combiner(identity, u)} is equal to {@code u}.
@@ -287,6 +296,71 @@
  *     combiner(u, accumulator(identity, t)) == accumulator(u, t)
  * </pre>
  *
+ * <h3><a name="MutableReduction">Mutable Reduction</a></h3>
+ *
+ * A <em>mutable</em> reduction operation is similar to an ordinary reduction, in that it reduces a stream
+ * of values to a single value, but instead of producing a distinct single-valued result, it instead
+ * mutates a general <em>result container</em>, such as a {@code Collection} or {@code StringBuilder}, as it
+ * processes the the elements in the stream.
+ *
+ * <p>If we wanted to take a stream of strings and concatenate them into a single long string, we <em>could</em>
+ * achieve this with ordinary reduction:
+ * <pre>
+ *     String concatenated = strings.reduce("", String::concat)
+ * </pre>
+ *
+ * We would get the desired result, and it would even work in parallel.  However, we might not be happy about
+ * the performance!  Such an implementation would do a great deal of string copying, and the run time would be
+ * <em>O(n^2)</em> in the number of elements.  A more performant approach would be to accumulate the results into
+ * a {@link StringBuilder}, which is a mutable container for accumulating strings.  We can use the same technique
+ * to parallelize mutable reduction as we do with ordinary reduction.
+ *
+ * </p>The mutable reduction operation is generally referred to as {@link Stream#collect collect}, as it collects
+ * together the desired results into a result container such as {@code StringBuilder}.  A {@code collect} operation
+ * requires three things: a factory function which will construct new instances of the result container, an
+ * accumulating function that will update a result container by incorporating a new element, and a combining function
+ * that can take two result containers and merge their contents.  The form of this is very similar to the general
+ * form of ordinary reduction:
+ * <pre>
+ * &lt;R> R collect(Supplier&lt;R> resultFactory,
+ *                  BiConsumer&lt;R, ? super T> accumulator,
+ *                  BiConsumer&lt;R, R> combiner);
+ * </pre>
+ * As with {@code reduce()}, the benefit of expressing {@collect} in this abstract way is that it is directly
+ * amenable to parallelization: we can accumulate partial results in parallel and then combine them.  For example,
+ * to collect the string representations of the elements in a stream accumulate them into an {@code ArrayList}, we
+ * could write the obvious sequential for-each form:
+ * <pre>
+ *     ArrayList&lt;String> strings = new ArrayList&lt;>();
+ *     for (T element : stream) {
+ *         strings.add(element.toString());
+ *     }
+ * </pre>
+ * Or we could use a parallelizable collect form:
+ * <pre>
+ *     ArrayList&lt;String> strings = stream.collect(() -> new ArrayList&lt;>(),
+ *                                                   (c, e) -> c.add(e.toString()),
+ *                                                   (c1, c2) -> c1.addAll(c2));
+ * </pre>
+ * or more succinctly:
+ * <pre>
+ *     ArrayList&lt;String> strings = stream.map(Object::toString)
+ *                                          .collect(ArrayList::new, ArrayList::add, ArrayList::addAll);
+ * </pre>
+ * Here, our supplier is just the {@link ArrayList#ArrayList() ArrayList constructor}, the accumulator adds
+ * the stringified element to an {@code ArrayList}, and the combiner simply uses {@link ArrayList#addAll addAll}
+ * to copy the strings from one container into the other. As with the regular reduction operation, the ability to
+ * parallelize only comes if an associativity condition are met.
+ *
+ * <p> The three aspects of {@code collect}: supplier, accumulator, and combiner, are often very tightly coupled, and
+ * it is convenient to introduce the notion of a {@link Collector} as being an object that embodies all three aspects.
+ * There is a {@link Stream#collect(Collector) collect} method that simply takes a {@code Collector} and returns
+ * the resulting container.  The above example for collecting strings into a {@code List} can be written as:
+ * <pre>
+ *     ArrayList&lt;String> strings = stream.map(Object::toString)
+ *                                          .collect(Collectors.toList());
+ * </pre>
+ *
  * <a name="Associativity"><h2>Associativity</h2></a>
  *
  * An operator or function {@code op} is <em>associative</em> if the following holds: