5 - MH instantiation is slow (native calls to set up assembly stubs).
6 - exact MH invocation is slow (no good compiled code path).
7 - MH compilation work is not shared among similar MHs.
8 - Multi-bound MH layout is not compact.
9 - Ricochet frames introduce a significant porting and maintenance burden.
12 - Make MH instantiation be lightweight symbolic composition.
13 - Create optimized invocation paths only after "sufficient" invocation (cf. hot-spot treatment of methods).
14 - Collect bound values at the root level of MH structure (for eventual use by fast path).
15 - Reuse/cache common code structure (modulo bound values).
16 - Have MH.invoke methods include an extra level of recursion, in all but the most optimized states.
20 header : usual JVM header
22 form : immutable behavior (type `LambdaForm`)
23 form.exactInvoker : mutable (optimizable) entry point for invoking the MH
24 data... : immutable values, inline
27 The various MH constructors initialize the type, form, and data to permanent values.
28 The type, form, and data determine the semantics of the MH.
29 The type, form, and data are disjoint (as a rule, when convenient to the implementation).
31 The form's `exactInvoker` field is set to an initial linking routine.
32 This field may be amended later to an optimized routine.
33 There are no race conditions, since the exactInvoker field always contains
34 a valid entry point for the given (immutable) type, form, and data.
36 When the form.exactInvoker field is executed, the MH reference is in leading position (e.g., register rcx).
37 Therefore, any machine code is reached by the MH code address is able to obtain the MH type, form, and data.
38 Code for a given "form" is optimized, but operates on the type and data as free variables (extra arguments).
40 Composition is performed by creating a new version of the MH with an adjusted MH.form field.
41 The new MH may also have additional (or completely different) data fields.
43 Additional fields may be worth adding:
44 - Have a code field directly in the MH itself, which is a copy (or specialization) of form.exactInvoker.
45 - Have additional invoker paths for the `LambdaForm`, such as `genericInvoker`, `spreadInvoker`, `partialInvoker`.
47 ### old MH layout _(as of JDK7 FCS)_
48 Same as above, with the following exceptions:
50 - no "form" field (no symbolic representation, which limits sharing)
51 - adapter or bound MH contains a tail-called "vmtarget" (sharing of executable nodes, only)
52 - data fields are limited to a single reference (a common case, but requires boxing, cascading, etc.)
53 - code field is called "vmentry", and various among a small (~100) set of predefined combinators
54 - code field does not support compiled fast path (NYI)
55 - composition is performed by tail-call chaining, with incremental argument transformations
57 Having the code field be a raw machine-level pointer meant that the Java code
58 for creating method handles had to include native calls, which tend to be slow.
60 ### uses of metadata in new data structures
62 In the new design raw machine-level pointers are limited to the internal
63 type `MemberName`. Member names must be resolved by a reflective query to
64 the JVM, which involves native calls.
66 The machine-level pointer in a member name is called `vmtarget`, and it
67 is of (internal) type `methodOop`. There is also an internal integer
68 field called `vmindex`, used to store a vtable or itable index.
70 (Member names can also refer to fields, in which case `vmtarget` is
71 of type `klassOop`, and `vmindex` is a field offset.)
73 A `MemberName` reference can be bound to a method handle or other structure
74 (`LambdaForm.exactInvoker`) with a simple Java-level field assignment.
76 Unlike method handles, member names are relatively easy to cache, and
77 do not need to be created frequently. They are associated with uses
78 of the `MethodHandles.Lookup` API, which is advertised as "reflective".
80 Member names are also used by lazy bytecode spinning mechanisms which
81 create optimized invokers for "mature" or "warm" lambda forms.
82 The internal method oops created in this way are subject to further
83 profiling and optimization by the JVM, just as for user-level methods.
85 ## MH exact invocation paths
87 Here are the paths to aim at. All other paths can be comparatively slow,
88 as long as the fast paths are attained in warm code.
90 (Also, if families of similar method handles are created, warming some
91 of them should allow all the future ones to share the optimized code.
92 The later method handles in the same family should quickly acquire the
93 code. This requires that the forms be built with structure sharing,
94 or acquire structure sharing over time, and that the shared structures
95 be mappable to the optimized invocation paths.)
97 ### compiled fast path
99 - place method handle in rcx (standard register for first reference argument)
100 - preserve stack pointer (spill rsp to callee-saved rbp)
101 - materialize expected MH.type (as constant)
102 - verify `rcx.type` (slowly throw error if mismatched)
103 - load `rcx.form.exactInvoker.vmtarget` (three indirections to get to methodOop)
104 - jump directly to `_from_compiled_entry` of the `LambdaForm.exactInvoker` method
108 - enter nmethod code for `LambdaForm.exactInvoker` method (`L0 == rcx == the MH`)
109 - shuffle arguments as needed (load `MH.data[*]` and/or `MH.type[*]`)
110 - tail-call eventual target, return a value, etc...
112 - restore stack pointer (reload rsp from rbp)
114 In JDK7 FCS, the compiled path is longer:
116 - place method handle in rcx (standard register for first reference argument)
117 - preserve stack pointer (spill rsp to callee-saved rbp)
118 - jump to compiled entry point of the MH.invokeExact method
119 - execute a C2I adapter: extend the stack, rebuild argument list for interpreter
120 - jump to interpreter entry point of the same MH.invokeExact method
121 - perform interpreter-mode MH invocation sequence (see below)
123 ### MH exact invocation, *interpreted path*
125 - preserve stack pointer (spill rsp to rsi, as for all interpreter calls)
126 - materialize expected MH.type (from MH.invokeExact methodOop in rbx)
127 - load method handle from stack into rcx (using stack depth obtained from MH.type)
128 - verify `rcx.type` (slowly throw error if mismatched)
129 - load `rcx.form.exactInvoker.vmtarget` (three indirections to get to methodOop)
130 - jump directly to `_from_interpreted_entry` of the `LambdaForm.exactInvoker` method
132 After the jump, if the lambda form is not yet mature:
134 - enter interpreted or compiled code for common JVM bytecoded method (`L0 == rcx == the MH`)
135 - execute generic AST-walking interpreter for LFs
137 After the jump, if the lambda form is mature and has been given customized `exactInvoker`:
139 - enter interpreted or compiled code for customized JVM bytecoded method (`L0 == rcx == the MH`)
140 - shuffle arguments as needed (load `MH.data[*]` and/or `MH.type[*]`)
141 - tail-call eventual target, return a value, etc...
145 The field `LambdaForm.exactInvoker` points to a `MemberName` that is
146 (at all times) capable of executing the semantics of the lambda form.
148 Initially, this method is an AST-walking interpreter which is shared
149 by all LF's of the same basic type signature.
151 The LF's exact invoker is created as a static bytecoded method.
152 Its first argument is the MH itself.
154 The behavior of the invoker is determined by the lamba form itself.
155 Typically, it is at least partially optimized,
156 by rendering the contents of the form as bytecodes.
158 Initially, the invoker may be unspecialized to the form's body, in which
159 case it must interpret the structure for the form, as a non-constant
160 parameter (on the same footing as the MH type and data).
161 Such invokers can be show great laziness, generating more optimized
162 forms only after complex sharing heuristics are consulted.
164 In principle, and in the reverse direction, the invoker may be
165 specialized to information in the MH beyond the form. For example,
166 the MH type field can be folded into the invoker as a constant,
167 or one or more MH data elements (or merely their types) can be
168 folded in. This would require a new code invoker field
169 in method handles, or some other sort of multiway dispatch
170 on method handle invocation.
172 The current case is that the invoker depends on only the form, and
173 functionally replaces it. The fully optimized code of the invoker
174 will unpack the MH type and data into additional arguments, and will
175 ignore the form. (I.e., no AST walking.)
177 ### direct MH invoker
178 For a direct method handle to the static method `C#m(A)`,
179 there are no data fields, just a type and form.
180 The invoker is therefore:
182 (mh, A a)->{ignore mh; C.m(a)}
184 A direct method handle to something other than a static method
185 is exactly analogous.
187 ### bound MH invoker to virtual
188 For a bound method handle to a virtual method `c#m(A)`,
189 there is one data field of type `C`, plus the type and form.
190 The invoker is therefore:
192 (mh, A a)->{C c = mh.data_c; c.m(a)}
194 A bound method handle to another kind of direct method handle
195 is exactly analogous.
197 ### multiply-bound MH invoker
198 If a method handle is bound several times, or bound to a complex
199 adapted method handle, the newly bound value is accumulated into
200 a copy of the target method handle, and the invokers are composed.
201 The old invoker is adjusted to take its parameter values from the
202 new (copied) method handle.
204 Here is a doubly-bound method handle to a static method `C#m(P,Q,A)`:
206 (mh, A a)->{P p = mh.data_p; Q q = mh.data_q; C.m(p, q, a)}
208 More generally, a binding on top of a complex method handle looks
211 (mh, A a)->{P p = mh.data_p; target.invoker(mh, p, a)}
213 Note that mh need not contain a pointer to its target, as long as
214 the target's invoker is known (or can be extracted from mh).
215 In fact, assuming the target's invoker depends only on the
216 target's form, and the new bound handle's form is derived from
217 the target's form, the target's invoker is know from the
218 new bound handle's own form, and can be compiled directly into
219 the new bound handle's invoker.
221 There is an additional trick assumed by the code above.
222 The new bound method is passed to the target's invoker,
223 even though the target invoker expects the target.
224 This trick only works if the target and the new bound
225 handle supply the identical values required by the target
226 invoker. This will be the case only if (a) the bound
227 handle copies all data values from the target handle
228 that the target invoker requires, (b) the `data_p`
229 field does not overlap with any of those fields,
230 and (c) any use of the target's type field by the
231 target invoker is consistent with the bound handle's
234 If this trick fails, the target invoker can (and must)
235 be inlined into the new invoker.
237 ### return adapter MH invoker
238 A adapter method handle which adapts its return value can be
239 viewed as a simple composition of a return transform `#r` on
240 top of the original target method handle. The adapted method
241 handle is a copy of the original target method, with a different
242 form, whose invoker is therefore:
244 (mh, A a)->{ x=target.invoker(mh, a); r(x) }
246 This assumes that the return transform is a simple static method.
248 ### composed method handle invoker
249 The composition of two arbitrary method handles requires
250 a composed invoker which can emulate the invokers of the
251 two component method handles, and requires that all bound
252 values in both component method handles be available in
253 composite method handle.
255 There are several ways to implement this. The composite
256 handle can be populated with all data values from the
257 components, or (at the other extreme) the composite
258 handle can have two data pointers, one to each component.
260 A mixed approach seems promising, where one handle,
261 viewed as the nominal "target", is copied into the
262 composite. The other, viewed as the nominal "transform",
263 is inlined into the invoker as much as possible,
264 and any remaining parameters are added to the
265 target, probably with a level of indirection.
269 A form is simply a symbolic representation of an invoker.
271 The form schema is competent to specify all MH transforms.
276 function: method | intrinsic | lambda
277 method: (CONSTANT_MethodHandle instance)
278 intrinsic: (ad hoc intrinsic, such as iadd)
279 lambda: formals returns '->' body
280 formals: '(' LIST[formal] ')'
281 returns: ':' (basictype | 'void')
282 body: '{' (temp? call ';')* call '}'
284 call: function '(' LIST[argument] ')'
285 LIST[E]: (E (',' E)*)?
286 formal: basictype vardef
287 basictype: 'int' | 'long' | 'float' | 'double' | 'reference'
288 vardef: (nothing; vars are lexically numbered)
289 argument: varref | constant | MAYBE_LATER[function]
290 varref: (a natural number; the lexical index of var; no uplevel refs)
291 constant: (any constant pool constant or "live" reference)
292 MAYBE_LATER[E]: (maybe E, later)
294 Calls cannot be nested. The final call is always treated as a tail-call.
296 The incoming formal parameters are numbered starting at zero.
297 The temporary variables (if any) are numbered immediately after.
298 Variable references are only relative to the immediately enclosing lambda.
300 The MH parameter (passed to the invoker) is conventionally the first formal parameter.
301 It is not otherwise distinguished in this grammar. Bound values are extracted
302 via field references, which are encoded as calls to direct method handle constants.
304 (Also, this formalism might be useful for spinning JDK 8 closures.
305 In that case, the first argument would be a non-MH closure object.)
307 If lambdas are allowed as literals, they will be interpreted as pre-optimized
308 direct method handle references. There is presently no need to define internal
309 "frames" or bound value displays, other than the data fields of the MH itself.
311 The basic type structure of a lambda type is a list of simplified "basic types",
312 similar to an erased `MethodType`. Subword types like `boolean` are
313 represented as `int` and all references map to `Object`.
315 Although this erasing means that unsafe code can be constructed,
316 it allows much greater code sharing among lambda forms, and therefore
317 much less reliance on dynamic creation of bytecodes.
318 In particular, many direct method handles can be implemented by
319 a single lambda form, as long as they all agree on a common
320 arity and use of integral or floating primitives.
322 (Compare this with the reflection implementation of Java SE, which
323 must spin one internal bytecoded method per distinct Java method
324 that will be invoked.)
326 Lambda forms are required and trusted to preserve static type safety with
327 respect to the full set of MethodType signatures.
329 The raw structure of a lambda body is a list of calls.
330 The raw structure of each call is a list of functions and arguments.
331 Thus, a lambda body is, in its raw structure, a ragged array of values.
333 The classes representing these will be something like
334 `FunctionForm`, `LambdaForm`, `CallForm`, `ArgumentForm`.
336 The form types will be private to the JVM implementation.
337 Forms will be trusted and (usually) not verified.
339 The typing of intermediate values will use only basic types, excluding
340 the subword types (boolean, byte, char, short).
342 This representation is designed to be as simple as possible, so that it can be
343 readily constructed, walked, memoized, and optimized to bytecode.
345 ### relation to JDK 7 implementation
347 The supported MH transforms include the internal nodes of the JDK 7 FCS
348 representation. They relate to the new symbolic form roughly as follows:
351 <tr><th>operation</th> <th>invoker component</th></tr>
352 <tr><td>direct MH</td> <td>JVM <code>CONSTANT_MethodHandle</code></td></tr>
353 <tr><td>bound MH</td> <td>field load (of <code>MH.data_N</code>)</td></tr>
354 <tr><td>profiling MH</td> <td>field load/store (of <code>CMH.data_vmcount</code>)</td></tr>
355 <tr><td>retype AMH</td> <td>none (invokers are trusted)</td></tr>
356 <tr><td>checkcast AMH</td> <td>invoke of Class.cast</td></tr>
357 <tr><td>prim/ref conversion AMH</td> <td>invoke of conversion method</td></tr>
358 <tr><td>permuting AMH</td> <td>explicit numbering of component inputs</td></tr>
359 <tr><td>collect AMH</td> <td>invoke of array constructor</td></tr>
360 <tr><td>spread AMH</td> <td>invoke of element accessors</td></tr>
361 <tr><td>filter/fold AMH</td> <td>invoke of transform method handle</td></tr>
364 ### MemberName invokers
366 There are invokers for direct method handles, one for each basic type
367 signature and invocation mode:
369 dmh.invokeStatic(a*) => M(a*)
370 dmh.invokeVirtual(x,a*) => M(x.class)(x,a*)
371 dmh.invokeSpecial(x,a*) => M(x,a*)
372 dmh.invokeInterface(x,a*) => M(x.class)(x,a*)
374 In each case the invoked method M depends on the target method handle `dmh`,
375 which must be a `DirectMethodHandle` whose `member` field is a member name
376 that refers to a method of the compatible type.
378 (In the case of `invokeSpecial`, the member name can be a constructor also.)
380 For creating weakly typed method handle invocations from inside lambda forms,
381 there is also a signature-polymorphic method that acts as an unchecked form
384 mh.invokeBasic(a*) => mh.invokeExact(a*) (no WrongMethodTypeException)
386 As with `invokeExact`, the call site signature of `invokeBasic` must agree with
387 the target method handle. The agreement is loosened to basic type signature.
388 (I.e, references are erased to `Object` and subword types to `int`.)
389 Failure to agree is an error which can crash the JVM.
391 The Java Language Specification allows all these additional invocation methods
392 to be signature polymorphic. Note that they are *not* public.
394 The compilers are expected to be able to compile calls to these methods
395 efficiently, both with constant and non-constant MH targets.
396 In the case of a constant direct method handle for `invokeStatic`,
397 `invokeVirtual`, etc., the compiler should generate the same code
398 as if had encountered a normal symbolic reference to the `methodOop`
399 of the constant `MemberName`.
403 For each basic type signature sig:
405 invokeVarargs[sig] := lambda (mh, args) {
406 assert(mh.type.basicTypeSignature == sig);
407 assert(!mh.isVarargsInvoker);
408 Object arg_ref_i = ((Object[]) args )[i]; ...
409 T_i arg_i = convertAndUnbox[T_i]( mh.type.parameterType(i), arg_ref_i ); ...
410 assert(sig[i] == T_i); ...
411 T_r res = (T_r) mh.invokeExactTrusted(arg_i...);
412 Object res_ref = box[T_r](res);
418 Conversion functions (for arguments and return values) can be made via
419 MHs.identity and MH.asType.
421 In order to lift these to pure JVM methods, a reified type is sometimes
422 required as the leading argument. For uniformity we present the leading
423 argument even when it is not used.
425 convertAndUnbox[T in {int, long, float, double}] = lambda (_, Object x) : T {
426 MH conv = MHs.identity(T).asType(methodType(T, Object));
427 return (T) conv.invoke(x);
430 To deal with the representation of boolean/byte/short/char as int, an
431 additional cast may also be necessary.
433 convertAndUnbox[T in {byte, short, char}] = lambda (_, Object x) : int {
434 MH conv = MHs.identity(T).asType(methodType(T, Object));
435 MH rawconv = conv.asType(methodType(int, Object));
436 return (int) rawconv.invoke(x);
439 convertAndUnbox[T = boolean] = lambda (_, Object x) : int {
440 MH conv = MHs.identity(boolean).asType(methodType(boolean, Object));
441 MH rawconv = MHs.explicitCastArguments(conv, methodType(int, Object)); // handles boolean->int
442 return (int) rawconv.invoke(x);
445 For reference types, the reified target type must be dynamically specified:
447 convertAndUnbox[T <: Object] = lambda (Class T, Object x) : Object {
448 MH conv = MHs.identity(Object).asType(methodType(Object, T));
449 return (Object) conv.invoke(x);
452 This boils down to a simple cast:
454 convertAndUnbox[T <: Object] = lamba (T, x) (Object) T.cast(x)
455 convertAndUnbox[T <: Object] = Class::cast
457 In cases where the reference type is statically known to be exactly Object,
458 things can be much simpler:
460 convertAndUnbox[T = Object] = lambda (_, Object x) : Object { return x; }
461 convertAndUnbox[T = Object] = MHs.identity(Object).dropArguments(0, Class)
463 In cases where we are generating code for a basic type signature to be shared
464 by multiple MethodTypes, certain runtime information must be incrementally
465 available. This is why the extra argument is present. Each reference type
466 (except those somehow known in advance, e.g., as Object or Object[]) must be
467 reified dynamically. If it participates in reference casting only, it needs
468 to be combined with something as simple as Class::cast. If it participates in
469 primitive unboxing or boxing from an inexact reference type, something
470 trickier from ValueConversions is needed to manage primitive conversions on
471 the fly. Each integer subrange type (boolean/byte/short/char) needs to be
472 reified somehow also, enough to select one out of four possible cases.