1 Notes for tagged unions in the JVM
5 - near-complete coverage of jlong range, without gaps
6 - complete coverage of jdouble range (via longBitsToDouble)
7 - support object heap sizes 1000 .. 10^6 times larger than present
8 - fast conversion to/from reference
9 - fast conversion to/from jlong
10 - fast conversion to/from jdouble (via longBitsToDouble, etc.)
11 - allow code space (~ 33 bits) for primitive values other than longs and references
12 - portable specification of jlong range
13 - portable specification of jdouble range (except perhaps for NaN bits)
14 - for non-float values, occupy unused jdouble code points (there are 2^53 - 3)
15 - if data loss occurs (because of silent truncation), magnitudes must be preserved
17 Key architectural constant: MAX_UNION_LONG_VALUE = 0x7ff0_000000000000.
18 This value is chosen so that there are enough code points to represent all doubles.
19 All values in MAX_UNION_LONG_VALUE+1 to Long.MAX_VALUE are reserved for references.
20 This is true regardless of the internal representation of the tagged union.
21 The internal representation will in general have unused code-points.
23 The relevant user-visible ranges of primitives are thus:
24 0x8000_000000000000 .. 0x7ff0_000000000000 : partial jlong, full jdouble
25 0x7ff0_000000000001 .. 0x7fff_fffffffffffe : illegal values (never encoded or decoded)
26 0x7fff_ffffffffffff : sentinel value (never encoded)
28 The sentinel value (Long.MAX_VALUE) may be returned from the non-signaling
29 decode operations, unionToLongSafe and unionToDoubleSafe.
31 Key implementation choice: Which type is favored by being stored natively?
32 If longs, we say "long-native". This also happens to be "double-native",
33 since all points above MAX_UNION_LONG_VALUE encode the single NaN value.
34 Decoding a long-native representation to a reference costs a test and add.
36 If addresses are stored natively (with padding), we say "address-native".
37 If addresses occupy one 50-bit range, decoding a long costs a test and add.
38 Address-native is the only representation that works well with Hotspot's "oops_do" APIs.
40 If the padding is all zero, we say "positive-address-native".
41 In this case, the representation is not "double-native", since native references
42 when loaded as doubles represent denormalized numbers or zero.
44 The bias for decoding must be such that for a pointer p, p+encBias is somewhere
45 in the illegal jlong value range. Of course decBias = uint64_t(-encBias).
47 Preferred "positive-address-native" representation as padded address (jlong/intptr_t):
48 0x8000_000000000000 .. 0xfff1_000000000000 : biased jlong/jdouble
49 0xfff1_000000000001 .. 0xfffe_ffffffffffff : unused code points (never stored)
50 0xffff_000000000000 .. 0xffff_ffffffffffff : natural negative address (2^48-1 code points)
51 0x0000_000000000000 : natural null address
52 0x0000_000000000001 .. 0x0000_ffffffffffff : natural positive address (2^48-2 code points)
53 0x0001_000000000000 .. 0x7fff_ffffffffffff : biased jlong/jdouble
55 The encoding and decoding biases are 0x8001_000000000000 and 0x7fff_000000000000.
56 0x8000_000000000000 .. 0x7ff0_000000000000 : partial jlong (decoded)
57 0x0001_000000000000 :: 0xfff1_000000000000 : encoded (wrapped range)
58 0xfff1_000000000001 .. 0x0000_ffffffffffff : non-jlong code points (range complement)
60 If pointer signs are negative, the representation is "negative-address-native".
61 In this case, the pointer padding is not zeroes but ones, and the bias differs.
62 The null address (all zero bits) must be adjoined to the negative range after -1.
64 (The "negative-address-native" representation can also be "double-native",
65 since such values are NaNs. They include small-magnitude negative jlongs
66 like jlong(-1), and so are are key values of the jlong range.
67 Unless long and double bit patterns are decoupled, treating such values
68 as native doubles does not pay off. Solaris is notable for allocating
69 negative addresses to managed heaps. Most OSs allocate positive addresses.
70 Also, all 32-bit addresses can be treated as positive regardless of bit 31.)
73 BasicType: T_UNION = 3
75 occupies two stack slots, like long and double
76 garbage collector can find embedded references, on stack/locals/registers/heap
78 API in package sun.misc:
80 public static final long MAX_UNION_LONG_VALUE = 0x7ff0_000000000000;
81 assert(longBitsToDouble(MAX_UNION_LONG_VALUE) == POSITIVE_INFINITY);
83 public static final long NO_UNION_LONG_VALUE = 0x7fff_ffffffffffff;
84 // NO_UNION_LONG_VALUE is similar in magnitude to MAX_UNION_LONG_VALUE
85 assert(NO_UNION_LONG_VALUE == Long.MAX_VALUE);
86 // it is also a NaN, when viewed as a double
87 assert(isNan(longBitsToDouble(NO_UNION_LONG_VALUE)));
89 public static boolean isReference(U<?> x) { ... }
91 public static U<?> longToUnion(long x) throws IllegalArgumentException { encodes the long }
92 public static U<?> doubleToUnion(double x) { encodes the double (no error) }
93 public static <T> U<T> referenceToUnion(T x) { encodes the reference (no error) }
95 public static long unionToLong(U<?> x) throws IllegalUnionException { returns long value }
96 public static double unionToDouble(U<?> x) throws IllegalUnionException { returns double value }
97 public static <T> T unionToReference(U<T> x) throws IllegalUnionException { returns reference }
99 public static long unionToLongSafe(U<?> x) { returns long value else NO_UNION_LONG_VALUE }
100 public static double unionToDoubleSafe(U<?> x) { returns double value else NaN }
101 public static <T> T unionToReferenceSafe(U<T> x) { returns reference else null }
103 class IllegalUnionException extends RuntimeException { ... }
105 Details of representation (assuming decoding and encoding biases above):
107 private final long __ENCODING_BIAS = 0x8001_000000000000;
108 private final long __DECODING_BIAS = 0x7fff_000000000000;
109 assert((long)(__ENCODING_BIAS + __DECODING_BIAS) == 0);
111 private final long __ADDRESS_LIMIT = 0x0001_000000000000;
112 private final bool __NEGATIVE_ADDRESSES = true; // can set false in ILP32, Windows, etc.
114 int64_t union_is_object(int64_t ux) {
115 bool r0 = (ux + __DECODING_BIAS) > MAX_UNION_LONG_VALUE;
116 bool r1 = ((uint64_t)ux >> 48 == 0x0000
117 || (__NEGATIVE_ADDRESSES && (uint64_t)ux >> 48 == 0xffff));
118 bool r2 = ((uint64_t)ux < (uint64_t)__ADDRESS_LIMIT
119 || (__NEGATIVE_ADDRESSES && (uint64_t)ux > -__ADDRESS_LIMIT));
120 bool r3 = ((uint64_t)ux + (__NEGATIVE_ADDRESSES ? __ADDRESS_LIMIT : 0)
121 < (uint64_t)__ADDRESS_LIMIT);
122 assert(r0 == r1 && r1 == r2 && r2 == r3);
126 jlong union_to_jlong(int64_t ux, bool safe = false) {
127 jlong x = ux + __DECODING_BIAS;
128 if (x <= MAX_UNION_LONG_VALUE) {
131 if (!safe) throw new IllegalUnionException("unexpected reference");
132 return NO_UNION_LONG_VALUE;
135 int64_t jlong_to_union(jlong x, bool safe = false) {
136 if (x <= MAX_UNION_LONG_VALUE)
137 return x + __ENCODING_BIAS;
138 if (!safe) throw new IllegalUnionException("unexpected reference");
139 // convert to null reference
140 assert(union_to_object(0) == (object)null);
144 jdouble union_to_jdouble(int64_t ux, bool safe = false) {
145 return jdouble_cast(union_to_jlong(ux, safe));
148 int64_t jdouble_to_union(jdouble dx) {
149 bool safe = true; // always safe to yield a NaN
150 return jlong_to_union(jlong_cast(dx, safe));
153 object union_to_object(int64_t ux, bool safe = false) {
154 if (union_is_object(ux))
155 return object_cast((uintptr_t)(uint64_t) ux);
156 if (!safe) throw new IllegalUnionException("unexpected non-reference");
160 int64_t object_to_union(object rx) {
161 // this is an error-free operation; encode all references, including null
162 return object_uncast(rx);
165 object object_cast(uintptr_t x) {
167 assert((object)x == (object)null);
169 x *= OBJECT_HEAP_SCALE; // might be a no-op
170 x += OBJECT_HEAP_BASE; // might be a no-op
175 uintptr_t object_uncast(object rx) {
176 uintptr_t x = ((uintptr_t)rx);
177 if (rx == (object)null) {
180 x -= OBJECT_HEAP_BASE; // might be a no-op
181 x /= OBJECT_HEAP_SCALE; // might be a no-op
182 assert(x > 0 && x < __ENCODING_BIAS);
187 typedef void oops_do_function(object* x);
188 void union_oops_do(int64_t *field, oops_do_function* fn) {
189 // if primitive or null, return immediately
190 if (!union_is_object(*field) || (*field) == 0) return;
191 object* objAddr = (object*) field;
192 if (sizeof(object) != sizeof(int64_t) && __BIG_ENDIAN)
193 objAddr = (object*) (field+1) - 1;
197 FTR, relevant ranges which are native to the double format:
198 0x0000_000000000000 .. 0x7fef_ffffffffffff : positive rationals
199 0x7ff0_000000000000 : positive infinity
200 0x7ff0_000000000001 .. 0x7ff7_ffffffffffff : nonstandard NaNs
201 0x7ff8_000000000000 : standard NaN (for Java)
202 0x7ff8_000000000001 .. 0x7fff_ffffffffffff : nonstandard NaNs
203 0x8000_000000000000 : negative zero
204 0x8000_000000000001 .. 0xffef_ffffffffffff : negative rationals
205 0xfff0_000000000000 : negative infinity
206 0xfff0_000000000001 .. 0xffff_ffffffffffff : nonstandard NaNs
208 All NaN ranges are available as code points for non-double values.
209 Only one NaN code point needs to be reserved to represent all NaNs in common.
211 Non-preferred "long-native" representation using jlong (int64_t):
212 0x8000_000000000000 .. 0x7ff0_000000000000 : native partial jlong; native full jdouble
213 0x7ff0_000000000001 .. 0x7fff_ffffffffffff : biased reference (2^52 - 1 code points)
214 This is non-preferred because the biased references are hard for the Hotspot GC to process.
216 Non-preferred "unsigned-long-native" representation using non-standard type "julong" (u8/uint64_t)
217 0x0000_000000000000 .. 0xfff0_000000000000 : partial unsigned long; full jdouble
218 0xfff0_000000000001 .. 0xffff_ffffffffffff : biased reference (2^52 - 1 code points)
219 This is worse than "long-native" because it opens a hole in the middle of the jlong range.