tagu.txt
author jrose
Tue Sep 27 19:02:18 2011 -0700 (20 months ago)
changeset 379 0afdd88556bc
permissions -rw-r--r--
tagu: first experiment with 64-bit tagged unions
        1 Notes for tagged unions in the JVM
        2 John Rose, 9/2011
        3 
        4 Goals:
        5  - near-complete coverage of jlong range, without gaps
        6  - complete coverage of jdouble range (via longBitsToDouble)
        7  - support object heap sizes 1000 .. 10^6 times larger than present
        8  - fast conversion to/from reference
        9  - fast conversion to/from jlong
       10  - fast conversion to/from jdouble (via longBitsToDouble, etc.)
       11  - allow code space (~ 33 bits) for primitive values other than longs and references
       12  - portable specification of jlong range
       13  - portable specification of jdouble range (except perhaps for NaN bits)
       14  - for non-float values, occupy unused jdouble code points (there are 2^53 - 3)
       15  - if data loss occurs (because of silent truncation), magnitudes must be preserved
       16 
       17 Key architectural constant:  MAX_UNION_LONG_VALUE = 0x7ff0_000000000000.
       18 This value is chosen so that there are enough code points to represent all doubles.
       19 All values in MAX_UNION_LONG_VALUE+1 to Long.MAX_VALUE are reserved for references.
       20 This is true regardless of the internal representation of the tagged union.
       21 The internal representation will in general have unused code-points.
       22 
       23 The relevant user-visible ranges of primitives are thus:
       24   0x8000_000000000000 .. 0x7ff0_000000000000 : partial jlong, full jdouble
       25   0x7ff0_000000000001 .. 0x7fff_fffffffffffe : illegal values (never encoded or decoded)
       26   0x7fff_ffffffffffff                        : sentinel value (never encoded)
       27 
       28 The sentinel value (Long.MAX_VALUE) may be returned from the non-signaling
       29 decode operations, unionToLongSafe and unionToDoubleSafe.
       30 
       31 Key implementation choice:  Which type is favored by being stored natively?
       32 If longs, we say "long-native".  This also happens to be "double-native",
       33 since all points above MAX_UNION_LONG_VALUE encode the single NaN value.
       34 Decoding a long-native representation to a reference costs a test and add.
       35 
       36 If addresses are stored natively (with padding), we say "address-native".
       37 If addresses occupy one 50-bit range, decoding a long costs a test and add.
       38 Address-native is the only representation that works well with Hotspot's "oops_do" APIs.
       39 
       40 If the padding is all zero, we say "positive-address-native".
       41 In this case, the representation is not "double-native", since native references
       42 when loaded as doubles represent denormalized numbers or zero.
       43 
       44 The bias for decoding must be such that for a pointer p, p+encBias is somewhere
       45 in the illegal jlong value range.  Of course decBias = uint64_t(-encBias).
       46 
       47 Preferred "positive-address-native" representation as padded address (jlong/intptr_t):
       48   0x8000_000000000000 .. 0xfff1_000000000000 : biased jlong/jdouble
       49   0xfff1_000000000001 .. 0xfffe_ffffffffffff : unused code points (never stored)
       50   0xffff_000000000000 .. 0xffff_ffffffffffff : natural negative address (2^48-1 code points)
       51   0x0000_000000000000                        : natural null address
       52   0x0000_000000000001 .. 0x0000_ffffffffffff : natural positive address (2^48-2 code points)
       53   0x0001_000000000000 .. 0x7fff_ffffffffffff : biased jlong/jdouble
       54 
       55 The encoding and decoding biases are 0x8001_000000000000 and 0x7fff_000000000000.
       56   0x8000_000000000000 .. 0x7ff0_000000000000 : partial jlong (decoded)
       57   0x0001_000000000000 :: 0xfff1_000000000000 : encoded (wrapped range)
       58   0xfff1_000000000001 .. 0x0000_ffffffffffff : non-jlong code points (range complement)
       59 
       60 If pointer signs are negative, the representation is "negative-address-native".
       61 In this case, the pointer padding is not zeroes but ones, and the bias differs.
       62 The null address (all zero bits) must be adjoined to the negative range after -1.
       63 
       64 (The "negative-address-native" representation can also be "double-native",
       65 since such values are NaNs.  They include small-magnitude negative jlongs
       66 like jlong(-1), and so are are key values of the jlong range.
       67 Unless long and double bit patterns are decoupled, treating such values
       68 as native doubles does not pay off.  Solaris is notable for allocating
       69 negative addresses to managed heaps.  Most OSs allocate positive addresses.
       70 Also, all 32-bit addresses can be treated as positive regardless of bit 31.)
       71 
       72 JVM structure:
       73   BasicType: T_UNION = 3
       74   signature sytax: 'U'
       75   occupies two stack slots, like long and double
       76   garbage collector can find embedded references, on stack/locals/registers/heap
       77 
       78 API in package sun.misc:
       79 class Union {
       80   public static final long MAX_UNION_LONG_VALUE = 0x7ff0_000000000000;
       81   assert(longBitsToDouble(MAX_UNION_LONG_VALUE) == POSITIVE_INFINITY);
       82 
       83   public static final long NO_UNION_LONG_VALUE  = 0x7fff_ffffffffffff;
       84   // NO_UNION_LONG_VALUE is similar in magnitude to MAX_UNION_LONG_VALUE
       85   assert(NO_UNION_LONG_VALUE == Long.MAX_VALUE);
       86   // it is also a NaN, when viewed as a double
       87   assert(isNan(longBitsToDouble(NO_UNION_LONG_VALUE)));
       88 
       89   public static boolean isReference(U<?> x) { ... }
       90 
       91   public static U<?> longToUnion(long x) throws IllegalArgumentException { encodes the long }
       92   public static U<?> doubleToUnion(double x) { encodes the double (no error) }
       93   public static <T> U<T> referenceToUnion(T x) { encodes the reference (no error) }
       94 
       95   public static long unionToLong(U<?> x) throws IllegalUnionException { returns long value }
       96   public static double unionToDouble(U<?> x) throws IllegalUnionException { returns double value }
       97   public static <T> T unionToReference(U<T> x) throws IllegalUnionException { returns reference }
       98 
       99   public static long unionToLongSafe(U<?> x) { returns long value else NO_UNION_LONG_VALUE }
      100   public static double unionToDoubleSafe(U<?> x) { returns double value else NaN }
      101   public static <T> T unionToReferenceSafe(U<T> x) { returns reference else null }
      102 }
      103 class IllegalUnionException extends RuntimeException { ... }
      104 
      105 Details of representation (assuming decoding and encoding biases above):
      106 
      107 private final long __ENCODING_BIAS = 0x8001_000000000000;
      108 private final long __DECODING_BIAS = 0x7fff_000000000000;
      109 assert((long)(__ENCODING_BIAS + __DECODING_BIAS) == 0);
      110 
      111 private final long __ADDRESS_LIMIT = 0x0001_000000000000;
      112 private final bool __NEGATIVE_ADDRESSES = true;  // can set false in ILP32, Windows, etc.
      113 
      114 int64_t union_is_object(int64_t ux) {
      115   bool r0 = (ux + __DECODING_BIAS) > MAX_UNION_LONG_VALUE;
      116   bool r1 = ((uint64_t)ux >> 48 == 0x0000
      117              || (__NEGATIVE_ADDRESSES && (uint64_t)ux >> 48 == 0xffff));
      118   bool r2 = ((uint64_t)ux < (uint64_t)__ADDRESS_LIMIT
      119              || (__NEGATIVE_ADDRESSES && (uint64_t)ux > -__ADDRESS_LIMIT));
      120   bool r3 = ((uint64_t)ux + (__NEGATIVE_ADDRESSES ? __ADDRESS_LIMIT : 0)
      121              < (uint64_t)__ADDRESS_LIMIT);
      122   assert(r0 == r1 && r1 == r2 && r2 == r3);
      123   return r1;
      124 }
      125 
      126 jlong union_to_jlong(int64_t ux, bool safe = false) {
      127   jlong x = ux + __DECODING_BIAS;
      128   if (x <= MAX_UNION_LONG_VALUE) {
      129     return x;
      130   }
      131   if (!safe)  throw new IllegalUnionException("unexpected reference");
      132   return NO_UNION_LONG_VALUE;
      133 }
      134 
      135 int64_t jlong_to_union(jlong x, bool safe = false) {
      136   if (x <= MAX_UNION_LONG_VALUE)
      137     return x + __ENCODING_BIAS;
      138   if (!safe)  throw new IllegalUnionException("unexpected reference");
      139   // convert to null reference
      140   assert(union_to_object(0) == (object)null);
      141   return 0;
      142 }
      143 
      144 jdouble union_to_jdouble(int64_t ux, bool safe = false) {
      145   return jdouble_cast(union_to_jlong(ux, safe));
      146 }
      147 
      148 int64_t jdouble_to_union(jdouble dx) {
      149   bool safe = true;  // always safe to yield a NaN
      150   return jlong_to_union(jlong_cast(dx, safe));
      151 }
      152 
      153 object union_to_object(int64_t ux, bool safe = false) {
      154   if (union_is_object(ux))
      155     return object_cast((uintptr_t)(uint64_t) ux);
      156   if (!safe)  throw new IllegalUnionException("unexpected non-reference");
      157   return null;
      158 }
      159 
      160 int64_t object_to_union(object rx) {
      161   // this is an error-free operation; encode all references, including null
      162   return object_uncast(rx);
      163 }
      164 
      165 object object_cast(uintptr_t x) {
      166   if (x == 0) {
      167     assert((object)x == (object)null);
      168   } else {
      169     x *= OBJECT_HEAP_SCALE;  // might be a no-op
      170     x += OBJECT_HEAP_BASE;   // might be a no-op
      171   }
      172   return (object)x;
      173 }
      174 
      175 uintptr_t object_uncast(object rx) {
      176   uintptr_t x = ((uintptr_t)rx);
      177   if (rx == (object)null) {
      178     assert(x == 0);
      179   } else {
      180     x -= OBJECT_HEAP_BASE;   // might be a no-op
      181     x /= OBJECT_HEAP_SCALE;  // might be a no-op
      182     assert(x > 0 && x < __ENCODING_BIAS);
      183   }
      184   return x;
      185 }
      186 
      187 typedef void oops_do_function(object* x);
      188 void union_oops_do(int64_t *field, oops_do_function* fn) {
      189   // if primitive or null, return immediately
      190   if (!union_is_object(*field) || (*field) == 0)  return;
      191   object* objAddr = (object*) field;
      192   if (sizeof(object) != sizeof(int64_t) && __BIG_ENDIAN)
      193     objAddr = (object*) (field+1) - 1;
      194   fn(objAddr);
      195 }
      196 
      197 FTR, relevant ranges which are native to the double format:
      198   0x0000_000000000000 .. 0x7fef_ffffffffffff : positive rationals
      199   0x7ff0_000000000000                        : positive infinity
      200   0x7ff0_000000000001 .. 0x7ff7_ffffffffffff : nonstandard NaNs
      201   0x7ff8_000000000000                        : standard NaN (for Java)
      202   0x7ff8_000000000001 .. 0x7fff_ffffffffffff : nonstandard NaNs
      203   0x8000_000000000000                        : negative zero
      204   0x8000_000000000001 .. 0xffef_ffffffffffff : negative rationals
      205   0xfff0_000000000000                        : negative infinity
      206   0xfff0_000000000001 .. 0xffff_ffffffffffff : nonstandard NaNs
      207 
      208 All NaN ranges are available as code points for non-double values.
      209 Only one NaN code point needs to be reserved to represent all NaNs in common.
      210 
      211 Non-preferred "long-native" representation using jlong (int64_t):
      212   0x8000_000000000000 .. 0x7ff0_000000000000 : native partial jlong; native full jdouble
      213   0x7ff0_000000000001 .. 0x7fff_ffffffffffff : biased reference (2^52 - 1 code points)
      214 This is non-preferred because the biased references are hard for the Hotspot GC to process.
      215 
      216 Non-preferred "unsigned-long-native" representation using non-standard type "julong" (u8/uint64_t)
      217   0x0000_000000000000 .. 0xfff0_000000000000 : partial unsigned long; full jdouble
      218   0xfff0_000000000001 .. 0xffff_ffffffffffff : biased reference (2^52 - 1 code points)
      219 This is worse than "long-native" because it opens a hole in the middle of the jlong range.