Package com.fasterxml.jackson.core.sym
Class ByteQuadsCanonicalizer
- java.lang.Object
-
- com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer
-
public final class ByteQuadsCanonicalizer extends java.lang.Object
Replacement forBytesToNameCanonicalizer
which aims at more localized memory access due to flattening of name quad data. Performance improvement modest for simple JSON document data binding (maybe 3%), but should help more for larger symbol tables, or for binary formats like Smile.Hash area is divided into 4 sections:
- Primary area (1/2 of total size), direct match from hash (LSB)
- Secondary area (1/4 of total size), match from
hash (LSB) >> 1
- Tertiary area (1/8 of total size), match from
hash (LSB) >> 2
- Spill-over area (remaining 1/8) with linear scan, insertion order
int
s, where 1 - 3 ints contain 1 - 12 UTF-8 encoded bytes of name (null-padded), and last int is offset in_names
that contains actual name Strings.- Since:
- 2.6
-
-
Field Summary
Fields Modifier and Type Field Description protected int
_count
Total number of Strings in the symbol table; only used for child tables.protected boolean
_failOnDoS
Flag that indicates whether we should throw an exception if enough hash collisions are detected (true); or just worked around (false).protected int[]
_hashArea
Primary hash information area: consists of2 * _hashSize
entries of 16 bytes (4 ints), arranged in a cascading lookup structure (details of which may be tweaked depending on expected rates of collisions).protected boolean
_hashShared
Flag that indicates whether underlying data structures for the main hash area are shared or not.protected int
_hashSize
Number of slots for primary entries within_hashArea
; which is at most1/8
of actual size of the underlying array (4-int slots, primary covers only half of the area; plus, additional area for longer symbols after hash area).protected boolean
_intern
Whether canonical symbol Strings are to be intern()ed before added to the table or not.protected int
_longNameOffset
Offset within_hashArea
that follows main slots and contains quads for longer names (13 bytes or longer), and points to the first available int that may be used for appending quads of the next long name.protected java.lang.String[]
_names
Array that containsString
instances matching entries in_hashArea
.protected ByteQuadsCanonicalizer
_parent
Reference to the root symbol table, for child tables, so that they can merge table information back as necessary.protected int
_secondaryStart
Offset within_hashArea
where secondary entries startprotected int
_seed
Seed value we use as the base to make hash codes non-static between different runs, but still stable for lifetime of a single symbol table instance.protected int
_spilloverEnd
Pointer to the offset within spill-over area where there is room for more spilled over entries (if any).protected java.util.concurrent.atomic.AtomicReference<com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer.TableInfo>
_tableInfo
Member that is only used by the root table instance: root passes immutable state info child instances, and children may return new state if they add entries to the table.protected int
_tertiaryShift
Constant that determines size of buckets for tertiary entries:1 << _tertiaryShift
is the size, and shift value is also used for translating from primary offset into tertiary bucket (shift right by4 + _tertiaryShift
).protected int
_tertiaryStart
Offset within_hashArea
where tertiary entries start
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
_reportTooManyCollisions()
java.lang.String
addName(java.lang.String name, int q1)
java.lang.String
addName(java.lang.String name, int[] q, int qlen)
java.lang.String
addName(java.lang.String name, int q1, int q2)
java.lang.String
addName(java.lang.String name, int q1, int q2, int q3)
int
bucketCount()
int
calcHash(int q1)
int
calcHash(int[] q, int qlen)
int
calcHash(int q1, int q2)
int
calcHash(int q1, int q2, int q3)
static ByteQuadsCanonicalizer
createRoot()
Factory method to call to create a symbol table instance with a randomized seed value.protected static ByteQuadsCanonicalizer
createRoot(int seed)
java.lang.String
findName(int q1)
java.lang.String
findName(int[] q, int qlen)
java.lang.String
findName(int q1, int q2)
java.lang.String
findName(int q1, int q2, int q3)
int
hashSeed()
ByteQuadsCanonicalizer
makeChild(int flags)
Factory method used to create actual symbol table instance to use for parsing.boolean
maybeDirty()
Method called to check to quickly see if a child symbol table may have gotten additional entries.int
primaryCount()
Method mostly needed by unit tests; calculates number of entries that are in the primary slot set.void
release()
Method called by the using code to indicate it is done with this instance.int
secondaryCount()
Method mostly needed by unit tests; calculates number of entries in secondary bucketsint
size()
int
spilloverCount()
Method mostly needed by unit tests; calculates number of entries in shared spill-over areaint
tertiaryCount()
Method mostly needed by unit tests; calculates number of entries in tertiary bucketsjava.lang.String
toString()
int
totalCount()
-
-
-
Field Detail
-
_parent
protected final ByteQuadsCanonicalizer _parent
Reference to the root symbol table, for child tables, so that they can merge table information back as necessary.
-
_tableInfo
protected final java.util.concurrent.atomic.AtomicReference<com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer.TableInfo> _tableInfo
Member that is only used by the root table instance: root passes immutable state info child instances, and children may return new state if they add entries to the table. Child tables do NOT use the reference.
-
_seed
protected final int _seed
Seed value we use as the base to make hash codes non-static between different runs, but still stable for lifetime of a single symbol table instance. This is done for security reasons, to avoid potential DoS attack via hash collisions.
-
_intern
protected boolean _intern
Whether canonical symbol Strings are to be intern()ed before added to the table or not.NOTE: non-final to allow disabling intern()ing in case of excessive collisions.
-
_failOnDoS
protected final boolean _failOnDoS
Flag that indicates whether we should throw an exception if enough hash collisions are detected (true); or just worked around (false).- Since:
- 2.4
-
_hashArea
protected int[] _hashArea
Primary hash information area: consists of2 * _hashSize
entries of 16 bytes (4 ints), arranged in a cascading lookup structure (details of which may be tweaked depending on expected rates of collisions).
-
_hashSize
protected int _hashSize
Number of slots for primary entries within_hashArea
; which is at most1/8
of actual size of the underlying array (4-int slots, primary covers only half of the area; plus, additional area for longer symbols after hash area).
-
_secondaryStart
protected int _secondaryStart
Offset within_hashArea
where secondary entries start
-
_tertiaryStart
protected int _tertiaryStart
Offset within_hashArea
where tertiary entries start
-
_tertiaryShift
protected int _tertiaryShift
Constant that determines size of buckets for tertiary entries:1 << _tertiaryShift
is the size, and shift value is also used for translating from primary offset into tertiary bucket (shift right by4 + _tertiaryShift
).Default value is 2, for buckets of 4 slots; grows bigger with bigger table sizes.
-
_count
protected int _count
Total number of Strings in the symbol table; only used for child tables.
-
_names
protected java.lang.String[] _names
-
_spilloverEnd
protected int _spilloverEnd
Pointer to the offset within spill-over area where there is room for more spilled over entries (if any). Spill over area is within fixed-size portion of_hashArea
.
-
_longNameOffset
protected int _longNameOffset
-
_hashShared
protected boolean _hashShared
Flag that indicates whether underlying data structures for the main hash area are shared or not. If they are, then they need to be handled in copy-on-write way, i.e. if they need to be modified, a copy needs to be made first; at this point it will not be shared any more, and can be modified.This flag needs to be checked both when adding new main entries, and when adding new collision list queues (i.e. creating a new collision list head entry)
-
-
Method Detail
-
createRoot
public static ByteQuadsCanonicalizer createRoot()
Factory method to call to create a symbol table instance with a randomized seed value.- Returns:
- Root instance to use for constructing new child instances
-
createRoot
protected static ByteQuadsCanonicalizer createRoot(int seed)
-
makeChild
public ByteQuadsCanonicalizer makeChild(int flags)
Factory method used to create actual symbol table instance to use for parsing.- Parameters:
flags
- Bit flags of activeJsonFactory.Feature
s enabled.- Returns:
- Actual canonicalizer instance that can be used by a parser
-
release
public void release()
Method called by the using code to indicate it is done with this instance. This lets instance merge accumulated changes into parent (if need be), safely and efficiently, and without calling code having to know about parent information.
-
size
public int size()
- Returns:
- Number of symbol entries contained by this canonicalizer instance
-
bucketCount
public int bucketCount()
- Returns:
- number of primary slots table has currently
-
maybeDirty
public boolean maybeDirty()
Method called to check to quickly see if a child symbol table may have gotten additional entries. Used for checking to see if a child table should be merged into shared table.- Returns:
- Whether main hash area has been modified
-
hashSeed
public int hashSeed()
-
primaryCount
public int primaryCount()
Method mostly needed by unit tests; calculates number of entries that are in the primary slot set. These are "perfect" entries, accessible with a single lookup- Returns:
- Number of entries in the primary hash area
-
secondaryCount
public int secondaryCount()
Method mostly needed by unit tests; calculates number of entries in secondary buckets- Returns:
- Number of entries in the secondary hash area
-
tertiaryCount
public int tertiaryCount()
Method mostly needed by unit tests; calculates number of entries in tertiary buckets- Returns:
- Number of entries in the tertiary hash area
-
spilloverCount
public int spilloverCount()
Method mostly needed by unit tests; calculates number of entries in shared spill-over area- Returns:
- Number of entries in the linear spill-over areay
-
totalCount
public int totalCount()
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
findName
public java.lang.String findName(int q1)
-
findName
public java.lang.String findName(int q1, int q2)
-
findName
public java.lang.String findName(int q1, int q2, int q3)
-
findName
public java.lang.String findName(int[] q, int qlen)
-
addName
public java.lang.String addName(java.lang.String name, int q1)
-
addName
public java.lang.String addName(java.lang.String name, int q1, int q2)
-
addName
public java.lang.String addName(java.lang.String name, int q1, int q2, int q3)
-
addName
public java.lang.String addName(java.lang.String name, int[] q, int qlen)
-
calcHash
public int calcHash(int q1)
-
calcHash
public int calcHash(int q1, int q2)
-
calcHash
public int calcHash(int q1, int q2, int q3)
-
calcHash
public int calcHash(int[] q, int qlen)
-
_reportTooManyCollisions
protected void _reportTooManyCollisions()
-
-