Class CharTypes


  • public final class CharTypes
    extends java.lang.Object
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected static byte[] HB  
      protected static char[] HC  
      protected static int[] sHexValues
      Lookup table for the first 256 Unicode characters (ASCII / UTF-8) range.
      protected static int[] sInputCodes
      Lookup table used for determining which input characters need special handling when contained in text segment.
      protected static int[] sInputCodesComment
      Decoding table used to quickly determine characters that are relevant within comment content.
      protected static int[] sInputCodesJsNames
      To support non-default (and -standard) unquoted field names mode, need to have alternate checking.
      protected static int[] sInputCodesUTF8
      Additionally we can combine UTF-8 decoding info into similar data table.
      protected static int[] sInputCodesUtf8JsNames
      This table is similar to Latin-1, except that it marks all "high-bit" code as ok.
      protected static int[] sInputCodesWS
      Decoding table used for skipping white space and comments.
      protected static int[] sOutputEscapes128
      Lookup table used for determining which output characters in 7-bit ASCII range need to be quoted.
    • Constructor Summary

      Constructors 
      Constructor Description
      CharTypes()  
    • Field Detail

      • HC

        protected static final char[] HC
      • HB

        protected static final byte[] HB
      • sInputCodes

        protected static final int[] sInputCodes
        Lookup table used for determining which input characters need special handling when contained in text segment.
      • sInputCodesUTF8

        protected static final int[] sInputCodesUTF8
        Additionally we can combine UTF-8 decoding info into similar data table.
      • sInputCodesJsNames

        protected static final int[] sInputCodesJsNames
        To support non-default (and -standard) unquoted field names mode, need to have alternate checking. Basically this is list of 8-bit ASCII characters that are legal as part of Javascript identifier
      • sInputCodesUtf8JsNames

        protected static final int[] sInputCodesUtf8JsNames
        This table is similar to Latin-1, except that it marks all "high-bit" code as ok. They will be validated at a later point, when decoding name
      • sInputCodesComment

        protected static final int[] sInputCodesComment
        Decoding table used to quickly determine characters that are relevant within comment content.
      • sInputCodesWS

        protected static final int[] sInputCodesWS
        Decoding table used for skipping white space and comments.
        Since:
        2.3
      • sOutputEscapes128

        protected static final int[] sOutputEscapes128
        Lookup table used for determining which output characters in 7-bit ASCII range need to be quoted.
      • sHexValues

        protected static final int[] sHexValues
        Lookup table for the first 256 Unicode characters (ASCII / UTF-8) range. For actual hex digits, contains corresponding value; for others -1.

        NOTE: before 2.10.1, was of size 128, extended for simpler handling

    • Constructor Detail

      • CharTypes

        public CharTypes()
    • Method Detail

      • getInputCodeLatin1

        public static int[] getInputCodeLatin1()
      • getInputCodeUtf8

        public static int[] getInputCodeUtf8()
      • getInputCodeLatin1JsNames

        public static int[] getInputCodeLatin1JsNames()
      • getInputCodeUtf8JsNames

        public static int[] getInputCodeUtf8JsNames()
      • getInputCodeComment

        public static int[] getInputCodeComment()
      • getInputCodeWS

        public static int[] getInputCodeWS()
      • get7BitOutputEscapes

        public static int[] get7BitOutputEscapes()
        Accessor for getting a read-only encoding table for first 128 Unicode code points (single-byte UTF-8 characters). Value of 0 means "no escaping"; other positive values that value is character to use after backslash; and negative values that generic (backslash - u) escaping is to be used.
        Returns:
        128-entry int[] that contains escape definitions
      • get7BitOutputEscapes

        public static int[] get7BitOutputEscapes​(int quoteChar)
        Alternative to get7BitOutputEscapes() when a non-standard quote character is used.
        Parameters:
        quoteChar - Character used for quoting textual values and property names; usually double-quote but sometimes changed to single-quote (apostrophe)
        Returns:
        128-entry int[] that contains escape definitions
        Since:
        2.10
      • charToHex

        public static int charToHex​(int ch)
      • appendQuoted

        public static void appendQuoted​(java.lang.StringBuilder sb,
                                        java.lang.String content)
      • copyHexChars

        public static char[] copyHexChars()
      • copyHexBytes

        public static byte[] copyHexBytes()