Class CharacterTokenization

  • All Implemented Interfaces:
    java.io.Serializable, Annotatable, SymbolTokenization, Changeable

    public class CharacterTokenization
    extends Unchangeable
    implements SymbolTokenization, java.io.Serializable

    Implementation of SymbolTokenization which binds symbols to single unicode characters.

    Many alphabets (and all simple built-in alphabets like DNA, RNA and Protein) will have an instance of CharacterTokenization registered under the name 'token', so that you could say CharacterTokenization ct = (CharacterTokenization) alpha.getTokenization('token'); and expect it to work. When you construct a new instance of this class for an alphabet, there will be no initial associations of Symbols with characters. It is your responsibility to populate the new tokenization appropriately.

    Since:
    1.2
    Author:
    Thomas Down, Matthew Pocock, Greg Cox, Keith James
    See Also:
    Serialized Form
    • Constructor Detail

      • CharacterTokenization

        public CharacterTokenization​(Alphabet alpha,
                                     boolean caseSensitive)
    • Method Detail

      • getAnnotation

        public Annotation getAnnotation()
        Description copied from interface: Annotatable
        Should return the associated annotation object.
        Specified by:
        getAnnotation in interface Annotatable
        Returns:
        an Annotation object, never null
      • bindSymbol

        public void bindSymbol​(Symbol s,
                               char c)

        Bind a Symbol to a character.

        This method will ensure that when this char is observed, it resolves to this symbol. If it was previously associated with another symbol, the old binding is removed. If this is the first time the symbol has been bound to any character, then this character is taken to be the default tokenization of the Symbol. This means that when converting symbols into characters, this char will be used. If the symbol has previously been bound to another character, then this char will not be produced for the symbol when stringifying the symbol, but this symbol will be produced when tokenizing this character.

        Parameters:
        s - the Symbol to bind
        c - the char to bind it to
      • parseToken

        public Symbol parseToken​(java.lang.String token)
                          throws IllegalSymbolException
        Description copied from interface: SymbolTokenization
        Returns the symbol for a single token.

        The Symbol will be a member of the alphabet. If the token is not recognized as mapping to a symbol, an exception will be thrown.

        Specified by:
        parseToken in interface SymbolTokenization
        Parameters:
        token - the token to retrieve a Symbol for
        Returns:
        the Symbol for that token
        Throws:
        IllegalSymbolException - if there is no Symbol for the token
      • getTokenTable

        protected Symbol[] getTokenTable()