Class SeqIOConstants


  • public final class SeqIOConstants
    extends java.lang.Object
    SeqIOConstants contains constants used to identify sequence formats, alphabets etc, in the context of reading and writing sequences.

    An int used to specify symbol alphabet and sequence format type is derived thus:

    • The two least significant bytes are reserved for format types such as RAW, FASTA, EMBL etc.
    • The two most significant bytes are reserved for alphabet and symbol information such as AMBIGUOUS, DNA, RNA, AA etc.
    • Bitwise OR combinations of each component int are used to specify combinations of format type and symbol information. To derive an int identifier for DNA with ambiguity codes in Fasta format, bitwise OR the AMBIGUOUS, DNA and FASTA values.
    Author:
    Keith James
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static int AA
      AA indicates that a sequence contains AA (amino acid) symbols.
      static int AMBIGUOUS
      AMBIGUOUS indicates that a sequence contains ambiguity symbols.
      static int DNA
      DNA indicates that a sequence contains DNA (deoxyribonucleic acid) symbols.
      static int EMBL
      EMBL indicates that the sequence format is EMBL.
      static int EMBL_AA
      EMBL_AA premade EMBL | AA.
      static int EMBL_DNA
      EMBL_DNA premade EMBL | DNA.
      static int EMBL_RNA
      EMBL_RNA premade EMBL | RNA.
      static int FASTA
      FASTA indicates that the sequence format is Fasta.
      static int FASTA_AA
      FASTA_AA premade FASTA | AA.
      static int FASTA_DNA
      FASTA_DNA premade FASTA | DNA.
      static int FASTA_RNA
      FASTA_RNA premade FASTA | RNA.
      static int GCG
      GCG indicates that the sequence format is GCG.
      static int GENBANK
      GENBANK indicates that the sequence format is GENBANK.
      static int GENBANK_AA
      GENBANK_DNA premade GENBANK | AA.
      static int GENBANK_DNA
      GENBANK_DNA premade GENBANK | DNA.
      static int GENBANK_RNA
      GENBANK_DNA premade GENBANK | RNA.
      static int GENPEPT
      GENPEPT indicates that the sequence format is GENPEPT.
      static int GFF
      GFF indicates that the sequence format is GFF.
      static int IG
      IG indicates that the sequence format is IG.
      static int INTEGER
      INTEGER indicates that a sequence contains integer alphabet symbols, such as used to describe sequence quality data.
      static LifeScienceIdentifier LSID_EMBL_AA
      LSID_EMBL_AA sequence format LSID for EMBL AA.
      static LifeScienceIdentifier LSID_EMBL_DNA
      LSID_EMBL_DNA sequence format LSID for EMBL DNA.
      static LifeScienceIdentifier LSID_EMBL_RNA
      LSID_EMBL_RNA sequence format LSID for EMBL RNA.
      static LifeScienceIdentifier LSID_FASTA_AA
      LSID_FASTA_AA sequence format LSID for Fasta AA.
      static LifeScienceIdentifier LSID_FASTA_DNA
      LSID_FASTA_DNA sequence format LSID for Fasta DNA.
      static LifeScienceIdentifier LSID_FASTA_RNA
      LSID_FASTA_RNA sequence format LSID for Fasta RNA.
      static LifeScienceIdentifier LSID_GENBANK_AA
      LSID_GENBANK_AA sequence format LSID for Genbank AA.
      static LifeScienceIdentifier LSID_GENBANK_DNA
      LSID_GENBANK_DNA sequence format LSID for Genbank DNA.
      static LifeScienceIdentifier LSID_GENBANK_RNA
      LSID_GENBANK_RNA sequence format LSID for Genbank RNA.
      static LifeScienceIdentifier LSID_SWISSPROT
      LSID_SWISSPROT sequence format LSID for Swissprot.
      static int NBRF
      NBRF indicates that the sequence format is NBRF.
      static int PDB
      PDB indicates that the sequence format is PDB.
      static int PHRED
      PHRED indicates that the sequence format is PHRED.
      static int RAW
      RAW indicates that the sequence format is raw (symbols only).
      static int REFSEQ
      REFSEQ indicates that the sequence format is REFSEQ.
      static int REFSEQ_AA
      REFSEQ_AA premade REFSEQ | AA.
      static int REFSEQ_DNA
      REFSEQ_DNA premade REFSEQ | DNA.
      static int REFSEQ_RNA
      REFSEQ_RNA premade REFSEQ | RNA.
      static int RNA
      RNA indicates that a sequence contains RNA (ribonucleic acid) symbols.
      static int SWISSPROT
      SWISSPROT indicates that the sequence format is SWISSPROT.
      static int UNKNOWN
      UNKNOWN indicates that the sequence format is unknown.
    • Constructor Summary

      Constructors 
      Constructor Description
      SeqIOConstants()  
    • Method Summary

      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • AMBIGUOUS

        public static final int AMBIGUOUS
        AMBIGUOUS indicates that a sequence contains ambiguity symbols. The first bit of the most significant word of the int is set.
        See Also:
        Constant Field Values
      • DNA

        public static final int DNA
        DNA indicates that a sequence contains DNA (deoxyribonucleic acid) symbols. The second bit of the most significant word of the int is set.
        See Also:
        Constant Field Values
      • RNA

        public static final int RNA
        RNA indicates that a sequence contains RNA (ribonucleic acid) symbols. The third bit of the most significant word of the int is set.
        See Also:
        Constant Field Values
      • AA

        public static final int AA
        AA indicates that a sequence contains AA (amino acid) symbols. The fourth bit of the most significant word of the int is set.
        See Also:
        Constant Field Values
      • INTEGER

        public static final int INTEGER
        INTEGER indicates that a sequence contains integer alphabet symbols, such as used to describe sequence quality data. The fifth bit of the most significant word of the int is set.
        See Also:
        Constant Field Values
      • UNKNOWN

        public static final int UNKNOWN
        UNKNOWN indicates that the sequence format is unknown.
        See Also:
        Constant Field Values
      • RAW

        public static final int RAW
        RAW indicates that the sequence format is raw (symbols only).
        See Also:
        Constant Field Values
      • FASTA

        public static final int FASTA
        FASTA indicates that the sequence format is Fasta.
        See Also:
        Constant Field Values
      • NBRF

        public static final int NBRF
        NBRF indicates that the sequence format is NBRF.
        See Also:
        Constant Field Values
      • IG

        public static final int IG
        IG indicates that the sequence format is IG.
        See Also:
        Constant Field Values
      • EMBL

        public static final int EMBL
        EMBL indicates that the sequence format is EMBL.
        See Also:
        Constant Field Values
      • SWISSPROT

        public static final int SWISSPROT
        SWISSPROT indicates that the sequence format is SWISSPROT. Always protein, so already had the AA bit set.
        See Also:
        Constant Field Values
      • GENBANK

        public static final int GENBANK
        GENBANK indicates that the sequence format is GENBANK.
        See Also:
        Constant Field Values
      • GENPEPT

        public static final int GENPEPT
        GENPEPT indicates that the sequence format is GENPEPT. Always protein, so already had the AA bit set.
        See Also:
        Constant Field Values
      • REFSEQ

        public static final int REFSEQ
        REFSEQ indicates that the sequence format is REFSEQ.
        See Also:
        Constant Field Values
      • GCG

        public static final int GCG
        GCG indicates that the sequence format is GCG.
        See Also:
        Constant Field Values
      • GFF

        public static final int GFF
        GFF indicates that the sequence format is GFF.
        See Also:
        Constant Field Values
      • PDB

        public static final int PDB
        PDB indicates that the sequence format is PDB. Always protein, so already had the AA bit set.
        See Also:
        Constant Field Values
      • PHRED

        public static final int PHRED
        PHRED indicates that the sequence format is PHRED. Always DNA, so already had the DNA bit set. Also has INTEGER bit set for quality data.
        See Also:
        Constant Field Values
      • EMBL_DNA

        public static final int EMBL_DNA
        EMBL_DNA premade EMBL | DNA.
        See Also:
        Constant Field Values
      • EMBL_RNA

        public static final int EMBL_RNA
        EMBL_RNA premade EMBL | RNA.
        See Also:
        Constant Field Values
      • GENBANK_DNA

        public static final int GENBANK_DNA
        GENBANK_DNA premade GENBANK | DNA.
        See Also:
        Constant Field Values
      • GENBANK_RNA

        public static final int GENBANK_RNA
        GENBANK_DNA premade GENBANK | RNA.
        See Also:
        Constant Field Values
      • GENBANK_AA

        public static final int GENBANK_AA
        GENBANK_DNA premade GENBANK | AA.
        See Also:
        Constant Field Values
      • REFSEQ_DNA

        public static final int REFSEQ_DNA
        REFSEQ_DNA premade REFSEQ | DNA.
        See Also:
        Constant Field Values
      • REFSEQ_RNA

        public static final int REFSEQ_RNA
        REFSEQ_RNA premade REFSEQ | RNA.
        See Also:
        Constant Field Values
      • REFSEQ_AA

        public static final int REFSEQ_AA
        REFSEQ_AA premade REFSEQ | AA.
        See Also:
        Constant Field Values
      • FASTA_DNA

        public static final int FASTA_DNA
        FASTA_DNA premade FASTA | DNA.
        See Also:
        Constant Field Values
      • FASTA_RNA

        public static final int FASTA_RNA
        FASTA_RNA premade FASTA | RNA.
        See Also:
        Constant Field Values
      • FASTA_AA

        public static final int FASTA_AA
        FASTA_AA premade FASTA | AA.
        See Also:
        Constant Field Values
      • LSID_FASTA_DNA

        public static final LifeScienceIdentifier LSID_FASTA_DNA
        LSID_FASTA_DNA sequence format LSID for Fasta DNA.
      • LSID_FASTA_RNA

        public static final LifeScienceIdentifier LSID_FASTA_RNA
        LSID_FASTA_RNA sequence format LSID for Fasta RNA.
      • LSID_FASTA_AA

        public static final LifeScienceIdentifier LSID_FASTA_AA
        LSID_FASTA_AA sequence format LSID for Fasta AA.
      • LSID_EMBL_DNA

        public static final LifeScienceIdentifier LSID_EMBL_DNA
        LSID_EMBL_DNA sequence format LSID for EMBL DNA.
      • LSID_EMBL_RNA

        public static final LifeScienceIdentifier LSID_EMBL_RNA
        LSID_EMBL_RNA sequence format LSID for EMBL RNA.
      • LSID_EMBL_AA

        public static final LifeScienceIdentifier LSID_EMBL_AA
        LSID_EMBL_AA sequence format LSID for EMBL AA.
      • LSID_GENBANK_DNA

        public static final LifeScienceIdentifier LSID_GENBANK_DNA
        LSID_GENBANK_DNA sequence format LSID for Genbank DNA.
      • LSID_GENBANK_RNA

        public static final LifeScienceIdentifier LSID_GENBANK_RNA
        LSID_GENBANK_RNA sequence format LSID for Genbank RNA.
      • LSID_GENBANK_AA

        public static final LifeScienceIdentifier LSID_GENBANK_AA
        LSID_GENBANK_AA sequence format LSID for Genbank AA.
      • LSID_SWISSPROT

        public static final LifeScienceIdentifier LSID_SWISSPROT
        LSID_SWISSPROT sequence format LSID for Swissprot.
    • Constructor Detail

      • SeqIOConstants

        public SeqIOConstants()