Class EmblCDROMIndexReader

  • Direct Known Subclasses:
    AcnumHitReader, AcnumTrgReader, DivisionLkpReader, EntryNamIdxReader

    public abstract class EmblCDROMIndexReader
    extends java.lang.Object

    EmblCDROMIndexReader is an abstract class whose concrete subclasses read EMBL CD-ROM format indices from an underlying InputStream. This format is used by the EMBOSS package for database indexing (see programs dbiblast, dbifasta, dbiflat and dbigcg). Indexing produces four binary files with a simple format:

    • division.lkp : master index
    • entrynam.idx : sequence ID index
    • acnum.trg : accession number index
    • acnum.hit : accession number auxiliary index

    Internally EMBOSS checks for Big-endian architechtures and switches the byte order to Little-endian. This means trouble if you try to read the file using DataInputStream, but at least the binaries are consistent across architechtures. This class carries out the necessary conversion.

    The EMBL CD-ROM format stores the date in 4 bytes. One byte is unused (the first one), leaving one byte for the day, one for the month and one (!) for the year.

    For further information see the EMBOSS documentation, or for a full description, the source code of the dbi programs and the Ajax library.

    Since:
    1.2
    Author:
    Keith James
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected java.io.InputStream input  
      protected org.biojava.bio.seq.db.emblcd.RecordParser recParser  
      protected java.lang.StringBuffer sb  
    • Constructor Summary

      Constructors 
      Constructor Description
      EmblCDROMIndexReader​(java.io.InputStream input)
      Creates a new EmblCDROMIndexReader instance.
    • Method Summary

      All Methods Instance Methods Abstract Methods Concrete Methods 
      Modifier and Type Method Description
      void close()
      close closes the underlying InputStream.
      java.lang.String readDBDate()
      readDBDate reads the date from the index header.
      java.lang.String readDBName()
      readDBName returns the database name from the index header.
      java.lang.String readDBRelease()
      readDBRelease returns the database release from the index header.
      long readFileLength()
      readFileLength returns the file length in bytes (stored within the file's header by the indexing program).
      byte[] readRawRecord()
      readRawRecord returns the raw bytes of a single record from the index.
      abstract java.lang.Object[] readRecord()
      readRecord returns an array of objects parsed from a single record.
      long readRecordCount()
      readRecordCount returns the number of records in the file.
      int readRecordLength()
      readRecordLength returns the record length (bytes).
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • input

        protected java.io.InputStream input
      • sb

        protected java.lang.StringBuffer sb
      • recParser

        protected org.biojava.bio.seq.db.emblcd.RecordParser recParser
    • Constructor Detail

      • EmblCDROMIndexReader

        public EmblCDROMIndexReader​(java.io.InputStream input)
                             throws java.io.IOException
        Creates a new EmblCDROMIndexReader instance. A BufferedInputStream is probably the most suitable.
        Parameters:
        input - an InputStream.
        Throws:
        java.io.IOException - if an error occurs.
    • Method Detail

      • readFileLength

        public long readFileLength()
        readFileLength returns the file length in bytes (stored within the file's header by the indexing program). This may be called more than once as the value is cached.
        Returns:
        a long.
      • readRecordCount

        public long readRecordCount()
        readRecordCount returns the number of records in the file. This may be called more than once as the value is cached.
        Returns:
        a long.
      • readRecordLength

        public int readRecordLength()
        readRecordLength returns the record length (bytes). This may be called more than once as the value is cached.
        Returns:
        an int.
      • readDBName

        public java.lang.String readDBName()
        readDBName returns the database name from the index header. This may be called more than once as the value is cached.
        Returns:
        a String.
      • readDBRelease

        public java.lang.String readDBRelease()
        readDBRelease returns the database release from the index header. This may be called more than once as the value is cached.
        Returns:
        a String.
      • readDBDate

        public java.lang.String readDBDate()
        readDBDate reads the date from the index header. The date is stored in 4 bytes: 0, unused; 1, year; 2, month; 3, day. With a 1 byte year it's not very much use and I'm not sure that the EMBOSS programs set the value correctly anyway.
        Returns:
        a String.
      • readRecord

        public abstract java.lang.Object[] readRecord()
                                               throws java.io.IOException
        readRecord returns an array of objects parsed from a single record. Its content will depend on the type of index file. Concrete subclasses must provide an implementation of this method.
        Returns:
        an Object [] array.
        Throws:
        java.io.IOException - if an error occurs.
      • readRawRecord

        public byte[] readRawRecord()
                             throws java.io.IOException
        readRawRecord returns the raw bytes of a single record from the index.
        Returns:
        a byte [] array.
        Throws:
        java.io.IOException - if an error occurs.
      • close

        public void close()
                   throws java.io.IOException
        close closes the underlying InputStream.
        Throws:
        java.io.IOException - if an error occurs.