Class CharacterReader


  • public final class CharacterReader
    extends java.lang.Object
    CharacterReader consumes tokens off a string. Used internally by jsoup. API subject to changes.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void advance()
      Moves the current position by one.
      private void bufferUp()  
      private static java.lang.String cacheString​(char[] charBuf, java.lang.String[] stringCache, int start, int count)
      Caches short strings, as a flywheel pattern, to reduce GC load.
      (package private) char consume()  
      (package private) java.lang.String consumeData()  
      (package private) java.lang.String consumeDigitSequence()  
      (package private) java.lang.String consumeHexSequence()  
      (package private) java.lang.String consumeLetterSequence()  
      (package private) java.lang.String consumeLetterThenDigitSequence()  
      (package private) java.lang.String consumeTagName()  
      java.lang.String consumeTo​(char c)
      Reads characters up to the specific char.
      (package private) java.lang.String consumeTo​(java.lang.String seq)  
      java.lang.String consumeToAny​(char... chars)
      Read characters until the first of any delimiters is found.
      (package private) java.lang.String consumeToAnySorted​(char... chars)  
      (package private) java.lang.String consumeToEnd()  
      (package private) boolean containsIgnoreCase​(java.lang.String seq)  
      char current()
      Get the char at the current position.
      (package private) boolean isBinary()
      Heuristic to determine if the current buffer looks like binary content.
      boolean isEmpty()
      Tests if all the content has been read.
      private boolean isEmptyNoBufferUp()  
      (package private) void mark()  
      (package private) boolean matchConsume​(java.lang.String seq)  
      (package private) boolean matchConsumeIgnoreCase​(java.lang.String seq)  
      (package private) boolean matches​(char c)  
      (package private) boolean matches​(java.lang.String seq)  
      (package private) boolean matchesAny​(char... seq)  
      (package private) boolean matchesAnySorted​(char[] seq)  
      (package private) boolean matchesDigit()  
      (package private) boolean matchesIgnoreCase​(java.lang.String seq)  
      (package private) boolean matchesLetter()  
      (package private) int nextIndexOf​(char c)
      Returns the number of characters between the current position and the next instance of the input char
      (package private) int nextIndexOf​(java.lang.CharSequence seq)
      Returns the number of characters between the current position and the next instance of the input sequence
      int pos()
      Gets the current cursor position in the content.
      (package private) static boolean rangeEquals​(char[] charBuf, int start, int count, java.lang.String cached)
      Check if the value of the provided range equals the string.
      (package private) boolean rangeEquals​(int start, int count, java.lang.String cached)  
      (package private) void rewindToMark()  
      java.lang.String toString()  
      (package private) void unconsume()  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Field Detail

      • charBuf

        private final char[] charBuf
      • reader

        private final java.io.Reader reader
      • bufLength

        private int bufLength
      • bufSplitPoint

        private int bufSplitPoint
      • bufPos

        private int bufPos
      • readerPos

        private int readerPos
      • bufMark

        private int bufMark
      • stringCache

        private final java.lang.String[] stringCache
      • numNullsConsideredBinary

        private static final int numNullsConsideredBinary
        See Also:
        Constant Field Values
    • Constructor Detail

      • CharacterReader

        public CharacterReader​(java.io.Reader input,
                               int sz)
      • CharacterReader

        public CharacterReader​(java.io.Reader input)
      • CharacterReader

        public CharacterReader​(java.lang.String input)
    • Method Detail

      • bufferUp

        private void bufferUp()
      • pos

        public int pos()
        Gets the current cursor position in the content.
        Returns:
        current position
      • isEmpty

        public boolean isEmpty()
        Tests if all the content has been read.
        Returns:
        true if nothing left to read.
      • isEmptyNoBufferUp

        private boolean isEmptyNoBufferUp()
      • current

        public char current()
        Get the char at the current position.
        Returns:
        char
      • consume

        char consume()
      • unconsume

        void unconsume()
      • advance

        public void advance()
        Moves the current position by one.
      • mark

        void mark()
      • rewindToMark

        void rewindToMark()
      • nextIndexOf

        int nextIndexOf​(char c)
        Returns the number of characters between the current position and the next instance of the input char
        Parameters:
        c - scan target
        Returns:
        offset between current position and next instance of target. -1 if not found.
      • nextIndexOf

        int nextIndexOf​(java.lang.CharSequence seq)
        Returns the number of characters between the current position and the next instance of the input sequence
        Parameters:
        seq - scan target
        Returns:
        offset between current position and next instance of target. -1 if not found.
      • consumeTo

        public java.lang.String consumeTo​(char c)
        Reads characters up to the specific char.
        Parameters:
        c - the delimiter
        Returns:
        the chars read
      • consumeTo

        java.lang.String consumeTo​(java.lang.String seq)
      • consumeToAny

        public java.lang.String consumeToAny​(char... chars)
        Read characters until the first of any delimiters is found.
        Parameters:
        chars - delimiters to scan for
        Returns:
        characters read up to the matched delimiter.
      • consumeToAnySorted

        java.lang.String consumeToAnySorted​(char... chars)
      • consumeData

        java.lang.String consumeData()
      • consumeTagName

        java.lang.String consumeTagName()
      • consumeToEnd

        java.lang.String consumeToEnd()
      • consumeLetterSequence

        java.lang.String consumeLetterSequence()
      • consumeLetterThenDigitSequence

        java.lang.String consumeLetterThenDigitSequence()
      • consumeHexSequence

        java.lang.String consumeHexSequence()
      • consumeDigitSequence

        java.lang.String consumeDigitSequence()
      • matches

        boolean matches​(char c)
      • matches

        boolean matches​(java.lang.String seq)
      • matchesIgnoreCase

        boolean matchesIgnoreCase​(java.lang.String seq)
      • matchesAny

        boolean matchesAny​(char... seq)
      • matchesAnySorted

        boolean matchesAnySorted​(char[] seq)
      • matchesLetter

        boolean matchesLetter()
      • matchesDigit

        boolean matchesDigit()
      • matchConsume

        boolean matchConsume​(java.lang.String seq)
      • matchConsumeIgnoreCase

        boolean matchConsumeIgnoreCase​(java.lang.String seq)
      • containsIgnoreCase

        boolean containsIgnoreCase​(java.lang.String seq)
      • isBinary

        boolean isBinary()
        Heuristic to determine if the current buffer looks like binary content. Reader will already hopefully be decoded correctly, so a bunch of NULLs indicates a binary file
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • cacheString

        private static java.lang.String cacheString​(char[] charBuf,
                                                    java.lang.String[] stringCache,
                                                    int start,
                                                    int count)
        Caches short strings, as a flywheel pattern, to reduce GC load. Just for this doc, to prevent leaks.

        Simplistic, and on hash collisions just falls back to creating a new string, vs a full HashMap with Entry list. That saves both having to create objects as hash keys, and running through the entry list, at the expense of some more duplicates.

      • rangeEquals

        static boolean rangeEquals​(char[] charBuf,
                                   int start,
                                   int count,
                                   java.lang.String cached)
        Check if the value of the provided range equals the string.
      • rangeEquals

        boolean rangeEquals​(int start,
                            int count,
                            java.lang.String cached)