Class HItemExtractor

  • All Implemented Interfaces:
    org.apache.any23.extractor.Extractor<Document>, org.apache.any23.extractor.Extractor.TagSoupDOMExtractor

    public class HItemExtractor
    extends EntityBasedMicroformatExtractor
    Extractor for the h-item microformat.
    Author:
    Nisala Nirmana
    • Constructor Detail

      • HItemExtractor

        public HItemExtractor()
    • Method Detail

      • getDescription

        public org.apache.any23.extractor.ExtractorDescription getDescription()
        Description copied from class: MicroformatExtractor
        Returns the description of this extractor.
        Specified by:
        getDescription in interface org.apache.any23.extractor.Extractor<Document>
        Specified by:
        getDescription in class MicroformatExtractor
        Returns:
        a human readable description.
      • extractEntity

        protected boolean extractEntity​(Node node,
                                        org.apache.any23.extractor.ExtractionResult out)
                                 throws org.apache.any23.extractor.ExtractionException
        Description copied from class: EntityBasedMicroformatExtractor
        Extracts an entity from a DOM node.
        Specified by:
        extractEntity in class EntityBasedMicroformatExtractor
        Parameters:
        node - the DOM node.
        out - the extraction result collector.
        Returns:
        true if the extraction has produces something, false otherwise.
        Throws:
        org.apache.any23.extractor.ExtractionException - if there is an error during extraction