Class MicroformatExtractor

    • Nested Class Summary

      • Nested classes/interfaces inherited from interface org.apache.any23.extractor.Extractor

        org.apache.any23.extractor.Extractor.BlindExtractor, org.apache.any23.extractor.Extractor.ContentExtractor, org.apache.any23.extractor.Extractor.TagSoupDOMExtractor
    • Method Summary

      All Methods Static Methods Instance Methods Abstract Methods Concrete Methods 
      Modifier and Type Method Description
      protected void addBNodeProperty​(org.eclipse.rdf4j.model.Resource subject, org.eclipse.rdf4j.model.IRI property, org.eclipse.rdf4j.model.BNode bnode)
      Helper method that adds a BNode property to a node.
      protected void addBNodeProperty​(Node n, org.eclipse.rdf4j.model.Resource subject, org.eclipse.rdf4j.model.IRI property, org.eclipse.rdf4j.model.BNode bnode)
      Helper method that adds a BNode property to a node.
      protected void addIRIProperty​(org.eclipse.rdf4j.model.Resource subject, org.eclipse.rdf4j.model.IRI property, org.eclipse.rdf4j.model.IRI object)
      Helper method that adds a IRI property to a node.
      protected boolean conditionallyAddLiteralProperty​(Node n, org.eclipse.rdf4j.model.Resource subject, org.eclipse.rdf4j.model.IRI property, org.eclipse.rdf4j.model.Literal literal)
      Helper method that adds a literal property to a node.
      protected boolean conditionallyAddResourceProperty​(org.eclipse.rdf4j.model.Resource subject, org.eclipse.rdf4j.model.IRI property, org.eclipse.rdf4j.model.IRI uri)
      Helper method that adds a IRI property to a node.
      protected boolean conditionallyAddStringProperty​(Node n, org.eclipse.rdf4j.model.Resource subject, org.eclipse.rdf4j.model.IRI p, String value)
      Helper method that adds a literal property to a subject only if the value of the property is a valid string.
      protected abstract boolean extract()
      Performs the extraction of the data and writes them to the model.
      protected org.eclipse.rdf4j.model.IRI fixLink​(String link)  
      protected org.eclipse.rdf4j.model.IRI fixLink​(String link, String defaultSchema)  
      protected org.apache.any23.extractor.ExtractionResult getCurrentExtractionResult()
      Returns the ExtractionResult associated to the extraction session.
      abstract org.apache.any23.extractor.ExtractorDescription getDescription()
      Returns the description of this extractor.
      org.eclipse.rdf4j.model.IRI getDocumentIRI()  
      org.apache.any23.extractor.ExtractionContext getExtractionContext()  
      HTMLDocument getHTMLDocument()  
      static boolean includes​(Class<? extends MicroformatExtractor> including, Class<? extends MicroformatExtractor> included)
      This method checks if there is a native nesting relationship between two MicroformatExtractor.
      protected org.apache.any23.extractor.ExtractionResult openSubResult​(org.apache.any23.extractor.ExtractionContext context)  
      void run​(org.apache.any23.extractor.ExtractionParameters extractionParameters, org.apache.any23.extractor.ExtractionContext extractionContext, Document in, org.apache.any23.extractor.ExtractionResult out)  
      protected void setCurrentExtractionResult​(org.apache.any23.extractor.ExtractionResult out)  
    • Constructor Detail

      • MicroformatExtractor

        public MicroformatExtractor()
    • Method Detail

      • getDescription

        public abstract org.apache.any23.extractor.ExtractorDescription getDescription()
        Returns the description of this extractor.
        Specified by:
        getDescription in interface org.apache.any23.extractor.Extractor<Document>
        Returns:
        a human readable description.
      • extract

        protected abstract boolean extract()
                                    throws org.apache.any23.extractor.ExtractionException
        Performs the extraction of the data and writes them to the model. The nodes generated in the model can have any name or implicit label but if possible they SHOULD have names (either URIs or AnonId) that are uniquely derivable from their position in the DOM tree, so that multiple extractors can merge information.
        Returns:
        true if extraction is successful
        Throws:
        org.apache.any23.extractor.ExtractionException - if there is an error during extraction
      • getExtractionContext

        public org.apache.any23.extractor.ExtractionContext getExtractionContext()
      • getDocumentIRI

        public org.eclipse.rdf4j.model.IRI getDocumentIRI()
      • run

        public final void run​(org.apache.any23.extractor.ExtractionParameters extractionParameters,
                              org.apache.any23.extractor.ExtractionContext extractionContext,
                              Document in,
                              org.apache.any23.extractor.ExtractionResult out)
                       throws IOException,
                              org.apache.any23.extractor.ExtractionException
        Specified by:
        run in interface org.apache.any23.extractor.Extractor<Document>
        Throws:
        IOException
        org.apache.any23.extractor.ExtractionException
      • getCurrentExtractionResult

        protected org.apache.any23.extractor.ExtractionResult getCurrentExtractionResult()
        Returns the ExtractionResult associated to the extraction session.
        Returns:
        a valid extraction result.
      • setCurrentExtractionResult

        protected void setCurrentExtractionResult​(org.apache.any23.extractor.ExtractionResult out)
      • openSubResult

        protected org.apache.any23.extractor.ExtractionResult openSubResult​(org.apache.any23.extractor.ExtractionContext context)
      • conditionallyAddStringProperty

        protected boolean conditionallyAddStringProperty​(Node n,
                                                         org.eclipse.rdf4j.model.Resource subject,
                                                         org.eclipse.rdf4j.model.IRI p,
                                                         String value)
        Helper method that adds a literal property to a subject only if the value of the property is a valid string.
        Parameters:
        n - the HTML node from which the property value has been extracted.
        subject - the property subject.
        p - the property IRI.
        value - the property value.
        Returns:
        returns true if the value has been accepted and added, false otherwise.
      • conditionallyAddLiteralProperty

        protected boolean conditionallyAddLiteralProperty​(Node n,
                                                          org.eclipse.rdf4j.model.Resource subject,
                                                          org.eclipse.rdf4j.model.IRI property,
                                                          org.eclipse.rdf4j.model.Literal literal)
        Helper method that adds a literal property to a node.
        Parameters:
        n - the HTML node from which the property value has been extracted.
        subject - subject the property subject.
        property - the property IRI.
        literal - value the property value.
        Returns:
        returns true if the literal has been accepted and added, false otherwise.
      • conditionallyAddResourceProperty

        protected boolean conditionallyAddResourceProperty​(org.eclipse.rdf4j.model.Resource subject,
                                                           org.eclipse.rdf4j.model.IRI property,
                                                           org.eclipse.rdf4j.model.IRI uri)
        Helper method that adds a IRI property to a node.
        Parameters:
        subject - the property subject.
        property - the property IRI.
        uri - the property object.
        Returns:
        true if the the resource has been added, false otherwise.
      • addBNodeProperty

        protected void addBNodeProperty​(Node n,
                                        org.eclipse.rdf4j.model.Resource subject,
                                        org.eclipse.rdf4j.model.IRI property,
                                        org.eclipse.rdf4j.model.BNode bnode)
        Helper method that adds a BNode property to a node.
        Parameters:
        n - the HTML node used for extracting such property.
        subject - the property subject.
        property - the property IRI.
        bnode - the property value.
      • addBNodeProperty

        protected void addBNodeProperty​(org.eclipse.rdf4j.model.Resource subject,
                                        org.eclipse.rdf4j.model.IRI property,
                                        org.eclipse.rdf4j.model.BNode bnode)
        Helper method that adds a BNode property to a node.
        Parameters:
        subject - the property subject.
        property - the property IRI.
        bnode - the property value.
      • addIRIProperty

        protected void addIRIProperty​(org.eclipse.rdf4j.model.Resource subject,
                                      org.eclipse.rdf4j.model.IRI property,
                                      org.eclipse.rdf4j.model.IRI object)
        Helper method that adds a IRI property to a node.
        Parameters:
        subject - subject to add
        property - predicate to add
        object - object to add
      • fixLink

        protected org.eclipse.rdf4j.model.IRI fixLink​(String link)
      • fixLink

        protected org.eclipse.rdf4j.model.IRI fixLink​(String link,
                                                      String defaultSchema)