Package org.apache.any23.extractor.html
Class HReviewExtractor
- java.lang.Object
-
- org.apache.any23.extractor.html.MicroformatExtractor
-
- org.apache.any23.extractor.html.EntityBasedMicroformatExtractor
-
- org.apache.any23.extractor.html.HReviewExtractor
-
- All Implemented Interfaces:
org.apache.any23.extractor.Extractor<Document>
,org.apache.any23.extractor.Extractor.TagSoupDOMExtractor
public class HReviewExtractor extends EntityBasedMicroformatExtractor
Extractor for the hReview microformat.- Author:
- Gabriele Renzi
-
-
Field Summary
-
Fields inherited from class org.apache.any23.extractor.html.MicroformatExtractor
BEGIN_SCRIPT, END_SCRIPT, valueFactory
-
-
Constructor Summary
Constructors Constructor Description HReviewExtractor()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected boolean
extractEntity(Node node, org.apache.any23.extractor.ExtractionResult out)
Extracts an entity from a DOM node.protected String
getBaseClassName()
Returns the base class name for the extractor.org.apache.any23.extractor.ExtractorDescription
getDescription()
Returns the description of this extractor.protected void
resetExtractor()
Resets the internal status of the extractor to prepare it to a new extraction section.-
Methods inherited from class org.apache.any23.extractor.html.EntityBasedMicroformatExtractor
extract, getBlankNodeFor
-
Methods inherited from class org.apache.any23.extractor.html.MicroformatExtractor
addBNodeProperty, addBNodeProperty, addIRIProperty, conditionallyAddLiteralProperty, conditionallyAddResourceProperty, conditionallyAddStringProperty, fixLink, fixLink, getCurrentExtractionResult, getDocumentIRI, getExtractionContext, getHTMLDocument, includes, openSubResult, run, setCurrentExtractionResult
-
-
-
-
Method Detail
-
getDescription
public org.apache.any23.extractor.ExtractorDescription getDescription()
Description copied from class:MicroformatExtractor
Returns the description of this extractor.- Specified by:
getDescription
in interfaceorg.apache.any23.extractor.Extractor<Document>
- Specified by:
getDescription
in classMicroformatExtractor
- Returns:
- a human readable description.
-
getBaseClassName
protected String getBaseClassName()
Description copied from class:EntityBasedMicroformatExtractor
Returns the base class name for the extractor.- Specified by:
getBaseClassName
in classEntityBasedMicroformatExtractor
- Returns:
- a string containing the base of the extractor.
-
resetExtractor
protected void resetExtractor()
Description copied from class:EntityBasedMicroformatExtractor
Resets the internal status of the extractor to prepare it to a new extraction section.- Specified by:
resetExtractor
in classEntityBasedMicroformatExtractor
-
extractEntity
protected boolean extractEntity(Node node, org.apache.any23.extractor.ExtractionResult out) throws org.apache.any23.extractor.ExtractionException
Description copied from class:EntityBasedMicroformatExtractor
Extracts an entity from a DOM node.- Specified by:
extractEntity
in classEntityBasedMicroformatExtractor
- Parameters:
node
- the DOM node.out
- the extraction result collector.- Returns:
true
if the extraction has produces something,false
otherwise.- Throws:
org.apache.any23.extractor.ExtractionException
- if there is an error during extraction
-
-