Package | Description |
---|---|
org.apache.nutch.parse |
The
Parse interface and related classes. |
org.apache.nutch.parse.html |
An HTML document parsing plugin.
|
org.apache.nutch.parse.js |
Parser and parse filter plugin to extract all (possible) links
from JavaScript files and embedded JavaScript code snippets.
|
org.apache.nutch.parse.tika |
Parse various document formats with help of
Apache Tika.
|
Modifier and Type | Method and Description |
---|---|
Parser |
ParserFactory.getParserById(java.lang.String id)
Function returns a
Parser instance with the specified
extId , representing its extension ID. |
Parser[] |
ParserFactory.getParsers(java.lang.String contentType,
java.lang.String url)
Function returns an array of
Parser s for a given content type. |
Modifier and Type | Class and Description |
---|---|
class |
HtmlParser |
Modifier and Type | Class and Description |
---|---|
class |
JSParseFilter
This class is a heuristic link extractor for JavaScript files and code
snippets.
|
Modifier and Type | Class and Description |
---|---|
class |
TikaParser
Wrapper for Tika parsers.
|
Copyright © 2019 The Apache Software Foundation