Package | Description |
---|---|
org.apache.nutch.analysis.lang |
Text document language identifier.
|
org.apache.nutch.api.impl.db | |
org.apache.nutch.crawl |
Crawl control code and tools to run the crawler.
|
org.apache.nutch.fetcher |
The Nutch robot.
|
org.apache.nutch.host |
Host database to store metadata per host.
|
org.apache.nutch.indexer |
Index content, configure and run indexing and cleaning jobs to
add, update, and delete documents from an index.
|
org.apache.nutch.indexer.anchor |
An indexing plugin for inbound anchor text.
|
org.apache.nutch.indexer.basic |
A basic indexing plugin, adds basic fields: url, host, title, content, etc.
|
org.apache.nutch.indexer.html |
Index raw HTML content.
|
org.apache.nutch.indexer.jsoup.extractor |
Indexing filter for jsoup-extractor plugin
|
org.apache.nutch.indexer.metadata |
Indexing filter to add document metadata to the index.
|
org.apache.nutch.indexer.more |
A more indexing plugin, adds "more" index fields:
last modified date, MIME type, content length.
|
org.apache.nutch.indexer.subcollection |
Indexing filter to assign documents to subcollections.
|
org.apache.nutch.indexer.tld |
Top Level Domain Indexing plugin.
|
org.apache.nutch.microformats.reltag |
A microformats Rel-Tag
Parser/Indexer/Querier plugin.
|
org.apache.nutch.net |
Web-related interfaces: URL
filters
and normalizers . |
org.apache.nutch.parse |
The
Parse interface and related classes. |
org.apache.nutch.parse.html |
An HTML document parsing plugin.
|
org.apache.nutch.parse.js |
Parser and parse filter plugin to extract all (possible) links
from JavaScript files and embedded JavaScript code snippets.
|
org.apache.nutch.parse.jsoup.extractor |
Parse filter based on Jsoup
|
org.apache.nutch.parse.metatags |
Parse filter to extract meta tags: keywords, description, etc.
|
org.apache.nutch.parse.tika |
Parse various document formats with help of
Apache Tika.
|
org.apache.nutch.plugin |
The Nutch
Plugin System. |
org.apache.nutch.protocol |
Classes related to the
Protocol interface,
see also org.apache.nutch.net.protocols . |
org.apache.nutch.protocol.file |
Protocol plugin which supports retrieving local file resources.
|
org.apache.nutch.protocol.ftp |
Protocol plugin which supports retrieving documents via the ftp protocol.
|
org.apache.nutch.protocol.http |
Protocol plugin which supports retrieving documents via the http protocol.
|
org.apache.nutch.protocol.http.api |
Common API used by HTTP plugins (
http ,
httpclient ) |
org.apache.nutch.protocol.sftp |
Protocol plugin which supports retrieving documents via the sftp protocol.
|
org.apache.nutch.scoring |
The
ScoringFilter interface. |
org.apache.nutch.scoring.link |
Scoring filter
|
org.apache.nutch.scoring.opic |
Scoring filter implementing a variant of the Online Page Importance Computation
(OPIC) algorithm.
|
org.apache.nutch.scoring.tld |
Top Level Domain Scoring plugin.
|
org.apache.nutch.storage |
Representation (
web pages ,
host metadata ) of data in abstracted storage. |
org.apache.nutch.util |
Miscellaneous utility classes.
|
org.apache.nutch.util.domain |
Classes for domain name analysis.
|
org.creativecommons.nutch |
Sample plugins that parse and index Creative Commons medadata.
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
Host
Host represents a store of webpages or other data which resides on a server or other computer so that it can be accessed over the Internet
|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
Class and Description |
---|
ParseStatus
A nested container representing parse status data captured from invocation of parsers on fetch of a WebPage
|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
ProtocolStatus
A nested container representing data captured from web server responses.
|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Class and Description |
---|
Host
Host represents a store of webpages or other data which resides on a server or other computer so that it can be accessed over the Internet
|
Host.Builder
RecordBuilder for Host instances.
|
Host.Field
Enum containing all data bean's fields.
|
Host.Tombstone |
Mark |
ParseStatus
A nested container representing parse status data captured from invocation of parsers on fetch of a WebPage
|
ParseStatus.Builder
RecordBuilder for ParseStatus instances.
|
ParseStatus.Field
Enum containing all data bean's fields.
|
ParseStatus.Tombstone |
ProtocolStatus
A nested container representing data captured from web server responses.
|
ProtocolStatus.Builder
RecordBuilder for ProtocolStatus instances.
|
ProtocolStatus.Field
Enum containing all data bean's fields.
|
ProtocolStatus.Tombstone |
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Builder
RecordBuilder for WebPage instances.
|
WebPage.Field
Enum containing all data bean's fields.
|
WebPage.Tombstone |
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
Class and Description |
---|
WebPage
WebPage is the primary data structure in Nutch representing crawl data for a given WebPage at some point in time
|
WebPage.Field
Enum containing all data bean's fields.
|
Copyright © 2019 The Apache Software Foundation