Package | Description |
---|---|
org.apache.nutch.analysis.lang |
Text document language identifier.
|
org.apache.nutch.crawl |
Crawl control code and tools to run the crawler.
|
org.apache.nutch.fetcher |
The Nutch robot.
|
org.apache.nutch.indexer |
Index content, configure and run indexing and cleaning jobs to
add, update, and delete documents from an index.
|
org.apache.nutch.indexer.anchor |
An indexing plugin for inbound anchor text.
|
org.apache.nutch.indexer.basic |
A basic indexing plugin, adds basic fields: url, host, title, content, etc.
|
org.apache.nutch.indexer.html |
Index raw HTML content.
|
org.apache.nutch.indexer.jsoup.extractor |
Indexing filter for jsoup-extractor plugin
|
org.apache.nutch.indexer.metadata |
Indexing filter to add document metadata to the index.
|
org.apache.nutch.indexer.more |
A more indexing plugin, adds "more" index fields:
last modified date, MIME type, content length.
|
org.apache.nutch.indexer.subcollection |
Indexing filter to assign documents to subcollections.
|
org.apache.nutch.indexer.tld |
Top Level Domain Indexing plugin.
|
org.apache.nutch.microformats.reltag |
A microformats Rel-Tag
Parser/Indexer/Querier plugin.
|
org.apache.nutch.parse |
The
Parse interface and related classes. |
org.apache.nutch.parse.html |
An HTML document parsing plugin.
|
org.apache.nutch.parse.js |
Parser and parse filter plugin to extract all (possible) links
from JavaScript files and embedded JavaScript code snippets.
|
org.apache.nutch.parse.jsoup.extractor |
Parse filter based on Jsoup
|
org.apache.nutch.parse.metatags |
Parse filter to extract meta tags: keywords, description, etc.
|
org.apache.nutch.parse.tika |
Parse various document formats with help of
Apache Tika.
|
org.apache.nutch.plugin |
The Nutch
Plugin System. |
org.apache.nutch.protocol |
Classes related to the
Protocol interface,
see also org.apache.nutch.net.protocols . |
org.apache.nutch.protocol.file |
Protocol plugin which supports retrieving local file resources.
|
org.apache.nutch.protocol.ftp |
Protocol plugin which supports retrieving documents via the ftp protocol.
|
org.apache.nutch.protocol.http |
Protocol plugin which supports retrieving documents via the http protocol.
|
org.apache.nutch.protocol.httpclient |
Protocol plugin which supports retrieving documents via the HTTP and
HTTPS protocols, optionally with Basic, Digest and NTLM authentication
schemes for web server as well as proxy server.
|
org.apache.nutch.protocol.sftp |
Protocol plugin which supports retrieving documents via the sftp protocol.
|
org.apache.nutch.scoring |
The
ScoringFilter interface. |
org.apache.nutch.scoring.link |
Scoring filter
|
org.apache.nutch.scoring.opic |
Scoring filter implementing a variant of the Online Page Importance Computation
(OPIC) algorithm.
|
org.apache.nutch.scoring.tld |
Top Level Domain Scoring plugin.
|
org.apache.nutch.storage |
Representation (
web pages ,
host metadata ) of data in abstracted storage. |
org.creativecommons.nutch |
Sample plugins that parse and index Creative Commons medadata.
|
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
HTMLLanguageParser.getFields() |
java.util.Collection<WebPage.Field> |
LanguageIndexingFilter.getFields() |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
MD5Signature.getFields() |
java.util.Set<WebPage.Field> |
AbstractFetchSchedule.getFields() |
java.util.Collection<WebPage.Field> |
TextMD5Signature.getFields() |
java.util.Collection<WebPage.Field> |
FetchSchedule.getFields() |
abstract java.util.Collection<WebPage.Field> |
Signature.getFields() |
java.util.Collection<WebPage.Field> |
TextProfileSignature.getFields() |
static java.util.Collection<WebPage.Field> |
SignatureFactory.getFields(Configuration conf) |
java.util.Collection<WebPage.Field> |
GeneratorJob.getFields(Job job) |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
FetcherJob.getFields(Job job) |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
IndexingFilters.getFields()
Gets all the fields for a given
WebPage Many datastores need to
setup the mapreduce job by specifying the fields needed. |
java.util.Collection<WebPage.Field> |
IndexCleaningFilters.getFields() |
java.util.Collection<WebPage.Field> |
CleaningJob.getFields(Job job) |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
AnchorIndexingFilter.getFields()
Gets all the fields for a given
WebPage Many datastores need to
setup the mapreduce job by specifying the fields needed. |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
BasicIndexingFilter.getFields()
Gets all the fields for a given
WebPage Many datastores need to
setup the mapreduce job by specifying the fields needed. |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
HtmlIndexingFilter.getFields() |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
JsoupIndexingFilter.getFields() |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
MetadataIndexer.getFields() |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
MoreIndexingFilter.getFields() |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
SubcollectionIndexingFilter.getFields() |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
TLDIndexingFilter.getFields() |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
RelTagIndexingFilter.getFields()
Gets all the fields for a given
WebPage Many datastores need to
setup the mapreduce job by specifying the fields needed. |
java.util.Collection<WebPage.Field> |
RelTagParser.getFields()
Gets all the fields for a given
WebPage Many datastores need to
setup the mapreduce job by specifying the fields needed. |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
ParseFilters.getFields() |
java.util.Collection<WebPage.Field> |
ParserFactory.getFields() |
java.util.Collection<WebPage.Field> |
NutchSitemapParser.getFields() |
java.util.Collection<WebPage.Field> |
ParserJob.getFields(Job job) |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
HtmlParser.getFields() |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
JSParseFilter.getFields()
Gets all the fields for a given
WebPage Many datastores need to
setup the mapreduce job by specifying the fields needed. |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
JsoupHtmlParser.getFields() |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
MetaTagsParser.getFields() |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
TikaParser.getFields() |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
FieldPluggable.getFields() |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
ProtocolFactory.getFields() |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
File.getFields() |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
Ftp.getFields() |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
Http.getFields() |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
Http.getFields() |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
Sftp.getFields() |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
ScoringFilters.getFields() |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
LinkAnalysisScoringFilter.getFields() |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
OPICScoringFilter.getFields() |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
TLDScoringFilter.getFields() |
Modifier and Type | Method and Description |
---|---|
static WebPage.Field |
WebPage.Field.valueOf(java.lang.String name)
Returns the enum constant of this type with the specified name.
|
static WebPage.Field[] |
WebPage.Field.values()
Returns an array containing the constants of this enum type, in
the order they are declared.
|
Modifier and Type | Method and Description |
---|---|
static <K,V> void |
StorageUtils.initMapperJob(Job job,
java.util.Collection<WebPage.Field> fields,
java.lang.Class<K> outKeyClass,
java.lang.Class<V> outValueClass,
java.lang.Class<? extends org.apache.gora.mapreduce.GoraMapper<java.lang.String,WebPage,K,V>> mapperClass) |
static <K,V> void |
StorageUtils.initMapperJob(Job job,
java.util.Collection<WebPage.Field> fields,
java.lang.Class<K> outKeyClass,
java.lang.Class<V> outValueClass,
java.lang.Class<? extends org.apache.gora.mapreduce.GoraMapper<java.lang.String,WebPage,K,V>> mapperClass,
java.lang.Class<? extends Partitioner<K,V>> partitionerClass) |
static <K,V> void |
StorageUtils.initMapperJob(Job job,
java.util.Collection<WebPage.Field> fields,
java.lang.Class<K> outKeyClass,
java.lang.Class<V> outValueClass,
java.lang.Class<? extends org.apache.gora.mapreduce.GoraMapper<java.lang.String,WebPage,K,V>> mapperClass,
java.lang.Class<? extends Partitioner<K,V>> partitionerClass,
boolean reuseObjects) |
static <K,V> void |
StorageUtils.initMapperJob(Job job,
java.util.Collection<WebPage.Field> fields,
java.lang.Class<K> outKeyClass,
java.lang.Class<V> outValueClass,
java.lang.Class<? extends org.apache.gora.mapreduce.GoraMapper<java.lang.String,WebPage,K,V>> mapperClass,
java.lang.Class<? extends Partitioner<K,V>> partitionerClass,
org.apache.gora.filter.Filter<java.lang.String,WebPage> filter,
boolean reuseObjects) |
static <K,V> void |
StorageUtils.initMapperJob(Job job,
java.util.Collection<WebPage.Field> fields,
java.lang.Class<K> outKeyClass,
java.lang.Class<V> outValueClass,
java.lang.Class<? extends org.apache.gora.mapreduce.GoraMapper<java.lang.String,WebPage,K,V>> mapperClass,
org.apache.gora.filter.Filter<java.lang.String,WebPage> filter) |
static java.lang.String[] |
StorageUtils.toStringArray(java.util.Collection<WebPage.Field> fields) |
Modifier and Type | Method and Description |
---|---|
java.util.Collection<WebPage.Field> |
CCIndexingFilter.getFields() |
java.util.Collection<WebPage.Field> |
CCParseFilter.getFields() |
Copyright © 2019 The Apache Software Foundation