Package | Description |
---|---|
org.apache.nutch.analysis.lang |
Text document language identifier.
|
org.apache.nutch.indexer.anchor |
An indexing plugin for inbound anchor text.
|
org.apache.nutch.indexer.basic |
A basic indexing plugin, adds basic fields: url, host, title, content, etc.
|
org.apache.nutch.indexer.html |
Index raw HTML content.
|
org.apache.nutch.indexer.jsoup.extractor |
Indexing filter for jsoup-extractor plugin
|
org.apache.nutch.indexer.metadata |
Indexing filter to add document metadata to the index.
|
org.apache.nutch.indexer.more |
A more indexing plugin, adds "more" index fields:
last modified date, MIME type, content length.
|
org.apache.nutch.indexer.subcollection |
Indexing filter to assign documents to subcollections.
|
org.apache.nutch.indexer.tld |
Top Level Domain Indexing plugin.
|
org.apache.nutch.microformats.reltag |
A microformats Rel-Tag
Parser/Indexer/Querier plugin.
|
org.creativecommons.nutch |
Sample plugins that parse and index Creative Commons medadata.
|
Modifier and Type | Class and Description |
---|---|
class |
LanguageIndexingFilter
An
IndexingFilter that adds a
lang (language) field to the document. |
Modifier and Type | Class and Description |
---|---|
class |
AnchorIndexingFilter
Indexing filter that offers an option to either index all inbound anchor text
for a document or deduplicate anchors.
|
Modifier and Type | Class and Description |
---|---|
class |
BasicIndexingFilter
Adds basic searchable fields to a document.
|
Modifier and Type | Class and Description |
---|---|
class |
HtmlIndexingFilter
Add raw HTML content of a document to the index.
|
Modifier and Type | Class and Description |
---|---|
class |
JsoupIndexingFilter |
Modifier and Type | Class and Description |
---|---|
class |
MetadataIndexer
Indexer which can be configured to extract metadata from the crawldb, parse
metadata or content metadata.
|
Modifier and Type | Class and Description |
---|---|
class |
MoreIndexingFilter
Add (or reset) a few metaData properties as respective fields (if they are
available), so that they can be accurately used within the search index.
|
Modifier and Type | Class and Description |
---|---|
class |
SubcollectionIndexingFilter |
Modifier and Type | Class and Description |
---|---|
class |
TLDIndexingFilter
Adds the Top level domain extensions to the index
|
Modifier and Type | Class and Description |
---|---|
class |
RelTagIndexingFilter
An
IndexingFilter that adds tag
field(s) to the document. |
Modifier and Type | Class and Description |
---|---|
class |
CCIndexingFilter
Adds basic searchable fields to a document.
|
Copyright © 2019 The Apache Software Foundation