public class IndexUtil
extends java.lang.Object
Constructor and Description |
---|
IndexUtil(Configuration conf) |
Modifier and Type | Method and Description |
---|---|
NutchDocument |
index(java.lang.String key,
WebPage page)
Index a
WebPage , here we add the following fields:
id: default uniqueKey for the NutchDocument .
digest: Digest is used to identify pages (like unique ID) and
is used to remove duplicates during the dedup procedure. |
public IndexUtil(Configuration conf)
public NutchDocument index(java.lang.String key, WebPage page)
WebPage
, here we add the following fields:
NutchDocument
.MD5Signature
or
TextProfileSignature
.key
- The key of the page (reversed url).page
- The WebPage
.Copyright © 2019 The Apache Software Foundation