public interface ScoringFilter extends Configurable, FieldPluggable
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
X_POINT_ID
The name of the extension point.
|
Modifier and Type | Method and Description |
---|---|
void |
distributeScoreToOutlinks(java.lang.String fromUrl,
WebPage page,
java.util.Collection<ScoreDatum> scoreData,
int allCount)
Distribute score value from the current page to all its outlinked pages.
|
float |
generatorSortValue(java.lang.String url,
WebPage page,
float initSort)
This method prepares a sort value for the purpose of sorting and selecting
top N scoring pages during fetchlist generation.
|
float |
indexerScore(java.lang.String url,
NutchDocument doc,
WebPage page,
float initScore)
This method calculates a Lucene document boost.
|
void |
initialScore(java.lang.String url,
WebPage page)
Set an initial score for newly discovered pages.
|
void |
injectedScore(java.lang.String url,
WebPage page)
Set an initial score for newly injected pages.
|
void |
updateScore(java.lang.String url,
WebPage page,
java.util.List<ScoreDatum> inlinkedScoreData)
This method calculates a new score during table update, based on the values
contributed by inlinked pages.
|
getConf, setConf
getFields
void injectedScore(java.lang.String url, WebPage page) throws ScoringFilterException
url
- url of the pagepage
- new page. Filters will modify it in-place.ScoringFilterException
void initialScore(java.lang.String url, WebPage page) throws ScoringFilterException
url
- url of the pagepage
- ScoringFilterException
float generatorSortValue(java.lang.String url, WebPage page, float initSort) throws ScoringFilterException
url
- url of the pagepage
- WebPage
object relative to the URLinitSort
- initial sort value, or a value from previous filters in chainScoringFilterException
void distributeScoreToOutlinks(java.lang.String fromUrl, WebPage page, java.util.Collection<ScoreDatum> scoreData, int allCount) throws ScoringFilterException
fromUrl
- url of the source pagescoreData
- A list of ScoreDatum
allCount
- number of all collected outlinks from the source pageScoringFilterException
void updateScore(java.lang.String url, WebPage page, java.util.List<ScoreDatum> inlinkedScoreData) throws ScoringFilterException
url
- url of the pagepage
- WebPage
object relative to the URLinlinkedScoreData
- list of ScoreDatum
s for all inlinks pointing to
this URL.ScoringFilterException
float indexerScore(java.lang.String url, NutchDocument doc, WebPage page, float initScore) throws ScoringFilterException
url
- url of the pagedoc
- document. NOTE: this already contains all information collected by
indexing filters. Implementations may modify this instance, in
order to store/remove some information.initScore
- initial boost value for the Lucene document.ScoringFilterException
Copyright © 2019 The Apache Software Foundation