Package | Description |
---|---|
org.apache.nutch.protocol |
Classes related to the
Protocol interface,
see also org.apache.nutch.net.protocols . |
org.apache.nutch.protocol.file |
Protocol plugin which supports retrieving local file resources.
|
org.apache.nutch.protocol.ftp |
Protocol plugin which supports retrieving documents via the ftp protocol.
|
org.apache.nutch.protocol.http |
Protocol plugin which supports retrieving documents via the http protocol.
|
org.apache.nutch.protocol.http.api |
Common API used by HTTP plugins (
http ,
httpclient ) |
org.apache.nutch.protocol.sftp |
Protocol plugin which supports retrieving documents via the sftp protocol.
|
Modifier and Type | Method and Description |
---|---|
Protocol |
ProtocolFactory.getProtocol(java.lang.String urlString)
Returns the appropriate
Protocol implementation for a url. |
Modifier and Type | Method and Description |
---|---|
crawlercommons.robots.BaseRobotRules |
RobotRulesParser.getRobotRulesSet(Protocol protocol,
java.lang.String url) |
abstract crawlercommons.robots.BaseRobotRules |
RobotRulesParser.getRobotRulesSet(Protocol protocol,
java.net.URL url) |
Modifier and Type | Class and Description |
---|---|
class |
File
This class is a protocol plugin used for file: scheme.
|
Modifier and Type | Class and Description |
---|---|
class |
Ftp
This class is a protocol plugin used for ftp: scheme.
|
Modifier and Type | Method and Description |
---|---|
crawlercommons.robots.BaseRobotRules |
FtpRobotRulesParser.getRobotRulesSet(Protocol ftp,
java.net.URL url)
The hosts for which the caching of robots rules is yet to be done, it sends
a Ftp request to the host corresponding to the
URL passed, gets
robots file, parses the rules and caches the rules object to avoid re-work
in future. |
Modifier and Type | Class and Description |
---|---|
class |
Http |
Modifier and Type | Class and Description |
---|---|
class |
HttpBase |
Modifier and Type | Method and Description |
---|---|
crawlercommons.robots.BaseRobotRules |
HttpRobotRulesParser.getRobotRulesSet(Protocol http,
java.net.URL url)
Get the rules from robots.txt which applies for the given
url . |
Modifier and Type | Class and Description |
---|---|
class |
Sftp
This class uses the Jsch package to fetch content using the Sftp protocol.
|
Copyright © 2019 The Apache Software Foundation