A C D G M N O R S V

A

addListener(CrawlerListener) - Method in class org.apache.any23.plugin.crawler.SiteCrawler
Registers a CrawlerListener to this crawler.

C

Crawler - Class in org.apache.any23.cli
Implementation of a CLI crawler based on Rover.
Crawler() - Constructor for class org.apache.any23.cli.Crawler
 
CrawlerListener - Interface in org.apache.any23.plugin.crawler
Defines a listener for a SiteCrawler.
createOptions() - Method in class org.apache.any23.cli.Crawler
 

D

DEFAULT_NUM_OF_CRAWLERS - Static variable in class org.apache.any23.plugin.crawler.SiteCrawler
Default number of crawler instances.
DEFAULT_PAGE_FILTER_RE - Static variable in class org.apache.any23.plugin.crawler.SiteCrawler
 
DEFAULT_WEB_CRAWLER - Static variable in class org.apache.any23.plugin.crawler.SiteCrawler
Default crawler implementation.
defaultFilters - Variable in class org.apache.any23.plugin.crawler.SiteCrawler
Default filter applied to skip contents.
DefaultWebCrawler - Class in org.apache.any23.plugin.crawler
Default WebCrawler implementation.
DefaultWebCrawler() - Constructor for class org.apache.any23.plugin.crawler.DefaultWebCrawler
 

G

getInstance() - Static method in class org.apache.any23.plugin.crawler.SharedData
 
getMaxDepth() - Method in class org.apache.any23.plugin.crawler.SiteCrawler
 
getMaxPages() - Method in class org.apache.any23.plugin.crawler.SiteCrawler
 
getNumOfCrawlers() - Method in class org.apache.any23.plugin.crawler.SiteCrawler
 
getPattern() - Method in class org.apache.any23.plugin.crawler.SharedData
 
getPolitenessDelay() - Method in class org.apache.any23.plugin.crawler.SiteCrawler
 
getSeed() - Method in class org.apache.any23.plugin.crawler.SharedData
 
getWebCrawler() - Method in class org.apache.any23.plugin.crawler.SiteCrawler
 

M

main(String[]) - Static method in class org.apache.any23.cli.Crawler
 

N

notifyPage(Page) - Method in class org.apache.any23.plugin.crawler.SharedData
Notifies all listeners that a page has been discovered.

O

org.apache.any23.cli - package org.apache.any23.cli
TODO fillme
org.apache.any23.plugin.crawler - package org.apache.any23.plugin.crawler
TODO fillme

R

removeListener(CrawlerListener) - Method in class org.apache.any23.plugin.crawler.SiteCrawler
Deregisters a CrawlerListener from this crawler.
run(String[]) - Method in class org.apache.any23.cli.Crawler
 

S

setCrawlData(String, Pattern, List<CrawlerListener>) - Static method in class org.apache.any23.plugin.crawler.SharedData
Initializes the crawler data.
setMaxDepth(int) - Method in class org.apache.any23.plugin.crawler.SiteCrawler
Sets the maximum depth.
setMaxPages(int) - Method in class org.apache.any23.plugin.crawler.SiteCrawler
Sets the maximum collected pages.
setNumOfCrawlers(int) - Method in class org.apache.any23.plugin.crawler.SiteCrawler
Sets the number of crawler instances.
setPolitenessDelay(int) - Method in class org.apache.any23.plugin.crawler.SiteCrawler
Sets the politeness delay.
setWebCrawler(Class<? extends WebCrawler>) - Method in class org.apache.any23.plugin.crawler.SiteCrawler
Sets the actual crawler clas.
SharedData - Class in org.apache.any23.plugin.crawler
This class hosts shared data structures accessible to all the DefaultWebCrawler instances run by the SiteCrawler.
shouldVisit(WebURL) - Method in class org.apache.any23.plugin.crawler.DefaultWebCrawler
Override this method to specify whether the given URL should be visited or not.
SiteCrawler - Class in org.apache.any23.plugin.crawler
A basic site crawler to extract semantic content of small/medium size sites.
SiteCrawler(File) - Constructor for class org.apache.any23.plugin.crawler.SiteCrawler
Constructor.
start(URL, Pattern, boolean) - Method in class org.apache.any23.plugin.crawler.SiteCrawler
Starts the crawling process.
start(URL, boolean) - Method in class org.apache.any23.plugin.crawler.SiteCrawler
Starts the crawler process with the SiteCrawler.defaultFilters.
stop() - Method in class org.apache.any23.plugin.crawler.SiteCrawler
Interrupts the crawler process if started with wait flag == false.

V

visit(Page) - Method in class org.apache.any23.plugin.crawler.DefaultWebCrawler
Override this method to implement the single page processing logic.
visitedPage(Page) - Method in interface org.apache.any23.plugin.crawler.CrawlerListener
Notifies to the listener that a page has been discovered.

A C D G M N O R S V

Copyright © 2010-2012 The Apache Software Foundation. All Rights Reserved.