Class | Description |
---|---|
HTMLLanguageParser |
Adds metadata identifying language of document if found We could also run
statistical analysis here but we'd miss all other formats
|
LanguageIndexingFilter |
An
IndexingFilter that adds a
lang (language) field to the document. |
Text document language identifier.
Language profiles are based on material from http://www.homepages.inf.ed.ac.uk/pkoehn/publications/europarl.ps/.
Copyright © 2019 The Apache Software Foundation