Apache Tika Java API

The Tika Java API provides the AutoDetectParser for automatic format detection and parsing, Metadata class for reading extracted metadata fields, ContentHandler for streaming SAX-based text extraction, and Detector for MIME type identification. The facade Tika class provides a simple one-line API for text extraction from any supported format.