Directory tree metadata parser using Apache Tika
-
Updated
May 3, 2024 - Python
Directory tree metadata parser using Apache Tika
A windows service wrapper for the tika JSR 311 network server.
A doc searcher of the documents on the local host that is based on: Tika+OCR, ElasticSearch and Kibana
WORK IN PROGRESS - Dataiku DSS plugin to extract text data from documents
The simple monolithic application demonstrates: the extraction of the images of the PDF document pages using Apache Tika, the storage of the images files into the local filesystem, the display of the pages using the ngx-swiper-wrapper library.
Early Buddhist texts from the Tipitaka (Tripitaka). Suttas (sutras) with the Buddha's teachings on mindfulness, insight, wisdom, and meditation.
Information retrieval system for documents.
Extracts GPS coordinates from pdf files and Points/Polygons from kmz files to create a master kml file. 🌎
Este proyecto consiste en la construcción de un sistema de recuperación de información que puede manipular documentos de diferentes formatos provenientes de un repositorio de información. La aplicación utiliza herramientas como Lucene y Tika para indexar y extraer información de los documentos.
A Java application that uses Lucene and Tika to search document and display the document part in which the document is found.Along with precision and recall value
A Windows Installer (MSI) for the windows service wrapper of the tika JSR 311 network server.
Container-ized (Docker) GeoTopicParser-Enabled Apache Tika Server with Lucene Geo Gazetteer.
DocClusterizer is a Java desktop application designed to analyze and cluster documents based on their content similarity. The application utilizes Lucene and Tika libraries to process various file extensions such as txt, pdf, docx, and pptx.
Add a description, image, and links to the tika topic page so that developers can more easily learn about it.
To associate your repository with the tika topic, visit your repo's landing page and select "manage topics."