You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A scenario we're looking to use the Filesystem Collector on is to crawl a collection of textual metadata files on the file system (one file per document) - we can use taggers in the preparsehandlers to extract this text as document metadata. However, each record can (though not always) reference an external file path to the actual document file which we'd want to undergo parsing by the document parser.
Is there an easy way through configuration to route this external document file to the parser for parsing so that the metadata record and document content are effectively combined?
The text was updated successfully, but these errors were encountered:
I do not think there is an out-of-the-box way to do this. If you know your Java, here is a suggestion:
Implement a IFileDocumentProcessor and add it as an entry under <postImportProcessors>.
In your document processor, you will have a FileDocument argument that will contain your file metadata and content. Get the path of the child document you want to merge. From that, use the FileSystemManager argument to fetch it and call the Importer module explicitly to parse the target document and merge it yourself. Not the most trivial thing, but that is the only option that comes to mind right now.
A scenario we're looking to use the Filesystem Collector on is to crawl a collection of textual metadata files on the file system (one file per document) - we can use taggers in the preparsehandlers to extract this text as document metadata. However, each record can (though not always) reference an external file path to the actual document file which we'd want to undergo parsing by the document parser.
Is there an easy way through configuration to route this external document file to the parser for parsing so that the metadata record and document content are effectively combined?
The text was updated successfully, but these errors were encountered: