Is there any processing method to exclude <rPh> Tag from sharedStrings.xml in Crawled xlsx File #74

ki-suzuki · 2023-09-21T04:19:35Z

The contents of the sharedStrings.xml file in the target xlsx file for crawling are as follows.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="8" uniqueCount="8"><si><t>月日</t><rPh sb="0" eb="2"><t>ガッピ</t></rPh><phoneticPr fontId="2"/></si><si><t>会社名</t><rPh sb="0" eb="3"><t>カイシャメイ</t></rPh><phoneticPr fontId="2"/></si><si><t>金額</t><rPh sb="0" eb="2"><t>キンガク</t></rPh><phoneticPr fontId="2"/></si><si><t>支払日</t><rPh sb="0" eb="3"><t>シハライビ</t></rPh><phoneticPr fontId="2"/></si><si><t>締日</t><rPh sb="0" eb="2"><t>シメビ</t></rPh><phoneticPr fontId="2"/></si><si><t>S社</t><rPh sb="1" eb="2"><t>シャ</t></rPh><phoneticPr fontId="2"/></si><si><t>A社</t><rPh sb="1" eb="2"><t>シャ</t></rPh><phoneticPr fontId="2"/></si><si><t>B社</t><rPh sb="1" eb="2"><t>シャ</t></rPh><phoneticPr fontId="2"/></si></sst>

What I ultimately want to obtain is the content excluding the tag.
(What i want to do is to remove something like <rPh sb="0" eb="2"><t>キンガク</t></rPh>)
Is there any processing method available? I would appreciate your help very much if you could assist me.

The text was updated successfully, but these errors were encountered:

sakanaosama · 2023-10-25T18:58:34Z

Following the import process and content extraction, all tags are removed. Nonetheless, if you wish to exclude specific content based on these tags, you must work at the 'preParseHandlers' level under "Importer," where all the tags are still preserved before extraction. You can find more information about this configuration in the documentation at https://opensource.norconex.com/importer/v2/configuration#tbl-transformer. You can achieve this using the 'ReduceConsecutivesTransformer' or by implementing a custom script using the 'ScriptTransformer.'

ohtwadi assigned GlimpseVanodiya Oct 13, 2023

ohtwadi assigned sakanaosama and unassigned GlimpseVanodiya Oct 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there any processing method to exclude <rPh> Tag from sharedStrings.xml in Crawled xlsx File #74

Is there any processing method to exclude <rPh> Tag from sharedStrings.xml in Crawled xlsx File #74

ki-suzuki commented Sep 21, 2023 •

edited

Loading

sakanaosama commented Oct 25, 2023

Is there any processing method to exclude <rPh> Tag from sharedStrings.xml in Crawled xlsx File #74

Is there any processing method to exclude <rPh> Tag from sharedStrings.xml in Crawled xlsx File #74

Comments

ki-suzuki commented Sep 21, 2023 • edited Loading

sakanaosama commented Oct 25, 2023

ki-suzuki commented Sep 21, 2023 •

edited

Loading