Skip to content
This repository has been archived by the owner on Mar 31, 2023. It is now read-only.

Characterization tool for plain text files, made by KEEP SOLUTIONS.

License

Notifications You must be signed in to change notification settings

keeps/keeps-characterization-txt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⚠️ This project is no longer maintained.
For tools related to RODA, please look at https://market.roda-community.org

keeps-characterization-txt

Characterization tool for text files, made by KEEP SOLUTIONS.

Build & Use

To build the application simply clone the project and execute the following Maven command: mvn clean package
The binary will appear at target/txt-characterization-tool-1.0-SNAPSHOT-jar-with-dependencies.jar

To see the usage options execute the command:

$ java -jar target/txt-characterization-tool-1.0-SNAPSHOT-jar-with-dependencies.jar -h
usage: java -jar [jarFile]
 -f <arg> file to analyze
 -h       print this message
 -v       print this tool version
 -c <arg> minimum confidence in charset detection
 -a       accepted charset (separated with ,)

Tool Output Example

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<textCharacterizationResult>
    <validationInfo>
        <valid>false</valid>
        <validationError>The file have an acceptable encoding, but the confidence is too low (19 &lt; 50)</validationError>
    </validationInfo>
    <features/>
</textCharacterizationResult>


<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<textCharacterizationResult>
    <validationInfo>
        <valid>true</valid>
    </validationInfo>
    <features>
        <item>
            <key>numberOfLines</key>
            <value>282</value>
        </item>
        <item>
            <key>charset</key>
            <value>ISO-8859-1</value>
        </item>
    </features>
</textCharacterizationResult>

License

This software is available under the LGPL version 3 license.

About

Characterization tool for plain text files, made by KEEP SOLUTIONS.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages