Skip to content

Framework to manipulate semi structured documents and extract data from them

License

Notifications You must be signed in to change notification settings

servierhub/Archery

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Archery

License: GPL v3 Maven Central Snyk security score Snyk Known Vulnerabilities Test Build Servier Inspired

A java API to manipulate semi structured documents and extract data from them.

Description

In today's data-driven landscape, navigating the complexities of semi-structured documents poses a significant challenge for organizations. These documents, characterized by diverse formats and a lack of standardization, often require specialized skills for effective manipulation and analysis. However, we propose a novel framework to address this challenge. By leveraging innovative algorithms and machine learning techniques, Archery offers a solution that gives you control over the data extraction process with tweakable and repeatable settings. Moreover, by automating the extraction process, it not only saves time but also minimizes errors, particularly beneficial for industries dealing with large volumes of such documents. Crucially, this framework integrates with machine learning workflows, unlocking new possibilities for data enrichment and predictive modeling. By leveraging determinist algorithms, this framework is perfect to prepare your data for training processes in a predictive and reproductible manner. Aligned with the paradigm of data as a service, it offers a scalable and efficient means of managing semi-structured data, thereby expanding the toolkit of data services available to organizations.

Visit our full documentation and learn more about how it works, try our tutorials and find a full list of plugins and models.

Getting Started

Dependencies

  • The Java Developer Kit, version 17.
  • Apache Maven, version 3.0 or above.

Apache Maven Installation

For more details, see the Installation Guide.

Update dependencies

Run the following command line:

mvn -DcreateChecksum=true versions:display-dependency-updates

Update pom.xml plugins

Run the following command line:

mvn -DcreateChecksum=true versions:display-plugin-updates

Build and install locally

Run the following command line:

mvn clean install

Build and deploy a snapshot to the Maven repository

Run the following command line:

mvn -P snapshot clean deploy

Build and deploy a release to the Maven repository

Run the following command line:

mvn -P release clean deploy

Build the javadoc documentation

Run the following command line:

mvn -P documentation clean site site:stage

Do not forget to configure the GitHub authentication in ~/.m2/settings.xml as follow:

<server>
    <id>github</id>
    <password>PERSONAL_TOKEN_CLASSIC</password>
</server>

Documentation

The following links will give you documentation about some background information, takes you through some implementation details, and then focuses on step-by-step instructions for getting the most out of Archery:

  • Using Archery: here.
  • API Reference: here.

Contribute

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

Authors

Version History

  • 2.37
  • ...
  • Initial Release

About

Framework to manipulate semi structured documents and extract data from them

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 93.8%
  • TeX 5.0%
  • Other 1.2%