Skip to content

Commit

Permalink
Merge branch 'master' into dependabot/maven/org.bouncycastle-bcmail-j…
Browse files Browse the repository at this point in the history
…dk15on-1.70
  • Loading branch information
jazzido authored Jul 17, 2024
2 parents 9cab20f + 2ef079f commit 3dbcb03
Show file tree
Hide file tree
Showing 29 changed files with 597 additions and 499 deletions.
23 changes: 23 additions & 0 deletions .github/workflows/tests-windows.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
name: Java CI (Windows)

on: [push]

jobs:
build:
runs-on: windows-latest

steps:
# https://github.com/actions/checkout/issues/135#issuecomment-602171132
- name: Set git to use LF
run: |
git config --global core.autocrlf false
git config --global core.eol lf
- uses: actions/checkout@v3
- name: Set up JDK 11
uses: actions/setup-java@v3
with:
java-version: '11'
distribution: 'adopt'
cache: maven
- name: Build with Maven
run: mvn --batch-mode test
18 changes: 18 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
name: Java CI

on: [push, pull_request]

jobs:
build:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3
- name: Set up JDK 11
uses: actions/setup-java@v3
with:
java-version: '11'
distribution: 'adopt'
cache: maven
- name: Build with Maven
run: mvn --batch-mode test
9 changes: 0 additions & 9 deletions .travis.yml

This file was deleted.

46 changes: 44 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
tabula-java [![Build Status](https://travis-ci.org/tabulapdf/tabula-java.svg?branch=master)](https://travis-ci.org/tabulapdf/tabula-java) [![Build status](https://ci.appveyor.com/api/projects/status/l5gym1mjhrd2v8yn?svg=true)](https://ci.appveyor.com/project/jazzido/tabula-java)
tabula-java [![Build Status](https://travis-ci.org/tabulapdf/tabula-java.svg?branch=master)](https://travis-ci.org/tabulapdf/tabula-java)
===========

`tabula-java` is a library for extracting tables from PDF files — it is the table extraction engine that powers [Tabula](http://tabula.technology/) ([repo](http://github.com/tabulapdf/tabula)). You can use `tabula-java` as a command-line tool to programmatically extract tables from PDFs.
Expand All @@ -9,7 +9,7 @@ tabula-java [![Build Status](https://travis-ci.org/tabulapdf/tabula-java.svg?bra

Download a version of the tabula-java's jar, with all dependencies included, that works on Mac, Windows and Linux from our [releases page](../../releases).

## Usage Examples
## Commandline Usage Examples

`tabula-java` provides a command line application:

Expand Down Expand Up @@ -75,11 +75,53 @@ You can also integrate `tabula-java` with any JVM language. For Java examples, s

JVM start-up time is a lot of the cost of the `tabula` command, so if you're trying to extract many tables from PDFs, you have a few options for speeding it up:

- the -b option, which allows you to convert all pdfs in a given directory
- the [drip](https://github.com/ninjudd/drip) utility
- the [Ruby](http://github.com/tabulapdf/tabula-extractor), [Python](https://github.com/chezou/tabula-py), [R](https://github.com/leeper/tabulizer), and [Node.js](https://github.com/ezodude/tabula-js) bindings
- writing your own program in any JVM language (Java, JRuby, Scala) that imports tabula-java.
- waiting for us to implement an API/server-style system (it's on the [roadmap](https://github.com/tabulapdf/tabula-api))

## API Usage Examples

A simple Java code example which extracts all rows and cells from all tables of all pages of a PDF document:

```java
InputStream in = this.getClass().getResourceAsStream("my.pdf");
try (PDDocument document = PDDocument.load(in)) {
SpreadsheetExtractionAlgorithm sea = new SpreadsheetExtractionAlgorithm();
PageIterator pi = new ObjectExtractor(document).extract();
while (pi.hasNext()) {
// iterate over the pages of the document
Page page = pi.next();
List<Table> table = sea.extract(page);
// iterate over the tables of the page
for(Table tables: table) {
List<List<RectangularTextContainer>> rows = tables.getRows();
// iterate over the rows of the table
for (List<RectangularTextContainer> cells : rows) {
// print all column-cells of the row plus linefeed
for (RectangularTextContainer content : cells) {
// Note: Cell.getText() uses \r to concat text chunks
String text = content.getText().replace("\r", " ");
System.out.print(text + "|");
}
System.out.println();
}
}
}
}
```


For more detail information check the Javadoc.
The Javadoc API documentation can be generated (see also '_Building from Source_' section) via

```
mvn javadoc:javadoc
```

which generates the HTML files to directory ```target/site/apidocs/```

## Building from Source

Clone this repo and run:
Expand Down
21 changes: 0 additions & 21 deletions appveyor.yml

This file was deleted.

Loading

0 comments on commit 3dbcb03

Please sign in to comment.