Skip to content
/ exide Public

Information extraction API for presentation documents

License

Notifications You must be signed in to change notification settings

youakrim/exide

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

exide

Exide is a Python API for information extraction from presentation documents.

The API is currently in developement and bugs are likely to occur.

Install exide

Download the repository, cd to the root of the exide package directory and then type the following:

sudo su
pip install .
python
import nltk
nltk.download("all")

then press enter. Exide is now installed on your system.

Implemented features :

  • Slide title extraction
  • Slide body text extraction
  • Named-entities recognition (unaccurate)
  • Emphasized text recognition
  • URLs recognition
  • Structure detection and outline generation
  • Recognition of the following silde types :
    • Introduction
    • Conclusion
    • Definition
    • Example
    • Table of contents
    • References
    • Section header

Note : slide types can be added by editing the datatypes/types file.

Supported file types

  • Office Open XML Presentations (PPTX)
  • OpenDocument Presentations (ODP)
  • LaTeX beamer Presentations

Simple example

The following code show you how to print the outline string of the file my_presentation.pptx.

from exide.parse import parse
exide_presentation = parse("my_folder/my_presentation.pptx")
print(exide_presentation.outline)

More examples here.

Documentation

Exide documentation is available here.

About

Information extraction API for presentation documents

Resources

License

Stars

Watchers

Forks

Packages

No packages published