Skip to content
/ exide Public

Information extraction API for presentation documents

License

Notifications You must be signed in to change notification settings

youakrim/exide

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

exipe

Exipe is a Python API for information extraction from presentation documents.

The API is currently in developement and bugs are likely to occur.

Implemented features :

  • Slide title extraction
  • Slide body text extraction
  • Named-entities recognition (unaccurate)
  • Emphasized text recognition
  • URLs recognition
  • Structure detection and outline generation
  • Recognition of the following silde types :
    • Introduction
    • Conclusion
    • Definition
    • Example
    • Table of contents
    • References
    • Section header

Note : slide types can be added by editing the datatypes/types file.

Install exipe

cd to the root of the exipe package directory and then :

sudo pip install .

Notes

For now the API works only with Office Open XML Presentation files (PPTX) and OpendDocument Presentation files (ODP). It uses python-pptx and NLTK librairies.

About

Information extraction API for presentation documents

Resources

License

Stars

Watchers

Forks

Packages

No packages published