ETL App for PDF Data Extraction and Transformation

Overview

This Streamlit application, named "ETL App," facilitates the extraction, transformation, and export of data from PDF files. It leverages Python libraries such as pypdf, pdfplumber, pandas, and streamlit to achieve these tasks efficiently. The app is designed to process PDF files containing structured data, extract relevant information, transform it into a structured format, and export the transformed data to a CSV file.

Features

File Upload: Users can upload PDF files directly through the Streamlit interface.
Data Extraction: The application traverses through each page of the uploaded PDF file to extract specific data points such as observed mass, sample positions, and FLP UV % area.
Data Transformation: Extracted data is organized into a Pandas DataFrame (df), where additional transformations such as handling NaN values and merging with supplementary data (e.g., sample positions) are performed.
Data Sorting: The application includes a custom sorting logic to sort the DataFrame (df_new) based on a specified column (Sample Position) in a structured format.
CSV Export: Once the data is processed and transformed, it is exported into a CSV file (Updated_plate_2.csv) for further analysis or integration with other systems.

Dependencies

Ensure you have the following Python libraries installed:

pypdf
pdfplumber
pandas
streamlit

You can install these dependencies using pip:

pip install pypdf pdfplumber pandas streamlit

Usage

Clone Repository:
```
git clone <repository_url>
cd ETL-App
```
Install Dependencies:

Ensure all dependencies are installed as mentioned above.
Run the Application:

Start the Streamlit application locally:
```
streamlit run streamlit_app.py
```
Upload a PDF File:
- Click on "Choose a file" and select a PDF file containing structured data.
- The application will automatically process the uploaded PDF file.
View Results:
- The extracted and transformed data will be displayed in a sorted format on the Streamlit interface.
- The CSV file (Updated_plate_2.csv) will be downloaded automatically, containing the processed data.

Contributing

Contributions to improve the application's functionality or fix issues are welcome. Fork the repository, make your changes, and submit a pull request for review.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.devcontainer		.devcontainer
.github		.github
.gitignore		.gitignore
LICENSE		LICENSE
Pdf-to-csv.py		Pdf-to-csv.py
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETL App for PDF Data Extraction and Transformation

Overview

Features

Dependencies

Usage

Contributing

License

About

Releases

Packages

Languages

License

yugmint/ETL-Transformation

Folders and files

Latest commit

History

Repository files navigation

ETL App for PDF Data Extraction and Transformation

Overview

Features

Dependencies

Usage

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages