This Streamlit application, named "ETL App," facilitates the extraction, transformation, and export of data from PDF files. It leverages Python libraries such as pypdf, pdfplumber, pandas, and streamlit to achieve these tasks efficiently. The app is designed to process PDF files containing structured data, extract relevant information, transform it into a structured format, and export the transformed data to a CSV file.
-
File Upload: Users can upload PDF files directly through the Streamlit interface.
-
Data Extraction: The application traverses through each page of the uploaded PDF file to extract specific data points such as observed mass, sample positions, and FLP UV % area.
-
Data Transformation: Extracted data is organized into a Pandas DataFrame (
df), where additional transformations such as handling NaN values and merging with supplementary data (e.g., sample positions) are performed. -
Data Sorting: The application includes a custom sorting logic to sort the DataFrame (
df_new) based on a specified column (Sample Position) in a structured format. -
CSV Export: Once the data is processed and transformed, it is exported into a CSV file (
Updated_plate_2.csv) for further analysis or integration with other systems.
Ensure you have the following Python libraries installed:
pypdfpdfplumberpandasstreamlit
You can install these dependencies using pip:
pip install pypdf pdfplumber pandas streamlit-
Clone Repository:
git clone <repository_url> cd ETL-App
-
Install Dependencies:
Ensure all dependencies are installed as mentioned above.
-
Run the Application:
Start the Streamlit application locally:
streamlit run streamlit_app.py
-
Upload a PDF File:
- Click on "Choose a file" and select a PDF file containing structured data.
- The application will automatically process the uploaded PDF file.
-
View Results:
- The extracted and transformed data will be displayed in a sorted format on the Streamlit interface.
- The CSV file (
Updated_plate_2.csv) will be downloaded automatically, containing the processed data.
Contributions to improve the application's functionality or fix issues are welcome. Fork the repository, make your changes, and submit a pull request for review.
This project is licensed under the MIT License.