SRN Documents Statistics Repository

Overview

Welcome to the srn_docs_stat repository, a tool created to streamline the process of downloading data from the SRN documents database API and performing simple summary statistical analyses on the acquired data. This repository contains a Python Jupyter Notebook for gaining insights into the dataset's characteristics.

Functionality

The repository provides range of tests and features:

1. Selective Downloads

Implement a feature to exclusively download files that are not already present in your local directory, enabling updates as the database continues to evolve.

2. Failed Downloads List

Identify and compile records of files that were unable to be downloaded, along with accompanying error messages.

3. Filetype Classification and Renaming

Categorize the downloaded files based on their filetypes and optionally append appropriate suffixes to the local file names.

4. Filetype Frequency summary

Analyze the distribution of filetypes among the locally downloaded files. Helps to understand the data composition.

5. Missing Files and Company Linkage

indicate connections between missing files and their corresponding companies, offering insights into companies with mainly missing files.

6. Year Frequency Distribution

Evaluate the frequency distribution of years within the downloaded dataset, aiding in identifying temporal distribution.

7. Summary Statistics about .pdf files

max, min, median and average Pages of .pdf files

Output and Data

The repository's "output" folder contains a snapshot of the tables generated by the code. When running the code, the up-to-date output tables are safed here.

The "data" folder contains the actual documents that are downloaded from the SRN API. By default and before running the code, this folder is empty.

Usage

To get started, simply clone this repository and follow the provided instructions in the documentation. This repository seeks to help understand the power and limitations of the SRN documents data.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
output		output
README.md		README.md
srn_docs_stat.ipynb		srn_docs_stat.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SRN Documents Statistics Repository

Overview

Functionality

1. Selective Downloads

2. Failed Downloads List

3. Filetype Classification and Renaming

4. Filetype Frequency summary

5. Missing Files and Company Linkage

6. Year Frequency Distribution

7. Summary Statistics about .pdf files

Output and Data

Usage

License

About

Releases

Packages

Languages

trr266/srn_docs_stat

Folders and files

Latest commit

History

Repository files navigation

SRN Documents Statistics Repository

Overview

Functionality

1. Selective Downloads

2. Failed Downloads List

3. Filetype Classification and Renaming

4. Filetype Frequency summary

5. Missing Files and Company Linkage

6. Year Frequency Distribution

7. Summary Statistics about .pdf files

Output and Data

Usage

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages