meta-facts

Automatic generation of Meta-Data for a dataset

Table of Contents

Motivation
How to run the application
Project Structure
Methodology
1. Where this Library fits in the overall architecture
2. Approach to determine Meta-Data
  1. File Path
  2. Units
  3. Temporal Coverage
  4. Granularity
  5. Spatial Coverage
  6. File Formats Available
  7. Is Public Dataset

Motivation

How to run the application

Runnning Localhost

poetry run uvicorn app.main:app --reload --port 8005

Deploy app

docker compose up --build

Access Swagger Documentation

http://localhost:8005/api/docs

Project structure

Files related to application are in the app or tests directories. Application parts are:

app
├── api              - web related stuff.
│   └── routes       - web routes.
├── core             - application configuration, startup events, logging.
├── models           - pydantic models for this application.
├── services         - logic that is not just crud related.
└── main.py          - FastAPI application creation and configuration.
│
tests                  - pytest

Methodology

Approach to determine Meta-Data

Units :

General Workflow

graph LR;
  A[Dataset]-->B{Unit Column Exists ?};
  
  B -- NO --> C(RETURN Null String);
  B -- Yes --> D[Get all  unique units from UNIT Column];

  D --> E[Prepare List of all separate units];
  E --> F(RETURN all units as STRING SEPARATED WITH COMMAS)

Loading

Table of Content

Temporal Coverage :

General Workflow

flowchart LR

A(Dataset) -->  B{Year column exists ?}
B -- NO --> C(RETURN Null String) 
B -- Yes --> D[Calender / Non-Calender Year Columns]
D --> E{Years are in Sequence ?}
E -- YES --> F(RETURN string represntation of range \n example : 2012 to 2020 or \n 2012-13 to 2020-21)
E -- NO --> G(RETURN  comma separated values for all years, \n exmaple : 2012,2015,2018 or \n 2012-13, 2015-16, 2018-19)

Loading

Notes:

Determination of Temporal coverage is based on the presence of year column.
If both Calender year and Non-Calender year are presnet in dataset then priority will be given to Calender year.

Table of Content

Granulaity :

General Workflow

  flowchart LR
  A(Dataset) --> B{If any of Date-time or \nGeography columns exists ?}
  B -- No --> C(RETURN Null String)
  B -- YES -->  D[Map all Columns levels in \nSorted Order for respective Domains]
  D --> E[Map the columns groups according to \nproper naming convention Granularity]
  E --> F(RETURN Comma Separated Values of all Granularitues \n example : Quarterly, District)

Loading

Notes:

Granularity is calculated for 2 domains.
- Geography
- Date-Time
In config.py There are granularity ranks mentioned for each domain.
In config.py there are Keywords also present for Granularity if found in Datasets.

Table of Content

Spatial Coverage :

Mentioned below are the Cases for Spatial Covererage :

Spatial Location	Dataset with categories as	Methodology	Spatial Coverage
Countries	India, Pakisthan, China, etc		Country
Specific Country	India	represent it with the specific Country Name	India
States of a Country	Andhra Pradesh, Assam, etc		States of India
Regions of a country	South India, NE states etc		Regions of India
Specific State of a country	Andhra Pradesh	represent it with the specific State Name	Andhra Pradesh
Districts of a State/ States	Adilabad, Hyderabad etc		Districts of Telangana or Districts of India
Specific District of a state	Hyderabad	represent it with specific District Name	Hyderabad

General Workflow
```
  flowchart LR
  A(Dataset) --> B{If Geographical Columns exists ?}
  B -- NO --> C(RETURN Default Value as INDIA)
  B -- YES --> D[Sort the order of different \nGeographical Level]
  D --> E(RETURN Value of biggest order of Geographical Column \nwith proper naming convention)
```
Loading
Notes:
- This library currently facilitates only for Country, State and District level of Spatial Coverage.
- Mapping of levels of Geographic Columns is decided by corresponding column names and not the values, hence change in Column names will impact the mapping.
- If there is no Geographic column , then the result would be default for INDIA.
- Spatial coverage order, keyword Mapping and Naming Convention are mentioned in config.py.

Table of Content

File Formats Available :

Notes:

Reads the format of file from the file name.

Table of Content

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

meta-facts

Motivation

How to run the application

Runnning Localhost

Deploy app

Access Swagger Documentation

Project structure

Methodology

Approach to determine Meta-Data

Units :

Temporal Coverage :

Granulaity :

Spatial Coverage :

File Formats Available :

Files

README.md

Latest commit

History

README.md

File metadata and controls

meta-facts

Motivation

How to run the application

Runnning Localhost

Deploy app

Access Swagger Documentation

Project structure

Methodology

Approach to determine Meta-Data

Units :

Temporal Coverage :

Granulaity :

Spatial Coverage :

File Formats Available :