Skip to content

Commit

Permalink
Added Readme markdown file
Browse files Browse the repository at this point in the history
  • Loading branch information
kimaina committed Jun 27, 2017
1 parent 4447384 commit 8418e73
Show file tree
Hide file tree
Showing 2 changed files with 185 additions and 0 deletions.
185 changes: 185 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
# OpenMRS ETL
> ETL (Extract, Transform and Load) is a process in data warehousing responsible for pulling data out of the source systems and placing it into a data warehouse. For more information about ETL go to http://datawarehouse4u.info/ETL-process.html
This project is responsible for the first 2 layers of ETL i.e ET (Extract, Transform) while L (Load) is handled by a different project (https://github.com/AMPATH/etl-rest-server)
The diagram below shows you the entire lifecycle of openmrs data from Extraction to Transformation to Loading
![](model.png)

## Getting Started

Create an empty database called 'etl'
```sh
create database etl;
```

clone the repository and run this command
```sh
cd /etl/etl/
git pull origin master
```

Open up a screen(sudo apt-get install screen) using the below command:
```sh
screen -r
```
Run this commands sequentially in that order (please use latest version of each script. See etl/database_updates/update_flat_tables_and_calculated_tables)
Each scripts takes a couple of seconds to hours depending with the amount of data in openmrs db
```sh
mysql etl < ../flat_tables/flat_obs_v1.3.sql
mysql etl < ../flat_tables/flat_orders_v1.0.sql
mysql etl < ../flat_tables/flat_lab_obs_v1.2.sql
mysql etl < ../calculated_tables/hiv_summary_v2.12.sql
mysql etl < ../calculated_tables/labs_and_imaging_v2.5.sql
mysql etl < ../calculated_tables/vitals_v2.0.sql
mysql etl < ../calculated_tables/hiv_vl_summary_v1.0.sql
mysql etl < ../calculated_tables/pep_summary_v1.0.sql
mysql etl < ../calculated_tables/defaulters_v2.3.sql

```

After all the tasks are completed, schedule the syncing process.
## Scheduling Syncing Process
The syncing process allows ETL process to run after every 5 minutes or (x minutes). This facilitate access of data in realtime (5 minutes delay) as data are inserted and updated on base db (openmrs/amrs)
Run the following commands
```sh
screen -r
cd /etl/etl/
git pull origin master
chmod +x database_updates/*
crontab -e
```
when you run crontab -e a nano editor will open up. paste this code

```sh
37 5-20 * * 1-5 ~/etl/etl/database_updates/update_calculated_tables_hourly
0-59/5 5-20 * * 1-5 /home/etl/etl/etl/database_updates/update_flat_tables_and_calculated_tables > etl_log
```
to understand more about cronjobs, please see https://help.ubuntu.com/community/CronHowto

## Contributing

1. Fork it!
2. Create your feature branch: `git checkout -b my-new-feature`
> While making changes, If you add any new column to any flat table, please remember to create a new script file. Don't modify the current version.
> You also need to test and verify your result using test server by running changed scripts individually
3. Commit your changes: `git commit -am 'Add some feature'`
4. Push to the branch: `git push origin my-new-feature`
5. Submit a pull request :D
6. Rebuild the changes
* Log in to ETL server
* Before rebuilding please ensure you shut down the syncing process (cronjob). Here is how to do this:
```sh
screen -r
crontab -e
```
* then comment out these lines using #:
```sh
#37 5-20 * * 1-5 ~/etl/etl/database_updates/update_calculated_tables_hourly
#0-59/5 5-20 * * 1-5 /home/etl/etl/etl/database_updates/update_flat_tables_and_calculated_tables > etl_log
```
* After shutting down cronjob, you need to backup the previous version of flat table before rebuilding it afresh by:
```sh
mysql -u xxx
ALTER TABLE `etl`.`flat_hiv_summary` RENAME TO `etl`.`flat_hiv_summary_20_05_2017` ;
DROP TABLE `etl`.`flat_hiv_summary`;
```
* After Backing up, run the rebuilding process by using these commands (this is an example for edited hiv_summary_v2.x.sql)
```sh
screen -r
cd /etl/etl/
git pull origin master
chmod +x database_updates/*
mysql etl < ../calculated_tables/hiv_summary_v2.x.sql
```
* Finally after rebuilding processes, turn back on syncing by uncommenting the 2 lines.

## Monitoring and logging
All successful builds and syncing metadata are stored in a table called flat_log
```sh
select * from etl.flat_log where table_name = 'flat_obs_v1.2' order by seconds_to_complete desc limit 50
```
## Backing Up
You always need to back up flat tables before any rebuilding process. Anything can happen!
```sh
ALTER TABLE `etl`.`flat_orders` RENAME TO `etl`.`flat_orders_backup_20_05_2017` ;
```
# Release History
## Flat Obs Release History
* v1.0 Notes:
* added encounter types: 1,2,3,4,5,6,7,8,9,10,13,14,15,17,19,22,23,26,43,47,21
* Replace flat_obs with flat_obs_name
* Replace concept_id in () with concept_id in (obs concept_ids)
* Add column definitions
* Add obs_set column definitions
* v1.1 Notes:
* Added visit_id. This makes it easier to query for visits related indicators
* v1.3 Notes:
* Added updated encounter tracking when updating flat_obs
* Removed voided patients data from flat_obs
## Flat HIV Summary Release History
* v2.1 Notes:
* Updated out_of_care to include untraceable
* Added tb_prophylaxis_start_date
* Updated patient_care_status to be more inclusive of other status questions
* Fixed problem with next_clinic_datetime_hiv
* v2.2 Notes:
* Added encounter types for GENERALNOTE (112), CLINICREVIEW (113), MOH257BLUECARD (114), HEIFOLLOWUP (115), TRANSFERFORM (116)
* v2.3 Notes:
* Added arv_first_regimen_start_date and arv_start_location. This makes it easier to query for the cumulative ever indicator
* Added visit_id This makes it easier to query visits related indicators eg scheduled, unscheduled
* Added prev_clinical_rtc_date_hiv and next_clinical_rtc_date_hiv this required for creating outreach dataset
* v2.4 Notes:
* Corrected Errors with definations for vl_1 and vl_2
* v2.5 Notes:
* Changed the definition arv_start_date to account for regimen changes
* Added prev_arv_start_date track when they stared the previous regimen
* Added prev_arv_end_date to track when they stopped the previous regimen
* Added prev_arv_line to track the arv line the patient was previously on
* Added prev_arv_meds to track the arv medications the patient was previously on
* Fixed arv_start_date and arv_first_regimen_start_date
* v2.7 Notes:
* This indicators were added inorder to produce vl supression assesment report
* Added hiv_status_disclosed indicator
* Added prev_arv_adherence, cur_arv_adherence indicators
* Added prev_vl_1_date and prev_vl_1
* fixed cur_clinic_datetime to include NONCLINICALENCOUNTER Encounter type
* v2.8 Notes:
* Removed prev_vl_1_date and prev_vl_1
* Fixed vl_2 and vl_2_date
* v2.9 Notes:
* Added tb_prophylaxis_end_date and modified tb_prophylaxis_start_date
* fixed tb_tx_start_date defintion
* v2.10 Notes:
* added encounter type 120 to is_clinical_encounter
* added encounter type index so that flat_defaulters can execute much faster
* added tb_screening_result indicator
* added index for location_uuid_rtc_date
* added hiv_exposed_occupational and pep_start_date
* v2.11 Notes:
* Added encounter date_change tracking to update patients whose encounter
* data changes without affecting obs or orders eg a change in encounter location
* removed PEP indicators (hiv_exposed_occupational and pep_start_date) from this flat table: this was affecting hiv active in care.
* v2.12 Notes:
* added encounter type 127 and 128 to is_clinical_encounter
* Added Concept TRANSFER CARE TO OTHER CENTER to transfer_out indicator
* Added ability to rebuild/sync at any day of the week
* Added ability to continue rebuilding in case of an error
* Renamed new_data_person_ids table to flat_hiv_summary_queue and made it permanent
* Added 1594 (PATIENT TRANSFERRED OUT) to transfer_out indicator
Binary file added model.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 8418e73

Please sign in to comment.