Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate data from 2011 till 2017 from API into Ingest. #735

Open
robouden opened this issue Nov 19, 2020 · 6 comments
Open

Migrate data from 2011 till 2017 from API into Ingest. #735

robouden opened this issue Nov 19, 2020 · 6 comments

Comments

@robouden
Copy link
Contributor

For the 10th anniversary of the Safecast it would be great if we can show all our data in Grafana.I would like to migrate the data from 2011 till the time we started using Ingest into ingest, to be able to pull with Grafana or other scripts all the data for the anniversary in one dashboard.

@matschaffer
Copy link
Contributor

Likely challenges:

  • mapping from old numeric device id to device urn
  • accounting for duplication (ttserve publishes api/ingest): maybe do it base on point in time for when ttserve introduced the feature.

@matschaffer matschaffer changed the title Migrade data from 2011 till 2017 from API into Ingest. Migrate data from 2011 till 2017 from API into Ingest. Nov 25, 2020
@matschaffer
Copy link
Contributor

matschaffer commented Dec 23, 2020

First choice: https://grafana.safecast.cc/d/DFSxrOLWk/safecast-device-details?orgId=1&from=now-5y&to=now&var-device_urn=pointcast:10009 - https://rt.safecast.org/sensors/100092

Second choice: https://grafana.safecast.cc/d/DFSxrOLWk/safecast-device-details?orgId=1&from=now-5y&to=now&var-device_urn=ngeigie:40 - https://realtime.safecast.org/sensor/40

These devices might be good choices to try to establish a continuous line since the devices have been near the Fukushima area and relatively stable over the last 10 years.

@matschaffer
Copy link
Contributor

matschaffer commented Dec 23, 2020

Let's see if we can get some data in place by Feb 11 so we have a month before the 10th anniversary to evaluate and write things up for discussion during the Feb 16th board meeting.

@matschaffer
Copy link
Contributor

matschaffer commented Dec 23, 2020

Noting that https://github.com/sakshamg94/safecast-unsupervised-anomaly-detection could maybe help here

@matschaffer
Copy link
Contributor

@matschaffer
Copy link
Contributor

So what I did here is first, a distinct query to find the API device_id numbers located in the area pointcast:10009 was in most recently.

select distinct(device_id)
from measurements
where ST_DWithin(location, ST_GeogFromText('POINT (140.978431 37.32925)'), 1000);

This yielded

42
100091
100092
100099

Then I found the point where the data started for ingest for that device (just from grafana output): 2017-09-07T08:28:24Z

Then I exported all the measurements (IntelliJ was handy for this, but easy to script):

select *
from measurements
where
      device_id in (42, 100091, 100092, 100099) and
      captured_at < '2017-09-07T08:28:24Z';

Then I imported it using the kibana data visualizer: https://5bc3d4f7330a4459881197a85659caf5.us-west-2.aws.found.io:9243/app/ml#/datavisualizer which can import and auto-discover fields for CSVs less than 100MB (this was 98M).

For the moment I just put all of it into a single api-measurements-ms-2021-02-19 index, but I think we could do the same thing to import everything that's in api's measurements table (maybe yearly indices just to keep size under control).

Once it's all in there graphing becomes just a matter of picking what you want and throwing it into grafana.

Massaging capabilities are a little limited, but getting better over time (check out runtime fields) so I think this could be a good way to catalog all the data in an explorable way.

We could also probably look at dumping all of the tables to s3+presto for folks who want to do their own massaging.

@matschaffer matschaffer removed their assignment Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants