Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate accuweather data #83

Open
erensezener opened this issue Jul 14, 2016 · 6 comments
Open

Validate accuweather data #83

erensezener opened this issue Jul 14, 2016 · 6 comments
Assignees

Comments

@erensezener
Copy link
Contributor

Please download the data with _aw suffix here: https://drive.google.com/folderview?id=0BwQc_CC3arWWMTNYaEpCOHlKZmc&usp=sharing

And look at nanmax(), unique() etc of columns and the number of entries to see if it makes sense.

@erensezener
Copy link
Contributor Author

It works like this:

import h5py
import numpy as np

>>> h5 = h5py.File('hourly_database.hdf5', 'r'); data = h5['weather_data'][:]
>>> np.unique(data[:,2])
array([  0.00000000e+00,   1.00000000e+00,   4.00000000e+00,
         2.01606212e+11])

Beware that the data is padded with rows of zero from the bottom.

@denisalevi
Copy link
Contributor

I have checked the data using nanmax(), values are reasonable.
Temperature, cloud_cover and station_id are zero (since they are not included in the daily database I tested on)

I didn't get what information unique() should give me? I get only NaNs.

I have also looked through the excepted errors and fixed some of them. So if the data is used and you have time tonight, you can rerun it. But it will only add data from ~60 / 3000 html files, which are not included in the current database since they through errors before.
There are still 184 / 3000 html files which give an UnicodeDecodeError. Nothing I can do about that right now.
And there are 304 / 3000 files not included because of french city forecasts being downloaded instead of germany cities. And to my surprise that was happening for an entire month (around 28.4. - 28.5.)... So for that period there is no data.

If you run the scraper again, can you change the ex == ...Error to type(ex) == ...Error, then the error counter works properly:

    try:        
        sc_ac(date_string, city, DATAPATH)
    except Exception as ex:
        if type(ex)== AssertionError: assertion_count += 1
        elif type(ex) == UnicodeDecodeError: unicode_count += 1

@erensezener
Copy link
Contributor Author

Did you push your changes?

@erensezener
Copy link
Contributor Author

I pulled some stuff but I haven't noticed your changes on the file. Maybe I have overlooked

@denisalevi
Copy link
Contributor

I thought I pushed changes. But it only changed my accuweather/functions.py file. You didnt get those?

On 19 Jul 2016, at 12:55, C. Eren Sezener [email protected] wrote:

I pulled some stuff but I haven't noticed your changes on the file. Maybe I have overlooked


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub, or mute the thread.

@erensezener
Copy link
Contributor Author

I pulled and started running it an hour ago. So results should be ready soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants