- TaeHyoung Kim (s0dkhim)
- YunAh Baek (yunah0515)
- WoongKooK Seo (woongkook)
- sales data for all stores & dates in the training set
- stores & dates for forecasting (missing 'units', which you must predict)
- the relational mapping between stores and the weather stations that cover them
- a file containing the NOAA weather information for each station and day
Index | Feature | Description |
---|---|---|
1 | units | the quantity sold of an item on a given day (Target) |
2 | date | the day of sales or weather |
3 | store_nbr | an id representing one of the 45 stores |
4 | station_nbr | an id representing one of 20 weather stations |
5 | item_nbr | an id representing one of the 111 products |
6 | tmax | maximum degrees Fahrenheit |
7 | tmin | minimum degrees Fahrenheit |
8 | tavg | average degrees Fahrenheit |
9 | depart | departure from normal |
10 | dewpoint | average dew point |
11 | wetbulb | average wet bulb |
12 | heat | heating (season begins with July) |
13 | cool | cooling (season begins with January) |
14 | sunrise | sunrise (calculated, not observed) |
15 | sunset | sunset (calculated, not observed) |
16 | codesum | significant weather types (weather phenomena) |
17 | snowfall | snowfall (inches an tenths) T = Trace M = Missing data |
18 | preciptotal | water equivalent (inches and hundredths) T = Trace M = Missing data |
19 | stnpressure | average station pressure |
20 | sealevel | average sea level pressure |
21 | resultspeed | resultant wind speed |
22 | resultdir | resultant wind direction |
23 | avgspeed | average wind speed |
- Missing values : Assigning with the most recent value
- Excluding unit 0
- Weather table with codesum removing and missing data processing
- Adding holiday and other variables
- The closer the value is to zero, the less distortion
- Normalization of target data
- Categorical variable analysis
- Numerical variables analysis: select 9 out of 17
- Multicollinearity
- Selecting the most influential 9 numerical variables
- VIF
- Modeling fuction
- OLS (Ordinary Least Squares)
- Modeling by each store : remove outliers
- Total Teams : 485 teams
- Final Score : 0.51053
- Leaderboard : 361 / 485
- New feature selection
- Modeling
- Score