Skip to content

Latest commit

 

History

History
75 lines (56 loc) · 3.03 KB

README.md

File metadata and controls

75 lines (56 loc) · 3.03 KB

-------------------------------------------------(b)-------------------------------------------------

  • Yearly Analysis: We can notice that prices mostly increase till the month of november and suddenly drops in the last month.
  • Probable reasons are :

    1. In december supply increase due to which price falls.
    2. There are many factors behind the increase in prices(Weather, Soil condition).
  • Quarterly Analysis
    1. If we see quarter wise theres not much difference in 1st and 2nd quarter but theres a substantial difference in 3rd quater i.e
      During the months of July, August and September prices are the highest
  • Analysis regarding variety, Market and Prices
    1. We can notice that Fatehpur and samsabad market mostly sell local variety while others sell Desi
    -------------------------------------------------(c)-------------------------------------------------
  • Data Pre-processing techniques
    1. Checking for missing values
    2. Converting categorical values to numeric
    3. Scaling values(normalization/standardization) it'll help converge the values faster while building models
    4. After analysing(refer Price Trends notebook) I figured that theres one data point of variety potato in achmera market and one of other in jagnair so we need to take care of those points either we can remove them or change there variety by the top variety which is Desi.
  • Relevant Features
    1. Commodity
    2. Variety because different varieties have different prices
    3. Time
    4. Market Name because prices differ market to market
  • ML problem
  • -As time is the greatest factor here we can treat this problem as a forecasting problem i.e using time as our independent variable we can forecast prices for future.

    Why Forecasting?
    -Prices depends on time and the data is time-series so, it'll be easier to forecast the prices using below methods.

    1. Classic ARIMA/SARIMA
    2. Deep Learning(RNN or LSTM)
  • Target Variable
    1. Modal Price
  • Loss function
  • -As the data is linear
    1. We could use mean squared error(MSE) because it would directly give us the difference between predicted/forecasted and real prices.

    ------------------------------------------Uploaded file Info-----------------------------------------

    1. crawler.py - It consists the scrawler but it only scapres table from first page.
    2. crawler.ipynb- step by step explanation of what the crawler does.
    3. (b) price trends analysis- It consists of data and visualization from which the observations are derived
    4. I was not able to scrape the tables from each page(But I am willing to learn how to do it) so, instead I took the data from export to csv option and did the analysis.

    PS: I have used references from medium and pluralsight articles to build the crawler.