Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predict fails with "cannot allocate vector of size..." with large number of prediction points #215

Open
moredatapls opened this issue Jun 8, 2017 · 6 comments

Comments

@moredatapls
Copy link

Hi,
I am currently trying to use Prophet to predict byte flows in a network. The data set has the following properties:

  • 4 periods (1 period = 1 week)
  • minute granularity
  • 44355 data points

The first couple of lines look as follows, with y being the number of bytes monitored in the network:

# ds y
1 2016-08-18 19:00:00 8476391449
2 2016-08-18 19:01:00 6432109555
3 2016-08-18 19:02:00 4039378069
4 2016-08-18 19:03:00 4124949796
5 2016-08-18 19:04:00 6911448995
6 2016-08-18 19:05:00 4021931960

I wrote some simple R code to fit the model and predict a single period:

fit <- prophet(data)
future <- make_future_dataframe(fit, periods = 1)
forecast <- predict(fit, future)
plot(fit, forecast)

The model fitting works, however, during the prediction, R tries to allocate more and more memory until both Mem and Swp are fully allocated (I have a machine with 32GB of RAM, Debian 8.8 Jessie, R 3.3.3). Meanwhile, a single CPU core is at 100% load. The program will eventually crash with the following error:

Error: cannot allocate vector of size 242.4 Mb
Execution halted
Warning message:
system call failed: Cannot allocate memory

I've done the same thing with the Holt-Winters prediction that's built into R already which works flawlessly and uses around 300MB of RAM.

Any ideas what the problem is here? Could it be the granularity?

@bletham
Copy link
Contributor

bletham commented Jun 13, 2017

Prophet right now uses the Date data type in R, and as such only works with daily data. In this case I expect that R is converting the timestamps to all just be the same day, and might give wacky results (even if it did work). If you are willing to work with daily predictions you'd want to aggregate to daily data before passing it to Prophet. Hopefully in the near future there will be support for finer granularity data (#29), it's mostly a matter of getting rid of all of the as.Date.

That said, this particular issue may remain. I haven't tried to evaluate a forecast on that many datapoints. For uncertainty estimation it will construct a ~ndata x 1000 matrix which I suspect is the issue. This is something we will have to make more efficient and not hold in RAM for finer grained data.

@bletham
Copy link
Contributor

bletham commented Jul 30, 2017

Support for sub-daily data (#29) is now finished in the v0.2 branch, and so this issue needs to be addressed.

The issue is in calculating the lower and upper uncertainty intervals for predictions. Suppose we are making predictions on a dataframe with n_dates rows (datetimes for predictions). There is a Prophet argument uncertainty_samples that controls how many draws of future trends / posterior parameter values are used to estimate the uncertainty intervals. Prophet constructs a n_dates x uncertainty_samples matrix that is filled with these predictions. Most daily datasets will have n_dates at most a few thousand, but for sub-daily data we can easily have many thousands of datetimes in the history, and so when we (by default) make predictions on all of them and construct this n_dates x 1000 matrix, we can have RAM issues.

It seems the right approach here is to chunk the prediction dataframe (so, n_dates) and compute the percentiles separately for each chunk. I think that rather than having Prophet do this, we can just catch this error (or the corresponding MemoryError in Python) inside sample_posterior_predictive and raise an exception that explains that there are too many dates in the prediction dataframe, and that it should be split into chunks. (or uncertainty_samples should be reduced).

PR welcome!

@bletham bletham added this to the v0.2-release milestone Jul 30, 2017
@bletham bletham removed this from the v0.2-release milestone Sep 12, 2017
@kikizxd
Copy link

kikizxd commented Oct 12, 2017

Hi,
I am currently trying to use Prophet to predict flows in a network. The data like this:

ds y
2017-07-28 09:30:00 5012
2017-07-28 09:30:01 3582
2017-07-28 09:30:02 3205
2017-07-28 09:30:03 2680
... ... ... ...
2017-07-28 10:30:00 4510
2017-08-28 09:30:00 6831
2017-08-28 09:30:01 4370
2017-08-28 09:30:02 3196
2017-08-28 09:30:03 2642
... ... ... ...
2017-08-28 10:30:00 1862
2017-09-28 09:30:00 4382
2017-09-28 09:30:01 5542
2017-09-28 09:30:02 2572
2017-09-28 09:30:03 2576
... ... ... ...
2017-09-28 10:30:00 2770

I want to predict the next period data--- '2017-10-28 9:30:00 10:30:00'. But I saw the Prophet can’t predict secondly data. I have tried to adapt 'forecaster.py' ,but it can't work.

I need your help,please.

@bletham
Copy link
Contributor

bletham commented Oct 17, 2017

Prophet now does support second-level data, as of v0.2. You can check what version you have with

import fbprophet
fbprophet.__version__

@kikizxd
Copy link

kikizxd commented Oct 18, 2017

I appreciate you tell me this.@bletham
And I have downloaded v0.2, but I found another problem.
My data is not a continuous data, It's one month a day.
image
May I do something to change it.

@bletham
Copy link
Contributor

bletham commented Nov 2, 2017

@kikizxd sorry for the slow reply. This is an interesting challenge. Make sure that daily seasonality is turned on, but weekly and yearly are turned off. I would expect it to work if you have enough days of data. Otherwise you could definitely get an estimate of the daily seasonality by adjusting the dates so that they are all consecutive. This is a little off-topic for this issue though so if you'd like to discuss further go ahead and open an issue on this.

@bletham bletham changed the title Predict fails with "cannot allocate vector of size..." with minute granularity data Predict fails with "cannot allocate vector of size..." with large number of prediction points May 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants