Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unable to convert between grains when using pandas==2.2.0 #494

Closed
00milsg opened this issue Mar 7, 2024 · 2 comments
Closed

[BUG] Unable to convert between grains when using pandas==2.2.0 #494

00milsg opened this issue Mar 7, 2024 · 2 comments
Assignees
Labels

Comments

@00milsg
Copy link

00milsg commented Mar 7, 2024

Describe the bug

When converting a triangle with grain OMDM to OQDQ every cell is converted to nan.

To Reproduce

To reproduce the behaviour I used the following package versions. Note: When I downgraded pandas to a version less than 2.2.0 the issue disappears. I will use this as a workaround for now.

pandas==2.2.0
numpy==1.26.4
chainladder==0.8.18

Full code to reproduce the behaviour is attached in a text file (it's 50 lines long and I don't want to bloat this post). I used the prism example dataset and back worked a dataset to create monthly & quarterly triangles. The code will run self contained. See link below.

If you'd rather I add the example code as a comment then let me know.

chain-ladder-grain-conversion-error-reprex.txt

Results of my digging into the issue

I think this is being caused by the initial dev_to_val conversion at the top of the method (possibly related to this issue?). This then flows through and upsets the calculation of the d_start variable. In the earlier versions the calculation of the d_start variable is correct. In version >= 2.2.0 d_start gets set to the start of the previous month. This triggers the conditional block below, which sets the data to nan.

d_start = pd.Period(
obj.valuation[0],
freq=dgrain_old if dgrain_old == 'M' else dgrain_old + obj.origin.freqstr[-4:]
).to_timestamp(how='s')
if (len(obj.ddims) > 1 and obj.origin.to_timestamp(how='s')[0] != d_start):
addl_ts = (
pd.period_range(obj.odims[0], obj.valuation[0], freq=dgrain_old)[:-1]
.to_timestamp()
.values
)
addl = obj.iloc[..., -len(addl_ts) :] * 0
addl.ddims = addl_ts
obj = concat((addl, obj), axis=-1)
obj.values = num_to_nan(obj.values)

Question: Could we add a warning stating that this is what's happening (and potentially why)? We could also add an accompanying suggestion on how to correct it (filter your input data so that min(dev_month) >= min(orig_month)).

Expected behavior

When converting a triangle from grain OMDM to OQDQ the total sum of the triangle contents should remain the same. Please see below for a potential unit test.

# Test that triangles contain the same starting information
qtr_sum = qtr_triangle['reportedCount'].sum().sum()
mth_sum = mth_triangle['reportedCount'].sum().sum()
assert qtr_sum == mth_sum, "Triangles are not equal before grain change."

# Test that we still get the same answers when we convert the monthly tri to quarterly
mth_conv_sum = mth_triangle['reportedCount'].grain('OQDQ').sum().sum()
assert mth_conv_sum == mth_sum, "Triangles are not equal after grain change."
@jbogaardt
Copy link
Collaborator

@00milsg , thanks for reporting, providing a fully functioning reprex, and doing the legwork on the root cause. This is a fantastic bug report. I actually think its a bug in pandas==0.2.2 or at the very least an undocumented deprecation. I've opened a bug over there to confirm: pandas-dev/pandas#57781

@00milsg
Copy link
Author

00milsg commented Mar 11, 2024

No problem @jbogaardt, happy to help :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants