Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate entries in DynamoDB when updating an existing object #162

Open
wagaboy opened this issue Nov 7, 2013 · 4 comments
Open

Duplicate entries in DynamoDB when updating an existing object #162

wagaboy opened this issue Nov 7, 2013 · 4 comments

Comments

@wagaboy
Copy link

wagaboy commented Nov 7, 2013

I have a model (shown below) that gets duplicated (observed from dynamodb console) when I update it.
My sequence of operations, starting with an empty table:

User.create(...) # creates one item in the table
User.count # =1
u = User.first
u.first_name = "Bar"
u.save # Now there are two items in the table
User.count # = 2

Just calling u.save multiple times on the same object creates multiple entries. Am I missing something-- I'm fairly new to dynamodb and dynamoid. Or is this a bug or a known issue.

My Model

class User
  include Dynamoid::Document

  field :email
  field :provider
  field :uid

  # Do not allow modification of first and last name.
  field :first_name
  field :last_name
  field :bio
  field :roles, :set

  index [:uid, :provider]
  ...
@wagaboy
Copy link
Author

wagaboy commented Nov 7, 2013

Details about my setup:
aws-sdk (1.24.0)
dynamoid (0.7.1)
rails (3.2.14)
ruby 1.9.3p392

@jasoncox
Copy link

jasoncox commented Feb 7, 2014

I've noticed this too, no duplicates are created if you set the key to something else like:

table :key => :user_id

Also on the duplicates the id hash remains the same but appended with a changing number .123, is this some kind of versioning - Any ideas on this?

@jasoncox
Copy link

jasoncox commented Feb 7, 2014

This is a feature of Dynamoid - see Partitioning 👍

From Readme:
Dynamoid attempts to obviate this problem transparently by employing a partitioning strategy to divide up keys randomly across DynamoDB's servers. Each ID is assigned an additional number (by default 0 to 199, but you can increase the partition size in Dynamoid's configuration) upon save; when read, all 200 hashes are retrieved simultaneously and the most recently updated one is returned to the application. This results in a significant net performance increase, and is usually invisible to the application itself. It does, however, bring up the important issue of provisioning your DynamoDB tables correctly.

With partitioning enabled I suppose .count is not returning the expected result?

@ngordon17
Copy link

@jasoncox the issue has nothing to do with partitioning.

@wagaboy you are getting this behavior because you are saving the field and Dynamoid is automatically filling in the 'id' field with a randomly generated string since you haven't specified what the table key should be (which is why @jasoncox's solution works). However, when you do User.find, it is querying using the index you used and thus is only pulling in the fields from the index and not the saved 'id'. Now you are resaving the object, but the 'id' field is blank and so Dynamoid is generating another random id and resaving the object with that new 'id' resulting in what appear to be two separate objects since User.count queries the table with the primary key not the table with your index. Note that the index table will actually only have one user.
This should probably be fixed so that when you query on an index it also loads the primary key, but in the meantime I would suggest specifying your own primary key so that you don't run into the issue or make sure to load the id field in before saving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants