Skip to content

Use 'create' instead of 'index' to avoid stale data scenario #22

Open
@jaddison

Description

@jaddison

Gleaned from the discussion with @avelis in #20, during the bulk reindexing scenario only the following line:

data = {'delete' if delete else 'index': data}

would likely be better as:

data = {'delete' if delete else 'create': data}

Even better, for flexibility, probably having a safe default parameter into bulk_index() makes sense, like:

def bulk_index(cls, es=None, index_name='', queryset=None, create_only=False):
  ...
  action = 'delete' if delete else ('create' if create_only else 'index')
  data = {action: data}

This would prevent any possible overwriting of data in the new index during the reindex process. That said, this would never happen given the current structure and usage of this library. It becomes more relevant in a parallel index writing strategy discussed in #20.

See ES bulk docs for more information.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions