Skip to content

Easy maintenence of collections of processed data in MongoDB with the Mongoid ODM

License

Notifications You must be signed in to change notification settings

mongoid/mongoid-collection-snapshot

 
 

Repository files navigation

Mongoid Collection Snapshot

Easy maintenance of collections of processed data in MongoDB with Mongoid 3, 4, 5 and 6.

This is a forked, renamed, maintained and supported version of mongoid_collection_snapshot.

Gem Version Build Status

Example

Suppose that you have a Mongoid model called Artwork, stored in a MongoDB collection called artworks and the underlying documents look something like:

{ name: 'Flowers', artist_id: ..., price: 3000000 }

From time to time, your system runs a map/reduce job to compute the total price of all artist's works, resulting in a collection called artist_artwork_price that contains documents that look like:

{ _id: ..., artist_id: ..., sum: 1500000 }

If your system wants to maintain and use this price data, it has to do so at the level of raw MongoDB operations, since map/reduce result documents don't map well to models in Mongoid. Furthermore, even though map/reduce jobs can take some time to run, you probably want the entire artist_artwork_price collection populated atomically from the point of view of your system, since otherwise you don't ever know the state of the data in the collection - you could access it in the middle of a map/reduce and get partial, incorrect results.

A mongoid-collection-snapshot solves this problem by providing an atomic view of collections of data like map/reduce results that live outside of Mongoid.

In the example below, we set up our artist price sum collection by including Mongoid::CollectionSnapshot and implementing a build method.

class ArtistArtworkPrice
  include Mongoid::CollectionSnapshot

  def build
    map = <<-EOS
      function() {
        emit({ artist_id: this['artist_id']}, { count: 1, sum: this['price'] })
      }
    EOS

    reduce = <<-EOS
      function(key, values) {
        var sum = 0;
        var count = 0;
        values.forEach(function(value) {
          sum += value['sum'];
          count += value['count'];
        });
        return({ count: count, sum: sum });
      }
    EOS

    Artwork.map_reduce(map, reduce).out(inline: 1).each do |doc|
      collection_snapshot.insert_one(
        artist_id: doc['_id']['artist_id'],
        count: doc['value']['count'],
        sum: doc['value']['sum']
      )
    end
  end
end

Now, if you want to schedule a recomputation, just call ArtistArtworkPrice.create.

The latest snapshot is always available as ArtistArtworkPrice.latest.

andy_warhol = Artist.where(name: 'Andy Warhol').first
andy_warhol_price = ArtistArtworkPrice.latest.collection_snapshot.where(artist_id: andy_warhol.id).first
average_price = andy_warhol_price['sum'] / andy_warhol_price['count']

Snapshot Cleanup

By default, mongoid-collection-snapshot maintains the most recent two snapshots computed any given time. Set max_collection_snapshot_instances to change this.

ArtistArtworkPrice.max_collection_snapshot_instances = 3

Query Snapshot Data with Mongoid

You can do better than the average price example above and define first-class models for your collection snapshot data, then access them as any other Mongoid collection via collection snapshot's .documents method.

class AverageArtistPrice
  document do
    belongs_to :artist, inverse_of: nil
    field :sum, type: Integer
    field :count, type: Integer
  end

  def average_price(artist_name)
    artist = Artist.where(name: artist_name).first
    doc = documents.where(artist: artist).first
    doc.sum / doc.count
  end
end

The following example iterates through all latest artist price averages.

AverageArtistPrice.latest.documents.each do |doc|
  puts "#{doc.artist.name}: #{doc.sum / doc.count}"
end

This code can be found in the example folder.

Multi-Collection Snapshots

You can maintain multiple collections atomically within the same snapshot by passing unique collection identifiers to collection_snaphot when you call it in your build or query methods:

class ArtistArtworkPrice
  include Mongoid::CollectionSnapshot

  def build
    # define map/reduce for average and max aggregations
    Mongoid.default_session.command('mapreduce' => 'artworks', map: map_avg, reduce: reduce_avg, out: collection_snapshot('average'))
    Mongoid.default_session.command('mapreduce' => 'artworks', map: map_max, reduce: reduce_max, out: collection_snapshot('max'))
  end

  def average_price(artist)
    doc = collection_snapshot('average').find('_id.artist' => artist).first
    doc['value']['sum'] / doc['value']['count']
  end

  def max_price(artist)
    doc = collection_snapshot('max').find('_id.artist' => artist).first
    doc['value']['max']
  end
end

Specify the name of the collection to define first class Mongoid models.

class ArtistArtworkPrice
  document('average') do
    field :value, type: Hash
  end

  document('max') do
    field :value, type: Hash
  end
end

Access these by name.

ArtistArtworkPrice.latest.documents('average')
ArtistArtworkPrice.latest.documents('max')

If fields across multiple collection snapshots are identical, a single default document is sufficient.

class ArtistArtworkPrice
  document do
    field :value, type: Hash
  end
end

Custom Database Connections

Your class can specify a custom database for storage of collection snapshots by overriding the snapshot_session instance method. In this example, we memoize the connection at the class level to avoid creating many separate connection instances.

class ArtistArtworkPrice
  include Mongoid::CollectionSnapshot

  def build
    # ...
  end

  def snapshot_session
    @snapshot_session ||= Mongo::Client.new('mongodb://localhost:27017/artists_and_artworks')
  end
end

Another common way of configuring this is through mongoid.yml.

development:
  sessions:
    default:
      database: dev_data
    imports:
      database: dev_imports
def snapshot_session
  Mongoid.session('imports')
end

License

Copyright (c) 2011-2017 Art.sy Inc. and Contributors

MIT License, see LICENSE.txt for details.

About

Easy maintenence of collections of processed data in MongoDB with the Mongoid ODM

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Ruby 100.0%