Skip to content

Latest commit

 

History

History
1197 lines (863 loc) · 46.4 KB

IntroductionToObjectify.wiki

File metadata and controls

1197 lines (863 loc) · 46.4 KB

  1. summary Guide to (obsolete) Objectify-Appengine v3
  • NOTE: This is the documentation for (now obsolete) Objectify v3.*
If you haven't read the [Concepts] yet, please do so first.

<wiki:toc max_depth="2"></wiki:toc>

This will explain how to use Objectify to get, put, delete, and query data. You may find it helpful to open the Objectify javadocs while reading. These examples omit getter and setter methods for clarity.

Table of Contents

Create Your Entity Classes

The first step is to define your entity class(es). Here is an example of a Car:

Things to note:

  * Objectify persists fields and fields only.  It does not arbitrarily map fields to the datastore; if you want to change the way a field is stored... rename the field.

 * Objectify will not persist {{{static}}} fields, {{{final}}} fields, or fields annotated with {{{javax.persistence.Transient}}} (it will persist fields with the {{{transient}}} keyword).

  * One field must be annotated with {{{javax.persistence.Id}}}.  It can be of type {{{Long}}}, {{{long}}}, or {{{String}}}.  If you use {{{Long}}} and put() an object with a null id, a value will be generated for you.  If you use {{{String}}} or the primitive {{{long}}} type, values will never be autogenerated.

  * You can persist any of the [http://code.google.com/appengine/docs/java/datastore/dataclasses.html#Core_Value_Types core value types], Collections (ie Lists and Sets) of the core value types, or arrays of the core value types.  You can also persist properties of type {{{Key}}}.

  * There must be a no-arg constructor (or no constructors - Java creates a default no-arg constructor).  The no-arg constructor can have any protection level (private, public, etc).

  * If you are converting entities from a JDO project, note that Objectify uses JPA annotations ({{{javax.persistence}}}) and not JDO annotations ({{{javax.jdo.annotations}}}).  Of course, Objectify adds several annotations of its own.

  * {{{String}}} fields which store more than 500 characters (the GAE limit) are automatically converted to {{{Text}}} internally.  {{{Text}}} fields, like {{{Blob}}} fields, are never indexed.

  * {{{byte[]}}} fields are automatically converted to {{{Blob}}} internally.  However, {{{Byte[]}}} is persisted "normally" as an array of (potentially indexed) {{{Byte}}} objects.  Note that GAE internally stores all integral values as a 64-bit long.

More information can be found in the AnnotationReference.

Registering Your Classes

Before you perform any datastore operations, you must register all your entity classes with the .

Objectify does not scan your classpath for classes. There are good reasons for and against this - see the discussion in BestPractices. If you are using Spring, see the objectify-appengine-spring project.

Basic Operations: Get, Put, Delete

You can obtain an interface from the :

The interface supports batch operations:

Querying

Here are some examples of using queries. Objectify's Query mimics the human-friendly Query class from GAE/Python rather than the machine-friendly GAE/Java version.

Note that queries are closely related to indexes. See the appengine documentation for indexes for detail about what you can and cannot filter by.

Cursors

Cursors let you take a "checkpoint" in a query result set, store the checkpoint elsewhere, and then resume from where you left off later. This is often used in combination with the Task Queue API to iterate through large datasets that cannot be processed in the 30s limit of a single request. The algorithm for this is roughly:

  # Create a query, using an existing cursor if you have one.
  # Iterate through the results, processing as you go.
  # If you near the 30s timeout:
    # Get the cursor
    # Create a new processing task with the cursor
    # Break out of the loop

Cursor Example

The s provided by Objectify (including the object) are actually . This will produce a , which allows you to obtain a .

This is an example of a servlet that will iterate through *all* the Car entities:

Asynchronous Calls

The GAE's low-level datastore API supports parallel asynchronous operations. GAE's model of asynchrony does not follow Javascript's "pass in a callback function" model; rather, when you make an asynchronous call, you get back a reference to the pending operation. You can create multiple references which will execute in parallel, however, any request to fetch a concrete result will block until the result is available.

This is better explained by example.

Asynchronous Queries

All *queries* are now asynchronous by default. The "reference" to a query is the object. For example, these two queries are executed in parallel:

Create multiple s, then execute over the iterators.

Asynchronous get()/put()/delete()

  • NOTE: This requires Objectify v3.x*
  • NOTE: If you use Objectify's global memcache with asynchronous operations, you MUST install the . If you do not, your cache will not properly synchronize with the datastore.* This is a workaround for a limitation of the GAE SDK; please star this issue.
Queries require no special interface to parallelize requests because the interface acts as a convenience reference to a pending operation. However, , , and return concrete results. The GAE low-level API provides a parallel set of methods that return results in a layer of indrection, the class. However, is cumbersome to use because it wraps and rethrows all exceptions as checked exceptions.

Objectify provides a similar set of parallel methods, but they return -- just like but with sane exception handling behavior. Here are the salient parts of Objectify's API:

You get the picture. The AsyncObjectify interface has methods that parallel the synchronous Objectify methods, but return instead. You can issue multiple parallel requests like this:

Considerations of Asynchronous Requests

Parallel requests must be used carefully:

 * If you use Objectify's global memcache (the @Cached annotaiton), you *must* install the {{{com.googlecode.objectify.cache.AsyncCacheFilter}}} in your web application.
 * You cannot have more than a fixed number of asynchronous requests going simultaneously.  This number is documented in the Low-Level API documentation, currently 10.  Additional requests will block until previous requests complete.
 * All pending requests will complete *before* your HTTP request returns data to the caller.  If you return from your {{{HttpServlet.service()}}} method while there are async requests pending, the SDK will block and complete these requests for you.
 * This does not allow you to work around the 30s limit for requests (or 10m for task queue requests).  Any async requests pending when a {{{DeadlineExceededException}}} happens will be aborted.  The datastore may or may not reflect any writes.
 * If you run up against {{{DeadlineExceededException}}} while using the global memcache, it is very likely that your cache will go out of sync with the datastore - even with the {{{AsyncCacheFilter}}}.  Do not do this.
 * The synchronous API is no more efficient than the asynchronous API.  In fact, both Objectify's synchronous API and Google's low level synchronous API are implemented as calls to the respective async API followed by an immediate get().  

Optimizing Storage

Indexes are necessary for queries, but they are very expensive to create and update. It costs, in api_cpu_ms, about 48ms to put() a single entity with no indexes. Each standard indexed field adds 17ms to this number. The indexes are written in parallel, so they do not add real-world time... but you'll see the real-world cost on your bill at the end of the week! Indexes also consume a significant amount of storage space - sometimes many times the amount of original data.

@Indexed and @Unindexed

By default, all entity fields except and are indexed. You can control this behavior with and annotations on fields or classes:

Partial Indexes

Often you only need to query on a particular subset of values for a field. If these represent a small percentage of your entities, why index all the rest? Some examples:

  * You might have a boolean "admin" field and only ever need to query for a list of the (very few) admins.
  * You might have a "status" field and never need to query for inactive values.
  * Your queries might not include null values.

Objectify gives developers the ability to define arbitrary conditions for any field. You can create your own classes or use one of the provided ones:

These conditions work with both and on fields. You cannot specify conditions on the class-level annotations.

Check the javadocs for available classes. Here are some basics to start:

is special. It tests true when the field value is whatever the default value is when you construct an object of your class. For example:

Note that you can initialize field values inline (as above) or in your no-arg constructor; either will work.

Custom Conditions

You can easily create your own custom conditions by extending or . is a simple test of a field value. For example:

You can use to examine other fields to determine whether or not to index! This example is inspired by the example in the Partial Index Wikipedia page, and will use a static inner class for convenience:

You can examine the source code of the classes to see how to construct your own. Most are one or two lines of code.

If you would like to exclude a field value from being stored, you can use the annotation. The field will not be saved and will not occupy any space in the datastore. This works well in concert with :

Note that values are not stored at all, so they aren't indexed and you can't query for them.

Polymorphism

  • NOTE: This requires Objectify v3.x*
Objectify lets you define a polymorphic hierarchy of related entity classes, and then load and query them without knowing the specific subtype. Here are some examples:

Things to note:

  * The root of your polymorphic hierarchy *must* be annotated with {{{@Entity}}}.
  * All polymorphic subclasses must be annotated with {{{@Subclass}}}.
  * You can skip {{{@Subclass}}} on intermediate classes which will never be materialized or queried for.
  * You should register all classes in the hierarchy separately, but order is not important.
  * Polymorphism applies only to entities, not to @Embedded classes.

In a polymorphic hierarchy, you can and without knowing the actual type:

Implementation Considerations

When you store a polymorphic entity subclass (but not an instance of the base type), your entity is stored with two additional, hidden synthetic properties:

  * _^d_ holds a discriminator value for the concrete class type.  This defaults to the class shortname but can be modified with the {{{@Subclass(name="alternate")}}} annotation.
  * _^i_ holds an indexed list of all the discriminators relavant to a class; for example a Cat would have [["Mammal", "Cat]].

The indexed property is what allows polymorphic queries to work. It also means that you cannot simply change your hierarchy arbitrarily and expect queries to continue to work as expected - you may need to re-put() all affected entities to rewrite the indexed field.

There are two ways you can affect this:

  # You can leave some subclasses unindexed by specifying {{{@Subclass(unindexed=true)}}}.  You will not be able to query by these subclasses (although simple {{{get()}}}s work, and queries for indexed superclasses will return a properly instantiated instance of the subclass).
  # You can use {{{@Subclass(alsoLoad="OldDiscriminator")}}} to "reclaim" old discriminator values when changing class names.  Note that this will not help with query indexes, which must be re-put().

Relationships

A relationship is simply a stored as a field in an entity. Objectify does not provide "managed" relationships in the way that JDO or JPA does; this is both a blessing and a curse. However, because is a generified class, it carries type information about what it points to.

There are fundamentally three different kinds of relationships in Objectify:

Parent Relationship

An entity can have a single field annotated with :

Each Car entity is part of the parent owner's entity group and both can be accessed within a single transaction. When loading the child entity, the parent must be used to generate the child's key:

Note that this is an inappropriate use of the @Parent entity; if a car were to be sold to a new owner, you would need to delete the Car and create a new one. It is often better to use Single Value Relationships even when there is a conceptual parent-child or owner-object relationship; in that case you could simply change the parent.

  • If you get() an entity, change the @Parent key field, and put() the entity, you will create a new entity*. The old entity (with the old parent) will still exist. You cannot simply change the value of a @Parent key field. This is a fundamental aspect of the appengine datastore; @Parent values form part of an entity's identity.

Single-Value Relationship

In Objectify (and the underlying datastore), Keys are just properties like any other value. Whether it defines a one-to-one relationship or a many-to-one relationship is up to you. Furthermore, a field could refer to any type of entity class.

One To One

The simplest type of single-value relationship is one-to-one.

Many To One

A field can represent a many-to-one relationship.

It looks identical to the one-to-one relationship because it is. The only difference is a conceptual one. What if you want to know all the employees managed by Fred? You use a query.

Multi-Value Relationship

The datastore can persist simple object types (Long, String, etc) and collections of simple object types. It can also persist collections (and arrays) of . This creates an alternative approach for defining one-to-many (and many-to-many) relationships.

This is sometimes useful, but should be used with caution for two reasons:

  # Every time you {{{get()}}} and {{{put()}}} an object, it will fetch and store the entire list of subordinate keys.  If you have large numbers of subordinates, this could become a performance problem.
  # Appengine limits you to 5,000 entries.
  # Because appengine creates an index entry for every value in the collection, you can suffer from [http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html#Big_Entities_and_Exploding_Indexes Exploding Indexes].

Because appengine stores an index entry for each value in the collection, it is possible to issue queries like this:

The decision to use a Multi-Value Relationship will depend heavily upon the shape of your data and the queries you intend to perform.

Transactions

Working with transactions is almost the same as working with Objectify normally.

All data manipulation methods are the same as you would normally use.

Since entities in Objectify really are Plain Old Java Objects and transactions are tied to the Objectify object, it's easy to work with data inside and outside of transactions (or multiple transactions running in parallel!):

You can interleave multiple transactions or nontransactional actions as long as you obey the the cardinal rule: Within a single transaction (defined by an Objectify object created with beginTransaction()), you may only read or write from a single entity group.

Yes, this means you can get() objects from a transactional Objectify and put() to a nontrasactional Objectify.

Lifecycle Callbacks

Objectify supports two of the JPA lifecycle callbacks: and . If you mark methods on your POJO entity class (or any superclasses) with these annotations, they will be called:

  * {{{@PostLoad}}} methods are called after your data has been populated on your POJO class from the datastore.
  * {{{@PrePersist}}} methods are called just before your data is written to the datastore from your POJO class.

You can have any number of these callback methods in your POJO entity class or its superclasses. They will be called in order of declaration, with superclass methods called first. Two parameter types are allowed:

  * The instance of {{{Objectify}}} which is being used to load/save the entity.
  * The datastore {{{Entity}}} which is associated with the Java POJO entity.

  • Caution*: You can't update @Id or @Parent fields in a @PrePersist callback; by this time, the low-level Entity has already been constructed with a Key so it can be passed in to the callback as an optional parameter. You can, however, update any other fields and the new values will be persisted.

Migrating Schemas

It is a rare schema that remains unchanged through the life of an application. BigTable's schemaless nature is both a blessing and a curse - you can easily change schemas object-by-object on the fly, but you can't easily do it in bulk with an ALTER TABLE. Objectify provides some simple but powerful tools to help with common types of structure change.

The basic process of schema migration using Objectify looks like this:

  # Change your entity classes to reflect your desired schema.
  # Use Objectify's annotations to map data in the old schema onto the new schema.
  # Deploy your code, which now works with objects in the old schema and the new schema.
  # Let your natural get()/put() churn convert objects for as long as you care to wait.
  # Run a batch job to get() & put() any remaining entities.

Here are some common cases.

Adding Or Removing Fields

This is the easiest - just do it!

You can add any fields to your classes; if there is no data in the datastore associated with that field, it will be left at its default value when the class is initialized. This is worlds better than the exceptions you often get from JDO.

You can remove a field from your classes. The data in the datastore will be ignored when the entity is get(). When you next put() the entity, the entity will be saved without this field.

Renaming A Field

Let's say you have an entity that looks like this:

You're doing some refactoring and you want to rename the field "name" to "fullName". You can!

When a Person is get()ed, the field will be loaded either the value of _fullName_ or _name_. If both fields exist, an IllegalStateException will be thrown. When put(), only _fullName_ will be written.

Caveat: Queries do not know about the rename; if you filter by "fullName", you will only get entities that have been converted. You can still filter by "name" to get only the old ones.

Transforming Data

Now that you've migrated all of your data to the new Person format, let's say you now want to store separate first and last names instead of a single fullName field. Objectify can help:

You can specify on the parameter of any method that takes a single parameter. The parameter must be type-appropriate for what is in the datastore; you can pass Object and use reflection if you aren't sure. Process the data in whatever way you see fit. When the entity is put() again, it will only have _firstName_ and _lastName_.

Caution: Objectify has no way of knowing that the importCruft() method has loaded the firstName and lastName fields. If both fullName and firstName/lastName exist in the datastore, the results are undefined.

Changing Enums

Changing enum values is just a special case of transforming data. Enums are actually stored as Strings (and actually, all fields can be converted to String automatically), so you can use an @AlsoLoad method to process the data.

Let's say you wanted to delete the AQUA color and replace it with GREEN:

The method automatically overrides the loading of the Color field, but the Color field is what gets written on save. Note that you cannot have conflicting values on multiple methods.

Moving Fields

Changing the structure of your entities is by far the most challenging kind of schema migration; perhaps you want to combine two entities into one, or perhaps you want to move an field into a separate entity. There are many possible scenarios that require many different approaches. Your essential tools are:

  * {{{@AlsoLoad}}}, which lets you load from a variety of field names (or former field names), and lets you transform data in methods.
  * {{{@NotSaved}}}, which lets you load data into fields without saving them again.
  * {{{@PostLoad}}}, which lets you execute arbitrary code after all fields have been loaded.
  * {{{@PrePersist}}}, which lets you execute arbitrary code before your entity gets written to the datastore.

Let's say you have some embedded address fields and you want to make them into a separate Address entity. You start with:

You can take two general approaches, either of which can be appropriate depending on how you use the data. You can perform the transformation on save or on load. Here is how you do it on load:

If changing the data on load is not right for your app, you can change it on save:

If you have an especially difficult transformation, post to the objectify-appengine google group. We're happy to help.

@Embedded

Objectify supports embedded classes and collections of embedded classes. This allows you to store structured data within a single POJO entity in a way that remains queryable. With a few limitations, this can be an excellent replacement for storing JSON data.

Embedded Classes

You can nest objects to any arbitrary level.

Embedded Collections and Arrays

You can use @Embedded on collections or arrays:

Some things to keep in mind:


Indexing Embedded Classes

As with normal entities, all fields within embedded classes are indexed by default. You can control this:

  * Putting {{{@Indexed}}} or {{{@Unindexed}}} on a class (entity or embedded) will make all of its fields default to indexed or unindexed, respectively.
  * Putting {{{@Indexed}}} or {{{@Unindexed}}} on a field will make it indexed or unindexed, respectively.
  * {{{@Indexed}}} or {{{@Unindexed}}} status for nested classes and fields are generally inherited from containing fields and classes, except that:
    * {{{@Indexed}}} or {{{@Unindexed}}} on a field overrides the default of the class containing the field.
    * {{{@Indexed}}} or {{{@Unindexed}}} on a field of type {{{@Embedded}}} will override the default on the class inside the field (be it a single class or a collection).

If you persist one of these EntityWithComplicatedIndexing objects, you will find:

|| || not indexed || || || indexed || || || indexed || || || not indexed ||

Note that is *not* indexed; the annotation on overrides 's class default.

Querying By Embedded Fields

For any indexed field, you can query like this:

Filtering works for embedded collections just as it does for normal collections:

Entity Representation

You may wish to know how @Embedded fields are persisted so that you an access them through the Low-Level API. Here is an example:

This will produce an entity that contains:

|| one.foo || "Foo Value" || || one.two.bar || "Bar Value" ||

You can see why query filters work the way they do.

For @Embedded collections and arrays, the storage mechanism is more complicated:

This will produce an entity that contains:

|| ones.foo || || || ones.two.bar || ||

This is what the entity would look like if the second and third values in the collection were :

|| ones.foo^null || || || ones.foo || || || ones.two.bar || ||

The synthetic ^null property only exists if the collection contains nulls. It is never indexed.

Schema Migration

The annotation can be used on any field, including fields. For example, this class will safely read in instances previously saved with :

methods work as well, however you cannot use on method parameters.

<wiki:comment>

Embedded Maps

There is one additional special behavior of : If you put it on a keyed by String, this will allow you to create <a href="http://code.google.com/appengine/docs/python/datastore/datamodeling.html#The_Expando_Class" target="_blank">"expando"</a> dynamic properties. For example: ...will produce this entity structure: || stuff.foo || "fooValue" || || stuff.bar || "barValue" || If the Map field is indexed, you can filter by "stuff.foo" or "stuff.bar". Note that while the Map value can be of any type, the Map key *must* be String. </wiki:comment>

@Serialized

An alternative to is to use , which will let you store nearly any Java object graph.

There are some limitations:

  * All objects stored in the graph must follow Java serialization rules, including implement {{{java.io.Serializable}}}.
  * The total size of an entity cannot exceed 1 megabyte.  If your serialized data exceeds this size, you will get an exception when you try to {{{put()}}} it.
  * You will not be able to use the field or any child fields in queries.
  * As per serializaton rules, {{{transient}}} (the java keyword, not the annotation) fields will not be stored.
  * All Objectify annotations will be ignored within your serialized data structure.  This means {{{@Transient}}} fields within your serialized structure will be stored!
  * Java serialization data is opaque to the datastore viewer and other languages (ie GAE/Python).  You will only be able to retrieve your data from Java.

However, there are significant benefits to storing data this way:

  * You can store nearly any object graph - nested collections, circular object references, etc.  If Java can serialize it, you can store it.
  * Your field need not be statically typed.  Declare {{{Object}}} if you want.
  * Collections can be stored in their full state; for example, a SortedSet will remember its Comparator implementation.
  * {{{@Serialized}}} collections can be nested inside {{{@Embedded}}} collections.

You are *strongly* advised to place on all classes that you intend to store as . Without this, *any* change to your classes will prevent stored objects from being deserialized on fetch. Example:

Caching

Objectify provides two different types of caches:

  * A _session cache_ which holds entity instances inside a specific {{{Objectify}}} instance.
  * A _global cache_ which holds entity data in the appengine memcache service.

You must explicitly decide to use these caches. If you do nothing, every get() will read through to the datastore.

Session Cache

The session cache associates your entity object instances with a specific instance. You must explicitly enable it by passing in to the method:

Note:

  * The session cache holds your *specific entity object instances*.  If you {{{get()}}} or {{{query()}}} for the same entity, you will receive the exact same Java entity object instance.
  * The session cache is local to the {{{Objectify}}} instance.  If you {{{begin()}}} a new instance, it will have a separate cache.
  * A {{{get()}}} (batch or otherwise) operation for a cached entity will return the entity instance *without* a call to the datastore or even to the memcache (if the global cache is enabled).  The operation is a simple hashmap lookup.
  * A {{{query()}}} will return cached entity instances, however the (potentially expensive) call to the datastore will still be made.
  * The session cache is *not* thread-safe.  You should never share an {{{Objectify}}} instance between threads.
  * The session cache appears to be very similar to a JPA, JDO, or Hibernate session cache with one exception - there is no dirty change detection.  As per standard Objectify behavior, if you wish to change an entity in the datastore, you must explicitly {{{put()}}} your entity.

Global Cache

Objectify can cache your entity data globally in the appengine memcache service for improved read performance. This cache is shared by all running instances of your application.

The global cache is enabled by default, however you must still annotate your entity classes with to make them cacheable:

That's it! Objectify will utilize the memcache service to reduce read load on the datastore.

What you should know about the global cache:

  * The fields of your entity are cached, not your POJO class itself.  Your entity objects will not be serialized (although any @Serialized fields will be).
  * Only get(), put(), and delete() interact with the cache.  query() is not cached.
  * Writes will "write through" the cache to the datastore.  Performance is only improved on read-heavy applications (which, fortunately, most are).
  * Negative results are cached as well as positive results.
  * Transactional reads bypass the cache.  Only successful commits modify the cache.
  * You can define an expiration time for each entity in the annotation: {{{@Cached(expirationSeconds=600)}}}.  By default entities will be cached until memory pressure (or an 'incident' in the datacenter) evicts them.
  * You can disable the global cache for an {{{Objectify}}} instance by creating it with the appropriate {{{ObjectifyOpts}}}.
  * The global cache can work in concert with the session cache.
    * Remember:  The session cache caches entity Java object instances, the global cache caches entity data.

  • Warning*: Objectify's global cache support prior to v3.1 suffered from synchronization problems under contention. Do not use it for entities which require transactional integrity, and you are strongly advised to apply an expiration period to all cache values.
The cache in 3.1 has been rewritten from scratch to provide near-transactional consistency with the datastore. Only DeadlineExceededException should be able to produce synchronization problems.

For more commentary about the new v3.1 cache, see MemcacheStandalone.

Example

Andrew Glover wrote an excellent article for IBM developerWorks: _Twitter Mining with Objectify-Appengine_, part 1 and part 2.


Now, read the BestPractices.