Skip to content

What flex_columns is NOT Good For

Andrew Geweke edited this page Dec 11, 2013 · 4 revisions

flex_columns does not magically give you the full power of something like HBase, Riak, or MongoDB. (That's why you don't have to pay for that complexity.) Attributes are not queryable, and you can't build indexes on them.

In the broadest possible sense: if you need "big data" support, flex_columns is definitely not that. It will scale with your relational database very well if you use it properly, but it is no substitute for a dedicated "big data" system. It also will not work well if you lean on it for storing data that's critical for analytics or online, bulk operations: querying data in flex columns for many rows at once or doing analytics on it is a really bad idea.

These are all bad ideas for using flex_columns, mapped to the reasons why it's a bad idea, below:

  • Storing data that you will need to retrieve en masse (as opposed to one-by-one): avatar pictures for users, ages, the "deleted" flag for a user, and so on. (This obviously depends on your application, but, for a site that shows users profiles/photos of other users in some capacity, these are all attributes that you'd want to retrieve for many users at once.) (1)(2)
  • Storing data that you will need to query: user email address, physical address, date of birth, and so on. (Think carefully, too: just because you don't need to query on physical address now doesn't mean you won't, later — and think about analytics needs, too.) (3)
  • Storing common or "default values": imagine you allow users to set a custom background color for your site, but only 3% of users do. For the 97% of users who leave the color at the default, don't store anything in the flex column at all — you'll just increase the size of your database (and the amount of RAM required to buffer it) enormously. Instead, leave the default value in code and only store it if they override it. (4)
  • Storing large amounts of data in a CLOB column directly on a highly frequently-accessed table: this will make the table itself much larger, which means far less of it will fit in cache; you're spending valuable RAM caching flex-column data that you don't need very often. (4)

Reasons those things are a bad idea:

  1. Deserializing JSON, while fast, is still considerably more expensive than getting this data in individual columns from a database.
  2. You can't "partially retrieve" or "partially deserialize" JSON and just grab one data item from it — once you touch it, you're getting all of it.
  3. Relying on being able to index a JSON-formatted column is probably a pretty bad idea. PostgreSQL can create indices on individual fields of JSON columns using its support for JSON types; MySQL is more limited. Either way, if you need the attribute to be queryable, you're better off putting it in a normal column, not a flex column.
  4. JSON columns are considerably less efficient (in terms of number of bytes used) for storing data than native database types. This will cause your database size to grow; the additional disk used is probably a mere annoyance, but the additional RAM required to effectively cache your database can be very important.
Clone this wiki locally