The following document uses a hierarchical structure of features - from high level descriptions of general functionality to the subfeatures of which those general functions are composed.
- geomesa-tools (command line tools for interacting with geomesa)
- Creating a geomesa datastore for accumulo
- Pointer: CreateCommand.scala
- Behavior tested; works
- Ingest vectors, provided a
GeoMesaInputFormat
- Pointer: IngestCommand.scala
- Predefined, common
SimpleFeatureType
s are provided - gdelt, geolife, geonames, gtd, nyctaxi, osm-gpx, tdrive, twitter - Behavior tested; works
- Ingest rasters
- Pointer: IngestRasterCommand.scala
- Supported file formats: "tif", "tiff", "geotiff", "dt0", "dt1", "dt2"
- Creating a geomesa datastore for accumulo
- geomesa-convert (tools for converting various serialization formats to
SimpleFeature
s for ingest - conversion mechanisms are specified by way of configuration files)- delimited text (usually CSV/TSV)
- Pointer: DelimitedTextConverter.scala
- Currently supported formats: "CSV" | "DEFAULT", "EXCEL", "MYSQL", "TDF" | "TSV" | "TAB", "RFC4180", "QUOTED", "QUOTE_ESCAPE", "QUOTED_WITH_QUOTE_ESCAPE". $1 through $n for n values per line ($0 refers to the entire line).
- fixed width
- Pointer: FixedWidthConverters.scala
- avro
- Pointer: geomesa-convert-avro
- json
- Pointer: geomesa-convert-json
- xml
- Pointer: geomesa-convert-xml
- delimited text (usually CSV/TSV)
- geomesa-stream (support for streaming input)
- A datastore which listens for updates from a source which meets certain conditions
- Pointer: StreamDataStore.scala
- A generic apache-camel based implementation](https://github.com/locationtech/geomesa/blob/b7056fae4988ef524913bf3dc33d9ff2a3476b09/geomesa-stream/geomesa-stream-generic/src/main/scala/org/locationtech/geomesa/stream/generic/GenericSimpleFeatureStreamSourceFactory.scala)
- Hooks for updating GeoServer on stream update
- Pointer: stub pomfile
- A datastore which listens for updates from a source which meets certain conditions
- Storm/Kafka ingest (mentioned in [Other Features](#Other Features) below)
- geomesa-compute
- Generating
RDD
s ofSimpleFeature
s- Pointer: GeoMesaSpark.scala
- Capable of querying with CQL to fill an RDD with some subset of your data
- Carrying out spark SQL queries to process geomesa data
- Pointer: GeoMesaSparkSql.scala
- When constructing a spark context, "yarn-client" is set to be the master, which isn't always a good assumption
- As of 7/12/16, some stubbed out functions remain in the
GeoMesaDataContext
- Generating
- geomesa-jobs
- Reading data for use in a custom M/R job
- Pointer: geomesa-jobs mapreduce
- Pointer: geomesa-jobs mapred
- Apparently capable of reading from any GeoMesa
DataStore
as well as from the filesystem with or without avro files specifying the details of the conversion.
- Reading data for use in a custom M/R job
- geomesa-process - (On Accumulo backed GeoMesa instances only - with the possible exception of
Point2Point
andDensityProcess
, based on file locations and accumulo imports within said files. All processes are registered in https://github.com/locationtech/geomesa/blob/b7056fae4988ef524913bf3dc33d9ff2a3476b09/geomesa-process/src/main/scala/org/locationtech/geomesa/process/ProcessFactory.scala)- computing a heatmap from a provided CQL query
- Pointer: DensityProcess.scala
- Given CQL and a description of the stats of interest, compute said stats on said CQL results
- Currently supported statistics: count, enumeration, frequency (countMinSketch), histogram, top-k, and min/max (bounds).
- Command line tools expose the following statistics: count, histogram, min/max (bounds), and top-k
- Pointer: StatsIteratorProcess.scala
- 'Tube selection' (space/time correlated queries)
- Pointer: geomesa 'tube' queries
- This is a pretty sophisticated query mechanism. The basic idea is that, given a collection of points (with associated times), you should be able to return similar collections of points (in terms of where the lines connecting said points exist). Constraints on the query include the size of the spatial and temporal buffers (this is the sense in which we're dealing with 'tubes') and maximum speed attained by the entity whose points make up a given trajectory. Read more here: http://www.geomesa.org/documentation/tutorials/geomesa-tubeselect.html
- Proximity Search
- Pointer: ProximitySearchProcess.scala
- Given a set of vectors to search through and a set of vectors to establish proximity, return the members of the former set which lie within the (specified) proximity of members of the latter set
- Query against an accumulo GeoMesa store
- Pointer: QueryProcess.scala
- Takes advantage of accumulo optimization to carry out geomesa queries
- Find the K nearest neighbors to a given point
- Pointer: KNearestNeighborSearchProcess.scala
- Identify unique values for an attribute in results of a CQL query
- Pointer: UniqueProcess.scala
- Convert points to lines
- Pointer: Point2PointProcess.scala
- Convert a collection of points into a collection of line segments given a middle term parameter. Optionally break on the day of occurrence. This feature isn't really advertised.
- computing a heatmap from a provided CQL query
- Default Indices
- XZ3 (GeoMesa 1.2.5+)
- Pointer: XZ3IdxStrategy.scala
- Notes: Default for objects with extent in GeoMesa 1.2.5. Objects are indexed with a maximum resolution of 36 bits (12 divisions into eighths).
- XZ2 (GeoMesa 1.2.5+)
- Pointer: XZ2IdxStrategy.scala
- Notes: Default for objects with extent in GeoMesa 1.2.5. Objects are indexed with a maximum resolution of 24 bits (12 divisions into quarters).
- Z3
- Pointer: Z3IdxStrategy.scala
- Notes: For points, X, Y, and Time have resolutions of 21, 21, and 20 bits, respectively.
- Z2
- Pointer: Z2IdxStrategy.scala
- Notes: For points, X and Y both have resolutions of 31 bits.
- Record
- Pointer: RecordIdxStrategy.scala
- Notes: This is an index over object UUIDs.
- XZ3 (GeoMesa 1.2.5+)
- Optional Indices
- Attribute
- Pointer: AttributeIdxStrategy.scala
- Notes: This is an index over SimpleFeature attributes. One can create a join index over the UUID, date, and geometry or a full index.
- ST
- Pointer: STIdxStrategy.scala
- Notes: Spatio-Temporal Index? Deprecated?
- Attribute
- Cost-Based Optimization (CBO) is used to select with index to use
- geomesa-accumulo
- A reader for directly querying a datastore in java/scala
- Pointer: .getFeatureReader
- This is the best bet for high speed accumulo reads, per the GeoMesa gitter.
- Produce a collection of features for a given datastore
- Pointer: .getFeatureSource
- Performance characteristics vs the above reader are unclear. This feature is used, however, in the command line export
- Direct map/reduce exports
- Pointer Map/Reduce Export
- A reader for directly querying a datastore in java/scala
- geomesa-tools (command line tools for interacting with geomesa)
- Serialize and export stored features (vectors)
- Pointer: ExportCommand.scala
- Supported export formats: CSV, shapefile, geojson, GML, BIN, Avro
- Serialize and export stored features (vectors)
- GeoMesa Native API
- An alternative to the geotools interface for interaction with GeoMesa stores
- Pointer: geomesa-native-api
- HBase backend
- Pointer: geomesa-hbase-datastore
- Google BigTable backend
- Pointer: geomesa-bigtable-datastore
- BLOB backend
- Pointer: geomesa-blobstore
- Sampling of data for custom statistics
- Example of sampling query
- geomesa-cassandra
- Back a geomesa datastore with cassandra
- cassandra datastore
- Docs describe this feature as 'alpha' quality currently
- Back a geomesa datastore with cassandra
- geomesa-kafka
- Use kafka backed geomesa datastore to pipe simplefeature types from producers, through kafka, to consumers
- Details can be found here
- Metrics reporting
- Pointer: geomesa-metrics
- Real time reporting of performance for GeoMesa instances. Supports multiple reporting backends - Ganglia, Graphite, and CSV/TSV