The following document uses a hierarchical structure of features - from high level descriptions of general functionality to the subfeatures of which those general functions are composed.

Data Ingest/Input

geomesa-tools (command line tools for interacting with geomesa)
- Creating a geomesa datastore for accumulo
  - Pointer: CreateCommand.scala
  - Behavior tested; works
- Ingest vectors, provided a GeoMesaInputFormat
  - Pointer: IngestCommand.scala
  - Predefined, common SimpleFeatureTypes are provided - gdelt, geolife, geonames, gtd, nyctaxi, osm-gpx, tdrive, twitter
  - Behavior tested; works
- Ingest rasters
  - Pointer: IngestRasterCommand.scala
  - Supported file formats: "tif", "tiff", "geotiff", "dt0", "dt1", "dt2"
geomesa-convert (tools for converting various serialization formats to SimpleFeatures for ingest - conversion mechanisms are specified by way of configuration files)
- delimited text (usually CSV/TSV)
  - Pointer: DelimitedTextConverter.scala
  - Currently supported formats: "CSV" | "DEFAULT", "EXCEL", "MYSQL", "TDF" | "TSV" | "TAB", "RFC4180", "QUOTED", "QUOTE_ESCAPE", "QUOTED_WITH_QUOTE_ESCAPE". $1 through $n for n values per line ($0 refers to the entire line).
- fixed width
  - Pointer: FixedWidthConverters.scala
- avro
  - Pointer: geomesa-convert-avro
- json
  - Pointer: geomesa-convert-json
- xml
  - Pointer: geomesa-convert-xml
geomesa-stream (support for streaming input)
- A datastore which listens for updates from a source which meets certain conditions
  - Pointer: StreamDataStore.scala
  - A generic apache-camel based implementation](https://github.com/locationtech/geomesa/blob/b7056fae4988ef524913bf3dc33d9ff2a3476b09/geomesa-stream/geomesa-stream-generic/src/main/scala/org/locationtech/geomesa/stream/generic/GenericSimpleFeatureStreamSourceFactory.scala)
- Hooks for updating GeoServer on stream update
  - Pointer: stub pomfile
Storm/Kafka ingest (mentioned in [Other Features](#Other Features) below)

Data Processing

geomesa-compute
- Generating RDDs of SimpleFeatures
  - Pointer: GeoMesaSpark.scala
  - Capable of querying with CQL to fill an RDD with some subset of your data
  - Carrying out spark SQL queries to process geomesa data
    - Pointer: GeoMesaSparkSql.scala
    - When constructing a spark context, "yarn-client" is set to be the master, which isn't always a good assumption
    - As of 7/12/16, some stubbed out functions remain in the GeoMesaDataContext
geomesa-jobs
- Reading data for use in a custom M/R job
  - Pointer: geomesa-jobs mapreduce
  - Pointer: geomesa-jobs mapred
  - Apparently capable of reading from any GeoMesa DataStore as well as from the filesystem with or without avro files specifying the details of the conversion.
geomesa-process - (On Accumulo backed GeoMesa instances only - with the possible exception of Point2Point and DensityProcess, based on file locations and accumulo imports within said files. All processes are registered in https://github.com/locationtech/geomesa/blob/b7056fae4988ef524913bf3dc33d9ff2a3476b09/geomesa-process/src/main/scala/org/locationtech/geomesa/process/ProcessFactory.scala)
- computing a heatmap from a provided CQL query
  - Pointer: DensityProcess.scala
- Given CQL and a description of the stats of interest, compute said stats on said CQL results
  - Currently supported statistics: count, enumeration, frequency (countMinSketch), histogram, top-k, and min/max (bounds).
  - Command line tools expose the following statistics: count, histogram, min/max (bounds), and top-k
  - Pointer: StatsIteratorProcess.scala
- 'Tube selection' (space/time correlated queries)
  - Pointer: geomesa 'tube' queries
  - This is a pretty sophisticated query mechanism. The basic idea is that, given a collection of points (with associated times), you should be able to return similar collections of points (in terms of where the lines connecting said points exist). Constraints on the query include the size of the spatial and temporal buffers (this is the sense in which we're dealing with 'tubes') and maximum speed attained by the entity whose points make up a given trajectory. Read more here: http://www.geomesa.org/documentation/tutorials/geomesa-tubeselect.html
- Proximity Search
  - Pointer: ProximitySearchProcess.scala
  - Given a set of vectors to search through and a set of vectors to establish proximity, return the members of the former set which lie within the (specified) proximity of members of the latter set
- Query against an accumulo GeoMesa store
  - Pointer: QueryProcess.scala
  - Takes advantage of accumulo optimization to carry out geomesa queries
- Find the K nearest neighbors to a given point
  - Pointer: KNearestNeighborSearchProcess.scala
- Identify unique values for an attribute in results of a CQL query
  - Pointer: UniqueProcess.scala
- Convert points to lines
  - Pointer: Point2PointProcess.scala
  - Convert a collection of points into a collection of line segments given a middle term parameter. Optionally break on the day of occurrence. This feature isn't really advertised.

Indices

Default Indices
- XZ3 (GeoMesa 1.2.5+)
  - Pointer: XZ3IdxStrategy.scala
  - Notes: Default for objects with extent in GeoMesa 1.2.5. Objects are indexed with a maximum resolution of 36 bits (12 divisions into eighths).
- XZ2 (GeoMesa 1.2.5+)
  - Pointer: XZ2IdxStrategy.scala
  - Notes: Default for objects with extent in GeoMesa 1.2.5. Objects are indexed with a maximum resolution of 24 bits (12 divisions into quarters).
- Z3
  - Pointer: Z3IdxStrategy.scala
  - Notes: For points, X, Y, and Time have resolutions of 21, 21, and 20 bits, respectively.
- Z2
  - Pointer: Z2IdxStrategy.scala
  - Notes: For points, X and Y both have resolutions of 31 bits.
- Record
  - Pointer: RecordIdxStrategy.scala
  - Notes: This is an index over object UUIDs.
Optional Indices
- Attribute
  - Pointer: AttributeIdxStrategy.scala
  - Notes: This is an index over SimpleFeature attributes. One can create a join index over the UUID, date, and geometry or a full index.
- ST
  - Pointer: STIdxStrategy.scala
  - Notes: Spatio-Temporal Index? Deprecated?
Cost-Based Optimization (CBO) is used to select with index to use

Output

geomesa-accumulo
- A reader for directly querying a datastore in java/scala
  - Pointer: .getFeatureReader
  - This is the best bet for high speed accumulo reads, per the GeoMesa gitter.
- Produce a collection of features for a given datastore
  - Pointer: .getFeatureSource
  - Performance characteristics vs the above reader are unclear. This feature is used, however, in the command line export
- Direct map/reduce exports
  - Pointer Map/Reduce Export
geomesa-tools (command line tools for interacting with geomesa)
- Serialize and export stored features (vectors)
  - Pointer: ExportCommand.scala
  - Supported export formats: CSV, shapefile, geojson, GML, BIN, Avro

Other Features

GeoMesa Native API
- An alternative to the geotools interface for interaction with GeoMesa stores
- Pointer: geomesa-native-api
HBase backend
- Pointer: geomesa-hbase-datastore
Google BigTable backend
- Pointer: geomesa-bigtable-datastore
BLOB backend
- Pointer: geomesa-blobstore
Sampling of data for custom statistics
- Example of sampling query
geomesa-cassandra
- Back a geomesa datastore with cassandra
  - cassandra datastore
  - Docs describe this feature as 'alpha' quality currently
geomesa-kafka
- Use kafka backed geomesa datastore to pipe simplefeature types from producers, through kafka, to consumers
- Details can be found here
Metrics reporting
- Pointer: geomesa-metrics
- Real time reporting of performance for GeoMesa instances. Supports multiple reporting backends - Ganglia, Graphite, and CSV/TSV

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GeoMesa_Feature_List.md

GeoMesa_Feature_List.md

Data Ingest/Input

Data Processing

Indices

Output

Other Features

Files

GeoMesa_Feature_List.md

Latest commit

History

GeoMesa_Feature_List.md

File metadata and controls

Data Ingest/Input

Data Processing

Indices

Output

Other Features