Skip to content

Latest commit

 

History

History
149 lines (139 loc) · 17.4 KB

GeoMesa_Feature_List.md

File metadata and controls

149 lines (139 loc) · 17.4 KB

The following document uses a hierarchical structure of features - from high level descriptions of general functionality to the subfeatures of which those general functions are composed.

Data Ingest/Input

Data Processing

  • geomesa-compute
    • Generating RDDs of SimpleFeatures
      • Pointer: GeoMesaSpark.scala
      • Capable of querying with CQL to fill an RDD with some subset of your data
      • Carrying out spark SQL queries to process geomesa data
        • Pointer: GeoMesaSparkSql.scala
        • When constructing a spark context, "yarn-client" is set to be the master, which isn't always a good assumption
        • As of 7/12/16, some stubbed out functions remain in the GeoMesaDataContext
  • geomesa-jobs
    • Reading data for use in a custom M/R job
      • Pointer: geomesa-jobs mapreduce
      • Pointer: geomesa-jobs mapred
      • Apparently capable of reading from any GeoMesa DataStore as well as from the filesystem with or without avro files specifying the details of the conversion.
  • geomesa-process - (On Accumulo backed GeoMesa instances only - with the possible exception of Point2Point and DensityProcess, based on file locations and accumulo imports within said files. All processes are registered in https://github.com/locationtech/geomesa/blob/b7056fae4988ef524913bf3dc33d9ff2a3476b09/geomesa-process/src/main/scala/org/locationtech/geomesa/process/ProcessFactory.scala)
    • computing a heatmap from a provided CQL query
    • Given CQL and a description of the stats of interest, compute said stats on said CQL results
      • Currently supported statistics: count, enumeration, frequency (countMinSketch), histogram, top-k, and min/max (bounds).
      • Command line tools expose the following statistics: count, histogram, min/max (bounds), and top-k
      • Pointer: StatsIteratorProcess.scala
    • 'Tube selection' (space/time correlated queries)
      • Pointer: geomesa 'tube' queries
      • This is a pretty sophisticated query mechanism. The basic idea is that, given a collection of points (with associated times), you should be able to return similar collections of points (in terms of where the lines connecting said points exist). Constraints on the query include the size of the spatial and temporal buffers (this is the sense in which we're dealing with 'tubes') and maximum speed attained by the entity whose points make up a given trajectory. Read more here: http://www.geomesa.org/documentation/tutorials/geomesa-tubeselect.html
    • Proximity Search
      • Pointer: ProximitySearchProcess.scala
      • Given a set of vectors to search through and a set of vectors to establish proximity, return the members of the former set which lie within the (specified) proximity of members of the latter set
    • Query against an accumulo GeoMesa store
      • Pointer: QueryProcess.scala
      • Takes advantage of accumulo optimization to carry out geomesa queries
    • Find the K nearest neighbors to a given point
    • Identify unique values for an attribute in results of a CQL query
    • Convert points to lines
      • Pointer: Point2PointProcess.scala
      • Convert a collection of points into a collection of line segments given a middle term parameter. Optionally break on the day of occurrence. This feature isn't really advertised.

Indices

Output

  • geomesa-accumulo
    • A reader for directly querying a datastore in java/scala
      • Pointer: .getFeatureReader
      • This is the best bet for high speed accumulo reads, per the GeoMesa gitter.
    • Produce a collection of features for a given datastore
      • Pointer: .getFeatureSource
      • Performance characteristics vs the above reader are unclear. This feature is used, however, in the command line export
    • Direct map/reduce exports
  • geomesa-tools (command line tools for interacting with geomesa)
    • Serialize and export stored features (vectors)
      • Pointer: ExportCommand.scala
      • Supported export formats: CSV, shapefile, geojson, GML, BIN, Avro

Other Features

  • GeoMesa Native API
    • An alternative to the geotools interface for interaction with GeoMesa stores
    • Pointer: geomesa-native-api
  • HBase backend
  • Google BigTable backend
  • BLOB backend
  • Sampling of data for custom statistics
  • geomesa-cassandra
    • Back a geomesa datastore with cassandra
  • geomesa-kafka
    • Use kafka backed geomesa datastore to pipe simplefeature types from producers, through kafka, to consumers
    • Details can be found here
  • Metrics reporting
    • Pointer: geomesa-metrics
    • Real time reporting of performance for GeoMesa instances. Supports multiple reporting backends - Ganglia, Graphite, and CSV/TSV