-
Notifications
You must be signed in to change notification settings - Fork 3
ObjectID
Marrow Mongo contains an ObjectID implementation independent from the bson
package bundled with PyMongo, developed in "clean-room" isolation based on publicly available end-use documentation.
Implementations are provided of all known ObjectId
generation methods and interpretations, primarily as a mechanism to utilize or transition older IDs on modern systems, as well as to provide an option if you prefer the guarantees and information provided by older versions, moving forwards. Additionally, our variant permits explicit "hardware identification" by use of custom, fixed byte strings.
Being Python 3 specific, we are more strict about the type of string being passed. Where PyMongo's bson.ObjectId
permits hex-encoded bytes values, our ObjectID does not: binary values will only be interpreted as a raw binary ObjectID; no transformations will be applied. If you have IDs encoded in hexadecimal, use the textual string representation, str
, for them.
ObjectId
was originally[1] defined (< MongoDB 3.3) as a combination of:
- 4-byte UNIX timestamp.
- 3-byte machine identifier.
- 2-byte process ID.
- 3-byte counter with random IV ("initialization vector", or starting point) on process start.
The server itself never had a complex interpretation, treating the data after the timestamp as an "arbitrary hardware/node identifier" followed by counter. The documentation and client drivers were brought more in-line with this intended lack of structure[2] replacing the hardware and process identifiers with literal random data initialized on process startup. As such, the modern structure is now comprised of:
- 4-byte UNIX timestamp.
- 5-byte random process identifier. ("Random value" in the docs.)
- 3-byte counter with random IV on process start.
Additionally, the mechanism used to determine the hardware identifier has changed in the past. Initially it used a substring segment of the hex-encoded result of MD5 hashing the value returned by gethostname()
. For Federal Information Processing Standard (FIPS) [3] compliance, use of MD5 was eliminated and a custom FNV implementation added. We avoid embedding yet another hashing implementation in our own code and will instead utilize the fnv
package, if installed. (This will be automatically installed if your own application depends upon marrow.mongo[fips]
.) Without the library installed, the fips
choice will not be available.
To determine which approach is used for generation, specify the hwid
keyword argument to the ObjectID()
constructor. Possibilities include:
- The string
"legacy"
: use the host name MD5 substring value and process ID. Note if FIPS compliance is enabled, the"md5"
hash will literally be unavailable for use, resulting in the inability to utilize this choice. - The string
"fips"
: use the FIPS-compliant FNV hash of the host name, in combination with the current process ID. Requires thefnv
package be installed. - The string
"random"
: pure random bytes, the default, aliased asmodern
. - Any 5-byte bytes value: use the given HWID explicitly.
You are permitted to add additional entries to this mapping within your own application, if desired.
Unlike the PyMongo-supplied ObjectId
implementation, this does not use a custom Exception
subclass to represent invalid values. TypeError
will be raised if passed a value not able to be stringified, ValueError
if the resulting string is not 12 binary bytes or 24 hexadecimal characters. Warning: any 12-byte bytes
value will be accepted as-is.
Additional points of reference: