Create an 'id' field per card #63

Sembiance · 2015-08-03T12:21:38Z

A unique 'id' field per card should be added.

This has been requested a LOT by many people, as it would make importing into a database a lot easier. Also, this id field would be needed in order to implement #55 and #47

There are a number of different ways the id could be created. Some additional discussion and thought will be needed to pick the best way.

My initial hunch is a hash of (setCode + cardName + imageName) would work. Depending on the type of hash used though the id field might be kinda long, which I may want to avoid. This sort of 'predictable hash' would be needed so if anyone were to generate all the files from scratch the id's would end up being the same.

ZeldaZach · 2015-08-03T19:12:09Z

You could potentially md5 the desired combination (setCode + cardName + imageName) as that's always a predictable length, 32.

You could then truncate a portion of the result (maybe down to only 15 characters), albeit it does increase the risk for collisions. I've yet seen a cockatrice hash collide, so this doesn't seem like a terrible way to go.

Snawboll · 2015-08-06T03:21:16Z

Hey, I'm a dev for www.ozguild.co and we have been using your database to create unique hashes already. I just sent you an email about some changes to Collectors edition basic lands which messed with our system (multiple cards have the exact same details). We would be happy to let you know how we have been doing it and working with you to get this implemented as it would be very useful to us

ZeldaZach · 2015-08-06T03:22:26Z

Sounds awesome @RavenousWolf :)

Sembiance · 2015-08-18T14:26:08Z

Ok, an 'id' field will be coming soon to mtgjson.

It is composed of sha1(setCode + cardName + imageName)

phrozen · 2015-09-14T19:24:37Z

I'm doing a bit of work with the foreign names and I just found that this "pronto" solution to de UUID of any card might not be the best. First, it uses an arbitrary string field ('imageName') which is already deprecated (even if its still supported). Second it uses a 160bit cryptographic hash for something as a uuid in a set with (at most) 200k records (I'm exagerating and even counting like there is every translation for every print), this is not only not efficient, it is exagerated in most aspects and can't be reproduced without knowledge of this specific environment.

IMHO an ID should represent a card by itself in its printing form, though it must include the fields that better represent that card (physical or virtual), in the most straightforward and simple way, those fields are:

SET
NAME
LANGUAGE
NUMBER (this to avoid issue Cards with alternate art have duplicate foreignNames entries #70 only with basic lands) or a variation art identifier in case of Alternate Art cards

This 4 fields can represent ANY card (even if number is missing it should not collide in it's set ie. Promos) ever printed, and this 4 fields can be obtained without any external reference (like this project and the imageName field) so they are a more universal way of describing that specific card.

On the other point, even if SHA1 is a very well known Hashing function, this is definitely not it's use, first of all is a correct cryptographic hashing algorithm, which makes it pretty secure, but also slow, and it produces a 160bits Sum which is actually HUGE for a set of records less than 0.0001% the size. I'm already doing collision tests using 32 and 64bit NON-Cryptographic functions like Fast-Hash, SipHash, Murmur3 (I recommend this one) using the fields provided above, NAME+SET+LANG+NUMBER? and it's working perfectly with 32bit Sums (a single integer) although I have found a lot of collisions on alternate art cards. So I'm currently trying to identify those.

Thoughts?

Sembiance · 2015-09-14T19:43:21Z

@phrozen

The four fields you mention are not unique enough to represent a card.

Early sets do not have card numbers, so LEA for example has two swamps that have identical data, even the same artist, but they have different art so they are different cards.

This is why the imageName field exists. Yes, I agree it is a mtgjson specific field, but it's used in the id, and other apps/websites use it too for uniqueness purposes, so therefore I'll never be dropping support for it.

As for language, gatherer lacks foreign language info on many sets and mtgjson does not yet support multiple languages anyways (Issue #40) Still, when the day comes to add full foreign language support, I've detailed how I'll handle it in Issue #40 in regards to the id field so existing id's will not change.

As far as the choice of hash function and the size of the id field, I don't necessarily disagree with you on that.
However, the 'id' field has already gone live, and tons of websites and apps utilize mtgjson. Undoubtedly several have already started relying on the 'id' field and changing it at this stage would make life very painful for those folks. So I won't be changing the id field at this point.

Sembiance added the enhancement label Aug 3, 2015

Sembiance mentioned this issue Aug 3, 2015

Add tokens #47

Closed

Sembiance closed this as completed in 560f550 Aug 18, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create an 'id' field per card #63

Create an 'id' field per card #63

Sembiance commented Aug 3, 2015

ZeldaZach commented Aug 3, 2015

Snawboll commented Aug 6, 2015

ZeldaZach commented Aug 6, 2015

Sembiance commented Aug 18, 2015

phrozen commented Sep 14, 2015

Sembiance commented Sep 14, 2015

Create an 'id' field per card #63

Create an 'id' field per card #63

Comments

Sembiance commented Aug 3, 2015

ZeldaZach commented Aug 3, 2015

Snawboll commented Aug 6, 2015

ZeldaZach commented Aug 6, 2015

Sembiance commented Aug 18, 2015

phrozen commented Sep 14, 2015

Sembiance commented Sep 14, 2015