General questions on record linkage for live event data #2509

nabebaye · 2024-11-14T16:59:38Z

nabebaye
Nov 14, 2024

Hello,

First, I just want to give tremendous thanks for developing this helpful library and all the support you all provide. 

I am trying to link multiple tables of records that describe live events from different ticketing providers. Each table is a different source like SeatGeek, StubHub, Ticketmaster etc and from these sources, I extract the columns name, venue_name, timezone_local_date, timezone_local_time, and event type.

I have a few of questions regarding this problem

Since I expect that each event source has a single record for each event, is it possible to make it such that Splink knows that only a single record is expected to match from each table? Splink currently just gives match weights for all blocked comparisons. I think working this into the algorithm would likely improve match accuracy.
My comparison between the name and venue_name is a string similarity measure (Token Set Ratio), but treats the words as a set. Would it be appropriate (or even necessary) to apply the array tf computation described in A possible methodology for combining array fields with term frequency adjustments #2022 even though these are not array fields?
Some of my event sources give an ID of the matching record in other event sources. For instance, some records in SeatGeek table give IDs for StubHub. This is provided only by a few of the tables and only covers a partial list of the records. Although they are mostly accurate, these provided mappings are also sometimes erroneous. Would it be appropriate to use these to estimate m-values?
Is there a general resource or guide recommended on how to tune string comparison measures and comparison levels for string comparisons by data exploration or other empirical methods?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

General questions on record linkage for live event data #2509

{{title}}

Replies: 0 comments

Select a reply

General questions on record linkage for live event data #2509

nabebaye Nov 14, 2024

Replies: 0 comments

nabebaye
Nov 14, 2024