Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create inverse_streetTalk() function (02/28/18) #10

Open
dmarulli opened this issue Feb 6, 2018 · 11 comments
Open

Create inverse_streetTalk() function (02/28/18) #10

dmarulli opened this issue Feb 6, 2018 · 11 comments

Comments

@dmarulli
Copy link
Contributor

dmarulli commented Feb 6, 2018

This function should receive a street_name, a street_from, and a street_to; and return a lat/lng pair

@dmarulli dmarulli changed the title Create inverse StreetTalk() function Create inverse streetTalk() function Feb 6, 2018
@dmarulli dmarulli changed the title Create inverse streetTalk() function Create inverse_streetTalk() function Feb 6, 2018
@vr00n
Copy link
Member

vr00n commented Feb 8, 2018

@dmarulli
Copy link
Contributor Author

dmarulli commented Feb 8, 2018

LEX may be useful for #8.

This is an intermediate step that only deals with text.

@dmarulli dmarulli changed the title Create inverse_streetTalk() function Create inverse_streetTalk() function (02/28/18) Feb 14, 2018
@YukunVVan
Copy link
Collaborator

@dmarulli very rough version of inverse_streetTalk() function has been pushed!

The logic I use is to find the three datasets contain names of on_street, from_street and to_street. Then I find the intersections of these three sets and return an index in on_street. Finally return x,y in centroid of that linestring.

@dmarulli
Copy link
Contributor Author

dmarulli commented Mar 8, 2018

@YukunVVan okay, finally had a chance to look at the code--a lot of stuff in the works over here.

Great stuff. Here are some comments:

  • networkType doesn't seem to be defined yet, so perhaps that needs to be an input parameter.

  • I think we'll have to get just a bit fancier with our approach here. Currently, inverse_streetTalk("8th Avenue", "West 48th Street", "West 49th Street", "New York City") returns the correct point while inverse_streetTalk("8th avenue", "west 48th Street", "west 49th Street", "New York City") returns None. One way to handle this that comes to mind is to use fuzzy matching. (cc: @vr00n )

@YukunVVan
Copy link
Collaborator

Thanks for comments. Will try package fuzzywuzzy to solve the second issue.

@dmarulli
Copy link
Contributor Author

For sure. Sounds good.

@patwater
Copy link

@YukunVVan any questions or need any assistance?

@YukunVVan
Copy link
Collaborator

YukunVVan commented Mar 22, 2018

@dmarulli @patwater I have tried several methods to solve the fuzzy matching issue:
1/ For case inverse_streetTalk("8th avenue", "west 48th Street", "west 49th Street", "New York City") , we can easily apply .lower() before the compare.
2/ For case inverse_streetTalk("avenue 8th", "Street west 48th", "west 49th Street", "New York City"), fuzzywuzzy helps.
For example:
fuzz.token_sort_ratio("avenue 8th","8th avenue") = 100
So, we can write

on_st = citymap_edge[citymap_edge['name'].apply(lambda x: 
                         fuzz.token_sort_ratio(x.lower(),street_name) == 100) ]

However, this function increases the running time. Use the method above cost 5s each search. If use on_st = citymap_edge[citymap_edge['name'].str.lower() == street_name], it only cost 160ms.

With fuzzywuzzy, we can get the ratio of the similarity of two strings. However, it's hard for us to set a threshold. Some examples:

fuzz.token_sort_ratio("vallejo Villas Street","vallejo Villas street") = 100
fuzz.token_sort_ratio("vallejo Villas Street","vallejo Villas") = 80
fuzz.token_sort_ratio("west 8th street","west 8th") = 70
fuzz.token_sort_ratio("west 8th street","west 9th street") = 93
fuzz.token_sort_ratio("west 8th street","west 14th street") = 90

Do you have any advice on how to utilize this function? What kind of fuzzy matching we're going to solve?

@patwater
Copy link

Hmmmm this is a tricky tradeoff. Thanks for laying this out so clearly @YukunVVan . I am not sure how to handle the time speed though would note premature optimization is the root of all evil.

In terms of operationalizing this, the API seems like the key path forward. How is #11 coming along @vr00n @dmarulli ?

In terms of the threshold one, initial idea would be to set at 85. Note "west 8th" and Vallejo Villas" probably shouldn't be matched as there can often be a "street" and "avenue" with the same name. Both "Vallejo Villas Avenue" and "Vallejo Villas Street" in the same city though may not be the case here

@vr00n
Copy link
Member

vr00n commented Mar 28, 2018

To clarify:

The goal here is to be able to get the centroid of OnStreet-FromStreet-ToStreet

A suggestion is to lookup the intersection of OnStreet, FromStreet and the intersection of OnStreet, ToStreet and then compute the centroid of the 2 points.

So for ("8th avenue", "west 48th Street", "west 49th Street", "New York City")

  1. Compute location of intersection 8th avenue and West 48th Street
  2. Compute location of intersection 8th avenue and West 49th Street
  3. Compute centroid of 1 and 2

@YukunVVan
Copy link
Collaborator

@vr00n Thanks for clarifying! I'm gonna update my code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants