-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential Input Features to Explore #1
Comments
X,Y position information and depth will be the easiest to add first, especially depth. Those are clearly easy to convert directly to features. With street topology, we'll need to give some thought as to how to convert to features, and we'll almost certainly want to experiment with this, as well. @jonfroehlich, you said that you had some street topology already computed. What format is that in, and what does it contain? |
Agree with everything you said. Best to probably have an in-person brainstorm/discussion about street topology stuff. At the very least, the easiest thing (and perhaps the most critical) might be simply percentage position on a street segment and whether a label is in an intersection (which might be captured by percentage position). By percentage position, I mean simply take every street segment (see CHI'19 paper for definition), cut it in half, and use the lat/lng position of the label and the two intersections to calculate percentage position where both intersections are 0% and the halfpoint is 100%. Does that make sense? We may want to use discretized bins of like 15% or so. |
Based on our in-person discussion today, we can also consider adding features such as building age and zoning category. |
We discussed in person about the important of distinguishing between intersection/non-intersection panos, which I think is best represented by including the percentage-towards-intersection value described above. |
For the intersection stuff, Mikey says this is not currently tracked but likely should be in the future. We may have to ask Anthony, Manaswi, or Kotaro. |
Per an email conversation with Anthony, we're now working on getting the intersection position features added. X,Y and heading are already exported alongside the imagery in the new dataset generation code. I'm currently working on writing a custom PyTorch dataloader that will be flexible and extensible in our incorporation of the features described here, as well as any we choose to add in the future. We will at some point probably want to experiment with how we incorporate the extra features into the network architecture, and I'm trying to set up a meeting with Joe Redmon to discuss this further. Will update this thread when I have content to report from that, but for now I'm just appending the extra features to the image vector - worth noting that with this approach we will not be able to use the pretrained resnet architecture. |
I wanted to put in more information about some simple street segment location features, so I've copied the full email I sent to Anthony et al. below. We were discussing two potential different input features to our model, which relate to the location of the labeled panorama on a street segment with respect to its two endpoints (intersections). I have attached an example figure, which I hope is helpful. In the example above, the labeled panos are in gray and occur at three points along a single street segment. We divide the segment in half so that we are tracking distance from the endpoint.
Both inputs are listed under the pano labels after the bold Loc. Does this make sense? |
I had another excellent meeting with Joe Redmon yesterday, and wanted to document here what we discussed regarding methods for incorporating additional features into resent, while still being able to use the pretrained weights and not needing to train from scratch. Broadly, we discussed two approaches: The first approach was to "widen" the standard resnet architecture to include the new features. The "widened" section of the net would be initialized with either random weights, or random weights chosen from the existing pretrained weights. To do this, Joe recommended normalizing all new features to the same [0...1] range that the RGB channels are. For current 7x7x3 filters, we'd change each filter to be 7x7x(3+# of new features/channels). Doing this approach, Joe believed, would result in the best possible performance but would be the most difficult to implement, as it would require essentially re-implementing resnet from scratch. The second, simpler approach we discussed was to use resnet as a 'feature extractor' to learn a feature vector of, say, length 1024 from the images, and then concatenate our new features onto the end of this image feature vector, and feed the combined vector through a few more hidden layers. This is much more straightforward to implement, so I'm planning on tackling this first. A few more additional suggestions from the meeting, in no particular order:
Joe also (very kindly) offered to help troubleshoot my current Google Compute Engine woes, which have been dramatically slowing the speed of our training. Planning on circling back with him on that on Monday. |
sine + cosine decomposition for the resnet "feature extractor" approach has been implemented, along (of course) with the rest of the "feature extractor" modified network architecture. The extra features (other than the imagery) are converted to a 7-channel tensor and added into the final fully connected layer of the network, alongside the 512 features outputted by the resnet. See this hasty and colorful diagram of the new architecture: |
Neat. Can’t wait to see some results. :)
…Sent from my iPhone
On Mar 8, 2019, at 4:19 PM, Galen Weld ***@***.***> wrote:
sine + cosine decomposition for the resnet "feature extractor" approach has been implemented, along (of course) with the rest of the "feature extractor" modified network architecture. The extra features (other than the imagery) are converted to a 7-channel tensor and added into the final fully connected layer of the network, alongside the 512 features outputted by the resnet.
See this hasty and colorful diagram of the new architecture:
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I spent some more time scheming about additional demographic features to potentially include. A few thoughts.... Of course, there are many sources of data out there. Many municipalities have their own local databases, many of which are publicly query-able. However, the more specific we get to each city, the more difficult it becomes to port models from one city to another (for example, if the model from DC includes data that isn't readily available for, say, Newberg). As such, I would imagine it makes sense to limit our features to data available universally, and of course, the best set of universal data for the US comes from the US census. The census provides lots and lots of useful measurements, which you can play around with a bit here. Some of those potentially useful features are:
All of this data is available for the entirety of the United States. Outside of the US it gets a bit trickier (for instance, if we wanted to include Tohme crops from Saskatchewan. This data is all available at the Census Tract level, which is fairly high resolution, but not enormously so. The only higher resolution datasets that I found are limited to specific cities, and even then, those datasets are hard to find (for things like population density, etc). To give a sense of the resolution of data at the census tract level, Washington DC is divided into 179 census tracts. The city website provides a map. So within DC, for example, we would not be able to differentiate between two points in the same census tract by census data alone - it remains to be seen how significant a limitation this is. All this raises the question of the cost-benefit analysis of going ahead and building a system that will allow us to query census data given a lat/long. I haven't spent too much time digging into the details (ie is it easier to download all the data or query an api, etc) but it doesn't seem like it should be too bad. As I mentioned to Jon in person yesterday, my hypothesis is that no matter the resolution, these demographic factors won't make a dramatic difference in the performance of our CV system, but are potentially worth discussing in the paper, even if we don't implement them. |
Thanks @galenweld for thinking about this further. I still feel like we should capture the lower hanging fruit first--like the things we discussed in our meeting this week--including: street-related features (i.e., where a label is placed with respect to a street segment) and geographic position features (i.e., where a label is placed in the city). I'm also interested in more extrinsic features like real-estate pricing, zoning category, and socio-economic data. I think zoning category would be relatively easy to get (we actually used it in our CHI'19 paper and performed some analysis exploring accuracy as a function of zone). While I agree with the general point that we should weight input features that are universally available (at least in the US), I am also, in principle, not really against investigating features only available in a given city. However, we don't really have time to do this for the ASSETS push imo. |
@jonfroehlich I agree wholeheartedy, and both street segment positioning as well as geographic positioning are on the list to be added as I re-run the metadata for the dataset. I'm only broaching this topic before I do so, as that will take a day or two to run and so I was simply wondering if it makes sense to try and wrap those items in during this round. However, it sounds like I should hold off for now, and we can always revisit them later. |
An update on this front: I've now written code to incorporate the following additional features, with thanks to @tongning:
Currently, I'm looping over the dataset and adding these additional features. Will let the run over the weekend, and then when that's finished I'll upload to the VM and tweak the model to run using them. This will raise one design question: For a number of reasons, primarily bad XML data in the GSV scrapes, some small percentage of the panos in our dataset have no latitude and longitude information, which makes it impossible to compute these new features. Currently, I'm just writing NaNs to their sidecar files in this case, but when we tweak the model, we'll need to decide how we want to handle these panos that don't have this information. I don't think it'll make a big performance difference, given the limited impact that the extra features have had so far, but worth considering a bit. Possible options are to skip panos with bad data (will have an accurate number on what fraction of the dataset this is when I finish computing the new features), to hardcode some backup value or compute one via a different heuristic, or to try and come up with some way of encoding NaN into the model. |
Update: I got errors getting lat/long or block position info for 2,925 panos out of the 57,446 total, so ~5.09%. |
Ugh, that's too bad. Glad to know it's only ~5%. What should we do in these cases? I'd prefer not to skip these panos... can we just encode a null value for those input features? Can you clarify what you are using for 'downtown' in Can you also expand on how you calculated |
I'm using the center of the White House right now, but can use any
arbitrary point. Takes about 20 minutes to redo that one, so let me know
where you'd prefer.
For middleness and distance to end of block, that's exactly what we
discussed. I'm using Anthony's code for that.
And yes, we can encode a null value, that's what I'm doing at the moment.
We just need to figure out the best way to input that to the model.
…On Sat, Apr 6, 2019 at 6:17 AM Jon Froehlich ***@***.***> wrote:
Ugh, that's too bad. Glad to know it's only ~5%. What should we do in
these cases? I'd prefer not to skip these panos... can we just encode a
null value for those input features?
Can you clarify what you are using for 'downtown' in distance from
downtown and heading to downtown. Is this the center point of the DC? If
so, how did you calculate this?
Can you also expand on how you calculated block 'middleness' and distance
to end of block? Was this similar to our discussions?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAV5yiqh4a0_GswCG5jsNcoHuy8Wx0QLks5veJ57gaJpZM4Z5mMm>
.
|
Got it. Thanks. Can you expand on:
|
Sure. The neural net takes as input a vector of size 244*244*3 + #extra
features. All of those features have a numeric value that we normalize to
the range [0,1].
For the features that we are missing data for, we encode a 'NaN' but we
can't input a 'NaN' into the neural net because one can't perform any
computation on NaN values. So we need to figure out how to either a) pick a
discrete numerical value to use instead of NaN or b) modify the neural
networks architecture to ignore those features in the case that they're NaN
values, which now that I think about it may be straightforward and the best
option. I'll experiment with that as well as ping Joe Redmon to see if he
has any suggestions as well.
…On Sun, Apr 7, 2019, 06:07 Jon Froehlich ***@***.***> wrote:
Got it. Thanks. Can you expand on:
We just need to figure out the best way to input that to the model.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAV5yutA7WyRNQkfKA0uFLn7ci9vKO2Bks5vee2bgaJpZM4Z5mMm>
.
|
Got it. Thanks for the additional explanation.
…Sent from my iPhone
On Apr 7, 2019, at 10:23 AM, Galen Weld ***@***.***> wrote:
Sure. The neural net takes as input a vector of size 244*244*3 + #extra
features. All of those features have a numeric value that we normalize to
the range [0,1].
For the features that we are missing data for, we encode a 'NaN' but we
can't input a 'NaN' into the neural net because one can't perform any
computation on NaN values. So we need to figure out how to either a) pick a
discrete numerical value to use instead of NaN or b) modify the neural
networks architecture to ignore those features in the case that they're NaN
values, which now that I think about it may be straightforward and the best
option. I'll experiment with that as well as ping Joe Redmon to see if he
has any suggestions as well.
On Sun, Apr 7, 2019, 06:07 Jon Froehlich ***@***.***> wrote:
> Got it. Thanks. Can you expand on:
>
> We just need to figure out the best way to input that to the model.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#1 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AAV5yutA7WyRNQkfKA0uFLn7ci9vKO2Bks5vee2bgaJpZM4Z5mMm>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
This is one of those brain dump threads that are hard to close out... so I suggest keeping it open and continuing to discuss new input features. We could also split out discussions of particular input features into their own Issue in future. |
In this thread, I'd love to list, discuss, and prioritize different input features for the ML model. We can always create new Issues once we start diving down individual paths.
-- [ ] The first input feature is a normalized % distance from the nearest endpoint to the middle of the street segment
-- [ ] The second input feature is the raw distance from the nearest endpoint
-- [ ] Binary label of intersection or not (this is subsumed by the features above so not necessary)
The text was updated successfully, but these errors were encountered: