Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Input Features to Explore #1

Open
4 of 7 tasks
jonfroehlich opened this issue Jan 10, 2019 · 21 comments
Open
4 of 7 tasks

Potential Input Features to Explore #1

jonfroehlich opened this issue Jan 10, 2019 · 21 comments

Comments

@jonfroehlich
Copy link
Member

jonfroehlich commented Jan 10, 2019

In this thread, I'd love to list, discuss, and prioritize different input features for the ML model. We can always create new Issues once we start diving down individual paths.

  • Depth map
  • Street topology (e.g., where a label seems to be placed in a street network)
    -- [ ] The first input feature is a normalized % distance from the nearest endpoint to the middle of the street segment
    -- [ ] The second input feature is the raw distance from the nearest endpoint
    -- [ ] Binary label of intersection or not (this is subsumed by the features above so not necessary)
  • x,y label position information
  • Building age
  • Zoning category
  • Geography related features like distance from center point of city, angle from center point of city, neighborhood, and quadrant
  • ... lots more but I wanted to get the list started!
@galenweld
Copy link
Collaborator

X,Y position information and depth will be the easiest to add first, especially depth. Those are clearly easy to convert directly to features. With street topology, we'll need to give some thought as to how to convert to features, and we'll almost certainly want to experiment with this, as well. @jonfroehlich, you said that you had some street topology already computed. What format is that in, and what does it contain?

@jonfroehlich
Copy link
Member Author

Agree with everything you said.

Best to probably have an in-person brainstorm/discussion about street topology stuff. At the very least, the easiest thing (and perhaps the most critical) might be simply percentage position on a street segment and whether a label is in an intersection (which might be captured by percentage position).

By percentage position, I mean simply take every street segment (see CHI'19 paper for definition), cut it in half, and use the lat/lng position of the label and the two intersections to calculate percentage position where both intersections are 0% and the halfpoint is 100%. Does that make sense? We may want to use discretized bins of like 15% or so.

@galenweld
Copy link
Collaborator

Based on our in-person discussion today, we can also consider adding features such as building age and zoning category.

@galenweld
Copy link
Collaborator

We discussed in person about the important of distinguishing between intersection/non-intersection panos, which I think is best represented by including the percentage-towards-intersection value described above.

@jonfroehlich
Copy link
Member Author

For the intersection stuff, Mikey says this is not currently tracked but likely should be in the future. We may have to ask Anthony, Manaswi, or Kotaro.

@galenweld
Copy link
Collaborator

Per an email conversation with Anthony, we're now working on getting the intersection position features added. X,Y and heading are already exported alongside the imagery in the new dataset generation code. I'm currently working on writing a custom PyTorch dataloader that will be flexible and extensible in our incorporation of the features described here, as well as any we choose to add in the future.

We will at some point probably want to experiment with how we incorporate the extra features into the network architecture, and I'm trying to set up a meeting with Joe Redmon to discuss this further. Will update this thread when I have content to report from that, but for now I'm just appending the extra features to the image vector - worth noting that with this approach we will not be able to use the pretrained resnet architecture.

@jonfroehlich
Copy link
Member Author

I wanted to put in more information about some simple street segment location features, so I've copied the full email I sent to Anthony et al. below.

We were discussing two potential different input features to our model, which relate to the location of the labeled panorama on a street segment with respect to its two endpoints (intersections). I have attached an example figure, which I hope is helpful.

image

In the example above, the labeled panos are in gray and occur at three points along a single street segment. We divide the segment in half so that we are tracking distance from the endpoint.

  • The first input feature is a normalized % distance from the nearest endpoint to the middle of the street segment
  • The second input feature is the raw distance from the nearest endpoint

Both inputs are listed under the pano labels after the bold Loc. Does this make sense?

@galenweld
Copy link
Collaborator

galenweld commented Mar 2, 2019

I had another excellent meeting with Joe Redmon yesterday, and wanted to document here what we discussed regarding methods for incorporating additional features into resent, while still being able to use the pretrained weights and not needing to train from scratch.

Broadly, we discussed two approaches:

The first approach was to "widen" the standard resnet architecture to include the new features. The "widened" section of the net would be initialized with either random weights, or random weights chosen from the existing pretrained weights.

To do this, Joe recommended normalizing all new features to the same [0...1] range that the RGB channels are. For current 7x7x3 filters, we'd change each filter to be 7x7x(3+# of new features/channels). Doing this approach, Joe believed, would result in the best possible performance but would be the most difficult to implement, as it would require essentially re-implementing resnet from scratch.

The second, simpler approach we discussed was to use resnet as a 'feature extractor' to learn a feature vector of, say, length 1024 from the images, and then concatenate our new features onto the end of this image feature vector, and feed the combined vector through a few more hidden layers. This is much more straightforward to implement, so I'm planning on tackling this first.

A few more additional suggestions from the meeting, in no particular order:

  • Any feature expressed in degrees (such as heading) should be decomposed into x and y components using sin and cos, instead of feeding the raw heading into the system. This is because while 359 degrees and 0 degrees are numerically quite far apart, they are in 2d space very close, and having almost-correct answers be numerically far apart throws off the error-squared computation dramatically.

  • At some point, we could attempt to re-map our 3-channel RGB images + depth data to a single 4 channel RGBD image, and extract sliding window crops from the 4 channel image, instead of the just taking a crop from the 3 channel image and annotating the depth at the crop's center. This should improve performance, but again, significantly complexifies the pipeline, especially as the depth data is of a lower resolution than the RGB data, so interpolation would be required, which would be computationally intensive when run on the entire dataset.

Joe also (very kindly) offered to help troubleshoot my current Google Compute Engine woes, which have been dramatically slowing the speed of our training. Planning on circling back with him on that on Monday.

@galenweld
Copy link
Collaborator

sine + cosine decomposition for the resnet "feature extractor" approach has been implemented, along (of course) with the rest of the "feature extractor" modified network architecture. The extra features (other than the imagery) are converted to a 7-channel tensor and added into the final fully connected layer of the network, alongside the 512 features outputted by the resnet.

See this hasty and colorful diagram of the new architecture:

modified_resnet_diagram 001

@jonfroehlich
Copy link
Member Author

jonfroehlich commented Mar 9, 2019 via email

@galenweld
Copy link
Collaborator

I spent some more time scheming about additional demographic features to potentially include. A few thoughts....

Of course, there are many sources of data out there. Many municipalities have their own local databases, many of which are publicly query-able. However, the more specific we get to each city, the more difficult it becomes to port models from one city to another (for example, if the model from DC includes data that isn't readily available for, say, Newberg). As such, I would imagine it makes sense to limit our features to data available universally, and of course, the best set of universal data for the US comes from the US census. The census provides lots and lots of useful measurements, which you can play around with a bit here. Some of those potentially useful features are:

  • Median Age
  • Percent Male/Female
  • Percent composed of a certain race
  • Percent Family
  • Rental and Homeowner Vacancy Rate
  • Percent Occupied Housing Units

All of this data is available for the entirety of the United States. Outside of the US it gets a bit trickier (for instance, if we wanted to include Tohme crops from Saskatchewan.

This data is all available at the Census Tract level, which is fairly high resolution, but not enormously so. The only higher resolution datasets that I found are limited to specific cities, and even then, those datasets are hard to find (for things like population density, etc). To give a sense of the resolution of data at the census tract level, Washington DC is divided into 179 census tracts. The city website provides a map. So within DC, for example, we would not be able to differentiate between two points in the same census tract by census data alone - it remains to be seen how significant a limitation this is.

All this raises the question of the cost-benefit analysis of going ahead and building a system that will allow us to query census data given a lat/long. I haven't spent too much time digging into the details (ie is it easier to download all the data or query an api, etc) but it doesn't seem like it should be too bad. As I mentioned to Jon in person yesterday, my hypothesis is that no matter the resolution, these demographic factors won't make a dramatic difference in the performance of our CV system, but are potentially worth discussing in the paper, even if we don't implement them.

@jonfroehlich
Copy link
Member Author

Thanks @galenweld for thinking about this further.

I still feel like we should capture the lower hanging fruit first--like the things we discussed in our meeting this week--including: street-related features (i.e., where a label is placed with respect to a street segment) and geographic position features (i.e., where a label is placed in the city). I'm also interested in more extrinsic features like real-estate pricing, zoning category, and socio-economic data. I think zoning category would be relatively easy to get (we actually used it in our CHI'19 paper and performed some analysis exploring accuracy as a function of zone).

While I agree with the general point that we should weight input features that are universally available (at least in the US), I am also, in principle, not really against investigating features only available in a given city. However, we don't really have time to do this for the ASSETS push imo.

@galenweld
Copy link
Collaborator

@jonfroehlich I agree wholeheartedy, and both street segment positioning as well as geographic positioning are on the list to be added as I re-run the metadata for the dataset. I'm only broaching this topic before I do so, as that will take a day or two to run and so I was simply wondering if it makes sense to try and wrap those items in during this round.

However, it sounds like I should hold off for now, and we can always revisit them later.

@galenweld
Copy link
Collaborator

An update on this front:

I've now written code to incorporate the following additional features, with thanks to @tongning:

  • latitude
  • longitude
  • block "middleness"
  • distance to end of block
  • distance from downtown
  • heading to downtown

Currently, I'm looping over the dataset and adding these additional features. Will let the run over the weekend, and then when that's finished I'll upload to the VM and tweak the model to run using them.

This will raise one design question: For a number of reasons, primarily bad XML data in the GSV scrapes, some small percentage of the panos in our dataset have no latitude and longitude information, which makes it impossible to compute these new features. Currently, I'm just writing NaNs to their sidecar files in this case, but when we tweak the model, we'll need to decide how we want to handle these panos that don't have this information.

I don't think it'll make a big performance difference, given the limited impact that the extra features have had so far, but worth considering a bit.

Possible options are to skip panos with bad data (will have an accurate number on what fraction of the dataset this is when I finish computing the new features), to hardcode some backup value or compute one via a different heuristic, or to try and come up with some way of encoding NaN into the model.

@galenweld
Copy link
Collaborator

Update: I got errors getting lat/long or block position info for 2,925 panos out of the 57,446 total, so ~5.09%.

@jonfroehlich
Copy link
Member Author

Ugh, that's too bad. Glad to know it's only ~5%. What should we do in these cases? I'd prefer not to skip these panos... can we just encode a null value for those input features?

Can you clarify what you are using for 'downtown' in distance from downtown and heading to downtown. Is this the center point of the DC? If so, how did you calculate this?

Can you also expand on how you calculated block 'middleness' and distance to end of block? Was this similar to our discussions?

@galenweld
Copy link
Collaborator

galenweld commented Apr 6, 2019 via email

@jonfroehlich
Copy link
Member Author

Got it. Thanks. Can you expand on:

We just need to figure out the best way to input that to the model.

@galenweld
Copy link
Collaborator

galenweld commented Apr 7, 2019 via email

@jonfroehlich
Copy link
Member Author

jonfroehlich commented Apr 7, 2019 via email

@jonfroehlich
Copy link
Member Author

This is one of those brain dump threads that are hard to close out... so I suggest keeping it open and continuing to discuss new input features. We could also split out discussions of particular input features into their own Issue in future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants