Potential Input Features to Explore #1

jonfroehlich · 2019-01-10T14:23:11Z

In this thread, I'd love to list, discuss, and prioritize different input features for the ML model. We can always create new Issues once we start diving down individual paths.

Depth map
Street topology (e.g., where a label seems to be placed in a street network)
-- [ ] The first input feature is a normalized % distance from the nearest endpoint to the middle of the street segment
-- [ ] The second input feature is the raw distance from the nearest endpoint
-- [ ] Binary label of intersection or not (this is subsumed by the features above so not necessary)
x,y label position information
Building age
Zoning category
Geography related features like distance from center point of city, angle from center point of city, neighborhood, and quadrant
... lots more but I wanted to get the list started!

galenweld · 2019-01-10T18:34:58Z

X,Y position information and depth will be the easiest to add first, especially depth. Those are clearly easy to convert directly to features. With street topology, we'll need to give some thought as to how to convert to features, and we'll almost certainly want to experiment with this, as well. @jonfroehlich, you said that you had some street topology already computed. What format is that in, and what does it contain?

jonfroehlich · 2019-01-10T18:40:18Z

Agree with everything you said.

Best to probably have an in-person brainstorm/discussion about street topology stuff. At the very least, the easiest thing (and perhaps the most critical) might be simply percentage position on a street segment and whether a label is in an intersection (which might be captured by percentage position).

By percentage position, I mean simply take every street segment (see CHI'19 paper for definition), cut it in half, and use the lat/lng position of the label and the two intersections to calculate percentage position where both intersections are 0% and the halfpoint is 100%. Does that make sense? We may want to use discretized bins of like 15% or so.

galenweld · 2019-01-22T23:07:02Z

Based on our in-person discussion today, we can also consider adding features such as building age and zoning category.

galenweld · 2019-01-29T23:21:34Z

We discussed in person about the important of distinguishing between intersection/non-intersection panos, which I think is best represented by including the percentage-towards-intersection value described above.

jonfroehlich · 2019-01-30T00:26:56Z

For the intersection stuff, Mikey says this is not currently tracked but likely should be in the future. We may have to ask Anthony, Manaswi, or Kotaro.

galenweld · 2019-02-22T22:48:09Z

Per an email conversation with Anthony, we're now working on getting the intersection position features added. X,Y and heading are already exported alongside the imagery in the new dataset generation code. I'm currently working on writing a custom PyTorch dataloader that will be flexible and extensible in our incorporation of the features described here, as well as any we choose to add in the future.

We will at some point probably want to experiment with how we incorporate the extra features into the network architecture, and I'm trying to set up a meeting with Joe Redmon to discuss this further. Will update this thread when I have content to report from that, but for now I'm just appending the extra features to the image vector - worth noting that with this approach we will not be able to use the pretrained resnet architecture.

jonfroehlich · 2019-02-23T01:10:10Z

I wanted to put in more information about some simple street segment location features, so I've copied the full email I sent to Anthony et al. below.

We were discussing two potential different input features to our model, which relate to the location of the labeled panorama on a street segment with respect to its two endpoints (intersections). I have attached an example figure, which I hope is helpful.

In the example above, the labeled panos are in gray and occur at three points along a single street segment. We divide the segment in half so that we are tracking distance from the endpoint.

The first input feature is a normalized % distance from the nearest endpoint to the middle of the street segment
The second input feature is the raw distance from the nearest endpoint

Both inputs are listed under the pano labels after the bold Loc. Does this make sense?

galenweld · 2019-03-02T21:12:47Z

I had another excellent meeting with Joe Redmon yesterday, and wanted to document here what we discussed regarding methods for incorporating additional features into resent, while still being able to use the pretrained weights and not needing to train from scratch.

Broadly, we discussed two approaches:

The first approach was to "widen" the standard resnet architecture to include the new features. The "widened" section of the net would be initialized with either random weights, or random weights chosen from the existing pretrained weights.

To do this, Joe recommended normalizing all new features to the same [0...1] range that the RGB channels are. For current 7x7x3 filters, we'd change each filter to be 7x7x(3+# of new features/channels). Doing this approach, Joe believed, would result in the best possible performance but would be the most difficult to implement, as it would require essentially re-implementing resnet from scratch.

The second, simpler approach we discussed was to use resnet as a 'feature extractor' to learn a feature vector of, say, length 1024 from the images, and then concatenate our new features onto the end of this image feature vector, and feed the combined vector through a few more hidden layers. This is much more straightforward to implement, so I'm planning on tackling this first.

A few more additional suggestions from the meeting, in no particular order:

Any feature expressed in degrees (such as heading) should be decomposed into x and y components using sin and cos, instead of feeding the raw heading into the system. This is because while 359 degrees and 0 degrees are numerically quite far apart, they are in 2d space very close, and having almost-correct answers be numerically far apart throws off the error-squared computation dramatically.
At some point, we could attempt to re-map our 3-channel RGB images + depth data to a single 4 channel RGBD image, and extract sliding window crops from the 4 channel image, instead of the just taking a crop from the 3 channel image and annotating the depth at the crop's center. This should improve performance, but again, significantly complexifies the pipeline, especially as the depth data is of a lower resolution than the RGB data, so interpolation would be required, which would be computationally intensive when run on the entire dataset.

Joe also (very kindly) offered to help troubleshoot my current Google Compute Engine woes, which have been dramatically slowing the speed of our training. Planning on circling back with him on that on Monday.

galenweld · 2019-03-09T00:19:29Z

sine + cosine decomposition for the resnet "feature extractor" approach has been implemented, along (of course) with the rest of the "feature extractor" modified network architecture. The extra features (other than the imagery) are converted to a 7-channel tensor and added into the final fully connected layer of the network, alongside the 512 features outputted by the resnet.

See this hasty and colorful diagram of the new architecture:

jonfroehlich · 2019-03-09T01:21:04Z

Neat. Can’t wait to see some results. :)

…

Sent from my iPhone

On Mar 8, 2019, at 4:19 PM, Galen Weld ***@***.***> wrote: sine + cosine decomposition for the resnet "feature extractor" approach has been implemented, along (of course) with the rest of the "feature extractor" modified network architecture. The extra features (other than the imagery) are converted to a 7-channel tensor and added into the final fully connected layer of the network, alongside the 512 features outputted by the resnet. See this hasty and colorful diagram of the new architecture: — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

galenweld · 2019-04-04T00:03:59Z

I spent some more time scheming about additional demographic features to potentially include. A few thoughts....

Of course, there are many sources of data out there. Many municipalities have their own local databases, many of which are publicly query-able. However, the more specific we get to each city, the more difficult it becomes to port models from one city to another (for example, if the model from DC includes data that isn't readily available for, say, Newberg). As such, I would imagine it makes sense to limit our features to data available universally, and of course, the best set of universal data for the US comes from the US census. The census provides lots and lots of useful measurements, which you can play around with a bit here. Some of those potentially useful features are:

Median Age
Percent Male/Female
Percent composed of a certain race
Percent Family
Rental and Homeowner Vacancy Rate
Percent Occupied Housing Units

All of this data is available for the entirety of the United States. Outside of the US it gets a bit trickier (for instance, if we wanted to include Tohme crops from Saskatchewan.

This data is all available at the Census Tract level, which is fairly high resolution, but not enormously so. The only higher resolution datasets that I found are limited to specific cities, and even then, those datasets are hard to find (for things like population density, etc). To give a sense of the resolution of data at the census tract level, Washington DC is divided into 179 census tracts. The city website provides a map. So within DC, for example, we would not be able to differentiate between two points in the same census tract by census data alone - it remains to be seen how significant a limitation this is.

All this raises the question of the cost-benefit analysis of going ahead and building a system that will allow us to query census data given a lat/long. I haven't spent too much time digging into the details (ie is it easier to download all the data or query an api, etc) but it doesn't seem like it should be too bad. As I mentioned to Jon in person yesterday, my hypothesis is that no matter the resolution, these demographic factors won't make a dramatic difference in the performance of our CV system, but are potentially worth discussing in the paper, even if we don't implement them.

jonfroehlich · 2019-04-04T04:14:30Z

Thanks @galenweld for thinking about this further.

I still feel like we should capture the lower hanging fruit first--like the things we discussed in our meeting this week--including: street-related features (i.e., where a label is placed with respect to a street segment) and geographic position features (i.e., where a label is placed in the city). I'm also interested in more extrinsic features like real-estate pricing, zoning category, and socio-economic data. I think zoning category would be relatively easy to get (we actually used it in our CHI'19 paper and performed some analysis exploring accuracy as a function of zone).

While I agree with the general point that we should weight input features that are universally available (at least in the US), I am also, in principle, not really against investigating features only available in a given city. However, we don't really have time to do this for the ASSETS push imo.

galenweld · 2019-04-04T06:49:25Z

@jonfroehlich I agree wholeheartedy, and both street segment positioning as well as geographic positioning are on the list to be added as I re-run the metadata for the dataset. I'm only broaching this topic before I do so, as that will take a day or two to run and so I was simply wondering if it makes sense to try and wrap those items in during this round.

However, it sounds like I should hold off for now, and we can always revisit them later.

galenweld · 2019-04-05T23:53:27Z

An update on this front:

I've now written code to incorporate the following additional features, with thanks to @tongning:

latitude
longitude
block "middleness"
distance to end of block
distance from downtown
heading to downtown

Currently, I'm looping over the dataset and adding these additional features. Will let the run over the weekend, and then when that's finished I'll upload to the VM and tweak the model to run using them.

This will raise one design question: For a number of reasons, primarily bad XML data in the GSV scrapes, some small percentage of the panos in our dataset have no latitude and longitude information, which makes it impossible to compute these new features. Currently, I'm just writing NaNs to their sidecar files in this case, but when we tweak the model, we'll need to decide how we want to handle these panos that don't have this information.

I don't think it'll make a big performance difference, given the limited impact that the extra features have had so far, but worth considering a bit.

Possible options are to skip panos with bad data (will have an accurate number on what fraction of the dataset this is when I finish computing the new features), to hardcode some backup value or compute one via a different heuristic, or to try and come up with some way of encoding NaN into the model.

galenweld · 2019-04-05T23:57:25Z

Update: I got errors getting lat/long or block position info for 2,925 panos out of the 57,446 total, so ~5.09%.

jonfroehlich · 2019-04-06T13:17:47Z

Ugh, that's too bad. Glad to know it's only ~5%. What should we do in these cases? I'd prefer not to skip these panos... can we just encode a null value for those input features?

Can you clarify what you are using for 'downtown' in distance from downtown and heading to downtown. Is this the center point of the DC? If so, how did you calculate this?

Can you also expand on how you calculated block 'middleness' and distance to end of block? Was this similar to our discussions?

galenweld · 2019-04-06T18:01:11Z

I'm using the center of the White House right now, but can use any arbitrary point. Takes about 20 minutes to redo that one, so let me know where you'd prefer. For middleness and distance to end of block, that's exactly what we discussed. I'm using Anthony's code for that. And yes, we can encode a null value, that's what I'm doing at the moment. We just need to figure out the best way to input that to the model.

…

On Sat, Apr 6, 2019 at 6:17 AM Jon Froehlich ***@***.***> wrote: Ugh, that's too bad. Glad to know it's only ~5%. What should we do in these cases? I'd prefer not to skip these panos... can we just encode a null value for those input features? Can you clarify what you are using for 'downtown' in distance from downtown and heading to downtown. Is this the center point of the DC? If so, how did you calculate this? Can you also expand on how you calculated block 'middleness' and distance to end of block? Was this similar to our discussions? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAV5yiqh4a0_GswCG5jsNcoHuy8Wx0QLks5veJ57gaJpZM4Z5mMm> .

jonfroehlich · 2019-04-07T13:07:39Z

Got it. Thanks. Can you expand on:

We just need to figure out the best way to input that to the model.

galenweld · 2019-04-07T15:23:44Z

Sure. The neural net takes as input a vector of size 244*244*3 + #extra features. All of those features have a numeric value that we normalize to the range [0,1]. For the features that we are missing data for, we encode a 'NaN' but we can't input a 'NaN' into the neural net because one can't perform any computation on NaN values. So we need to figure out how to either a) pick a discrete numerical value to use instead of NaN or b) modify the neural networks architecture to ignore those features in the case that they're NaN values, which now that I think about it may be straightforward and the best option. I'll experiment with that as well as ping Joe Redmon to see if he has any suggestions as well.

…

On Sun, Apr 7, 2019, 06:07 Jon Froehlich ***@***.***> wrote: Got it. Thanks. Can you expand on: We just need to figure out the best way to input that to the model. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAV5yutA7WyRNQkfKA0uFLn7ci9vKO2Bks5vee2bgaJpZM4Z5mMm> .

jonfroehlich · 2019-04-07T15:48:56Z

Got it. Thanks for the additional explanation.

…

Sent from my iPhone

On Apr 7, 2019, at 10:23 AM, Galen Weld ***@***.***> wrote: Sure. The neural net takes as input a vector of size 244*244*3 + #extra features. All of those features have a numeric value that we normalize to the range [0,1]. For the features that we are missing data for, we encode a 'NaN' but we can't input a 'NaN' into the neural net because one can't perform any computation on NaN values. So we need to figure out how to either a) pick a discrete numerical value to use instead of NaN or b) modify the neural networks architecture to ignore those features in the case that they're NaN values, which now that I think about it may be straightforward and the best option. I'll experiment with that as well as ping Joe Redmon to see if he has any suggestions as well. On Sun, Apr 7, 2019, 06:07 Jon Froehlich ***@***.***> wrote: > Got it. Thanks. Can you expand on: > > We just need to figure out the best way to input that to the model. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#1 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AAV5yutA7WyRNQkfKA0uFLn7ci9vKO2Bks5vee2bgaJpZM4Z5mMm> > . > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

jonfroehlich · 2019-07-11T17:15:04Z

This is one of those brain dump threads that are hard to close out... so I suggest keeping it open and continuing to discuss new input features. We could also split out discussions of particular input features into their own Issue in future.

jonfroehlich added the Priority: High label Mar 22, 2019

jonfroehlich mentioned this issue Jun 11, 2019

To analyze where labels are placed on a street segment, I need the previously developed code ProjectSidewalk/sidewalk-quality-analysis#10

Open

jonfroehlich added needs discussion and removed Priority: High labels Jul 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential Input Features to Explore #1

Potential Input Features to Explore #1

jonfroehlich commented Jan 10, 2019 •

edited by galenweld

Loading

galenweld commented Jan 10, 2019

jonfroehlich commented Jan 10, 2019

galenweld commented Jan 22, 2019

galenweld commented Jan 29, 2019

jonfroehlich commented Jan 30, 2019

galenweld commented Feb 22, 2019

jonfroehlich commented Feb 23, 2019

galenweld commented Mar 2, 2019 •

edited

Loading

galenweld commented Mar 9, 2019

jonfroehlich commented Mar 9, 2019 via email

galenweld commented Apr 4, 2019

jonfroehlich commented Apr 4, 2019

galenweld commented Apr 4, 2019

galenweld commented Apr 5, 2019

galenweld commented Apr 5, 2019

jonfroehlich commented Apr 6, 2019

galenweld commented Apr 6, 2019 via email

jonfroehlich commented Apr 7, 2019

galenweld commented Apr 7, 2019 via email

jonfroehlich commented Apr 7, 2019 via email

jonfroehlich commented Jul 11, 2019

Potential Input Features to Explore #1

Potential Input Features to Explore #1

Comments

jonfroehlich commented Jan 10, 2019 • edited by galenweld Loading

galenweld commented Jan 10, 2019

jonfroehlich commented Jan 10, 2019

galenweld commented Jan 22, 2019

galenweld commented Jan 29, 2019

jonfroehlich commented Jan 30, 2019

galenweld commented Feb 22, 2019

jonfroehlich commented Feb 23, 2019

galenweld commented Mar 2, 2019 • edited Loading

galenweld commented Mar 9, 2019

jonfroehlich commented Mar 9, 2019 via email

galenweld commented Apr 4, 2019

jonfroehlich commented Apr 4, 2019

galenweld commented Apr 4, 2019

galenweld commented Apr 5, 2019

galenweld commented Apr 5, 2019

jonfroehlich commented Apr 6, 2019

galenweld commented Apr 6, 2019 via email

jonfroehlich commented Apr 7, 2019

galenweld commented Apr 7, 2019 via email

jonfroehlich commented Apr 7, 2019 via email

jonfroehlich commented Jul 11, 2019

jonfroehlich commented Jan 10, 2019 •

edited by galenweld

Loading

galenweld commented Mar 2, 2019 •

edited

Loading