Seattle and Newberg experiments for ASSETS camera ready #27

jonfroehlich · 2019-06-28T19:51:50Z

For the ASSETS CR, we want to rerun a few experiments. To do this, we need updated data.

@misaugstad, could you run the following queries for us ASAP:

How many researcher-provided labels do we have in the Newberg and Seattle datasets?
How many labels do we have in total that are either researcher provided or researcher validated? (for both cities)
How many labels do we have in total if just researcher validated (for both cities)

misaugstad · 2019-06-28T20:06:11Z

How many researcher-provided labels do we have in the Newberg and Seattle datasets?

Seattle: 1.7k, Newberg: 3.2k

How many labels do we have in total if just researcher validated (for both cities)

Seattle: 4.1k, Newberg: 0.5k

How many labels do we have in total that are either researcher provided or researcher validated? (for both cities)

Seattle: 5.3k, Newberg: 3.5k (note that these are smaller than the sum of the two above, because researcher-validated labels placed by other researchers would have been counted twice).

misaugstad · 2019-06-28T20:07:35Z

and actually you should lower the estimates of the number of researcher validated labels there are slightly b/c I didn't calculate a majority vote for these estimates above, I only counted the number of labels that have any upvotes from researchers. I don't expect this to have a large effect, but just note that the numbers will likely be revised down slightly.

galenweld · 2019-06-28T20:21:45Z

Great, thanks Mikey. I was misremembering in my email when I said that we had 6k labels for Newberg.

galenweld · 2019-07-11T01:57:15Z

I sent the following email yesterday, wanted to add it here for continuity:

Hi Jon,
Sorry for the delay on this. Lots of exciting new results to discuss, and thankfully nothing too surprising. So far, we've finished running all of the equivalent experiments for Seattle that we've run for Newberg, except for the model trained on Seattle+DC data, which is much slower to run since it is much larger. That model is training right now and should finish on Thursday.

I recreated the same figure that we have for Newberg with our Seattle data:

You'll notice that basically everything we observed in our Newberg experiments also holds for Seattle. Our DC model offers the best performance on curb ramps, presumably because we have so many more examples of curb ramps from DC than from any other city. In Seattle, just as in Newberg, training on Seattle data offers much better performance on "null" crops than the DC-trained model, presumably because the "background" environment, i.e. all of the aspects of the city that aren't sidewalk features, is very specific to each city – a model trained on DC curb ramps will do a good job learning Seattle curb ramps, but a model trained on DC null crops will do an atrocious job recognizing Seattle curb ramps.

Perhaps I've digressed a little bit, but the point is, there's lots of interesting tidbits to be teased out of both our Seattle and our Newberg experiments – our challenge for the camera ready will be rolling this all into a coherent narrative and presenting it in a clear manner.

I think for tomorrow's meeting, we should discuss the following points:
How best to present the results we have so far (both with graphics as well as in our discussion)
Additional Seattle+Newberg experiments to run
Looking forward to chatting about all this soon,
GalenHi Jon,
Sorry for the delay on this. Lots of exciting new results to discuss, and thankfully nothing too surprising. So far, we've finished running all of the equivalent experiments for Seattle that we've run for Newberg, except for the model trained on Seattle+DC data, which is much slower to run since it is much larger. That model is training right now and should finish on Thursday.

I recreated the same figure that we have for Newberg with our Seattle data:
newberg_results.pngseattle_results.png
You'll notice that basically everything we observed in our Newberg experiments also holds for Seattle. Our DC model offers the best performance on curb ramps, presumably because we have so many more examples of curb ramps from DC than from any other city. In Seattle, just as in Newberg, training on Seattle data offers much better performance on "null" crops than the DC-trained model, presumably because the "background" environment, i.e. all of the aspects of the city that aren't sidewalk features, is very specific to each city – a model trained on DC curb ramps will do a good job learning Seattle curb ramps, but a model trained on DC null crops will do an atrocious job recognizing Seattle curb ramps.

Perhaps I've digressed a little bit, but the point is, there's lots of interesting tidbits to be teased out of both our Seattle and our Newberg experiments – our challenge for the camera ready will be rolling this all into a coherent narrative and presenting it in a clear manner.

I think for tomorrow's meeting, we should discuss the following points:
How best to present the results we have so far (both with graphics as well as in our discussion)
Additional Seattle+Newberg experiments to run
Looking forward to chatting about all this soon,
Galen

galenweld · 2019-07-17T10:10:30Z

All the models we ran on Newberg have been re-run on Seattle with the fixed crop sizing. As expected, nothing changed dramatically, and our results all still hold.

I also tweaked the plots to make the Overall values more visible.

The plots above have been added to the paper, and I re-wrote our analysis to incorporate numbers from Seattle.

As far as additional experiments for Seattle and Newberg together go, I'm currently training a model on the three-way combination of Seattle+Newberg+D.C. data, as we decided in person that this was the most promising thing to try first. I will update with results from this when I have them, but even without this, I think we're in good shape on this front.

jonfroehlich · 2019-07-17T12:52:30Z

Cool--both on the results front (yay!) and on the plots front (looking better!). Towards the latter, can we: - Switch around the colors so that they are consistent with PS (ingrained in my head). Green for curb ramp, red for missing ramp, blue for obstruction, and orange for surface problem - I really like how the 'Overall' line is more noticeable now (nice job) but it now obscures some of the underlying trends (especially in the second graph). Could we try going with a 50-75% opacity or something to see if that helps?

…

On Wed, Jul 17, 2019 at 3:10 AM Galen Weld ***@***.***> wrote: All the models we ran on Newberg have been re-run on Seattle with the fixed crop sizing. As expected, nothing changed dramatically, and our results all still hold. I also tweaked the plots to make the Overall values more visible. [image: newberg_acc] <https://user-images.githubusercontent.com/358858/61367260-18dcaf00-a840-11e9-8bc7-fba2e6229ce4.png> [image: seattle_acc] <https://user-images.githubusercontent.com/358858/61367269-1d08cc80-a840-11e9-9f65-ca1a97064108.png> The plots above have been added to the paper, and I re-wrote our analysis to incorporate numbers from Seattle. As far as additional experiments for Seattle and Newberg together go, I'm currently training a model on the three-way combination of Seattle+Newberg+D.C. data, as we decided in person that this was the most promising thing to try first. I will update with results from this when I have them, but even without this, I think we're in good shape on this front. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#27?email_source=notifications&email_token=AAML55NN45BXN5AH4N65XDTP73VZPA5CNFSM4H4IDDEKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2DW5FI#issuecomment-512192149>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAML55LRCI4MKPSTJQ2FDFTP73VZPANCNFSM4H4IDDEA> .

-- Jon Froehlich Associate Professor Paul G. Allen School of Computer Science & Engineering University of Washington http://makeabilitylab.io @jonfroehlich <https://twitter.com/jonfroehlich> - Twitter Help make sidewalks more accessible: http://projectsidewalk.io

galenweld · 2019-07-17T20:51:58Z

Done. Let's take a look at the opacity in person in a little bit.

I also re-did all the other figures so we're 100% consistent with our color schemes.

jonfroehlich assigned misaugstad Jun 28, 2019

galenweld changed the title ~~How many labels from researchers and how many labels validated by researchers?~~ Seattle and Newberg experiments for ASSETS camera ready Jun 28, 2019

jonfroehlich added the assets19-cr label Jul 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seattle and Newberg experiments for ASSETS camera ready #27

Seattle and Newberg experiments for ASSETS camera ready #27

jonfroehlich commented Jun 28, 2019

misaugstad commented Jun 28, 2019

misaugstad commented Jun 28, 2019

galenweld commented Jun 28, 2019

galenweld commented Jul 11, 2019

galenweld commented Jul 17, 2019

jonfroehlich commented Jul 17, 2019 via email

galenweld commented Jul 17, 2019

Seattle and Newberg experiments for ASSETS camera ready #27

Seattle and Newberg experiments for ASSETS camera ready #27

Comments

jonfroehlich commented Jun 28, 2019

misaugstad commented Jun 28, 2019

misaugstad commented Jun 28, 2019

galenweld commented Jun 28, 2019

galenweld commented Jul 11, 2019

galenweld commented Jul 17, 2019

jonfroehlich commented Jul 17, 2019 via email

galenweld commented Jul 17, 2019