-
Notifications
You must be signed in to change notification settings - Fork 0
/
dev_diary.txt
152 lines (97 loc) · 8.78 KB
/
dev_diary.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
Session 1
Following along with https://www.youtube.com/watch?v=7Hlb8YX2-W8
Download the initial mp4 and figure out what his first steps are
- He just wanted to make a basic video player
Session 2
Make some notes on slam
- Constructing a map of the surrounding environment while updating our position within it
How might we do this?
- Use openCV to track visual markers across frames.
- Use the camera specs to calculate the distance of the markers from the camera
- Then on the next frame we update our position in the map by observing the change in position of the markers
- Then look for more markers to track add to out map
What if we dont have the camera specs?
Apparently we are using feature slam not dense slam. It is not clear what the difference is
some slam links
https://faculty.cc.gatech.edu/~afb/classes/CS7495-Fall2014/presentations/visual_slam.pdf
https://etd.ohiolink.edu/apexprod/rws_etd/send_file/send?accession=ouhonors1461320700&disposition=inline
https://medium.com/data-breach/introduction-to-harris-corner-detector-32a88850b3f6
https://arxiv.org/pdf/1502.00956.pdf
Orb slam uses a FAST keypoint detector and BRIEF detector.
FAST is Features from Accelerated Segment Test. It uses a machine learning algorithm to find corners in the image. The idea being that corners are less likely to change in presentation as the camera translated or rotates
Session 3
BRIEF stands for Binary Robust Independent Elementary Features. A binary string is generated to represent a keypoint/feature. XOR can be used to determine if two features match using only one machine instruction.
It wasn't designed to be invariant to rotation, so it performs poorly if the camera is rotated.
I will now hook up the cv2 functions to extract features from each frame of the video. I might need a dict to store them by there key. i wonder what meta data comes with them.
See https://docs.opencv.org/3.4/d1/d89/tutorial_py_orb.html for the extra modifications orbslam takes to account for orientation of the camera.
Some points of confusion in the openCV orbslam docs
- what is meant by binary tests? Just features in a patch?
- isn't the rotation different for each feature, so how does it make sense to apply one rotation matrix to all features?
- what is a bit feature? What does it mean for them to have a variance and mean?
- what is using a pyramid to find multiscale features?
It looks like I am going to ge into the weeds with orb slam.
There are some key ideas I need to clarify at the top of the research paper.
Tracking - Presumably finding features in a frame and reidentifying them in the next frame.
Mapping - Using the features to construct a 3D map of the environment. Needs some way of transforming from image space to world space.
Relocalization - Sometime there are no familiar features to be found and in order to join up with the map we need to lookup the current view from a database to get a match. Need more info on how this is done
Loop Closing - Slight error in map generation and localization can mean that travelling in a loop and result in previously detected features having different map locations. An algorithm is run to deform the map and our localizaed path so that the common features match up in map space and the loop is closed.
More questions
- How does the deformation work to close the loop?
- How do we know if we are in a loop? Is it my detecting similar groupings of features?
- How does image lookup work?
Food for thought. I still need to read the ORB-SLAM paper. There is a high barrier to entry for this coding session.
Session 4
links
https://docs.opencv.org/3.4/dd/d1a/group__imgproc__feature.html#ga1d6bb77486c8f92d79c8793ad995d541
https://docs.opencv.org/4.6.0/dc/dc3/tutorial_py_matcher.html
Skipped to 46:06
This is now a git repo. I decided yesterSession to combine whatever I make here with a flutter app, probably streaming some results to be visualized.
I've hooked up the basic openCV ORB keypoint detector and descriptor computation. They are drawn on each frame nicely. Next I suppose we need to track them across frames.
Apparently the keypoints are not well distributed (they mostly sit on the treeline) and so we need to break the image up into grid to distribute them more evenly.
The ORB-SLAM paper mentions that we want at least 5 features per grid cell.
Now the keypoint are clumped together in the grid cells.
There 3 main keypoint/corner detectors. FAST, Harris, and Shi-Tomasi. Apparently Shi-Tomasi is better without lower time constraints.
openCV has a function to implement Shi-Tomasi which is called goodFeaturesToTrack. It is a big improvement over the previous method.
So now we need to track the keypoints across frames.
I'll start with the brute force matcher. It appears to work well enough but I get this error on frame 34
Traceback (most recent call last):
File "/Users/georgeoconnor/geohot_livestreams/slam/./slam.py", line 39, in <module>
frame = cv2.drawMatches(prev_frame, orb.prev['kps'], frame, kps, matches[:10], None, flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
cv2.error: OpenCV(4.6.0) /Users/runner/work/opencv-python/opencv-python/opencv/modules/features2d/src/draw.cpp:242: error: (-215:Assertion failed) i2 >= 0 && i2 < static_cast<int>(keypoints2.size()) in function 'drawMatches'
Tried a few things with no success
Session 5
Quickly checking in, hes talking about using the fundemental matrix to filter results, better find out what that is...
Also, I think I should read some of the ORB-SLAM paper to get a better insight into where this is going. I am still a little hazy on how mapping will work.
These is no explicitly encoded depth information in the video, so how will we be able to assign a 3D position.
And beyond that, how exactly do we simultaneously use new matches to update the map and our position in the map?
- In that regard I suppose we compute out position delta out of other odometry sources and then do the map update using the new matches and position delta just calculated.
- But then in that case are we just using wheel angles and an IMU to get the position delta? OR are somehow using the matches without creating a chicken and egg situation?
Session 5
Time to get a flutter app up and running and this thing on GitHub.
Session 6
I decided I should also build the equivalent server side code in c++.
So far, that involves opening a video file, reading the frames and tracking the features. Lets do that now.
Ive had to run brew install opencv to get the c++ version 4 of opencv.
I can use the reccomended cmake build system from the docs to build the project.
Session 7
Just need to do a few things to get the c++ up to speed with the python...
Session 8
Still doing the above. I had a moment of confusion over the difference between keypoints and descriptors. It is explained here https://answers.opencv.org/question/37985/meaning-of-keypoints-and-descriptors/
Some key differences
- keypoints contain location info, descriptors should be location invariant (if the visual feature is the same)
- descriptor should be invariant to rotation, scale, and illumination
I found some handy sample code to help with the c++ version. https://docs.opencv.org/4.6.0/d2/d1d/samples_2cpp_2lkdemo_8cpp-example.html#a20
I should read this at some point and get up to speed with the essential matrix https://www.cs.cmu.edu/~16385/s17/Slides/12.2_Essential_Matrix.pdf
I have now setup c++ debugging. Its always very handy to view the data structures in the debugger when doing image processing.
Ok now c++ is up to speed.
Lets get the flutter app up and running. For now I just want to get the images from the c++ or the python code and display them in the flutter app.
I can open a socket and send the frame over seperately, maybe I can make a button to allow the user to step through the frames.
Maybe I make a proper H.265 stream? https://pub.dev/packages/video_player_web_hls might work. I could pipe images into ffmpeg and stream them to the app.
Looking into HLS in more detail, I have my doubts that the added complexity will be worth it, especially if the user will just press buttons to step through frames.
So for now I am using a socket to send images. I'll start on the python side first.
Ok, now the image streaming is working, I can pause, play and restart. I'll add a frame rate slider and then move on to making it pretty.
Session 9
I've done all sorts with flutter, but havent gone too deep customizing the UI themes. I'm going to look at a few examples of compelling dashboards online and try to match one.
Ok, seems like a blurred off white colour gradient background with some rounded corners is the way to go.
Alright the the UI has some nice rounded corners and funky colors. I suspect some people would hate the colors though. Oh well.
I have multiple widgets making seperate socket connections to the server, so next time I better handle that properly.