CSc 471 Spring 2022 -Assignment 4

Computer Science – The City College of New York
Computer Vision
Assignment 4 ( Deadline: 04/24 Sunday before midnight – extended for one week to 05/01 Sunday before midnight)

(Those marked with * are optional for extra credits)

Note:  Turn in a PDF document (in writing; please type) containing a list  of your .m files (not the code itself),  images showing the results of your experiments, and an analysis of the results.All the writings must be soft copies in print and be sent to Prof. Zhu via email Zhigang Zhu <> . For the programming part, send ONLY your source code  by email; please don’t send in your images and executable (even if you use C++).  You are responsible for the loss of your submissions if you don’t write  “CSC 471 Computer Vision Assignment 4” in the subject of your email. Do write your names and IDs (last four digits) in both both of your report and the code.  Please don’t zip your report with your code and other files; send me the report in a separate PDF file. The rest can be in a zipped file.

1.  (Stereo- 30 points ) Estimate the accuracy of  the simple stereo system (Figure 3 in the lecture notes of stereo vision) assuming that the only source of noise is the localization of corresponding points in the two images. Please derive the error estimation equation (10 points) and discuss (20 points) the dependence of the error in depth estimation of a 3D point as a function of (1) the baseline width, (2) the focal length, (3) stereo matching error, and (4) the depth of the 3D point.

Hint: D = f B/d; Take the partial derivatives of D with respect to the disparity d. 

2. (Motion- 20 points) Could you obtain 3D information of a scene by viewing the scene by using multiple frames of images taken by a camera  rotating around its optical center (5 points)? Discuss why or why not (5 points). What about translating (moving, not zooming!) the camera along the direction of its optical axis (5 points)? Explain. (5 points)

3. (Stereo and Motion – 20 points): (1) Give 5 examples when humans using stereo or motion in daily life or work (10 points) (2) Give another 5 examples that use computer vision techniques with stereo or motion in real applications (10 points)

4. (Stereo Programming – 30 points + 10 bonus points) Use the image pair ( Image 1Image 2) for the following exercises.

(1). Fundamental Matrix. – Design and implement a program that, given a stereo pair, determines at least eight point matches, then recovers the fundamental matrix (5 points ) and the location of the epipoles (5 points). Check the accuracy of the result by measuring the distance between the estimated epipolar lines and image points not used for the matrix estimation (5 points). Also, overlay the epipolar lines of control points and test points on one of the images (say Image 1- I already did this in the starting code below). Control points are the correspondences (matches)  used in computing the fundamental matrix,  and test points are those  used to check the accuracy of the computation.

Hint: You can pick up the matches of both the control points and the test points manually. You may use my matlab code (FmatGUI.m)  as a starting point – where I provided an interface to pick up point matches by mouse clicks. The epipolar lines should be (almost)  parallel in this stereo pair. If not, something is wrong either with your code or the point matches.

(2). Feature-based matching. – Design a stereo vision system to do “feature-based matching” and explain your algorithm in writing – what the feature is, how effect it is, and what are the problems (5 points). The system should have a user interface that allows a user to select a point on the first image, say by a mouse click (5 points).  The system should then find and highlight the corresponding point on the second image, say using a cross hair points). Try to use the epipolar geometry derived from (1) in searching  correspondences along epipolar lines (5 points). 

Hint : You may use a similar interface  as I did for question (1).

(3) Discussions. Show your results on points with different properties like those in corners, edges, smooth regions, textured regions, and occluded regions that are visible only in one of the images. Discuss for each case, why your vision system succeeds or fails in finding the correct matches (5 extra points). Compare the performance of your system against a human user (e.g. yourself) who marks the corresponding matches on the second image by a mouse click (5 extra points).