Computer Vision Spring 2023 – Assignment 2

Computer Science – The CUNY Graduate Center
CSC 83020 – 01  (55719) Human & Computer Vision with Advanced Topics
Assignment 2 ( Deadline: 03/07 Tuesday before midnight)

===============================================================
Note: All the writings of your assignment must be in a “soft” copy (in a single PDF file)  by sending  to Prof. Zhu <cv.zhu.ccny@gmail.com> via an email attachment. You are responsible for the lose of your submissions if you don’t include  “CSC 83020 – 01″  (exactly) in the subject line of your email. For your programming part, in addition to the writing report including in the PDF file mentioned above, please also send your source code in a separate file (zipped if having multiple files in their original formats); please don’t format them into PDF or Word formats. Please don’t send in your images and executable.  Instead you may want to include images, tables, etc. inside your report to show results of your work. (Those marked with * are optional for extra credits)

Please don’t forget to write your name and ID (last four digits) in both your report and the code, right after the title (if any) of your report. Then under your name, please write this statement:
“The work in this assignment is my own. Any outside sources have been properly cited.”
Without writing this statement, you will not be able to get any score.

1   (Camera Models- 20 points)  Prove that the vector from the viewpoint of a pinhole camera to the vanishing point (in the image plane) of a set of 3D parallel lines is parallel to the direction of the parallel lines. Please show the steps of your proof.

Hint: You can either use geometric reasoning or algebraic calculation. 

If you choose to use geometric reasoning, you can use the fact that the projection of a 3D line in space is the intersection of its “interpretation plane” with the image plane.  Here the interpretation plane (IP) of a 3D line is a plane passing through the 3D line and the center of projection (viewpoint) of the camera.  Also, the interpretation planes of two parallel lines intersect in a line passing through the viewpoint, and the intersection line is parallel to the parallel lines.

If you select to use algebraic calculation, you may use the parametric representation of a 3D line: P = P0 +tV, where P= (X,Y,Z)T is any point on the line (here  T denote for transpose),   P0 = (X0,Y0,Z0)T is a given fixed point on the line, vector V = (a,b,c)T represents the direction of the line, and t is the scalar parameter that controls the distance (with sign) between P and P0.

If you want to use the determinant formed by three 3D points, you will need to explain details of both the meaning of the determinant, and the steps to arrive your conclusion. Finding a solution somewhere online and copy it in your submission doesn’t work for you.

2. (Camera Models- 20 points) Show that relation between any image point (xim, yim)T of a plane (in the form of (x1,x2,x3)T in projective space ) and its corresponding point (Xw, Yw, Zw)T on the plane in 3D space can be represented by a 3×3 matrix. You should start from the general form of the camera model (x1,x2,x3)T = MintMext(Xw, Yw, Zw, 1)T, where M = MintMext is a 3×4 matrix, with the image center (ox, oy), the focal length f, the scaling factors( sx and sy),  the rotation matrix R and the translation vector T all unknown. Note that in the course slides and the lecture notes, I used a simplified model of the perspective project by assuming ox and oy are known and sx = sy =1, and only discussed the special cases of planes.. So you cannot directly copy those equations I used. Nor can you simply derive the 3×4 matrix M.  Instead you should use the general form of the projective matrix (5 points), and the  general form of a plane nx Xw + ny Yw + nz Zw  = d (5 points), work on an integration (5 points), to form a 3×3 matrix between a 3D point on the plane and its 2D image projection (5 points).

3.  (Calibration- 20 points )  Prove the Orthocenter Theorem by geometric arguments: Let T be the triangle on the image plane defined by the three vanishing points of three mutually orthogonal sets of parallel lines in space. Then the image center is the orthocenter of the triangle T (i.e., the common intersection of the three altitudes. 
(1)    Basic proof: use the result of Question 1, assuming the aspect ratio of the camera is 1. Note that you are asked to prove the Orthcenter Theorem, not just the orthcenter of a triangle (7 points)
(2)    If you do not know the  focal length of the camera, can you still find the image center using the Orthocenter Theorem? Explain why or why not (3 points).  Can you also estimate the focal length after you find the image center? If yes, how, and if not, why (5 points)
(3)    If you do not know the aspect ratio and the focal length of the camera, can you still find the image center using the Orthocenter Theorem? Explain why or why not. (5 points)


4. Calibration Programming Exercises (40 points): Implement the direct parameter calibration method in order to (1) learn how to use SVD to solve systems of linear equations; (2) understand the physical constraints of the camera parameters; and (3) understand important issues related to calibration, such as calibration pattern design, point localization accuracy and robustness of the algorithms. Since calibrating a real camera involves lots of work in calibration pattern design, image processing and error controls as well as solving the equations, we will use simulated data to understand the algorithms.  As a by-product we will also learn how to generate 2D images from 3D models using a “virtual” pinhole camera.

  • A.Calibration pattern “design”. Generate data of a “virtual” 3D cube similar to the one shown in here of the lecture notes in camera calibration. For example, you can hypothesize a 1x1x1 m3 cube and pick up coordinates of 3-D points on one corner of each black square in your world coordinate system. Make sure that the number of your 3-D points is sufficient for the following calibration procedures. In order to show the correctness of your data, draw your cube (with the control points marked) using Matlab (or whatever language you are using). I have provided a piece of starting code in Matlab for you to use. (5 points)
  • B. “Virtual” camera and images. Design a “virtual” camera with known intrinsic parameters including focal length f, image center (ox, oy) and pixel size (sx, sy).  As an example, you can assume that the focal length is f = 16 mm, the image frame size is 512*512 (pixels) with an image center (ox,oy) = (256, 256), and the size of the image sensor  inside your camera is 8.8 mm *6.6 mm (so the pixel size is (sx,sy) = (8.8/512, 6.6/512) ). Capture an image of your “virtual” calibration cube with your virtual camera with a given pose (rotation R and translation T).  For example, you can take the picture of the cube 4 meters away and with a tilt angle of 30 degree. Use three rotation angles alpha, beta, gamma to generate the rotation matrix R (refer to the lecture notes in camera model – please double check the equation since it might have typos in signs).  You may need to try different poses in order to have a suitable image of your calibration target. (5 points)
  • C. Direction calibration method: Estimate the intrinsic (fx, fy, aspect ratio a, image center (ox,oy) ) and extrinsic (R, T and further alpha, beta, gamma) parameters. Use SVD to solve the homogeneous linear system and the least square problem, and to enforce the orthogonality constraint on the estimate of R. 

        C(i).      Use the accurately simulated data (both 3D world coordinates and 2D image coordinates) to the algorithms, and compare the results with the “ground truth” data (which are given in step (a) and step (b)).  Remember you are practicing a camera calibration, so you should pretend you know nothing about the camera parameters (i.e. you cannot use the ground truth data in your calibration process). However, in the direct calibration method, you could use the knowledge of the image center (in the homogeneous system to find extrinsic parameters) and the aspect ratio (in the Orthocenter theorem method to find image center).  (15 points)

      C(ii).      Study whether the unknown aspect ratio matters in estimating the image center (5 points), and how the initial estimation of image center affects the estimating of the remaining parameters (5 points), by experimental results.  Give a solution to solve the problems if any (5 points).

    C(iii).      Accuracy Issues. Add in some random noises to the simulated data and run the calibration algorithms again. See how the “design tolerance” of the calibration target and the localization errors of 2D image points affect the calibration accuracy. For example, you can add 0.1 mm (or more) random error to 3D points and 0.5 pixel (or more) random error to 2D points. Also analyze how sensitive of the Orthocenter method is to the extrinsic parameters in imaging the three sets of the orthogonal parallel lines. (* extra points:10)

In all of the steps, you should give you results using either tables or graphs, or both of them.