Computer Vision Spring 2023 – Assignment 1

Computer Science – The CUNY Graduate Center
CSC 83020 – 01  (55719) Human & Computer Vision with Advanced Topics
Assignment 1 ( Deadline: 02/21 Tuesday before midnight)

===============================================================
Note: All the writings of your assignment must be in a “soft” copy (in a single PDF file)  by sending  to Prof. Zhu <cv.zhu.ccny@gmail.com> via an email attachment. You are responsible for the lose of your submissions if you don’t include  “CSC 83020 – 01 ″ (exactly) in the subject of your email. For your programming part, in addition to the writing report including in the PDF file mentioned above, please also send your source code in a separate file (zipped if having multiple files in their original formats); please don’t format them into PDF or Word formats. Please don’t send in your images and executable.  You may want to include images, tables, etc. inside your report as they show results of your work.

Please don’t forget to write your name and ID (last four digits) in both your report and the code, right after the title (if any) of your report. Then under your name, please write this statement:
“The work in this assignment is my own. Any outside sources have been properly cited.”
Without writing this statement, you will not be able to get any score.

A. Writing Assignments (10×3 = 30 points)

  1. How does an image change (including at least, objects’ sizes in the image, field of view, etc.) if the focal length of a pinhole camera is varied?
  2. Give an intuitive explanation of the reason why a pinhole camera has an infinite depth of field, meaning everything is in focus regardless of the distances of the objects, and why a thin lens camera model will cause blurs on images.
  3. Prove that, in the pinhole camera model, three collinear points (i.e., they lie on a line)  in 3D space are imaged into three collinear points on the image plane. You may either use geometric reasoning (with line drawings) or algebra deduction (using equations).

B. Programming  Assignments on basic image transformation (Matlab preferred – here is a quick matlab tutorial.  You may use C++ or Java if you like, but you may need to bring your  own machine to me in my office hours to run your programs when I ask you. ) (total 70 points)

In this small project, you are going to use Matlab to read, manipulate and write image data. The purpose of the project is to make you familiar with the basic digital image formations. No matter what is the programming language you will use, please don’t use the built-in functions for image processing or computer vision, but rather you should write your transformation functions and array operations. Your program should do the following things:

  1. [5 points] Read in a color image C1(x,y) = (R(x,y), G(x,y), B(x,y)) in Windows BMP format, and display it.
  2. [5 points] Display the images of the three color components, R(x,y), G(x,y) and B(x,y), separately. You should display three black-white-like images.
  3. [5 points] Generate an intensity image I(x,y) and display it. You should use the equation I = 0.299R + 0.587G + 0.114B (the NTSC standard for luminance) and tell us what are the differences of the intensity image thus generated from the one using a simple average of the R, G and B components. Please be specific by analyzing your quantitative results rather than just speculation by looking at images.
  4. [10 points] The original intensity image should have K = 256 gray levels.  Please uniformly quantize this image into different K levels ( with K=4, 16, 32, 64).  As an example,  when K=2,  pixels whose values are below 128 are turned to 0,  otherwise to 255.  Display the four quantized images with four different K levels  and tell us  how the images still look like the original ones.
  5. [10 points] Quantize  the original three-band color image C1(x,y) into color images CK(x,y)= (R'(x,y), G'(x,y), B'(x,y)) (with uniform intervals for each band having K levels) , and display them. You may choose K=2 and 4 (for each band).  Do they have any advantages in viewing and/or in computer processing (e.g. segmentation)? Please be specific on your chosen image(s).
  6. [10 points] Please transform the original RGB image into (1) an HSI image, and (2) a YUV image, and display each band individually into a gray scale image. You may use the equation on the slides of I-4 for YUV, but you will have to figure out what are the equations from RGB to HSI. Please give some intuitive explanation of each band, and their relations if any.
  7. [25 points]. Suppose you apply the Sobel operator to each of the RGB color bands of a color image.  How might you combine these results into a color edge detector (5 points)?  Do the resulting edge results differ from the gray scale results?  How and why (5 points)? You may compare the edge maps of the intensity image (of the color image), the gray-scale edge map that are the combination of the three edge maps from three color bands, or a real color edge map that edge points have colors (5 points). Please discuss their similarities and differences, and how each of them can be used for image enhancement or feature extraction (5 points). Note that you want to first generate gradient maps and then using thresholding to generate edge maps.  In the end, please try to generate a color sketch of an image, such as the ID image of Prof. Zhu. You may also consider local, adaptive thresholding in generating a color edge map (5 points).

Please for each of the above, provide your analysis / observations / conclusions, rather than just show the experimental results in images and/or charts. I have provided a piece of starting code for you to use, in which Questions 1 and 2 have been done. You may use Prof. Zhu’s old ID picture for testing your algorithm.