卷积神经网络Quiz3

来源：互联网发布：英国域名编辑：程序博客网时间：2024/06/05 14:52

Question 1
You are building a 3-class object classification and localization algorithm. The classes are: pedestrian (c=1), car (c=2), motorcycle (c=3). What would be the label for the following image? Recall y=[p_c,b_x,b_y,b_h,b_w,c₁,c₂,c₃]
这里写图片描述
y=[1,0.3,0.7,0.3,0.3,0,1,0]

y=[1,0.7,0.5,0.3,0.3,0,1,0]

y=[1,0.3,0.7,0.5,0.5,0,1,0]

y=[1,0.3,0.7,0.5,0.5,1,0,0]

y=[0,0.2,0.4,0.5,0.5,0,1,0]

解析：首先图像里面有目标，所以pc为1，因为是car，所以c1为0，c2为1，c3为0，图像的位置大概在0.3,0.7左右，所以bx= 0.3,by=0.7，汽车大小差不多占图像的0.3*0.3，所以bh = bw =0.3

Question 2
Continuing from the previous problem, what should y be for the image below? Remember that “?” means “don’t care”, which means that the neural network loss function won’t care what the neural network gives for that component of the output. As before, y=[p_c,b_x,b_y,b_h,b_w,c₁,c₂,c₃].
这里写图片描述
y=[1,?,?,?,?,0,0,0]

y=[0,?,?,?,?,?,?,?]

y=[0,?,?,?,?,0,0,0]

y=[1,?,?,?,?,?,?,?]

y=[?,?,?,?,?,?,?,?]

解析：没有目标，所以第一项为0，其余不关心，为?

Question 3
You are working on a factory automation task. Your system will see a can of soft-drink coming down a conveyor belt, and you want it to take a picture and decide whether (i) there is a soft-drink can in the image, and if so (ii) its bounding box. Since the soft-drink can is round, the bounding box is always square, and the soft drink can always appears as the same size in the image. There is at most one soft drink can in each image. Here’re some typical images in your training set:
这里写图片描述
What is the most appropriate set of output units for your neural network?

Logistic unit (for classifying if there is a soft-drink can in the image)

Logistic unit, b_x and b_y

Logistic unit, b_x, b_y, b_h (since b_w = b_h)

Logistic unit, b_x, b_y, b_h, b_w

解析：有两个要求，第一个是是否有饮料罐，其次是位置，所以需要b_x, b_y, b_h, b_w，而b_w = b_h，所以只需要3个即可

Question 4
If you build a neural network that inputs a picture of a person’s face and outputs N landmarks on the face (assume the input image always contains exactly one face), how many output units will the network have?

解析：
这里写图片描述
在人体姿态检测中，同样可以通过对人体不同的特征位置关键点的标注，来记录人体的姿态。一个特征位置需要(x,y)两个值表示，所以需要2N

Question 5
When training one of the object detection systems described in lecture, you need a training set that contains many pictures of the object(s) you wish to detect. However, bounding boxes do not need to be provided in the training set, since the algorithm can learn to detect the objects by itself.

True

False

Question 6
Suppose you are applying a sliding windows classifier (non-convolutional implementation). Increasing the stride would tend to increase accuracy, but decrease computational cost.

True

False

解析：增加stride，相当于检测的更少了，所以精度不可能提高

Question 7
In the YOLO algorithm, at training time, only one cell —the one containing the center/midpoint of an object— is responsible for detecting this object.

True

False

解析：将对象分配到一个格子的过程是：观察对象的中点，将该对象分配到其中点所在的格子中，（即使对象横跨多个格子，也只分配到中点所在的格子中，其他格子记为无该对象，即标记为“0”）；

8。Question 8
What is the IoU between these two boxes? The upper-left box is 2x2, and the lower-right box is 2x3. The overlapping region is 1x1.

这里写图片描述

1/6

1/9

1/10

None of the above
解析：交集为1，并集为9

Question 9
Suppose you run non-max suppression on the predicted boxes above. The parameters you use for non-max suppression are that boxes with probability ≤ 0.4 are discarded, and the IoU threshold for deciding if two boxes overlap is 0.5. How many boxes will remain after non-max suppression?
这里写图片描述
3

解析：NMS算法：
以单个对象检测为例：
对于图片每个网格预测输出矩阵：yi=[Pc bx by bh bw]，其中Pc表示有对象的概率；
抛弃Pc⩽0.6 的边界框；
对剩余的边界框（while）：

选取最大Pc值的边界框，作为预测输出边界框；
抛弃和选取的边界框IoU⩾0.5的剩余的边界框。

对于本例，首先抛弃Pc小于0.4的目标，即右下角的小汽车；之后，选最大Pc，即0.98，没有交并集，count=1；其次选0.74，与0.46的Tree存在交并集，交并集小于0.5，不抛弃，count=2；接下来选0.73，抛弃0.62的car，count=3；接下来选0.58，count=4；最后是0.46的tree，count=5

10。Question 10
Suppose you are using YOLO on a 19x19 grid, on a detection problem with 20 classes, and with 5 anchor boxes. During training, for each image you will need to construct an output volume y as the target value for the neural network; this corresponds to the last layer of the neural network. (y may include some “?”, or “don’t cares”). What is the dimension of this output volume?

19x19x(25x20)

19x19x(20x25)

19x19x(5x20)

19x19x(5x25)

解析：Pc, bx,by,bh,bw占5个，剩下20个为classes，所以需要25个，有5个anchor boxes，需要乘以5

参考：http://blog.csdn.net/koala_tree/article/details/78597575

阅读全文

0 0