The main task is to find the result of an equation based on a video sequence.
The equation will be indicated by a moving robot.
Several mathematical operators (multiplication, division, minus, plus, equal) are placed on the table.
Several handwritten digits (0 to 8) are placed on the table.
From an initial location somewhere on the table, the robot moves around the table.
Each time the robot passes above an operator or a digit, the symbol located below the robot is added to the equation.
For example, the sequence “2” → “+” → “3” → “=” becomes “2+3=”.
The goal is, given a new video sequence, to retrieve the formula and its associated answer.
To test the pipeline, three different scenarios will be presented:
SC1: All operators/ digits have vertical orientations.
SC2: Both operators and digits have random orientations.
SC3 (Bonus): Both operators and digits have black colors. Orientations are random.
The input of the algorithm is a “.avivideo sequence, recorded at 2 FPS. The output should be a video sequence with
the same frame rate, duration and resolution as the input video. Each frame (e.g., frame at time t) of the output video
should contain the following information, printed on the same frame:
The current state of the formula at time t.
The trajectory of the robot from start to time t.
1
Project Overview
IAPR Project 2020
Evan Béal, Maxime Délitroz & Eric Bergkvist
Global architecture
Object-Oriented Programming:
Object class: Robot, Number and
Operator derived classes
Equation class
IAPR Project 2020
Evan Béal, Maxime Délitroz & Eric Bergkvist
2
Workflow
Segmentation
Objects (num & op) on initial frame
Robot in all frames
With and without colored operators
Description
Classification
Numbers
Operators
Robot
Operator
Segmentation &
Classification
IAPR Project -2020
3
Scenario 1 & 2
Scenario 3
Generalized Hough transform
Number of sub-parts
Fourier descriptors
First assessment
Generalized Hough transform
Second assessment
Number of sub-parts
Elongation
Third assessment
Fourier descriptors
Reference image
Target image
Evan Béal, Maxime Délitroz & Eric Bergkvist
Operator
Segmentation &
Classification
IAPR Project -2020
4
Scenario 1 & 2
Scenario 3
Generalized Hough transform
Number of sub-parts
Fourier descriptors
First assessment
Generalized Hough transform
Second assessment
Number of sub-parts
Elongation
Third assessment
Fourier descriptors
Reference image
Target image Accumulator array
Evan Béal, Maxime Délitroz & Eric Bergkvist
Operator
Segmentation &
Classification
IAPR Project -2020
5
Scenario 1 & 2
Scenario 3
Generalized Hough transform
Number of sub-parts
Fourier descriptors
First assessment
Generalized Hough transform
Second assessment
Number of sub-parts
Elongation
Third assessment
Fourier descriptors
Reference image
Target image Accumulator array
Detection
Evan Béal, Maxime Délitroz & Eric Bergkvist
Number
Classification
IAPR Project 2020
6
"TI-pooling: transformation-invariant pooling for feature learning in Convolutional Neural
Networks" D. Laptev, N. Savinov, J.M. Buhmann, M. Pollefeys, CVPR 2016.
Network architecture: TI pooling
Results on MNIST test set 2.65% error rate
Segmented number processing
Randomly selected samples with random rotations applied
Evan Béal, Maxime Délitroz & Eric Bergkvist
Results on the
most complex
scenario
IAPR Project -2020
7
Evan Béal, Maxime Délitroz & Eric Bergkvist
1 2
34