The coach is concerned that though the team has won 9 of its last 10
games, it has done so on the strength of the field goal kicker. The offense has been
unable to reliably move the ball once inside the "red zone" --- the area of the
field inside the opponent's 20 yard line.
In preparation for the big game next week, the head coach asks the Video Athletic
Coordinator (VAC) for some compilation tapes. The tapes should contain every offensive
play over the last three years in which the ball is within the red zone, it is 2nd or 3rd
down, there is more than 5 yards to go, and the team executed a running play. Furthermore,
since he is going to use the tapes to review particular plays with the team, he'd like to
separate the draw plays from the sweeps from the traps. The VAC types the necessary
information into the computer, and a short while later hands the coach the separate tapes
he asked for.
The coaching scenario above is not fiction. Every university
with a significant football program as well as every professional football team has a
Video Athletic Coordinator and a video database of all the games played. The video is
recorded using a camera with a high vantage point that is controlled by a cameraman tasked
with keeping all the players in the field of view. Using specialized database software,
the VAC manually annotates every play, recording attributes such as yard line, down, yards
to go, formation, type of play executed, and result. These descriptions, along with
timecode information, are used to automatically edit input tapes into the necessary
compilations. The NFL has recently converted to using all digital media so that the video
can be accessed and viewed directly from a computer.
Video annotation is the task of generating such descriptions.
It is different than conventional computer vision image understanding in that one is
primarily interested in what is happening in a scene, as opposed to what is in the scene.
The goal is to describe the behavior or action that takes place in a manner relevant to
the domain. In the "football domain," we would like to build a computer system
that will automatically annotate video automatically or provide a semi-automatic process
for the VAC.
Video annotation is a problem that will become much more important in the next few years
as video databases begin to grow and methods must be developed for automatic database
summary, analysis, and retrieval. Other annotation problems being studied in the Vision
and Modeling Group of the MIT Media Lab include dance steps and human gesture .
We have chosen to study the automatic annotation of football plays for four reasons: (1)
football has a known descriptive language, (2), football has a rich set of domain rules
and domain expectations, (3), football annotation is a real-world problem, and (4) it's
descriptive language is the football playbook. Players, coaches, and fans have developed a
categorization system that includes virtually all possible plays . The classification
problem is difficult, however, because distinctions between play types can be subtle and
there is a significant amount of variation of player movement between plays within the
same category. Fortunately, a football game is governed by the rules of the game and
expected events. These rules and likelihoods must be used to identify the key events in a
play that can be used to assign the most appropriate play label.
Automatic or semi-automatic video annotation is not the prototypical computer vision
problem, but it will become increasingly important as access to video databases increases.
An automatic football annotation system must have some input
data upon which to make a preliminary play hypothesis. In the football annotation problem,
we are using player trajectories. In the first stage of our annotation project, we have
implemented a computer vision football-player tracker that uses contextual knowledge to
track football players as they move around a field.
A football play annotation system has many uses from generating
"chalkboard" diagrams for network newscasters (better than the current
chalkboard systems) to home "call-the-play" television sets that score living
room coaches on how well they predict each play. The most direct application is to aid the
VAC in the tedious play annotation task and to provide better data to professional,
college, and high school coaches.