Artificial Intelligence and Robotics blog
Computer Vision
With computer vision by your side you will never have to ask for directions again
Mar 14th
MIT’s Technology Review recently published an article describing how Augmented Reality (AR) will have a large impact in the way that we find information; they consider on of the 10 emerging technologies in 2007. Augmented Reality is a specialization of computer vision that attempts to extract the 3D structure of a scene from video and then insert with high precision virtual or 3D objects into the scene. AR can be useful in many applications including entertainment, heads-up displays, tourism and surgical visualization.
Consider for example visiting a foreign country and not having a map. If you want directions to your hotel from any location in the city then you can just take a photo with your cell phone camera and using the phone’s build-in AR software determine your location; the AR software would then retrieve directions wirelessly from a large database and present them to you by annotating the original photo. Another application would be retrieving information about a historical monument that you are looking at. An example of AR is shown in the following video from related research at the University of British Columbia,
Nokia is currently in the early stages of testing a cell phone equipped with AR algorithms to identify and provide information about a city’s landmark buildings. According to the Technology Review,
Last October, a team led by Markus Kähäri unveiled a prototype of the system at the International Symposium on Mixed and Augmented Reality. The team added a GPS sensor, a compass, and accelerometers to a Nokia smart phone. Using data from these sensors, the phone can calculate the location of just about any object its camera is aimed at. Each time the phone changes location, it retrieves the names and geographical coordinates of nearby landmarks from an external database. The user can then download additional information about a chosen location from the Web–say, the names of businesses in the Empire State Building, the cost of visiting the building’s observatories, or hours and menus for its five eateries.
You can find more examples of Augmented Reality at Nokia’s Mobile Augmented Reality Applications (MARA) project site and also at David Lowe’s website at the University of British Columbia.
The Semantic Robot Vision Challenge
Mar 11th
For the first time, researchers will compete in the Semantic Robot Vision Challenge (SRVC) as part of AAAI’s Mobile Robot Competition and Exhibition. The idea behind SRVC is to test machine learning algorithms applied to computer vision using online databases. The competition will essentially send mobile robots to a scavenger hunt.
The Semantic Robot Vision Challenge is a new research competition that is designed to push the state of the art in image understanding and automatic acquisition of knowledge from large unstructured databases of images (such as those generally found on the web).
Each robot will be given a textual list of objects that it must identify by searching in a confined area. It will first be allowed a small amount of time to go online and search on Google images for example photos of what the objects look like. The robot should use this information to learn a model of each object’s appearance without supervision. Lastly, it will be allowed to enter a given area and autonomously search for the objects by using its on-board camera. The winner is the robot that identifies the most objects correctly.
This challenge is not an easy task. The machine learning part is tricky because the robot must work with a large collection of images to be returned by Google. In addition, many of the example images will have lots of clutter making it difficult to identify the visual characteristics of just the object searched for. For example, if the robot is searching for images of bicycle helmets then many of the results returned by Google will likely include photos of people wearing helmets making it difficult to segment just the helmet without human supervision. The robotics part will also be difficult because it will require autonomous navigation and planning in a very large state space.
The final set of rules will be posted in just a few days on March 15th. The Semantic Robot Vision Challenge will take place during the 22nd National Conference on Artificial Intelligence to be held July 22-25 in Vancouver, Canada. The SRVC is also sponsored by the National Science Foundation.
I am really curious to see what kind of solutions people will come up with in order to complete this object discovery task.
ASPOGAMO: A computer vision system for the visual tracking of soccer players
Jan 24th
The Intelligent Autonomous Systems Group at the Munich University of Technology (TUM) is developing a system for automatically tracking soccer players during live games. The idea is to use the camera data from live TV broadcasts to estimate the trajectories of all the players visible in order to determine their intent as well as the overall team strategy. Part of the Automated SPOrt Game Analysis Model (ASPOGAMO) was recently presented during the International Joint Conference on Artificial Intelligence held at Hyderabad, India.
The paper written by M. Beetz, S. Gediki, J. Bandouch, B. Kirchlechner, N. Hoyningen and A. Perzylo focuses on the visual tracking aspect of ASPOGAMO. The system takes as input images captured with TV broadcast cameras and in near real-time estimates the camera’s direction and zoom factor and the tracking and smoothing of player trajectories.
The team demonstrates the system’s abilities with taped data from live broadcasts of World Cup 2006 games that was held in Germany last summer. The project’s website is hosting a number of short video clips showcasing ASPOGAMO’s ability to track the players and ball’s positions. The paper reports that ASPOGAMO estimates players’ positions to within 0.5 meters. In addition, the system’s player detection rate is over 90% for what are considered challenging image sequences.
Tracking multiple targets using a moving camera is a very difficult problem in computer vision. Recent advances in object detection and state estimation using stochastic frameworks is making systems such as ASPOGAMO possible. I expect that in less than 5 years, a computer will be able to watch a soccer game live, understand a team’s strategy and make suggestions for countering it.
Ookles wants to automatically organize your photos
Dec 18th
Ookles is a new Web company that will officially launch early next year (2007) with the hope to help people organize their digital photos using machine vision technologies. Ookles wants to be a website similar to the very popular Flickr. However, its founders want to automate the semantic labeling and categorization of user photos taking advantage of recent advances in machine vision algorithms. Currently, there is little information available about how Ookles is supposed to work, however, Michael Arrington (TechCrunch) was privileged to an early look. He briefly explains how Ookles uses supervised learning to find photos of the same person using face recognition.
Like Riya, Ookles will find and show thumbnails of faces from photos, and then analyze that face against other faces in your photos. You tell Ookles which ones are a match. Ookles repeats the process a couple of times until it has a good idea of who the person is. It will then tag all photos with the name, and future photos containing that person will also be auto-tagged. The demo worked perfectly – it took a few steps to train it and then all photos were properly tagged.
Arrington also discusses a feature that enables Ookles to group similar photos together into albums. Ookles hopes to use object recognition techniques to automatically label a user’s digital photos. How well this will work compared to human tagging is not known but I don’t expect it to be as good. In fact, I would not be surprised if the early version of Ookles simply suggests tags which the user could use for each photo.
Finding ways to automatically categorize the large number of digital photos available online is certainly a worthy cause. To my knowledge, however, even state of the art machine vision algorithms are not currently capable of doing this. It will be interesting to track the progress of Ookles during 2007 and hopefully very soon any one of us will be able to try out their system.
CMU’s TK60 symposium to honor Takeo Kanade on his 60th birthday
Dec 13th
Carnegie Mellon University (CMU) will organize a special symposium titled “Celebrating Kanade’s Vision.” The event known as the TK60 symposium is centered on Robotics Institute’s Professor Takeo Kanade and it is meant to honor him for his contributions to computer vision and robotics in the last 30+ years. The symposium will take place in Pittsburgh on March 8th and 9th 2007.
Throughout his career, Takeo Kanade, U.A. and Helen Whitaker University Professor, has collaborated with researchers and students from a variety of scientific disciplines who are honoring him with this symposium on the occasion of his 60th birthday.
Some of Takeo Kanade’s most notable contributions include his work on face recognition, multi-baseline and real-time stereo vision, video surveillance and monitoring (VSAM,) robot manipulators (CMU DD Arm I) and virtualized reality. In fact, one of his most well known projects is the “Eye Vision” system that he developed for CBS and it was used to create breathtaking, matrix-like replays of the action during the broadcast of Super Bowl XXXV in 2001.
The symposium will include a number of invited talks. Some notable speakers include Tomaso Poggio (MIT) and Harry Shum (Microsoft Research) among many others. A special Distinguished Lecture from President Yuichiro Anzai of Keio University will kick-start the event on the morning of March 8th.
Interesting online demos of visual illusions
Oct 16th
I came across some very nice demonstrations that show that our brain perceives the color, form and brightness of objects differently depending on the scene. For example, the color that we perceive is directly affected by the color of the objects surrounding it. In other words, there is no direct mapping between the energy reflected by an object and our perception of it. Understanding these visual illusions is essential to understanding how the human brain works. Artificial Intelligence research can benefit greatly from understanding human perception because such knowledge could be used to construct computational models of perception allowing us to design artificial cognitive agents.
The lottolab at the University College London (UCL) is performing a large amount of work on human perception including a number of visual illusions about color, form and brightness. You can find descriptions and demos of many such illusions posted on their website.
Tyzx real-time 3DAWARE vision systems
Aug 30th
Tyzx is a company that specializes in real-time 3D or stereo vision cameras and software. A stereo camera works similarly to human vision. Using two cameras that are spatially separated, stereo vision software computes the location of the same object in both images and then using geometry it computes the distance to that object. It turns out, however, that doing this computation is expensive especially if one cares about accuracy.
So, how does Tyzx technology solves these problems? According to their website, “Tyzx’s 3DAWARE technology relies on stereo vision, based on a scientific concept called parallax shift. It works much like the human eye and brain. A pair of inexpensive CMOS imagers captures left and right views of the same scene. The left and right images are compared to find the same pixel in each image and determine their relative shift with sub-pixel accuracy. The apparent shift of corresponding pixels is directly related to the object’s distance from the sensor. The image information is processed at rates of up to 132 frames per second (films shown in movie theaters have a frame rate of 30 per second) to track moving objects over a wide range of distances. The computations are performed at over 50 billion calculations per second (compared to today’s fastest microprocessors that perform approximately 4 billion calculations per second) while using very little power.”
There are many applications for 3D vision technologies. A couple of days ago we saw how NEC’s new IMAPCAR sensor will be used in cars to detect obstacles and help drivers avoid collisions. Earlier, I also posted about SEEGRID’s efforts to use volumetric grids constructed using stereo vision for robot navigation. Tyzx claims that there are even more applications including human-computer interfaces, security, video gaming and consumer electronics.
The payoff for functional 3D vision systems is predicted to be large. “By 2010, according to UK-based semiconductor researcher Future Horizons, there will be 55.5 million robots in the world—many requiring 3D vision—in a market worth some $60 billion. Industry research firm Frost & Sullivan projects a compound annual growth rate of 12.7% for the security and surveillance market alone. They estimate the market will be an $11 billion industry by 2008, as manufacturers migrate to digital with many new smart systems benefiting from 3D vision technology.”
Recent Comments