Here are few facts and announcements on the meetup.com:
* About the Google Deep Learning Project: Andrew Ng and team built massive deep learning algorithms which resulted in a highly distributed neural network trained on 16,000 CPU cores that learned by itself to discover high level concepts such as common objects in video
About Andrew Ng:
• Founder of Coursera and Seminal Instructor of Machine Learning
• Early work includes the Stanford Autonomous Helicopter project, ROS (Robot OS), and STAIR (Stanford Artificial Intelligence Robot) project
• Named by Time Magazine in 2013 as one of the 100 most influential people in the world
As far as I know, Andrew is quite a busy person to handle his own (co-founded) company, Coursera, but here he talks his recent work on the "Deep Learning", so I was a little bit doubt before attending this talk, but you'll see in this post that such a worry wasn't necessary with him.
First of all, he started his previous experiences in Coursera, he mentioned that uploading the machine learning course on the web could reach more than 100k people that can be done by usual lecturing for 220 years in universities. So he founded Coursera, to distribute courses to anyone in the world, and it is now a successful startup in the world.
Then, he mentioned his very recent practical work done with machine learning, "Computer Vision" problem. Just telling something like "there is a motor cycle" in the image is very hard using the existing computer vision techniques, so he started to investigate such a "image object recognition" task using a machine learning approaches. Basically, FYI, classification problem on the machine learning is to utilize the "features' which can be captured and/or extracted from the source data. But in the world of the "Computer Vision", there are too many of approaches and heuristics to fix and match done by humans to maximize performance with the existing classification targets.
He then mentioned a "one learning algorithm" hypothesis such that is based on "seeing is believing", or "one picture is worth to thousands of words" in the human nature.
His approach is considering the whole image as chunks of "14 x 14 image patches", and then maximum 196 different image classifications is possible, but he selected "64 pre-determined images" and use them to represent existing all image patches as a combination of "64" pre-patches. Note that, usually image classifications can be done with existing 'edge detection mechanisms'. He mentioned the approach as "Sparse coding", and it actually act as a coding as a form of reduced data.
Using such a sparse coding on the images, he did an unsupervised learning but using the discriminated input image files that is having 'this image includes motorcycle', or 'no motorcycle at all'. Using this approach, he could defeat all the well-known combinations of known image recognition mechanisms.
Then, he mentioned how he did started the deep learning project with massive amount of cores. He wanted to maximize (or, increase) the performance (quality) of machine learning process, so he used a 'server-client' approach to run a massively distributed, and parallel way.
His effort is now ported to Android's speech recognition technology, so that each people's voice can be an input to get trained by each target people. This way, Google's speech recognition can get understand better as time goes by, than existing competitors' ones. Also this is quite very useful on Google's StreetView, since they need to classify different numbers in each house to know the address in detail.
Still, we need too much uncovered, and too far way behind towards the real-world adoption, but his approach outperformed in the world of 'image recognition' tasks.
Recently, he did also some part of engineering with GPUs, (using Stanford GPU clusters) and he confirmed that GPUs could further increase the performance of machine learning tasks.
If you're interested in Deep Learning, please visit the sites below;
Here are the Q&As received from #PayPalTechX, and here are few answers from Andrew Ng.
Q) What are your preferred programming languages and why? (a kind of silly question)
- A: Matlab, R, Python: since Python is open-source, and R is good for machine learning.
Q) The term, 'Deep Learning', where did it come from?
- A2: Also, enabling the 'unsupervised machine learning' is a key vision in the 'Deep Learning' project, by utilizing the massive amount of computational power available in the world.
Q) What about the data bottleneck? (not the computing power bottleneck)
- A: Due to the bandwidth limitations on multimedia, and volume of data is exploding (massively increasing) too fast, so the computing power may not be able to scale up to the massive amount of data in the world.
Q) Do you pre-process the audio and video data to use in the Deep Learning process?
- A: We do some pre-processing like whitening images to increase contrast, but we do as minimal as possible before the Deep Learning process.
Q) Brain-machine interfaces? are we close or still far away from them?
- A: Thanks to the Neuro-scientists, some specific (limited) amount of tasks can be done between brain and machines, but there are many different unsolved problems not even touched before.
Even though, we're very far behind toward a human-level robot, or machines thinking like humans, but we hope to use the real 'Deep Learning' in the mainstream.
- written by ANTICONFIDENTIAL at San Jose in August 27, 2013