I had to divide this in two parts because of some critical decision making and idea debates.
Yes, on the second day, Samantha and I exchanged ideas and pros and cons of each kind of approach that we could take to accomplish our project :
- If we use the Microsoft Kinect, we could capture the skeleton of the user via the Infrared Projector and Infrared Sensor on the Kinect, so the vision processing part would be mostly solved. All left would be (though challenging) - the identification of the gesture and mapping gestures to music files. Further, the Kinect has a lot of libraries for both Mac OS X and Windows, and those can be easily used with the Android tablet and Mac and Windows computers.
- If we use a simple camera with OpenCV for image/video processing, we would have to literally do a PhD thesis within 5 day span to figure out the locations of body joints of the user from a single 2D image. Naturally, for a first-attempt vision project this is difficult. But the advantage would be that the application could be designed for all kinds of devices including the iPhone.
To be realistic and to enjoy some risk taking, we decided to start working with the Kinect first and see how that goes. We aim that if the Kinect is not helping us in making the application, we will switch to OpenCV.
So, a step-wise plan (mostly tailored for working with the Kinect) is as follows:
- Get the skeleton image and information of 1 single person.
- Since the demonstration is in a public and heavily occupied space, focus on the user of our application in the crowded area.
- Decide a maximum of 3 gestures for demonstration purposes for the application, to be installed by the following procedure:
- Have SOME user Motion —-> Play Music (Any file)
- Have 1 SPECIFIC gesture (sequence of postures rather than a ‘still’ posture) —-> Play Music (Any file)
- Have 1 SPECIFIC gesture (sequence of postures rather than a ‘still’ posture) —-> Play 1 SPECIFIC music file/Playlist
- Install other 2 SPECIFIC gestures —-> Play other types of music files/playlists
- Note: All these music files will be played on the laptop rather than a mobile device
Now enough talking ! Lets jump to some real action!
So I spent most of Sunday night trying to get the Kinect working with Mac OS X Mountain Lion. Here are some things that happened in this process:
- I learnt that the OpenNI (Open Natural Interaction) library is used for interacting with devices like the Kinect (devices with Infrared cameras) as there is no external input from a mouse or keyboard (only body movements by the user).
- There are a TON of websites for helping with installation for Kinect software. When I say Kinect software I mean libraries that we can use for development and tapping resources on the Kinect for Xbox 360 (that is what we are given by the Build18 competition Staff). Kinect for Windows is a good option too – it has really cool libraries for many applications. But not for Mac OS X.. I particularly liked this one : http://developkinect.com/resource/mac-os-x/install-openni-nite-and-sensorkinect-mac-os-x, however, I got really frustrated after re-doing the procedure for about 6 times, so I used old versions of the OpenNI and NITE libraries with SensorKinect to get things working and code in C++.
- The sad thing was that I couldn’t find any source code guides/documentation to help me with skeleton tracking using these libraries (I promise I searched a lot). During my search I saw Processing being a really poplular choice, so I decided to give it a shot. This website is AWESOME : http://learning.codasign.com/index.php?title=Using_the_Kinect_with_Processing . I will eventually be uploading some videos of modified, compiled code (credits to the website http://learning.codasign.com/index.php?title=Using_the_Kinect_with_Processing for cool tutorials). I am still unhappy with the inaccuracy of the skeleton mapping, but its a wonderful feeling to actually code something and make things work!!!
So we really want to do the complicated gesture tracking rather than a ‘still’ gesture like having your hands up in the air for ages. For a simple (yet complicated as it involves the motion of hands, look at the following demo - http://youtu.be/DkgDjqiWArc). We AIM to have some sort of a program/tool to recognize that the user is moving their right hand diagonally (left in the reflected picture). So that is the fun part
PS – I really like Processing as a beginner for Computer Vision and Kinect projects. There is OpenCV for Processing as well but I am trying to decide the algorithm for gesture recognition from the skeletal information at this point. In my next post, I will describe what Samantha and I are doing for the music player and how we will deal with accuracy of skeletal information.