login

Sudeep's Blog

Disorganized Thoughts in Organized Manner

Create a new post>>   |   Blog Map>>

Kaggle - Whale Detection Challenge

2013-04-08 Share on Tumblr

Today marks the end of Kaggle's MarineExplore Whale Detection challenge. The challenge, simply stated, is this: You are given You are given a set of 2-minute .aiff sound files, some containing sound from some species of whale, while others containing other ambient noises in sea  (possibly including sounds from different species of whale). The dataset consists of a 0/1 label train data (30000 samples) and a unlabelled test data (54503 samples). The challenge was to predict the presence of the relevant species of whale in test set . 

Like many, my initial approach was to read the aiff files and directly use sound frequencies from the file as features. This approach helps 'break-into' the 0.90 AUC (Area-Under the-Curve) score. Some of the most successful submissions, however, treat the problem as an image-processing problem, treating  audio spectrogram as relevant feature. Check  this forum for more information on these approaches. 

Using this approach, I have been able to obtain an AUC of 0.96016  with a  respectable 56th place out of 249 participants. This gives me a (sorta) coveted  Top 25% badge on Kaggle.  Click here to checkout my code on Github. 

Kaggle competition and ipython notebook

2013-03-22 Share on Tumblr

I've recently participated a basic Kaggle Competition arranged by floks at Scikit. Here is a link to the competition. http://www.kaggle.com/c/data-science-london-scikit-learn

My biggest take-away from it is ipython notebook. A cool tool like R notebook to run and document your data analysis in browser. Here is my first ipython notebook: http://nbviewer.ipython.org/url/dl.dropbox.com/u/69791784/ipython%2520notebooks/Expository%2520Analysis.ipynb.

Python, sklearn, pandas, numpy/scipy

2013-02-25 Share on Tumblr

I have been playing with Python's machine learning/big data packages and must say that they give R quite a run for money!

For now, I can offer a step-by-step installation guide for installing these packages on mac OSX. Click here

Finally,  here is the main page for sklearn and amazing things it can do. Go like!

Last Day in Mumbai

2013-01-09 Share on Tumblr

Nothing much to report, except this new exciting course announced on Coursera....Startup Engineering.

Multithreading in C++

2012-12-02 Share on Tumblr

I remember using C++ pthread_mutex's in ancient past (well, during undergraduate years). That was my entire exposure to multithreading. Well, that and a little of Java's Thread.

That was till last week. Like everything else, C++ multithreading has been give a boost (pun, eh) with boost library. Here, in nutshell is how it works: 
  • Every boost::thread requires an object of type callable, with a operator() overloaded.
  • The object passed to a thread must be callable. If not, it's best to pass a shared_ptr to the object.
  • The boost::thread object itself is not copyable. But it can be placed in a move-aware container. I found it best to create a vector of shared_ptr's to thread objects and pass them to boost::Thread_group using an add_thread method.
  • Thread synchronization is achieved by simply calling join on Thread or join_all on Thread_group
Check this for more on threads.

I Live

2012-11-23 Share on Tumblr

Survived a big storm and a bout of cough/cold after that. 

In the meanwhile, learned a thing or two about  factory pattern .

The best resource I can give right now is this stackoverflow post.

Hurricane Sandy Tweets

2012-10-29 Share on Tumblr



R with big data

2012-10-26 Share on Tumblr

I am currently digging for resources on 'big data with R'

Here are few that I have found
Page: 1 | 2 | 3 | 4 | 5 |