Wednesday, October 13, 2010
I was wondering how long it could take to write a multivariate classifier in python.
With python and numpy it isn't long. We simply need to be able to compute the covariance matrix, the determinant and to inverse a matrix (covariance matrix). Even if the matrix is singular, which mean it can't inverse it, you can compute the pseudo-inverse (Moore-Penrose) easily (i.e.: numpy.linalg.pinv).
As expected, assuming too much about the data lead to poor classification.
You can find a simple python program of 75 lines here.
Sunday, October 10, 2010
Dimensionality reduction is a powerful approach to reduce inputs size, reduce training time and visualize data.
As an example, you can use PCA(Principal Component Analysis) or ICA (independent component analysis) or LLE (Locally Linear Embedding).
to see class grouping. You can try it on your data easily with python in a couple of lines.
import mdppca = mdp.pca(ds.data)pylab.title("PCA")pylab.plot(pca[:,0], pca[:,1], '.')
The figure presents the PCA dimensionally reduction applied on a digit dataset. You can find the source code here to see you to do a PCA, ICA or LLE using python. Unfortunately, ICA doesn't work on our dataset because it doesn't converge.
Saturday, October 9, 2010
If you are looking to watermark a pdf, you can use this simple appengine service:
This service use pdfrw (a PDF file manipulation library written by Paul Gauvin) and reportlab. pdfrw is much faster then pypdf for watermarking.