I am a French engineer and PhD student in machine learning. I wanted to apply data mining to the medical field in order to bring a meaning to what I do, so I chose to do my PhD on the automatic diagnosis and prognosis of neurodegenerative diseases at AramisLab. I also enjoy signal and image processing, which I studied during my engineering studies at the INSA of Rouen.
I love to read and take pictures, and I am always eager to travel and discover new places and cultures. It is why I spent a year as an exchange student in the US in 2011, and I would enjoy having another international experience after my PhD.
What I do
I am currently doing my PhD in the AramisLab team, in the Brain and Spine Institute (ICM), in Paris. I am supervised by Didier Dormont and Stanley Durrleman.
I am working on building an automatic diagnosis and prognosis system for neurodegenerative diseases by using longitudinal data, which are repeated measures taken at different time points for each individual. I am using statistics and machine learning methods, such as kernel density estimation.
As part of my engineering studies, I have been working during one year on an end-of-studies project. I was part of an 8-member team, working with Scrum methods 22 hours a week. We were working for Libon, by Orange Vallée, which is a VoIP application. We studied the different applications of graph processing and data mining to social networks, and we developed a program to detect communities in the network formed by Libon users. We used Scala, Spark and GraphX.
In summer 2015 I have done my specialization internship at Creative Data, a French start-up in Rouen. I created a Scala program to automatize getting and cleaning open data from the French platform data.gouv and saving it to HDFS and Hive Table. I also analysed different prediction tools for the company. I worked on a Kaggle competition which goal was to identify different hand movements based on electroencephalograms, in Python. I also followed a sale-forecasting project and wrote its unit tests in R.
In 2014, I have been the president of AJIR, the Junior Entreprise of this INSA of Rouen. A Junior Entreprise is an association managed by students which goal is to realize projects for companies. As a president, I managed a team of 10 students, decided the strategy of the association, met with the clients and supervised scientific projects. I have also been an auditor for the CNJE (the French confederation of Junior Entreprises). Therefore I audited other Junior Entreprises in France in order to help protect the label and advice them in their development.
As part of my summer internship at Creative Data, in 2015, I worked on a Kaggle competition. The goal of this competition was to identify different hand movements based on electroencephalograms. I worked on several Python scripts in order to solve this problem.
I first tried some very classic machine learning algorithms, such as logistic regression, SVM and random forest. Then I tried neural networks, which performed way better on my validation set. I used a dense neural network made of two couples of layers dense-drop out and a convolution neural network made of a convolution layer, a max-pooling layer and a dense layer. I used a weighted mean of the scores predicted by those two networks to compute my final result.
Seeing that one class was over-represented, I tried re-balancing the classes by selecting only 20% of the data from the biggest class. The result wasn't good because a lot of data was lost so I didn't do this for my final solution. I also applied a band pass filter, as well as a Common Spatial Pattern algorithm which I used to create new variables. I also tried reducing the number of features but it resulted in a greater test error.
Recognition of 3D point clouds
I worked on a school project which goal was to implement solutions from the paper "Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes", by Andrew E. Johnson and Martial Hebert. The goal was to identify 3D point clouds representing an object on the road (pedestrian, car...).
We first practised on only two classes, using a Gaussian and a linear SVM. We decided to keep the linear SVM for the multi-class algorithm. We used an SVM one versus all, and an SVM one versus one, for which we tried two kinds of parameter validation: one where the parameter was chosen for each SVM one versus one and one where the same parameter was used for all SVMs.
The choice of the attributes was based on the recommendations of the article we were studying. We used statistics on the intensity of the points, the bounding box and the attributes scatter-ness, linear-ness and surface-ness. As one class was over-represented compared to the others, we also rebalanced the classes.
I studied data mining in class during three semesters. It is hard to sum up everything we have seen during this time and the algorithms we have implemented, but here are some topics:
- Unsupervised data mining: k-means, fuzzy k-means, k nearest neighbours, hierarchical clustering, PCA;
- Optimisation: without constraints (gradient descent and Newton), and with constraints;
- Regression: linear and polynomial regression;
- Classification: SVM, neural networks, random forest, bagging, Bayesian decision, logistic regression;
- Focus on SVM: multi-class SVM, hyper-parameters tuning, kernels;
- Others: Lasso, ridge regression.