I am graduate engineer (B.Tech) from DPGITM (Maharshi Dayanad University), Gurugram. I’m eager to learn, proactive, meticulous and I love nothing more than working with people, building amazing software and solving technical problems. My major areas of interest are mathematics, solving real world data science problems , software development, contributing to open source community.
I have good experience with Machine Learning, Deep Learning and NLP. I worked on scikit-learn, XGBoost and tensorflow for solving various real world classification, regression and clustering problems using Logistic Regression, SVM, Random Forest, K-Means and other techniques. I posted my some of Data Science projects here.
I have good experience with web scrapping. While I working with a startup, I build a scrapper and deployed on DynamoDB in JSON form using Scrapy, Python and Tesseract (OCR).
I did my Summer 2016 internship at Xavient Information System, Noida in Java Development where I build a dynamic web application using Java, Web Technologies and MYSQL.
I did my Winter 2017 internship at Xavient Information System, Noida In Big Data and Hadoop. Here I get exposure to Big Data Ecosystem and other functionalities. I set up multi-node cluster (CentOS 7) using Hortonworks Data Platform with many other big data services such as Spark, HDFS, Zookeeper, Kafka, etc... I also worked on Apache Hadoop, Hive, Hbase, Apache Kafka, Apache Storm and Apache Spark and build an end to end data ingestion application which provides batch and real time processing on any data format such as CSV, JSON, XML, etc..
I'm available for remote work - if you would like to build something together!
Predict the pick up density of yellow cabs at a given particular time and a location in new york city using Linear Regression, Random Forest, XGBoost, Time Series Forecasting and Fourier Transformation.
View More..
Classify the given genetic variations/mutations based on evidence from text- based clinical literature using Logistic Regression, Random Forest, TF-IDF and Feature Engineering.
View More..
Suggest the tags based on the content that was there in the question posted on Stackoverflow. Techniques used : Logistic Regression (One vs Rest Multilabel Classifier).
View More..
Identify which questions asked on Quora are duplicates of questions that have already been asked. This could be useful to instantly provide answers to questions that have already been answered. We are tasked with predicting whether a pair of questions are duplicates or not. Techniques used : Logistic Regression, Linear SVM and XGBoost.
View More..
Netflix provided a lot of anonymous rating data, and a prediction accuracy bar that is 10% better than what Cinematch can do on the same training data set. (Accuracy is a measurement of how closely predicted ratings of movies match subsequent actual ratings). Techniques used : XGBoost, SVD++.
View More..
Identify whether a given piece of file/software is a malware. Techniques used : KNN, Logistic Regression, Random Forest and XGBoost.
View More..
Build a recommendation engine which suggests similar products (apparel) to the given product using (amazon.com) dataset. Techniques used : VGG-16 CNN, Tensorflow, TFIDF-AvgWord2VEC Model.
View More..
Predict the reviews on amazon fine food dataset that review is positive or negative by training model using many different classification algorithms and comparing their results against each other.
View More..
A complete scrabble game written in Python3 with the following features: challenge mode, time limit, point limit, multiplayer on a single computer, multiplayer over LAN and playing against computer.
View More..