Education

Boston University

Computer Science & Neuroscience double major, BASep 2012 — Jun 2017


Initially, as a Neuroscience student, I became interested in the applications of data and technology. This led me to extend my studies to Computer Science with a coursework focused on data-science.

Experience

StratoDem Analytics

Data Scientist InternJun 2016 — Aug 2016


I was a summer intern at StratoDem, a real-estate consulting startup that uses demographic data to build predictive solutions for investors and businesses. I worked in a team of analysts, utilizing data-mining tools to build predictive models for real estate and government demographic data.

During my stay, I developed Stratplotlib, a python tool that provided the team with an easy way to create maps and visualize their geospatial data.
An example use-case is provided below.

BIDMC, Department of Neuroscience

Research AssistantJan 2015 — Jan 2016


This was my first ecounter with programming in the workplace. As a research assistant, I was responsible to manually analyze videos for neuroscience research. I saw room for improvement and wrote python automations that yielded signficant increases in analysis speed.

  • Analyzed and processed eye-tracking video data, identifying features that 
    characterize the spatial attention of research subjects.
  • Took initiative to write Python scripts to partially automate data processing
    which made the analysis of the data 50% faster.
  • Created visualizations that yielded insights on the behavioral patterns of individuals.

Skills

Software and Programming Languages

Python (pandas, numpy, scikit-learn, scipy, networkX, matplotlib,
seaborn, selenium etc.), SQL, HTML/CSS 

Analytics

Natural Language Processing, stream sampling, search with hashing, principal component analysis and dimensionality reduction

Misc.

Agile driven development, AWS, webscraping, LaTeX, Linux, Git, Jupyter Notebook

Projects

Duplicate document identification in big data

python, minHash, reservoir sampling2017


Implemented an efficient way to find duplicate book documents in big volumes of data with the use of locality sensitive hashing and sampling.

Amazon natural language processing

python, nltk, pandas, tf-idf, sentiment analysis2016


Built a model that predicts a user’s rating of a product by analyzing his review and comparing it to ratings of similar reviews. This was done by building a TF-IDF vector for each user’s text and finding its euclidean distance from different clusters.

Classifying the safety state of containers​

agile, python, docker, elasticSearch2016


Worked in a team of engineers to develop a tool that finds suspicious code in docker images. Intended for use in data centers, the tool works by mining security data from containers and then uses similarity matching to find malicious code snippets in the file binaries. (Mentored by IBM)

Youtube video prediction

python, networkX, AWS, multiprocessing2016


Built a model that predicts the relationship between two youtube videos (i.e. if they are linked) by analyzing shortest-path metrics in a constructed YouTube graph. Computations were done multi-processed on a remote AWS instance.

Tripadvisor web-scraping

webscraping, python, selenium, beautifulsoup2016


Web-scraped all of Boston's hotel reviews in order to build a model of factors that influenced people's ratings.

Yelp text sentiment analysis

python, sentiment analysis, text-processing2016


Implemented a way of predicting user ratings by analysing the number of words with sentiment value in their reviews.

Visualizing StackOverflow

python, networkX, Gephi, webscraping2016


Visualized the relationship of programming tools by constructing a network of “tags” associated with user questions on stackoverflow.com in which a connection defines how often a tag appears with another. This project was done in order to simplify and understand where technologies fit in the grand scheme of things.

I thought that StackOverflow.com would provide an organic result due to its importance in the tech community.

Making a connection out of 1 post would look something like the above.
With a few million posts, I was able to get a much more descriptive visualization (below).