arvkevi's CodersRank profile

Kevin Arvai

Washington, United States

Intro

I am a data scientist with experience in clinical genomics. I am also a Python enthusiast and an open-source advocate. My ambition is to create software that leverages data to improve human health. Experience: Creating machine learning models with scikit-learn and deep learning models using Keras and TensorFlow. Delivering natural language processing solutions for named entity recognition. Collaborating on software projects using git for version control. Optimizing algorithms on high-performance computing platforms. Python, R, Perl, SQL, shell scripting, and cloud computing. Personal: I developed and currently maintain Imputer, an open-source genome imputation application that performs genotype imputation for members of the non-profit organization, Open Humans. I am active in the open-source community, frequently developing and contributing to open-source repositories on GitHub. Occasionally I compete in machine learning challenges or write kernels on Kaggle (https://www.kaggle.com/kevinarvai/kernels). I also enjoy participating in hackathons and local coding meetups.

Scores & Badges

CodersRank Score

1,712.3

CodersRank Rank

Top 1%

Based on:

103 events

Top 100

Jupyter Notebook

Developer

United States

Top 10

Python

Developer

United States

Show all badges

Tech Skills

Highest experience points: 0 points,

Timeline

Activity Chart

0 activities in the last year

Language overview

Python

725.9

exp.

Top 0.2% out of 165K Worldwide Top 1% out of 1K United States

Jupyter Notebook

49.3

exp.

Top 14% out of 37K Worldwide Top 16% out of 339 United States

Technologies

Work Experiences

List your work history, including any contracts or internships

DataRobot Full-time

Jan 2021 - Jun 2021 (5 months)

Washington, United States

Senior Data Scientist, Team Lead

python data science sql pyspark machine learning data visualization nlp data analytics

GeneDx

5 years 9 months

Gaithersburg, Maryland

Senior Data Scientist

Apr 2018 - Jan 2021 (2 years 9 months)

Developed software to assess the semantic similarity between two or more sets of phenotypes.
https://github.com/genedx/phenopy

Significantly reduced the amount of time genetic counselors spend abstracting patient phenotypes from pdfs by implementing named entity recognition with natural language processing tools.

Developed a calculator that determines the classification of genetic variants given a list of ACMG criteria.

Serve as an internal resource for data preprocessing and feature engineering techniques.

python sql pandas

Scientist

Apr 2015 - Apr 2018 (3 years)

Develop and implement machine learning models to support decision making for next-generation sequencing (NGS) assays. Models include gene prioritization in clinical exomes, NGS quality control improvements and natural language processing for named entity recognition.

Performed study design, programming and analysis for genetic association studies from hereditary cancer multi-gene panel testing assays.

Created semantic similarity algorithms that compare phenotypic similarity between cases, diseases, and genes using phenotype terms from the Human Phenotype Ontology.

Validate new bioinformatics software and, when necessary, integrate them into existing pipelines.

Machine learning sklearn pandas

Quest Diagnostics

6 years 3 months

Chantilly, Va

Scientist

Jan 2012 - Apr 2015 (3 years 3 months)

Member of a small research and development team focused on designing new molecular diagnostic assays for the clinical anatomic pathology laboratory. Principle investigator on a study that led to publication. The publication describes the role of stem cells and cancer pathways on tumor progression in colorectal cancer. Methods included in the study were immunohistochemistry, fluorescent in-situ hybridization, oligo-snp arrays and NGS. Developed bioinformatics workflows for hematopoietic and solid tumor NGS assays. Validated clinical NGS mutation-calling pipeline with current open-source software tools. Investigated the role of somatic mutations in hematopoietic disorders using multi-gene NGS panels.

excel data analysis python jupyter notebook

Histotechnologist

Jun 2011 - Jan 2012 (7 months)

Performed quality control by microscopic examination of immunohistochemical tissue slides in a high-throughput setting.
Validated and optimized new antibodies on multiple automated staining platforms. Company-wide validation of a new, automated staining platform.

excel data analysis

Lab Associate III

Jan 2009 - Jun 2011 (2 years 5 months)

Worked closely with pathologists and clinical directors to validate immunohistochemistry and fluorescent in-situ hybridization image analysis algorithms.

excel data analysis

Portfolio

Add some compelling projects here to demonstrate your experience

Imputer

Feb 2018 - Present

I developed and maintain an open-source genome imputation pipeline for Open Humans.

Education

This section lets you add any degrees or diplomas you have earned.

George Mason University

Master of Science (M.S.), Biomathematics, Bioinformatics, and Computational Biology

Jan 2012 - Jan 2014

Udacity

Deep Learning Nanodegree, Artificial Intelligence

Jan 2017 - Jan 2017

Met the nanodegree requirements by successfully creating five unique neural networks written in Python using the TensorFlow library. The projects were pulled from diverse subject matter areas, providing exposure to different network architectures. Following a rigorous machine learning introduction, the first project tackled the basics of multi-layer perceptrons. Next, I created a convolutional neural network to classifiy images. The course shifted focus to text processing for the next two projects, using recurrent neural nets to generate novel TV scripts and translate languages. Finally, for the final project I created a Generative Adversarial Network to generate fake images of human faces.

Indiana University Bloomington

Certificate, Histologic Technology/Histotechnologist

Jan 2011 - Jan 2011

Lycoming College

Bachelor of Science (B.S.), Biology, General

Jan 2004 - Jan 2008

Certificates