Kevin Arvai
Washington, United States

I am a data scientist with experience in clinical genomics. I am also a Python enthusiast and an open-source advocate. My ambition is to create software that leverages data to improve human health. Experience: Creating machine learning models with scikit-learn and deep learning models using Keras and TensorFlow. Delivering natural language processing solutions for named entity recognition. Collaborating on software projects using git for version control. Optimizing algorithms on high-performance computing platforms. Python, R, Perl, SQL, shell scripting, and cloud computing. Personal: I developed and currently maintain Imputer, an open-source genome imputation application that performs genotype imputation for members of the non-profit organization, Open Humans. I am active in the open-source community, frequently developing and contributing to open-source repositories on GitHub. Occasionally I compete in machine learning challenges or write kernels on Kaggle ( I also enjoy participating in hackathons and local coding meetups.

CodersRank Score

What is this?

This represents your current experience. It calculates by analyzing your connected repositories. By measuring your skills by your code, we are creating the ranking, so you can know how good are you comparing to another developers and what you have to improve to be better

Information on how to increase score and ranking details you can find in this blog post.

CodersRank Rank
Top 1%
Based on:
Stackoverflow 103 events
Top 100
Jupyter Notebook
Jupyter Notebook
United States
Top 10
United States
Highest experience points: 0 points,

0 activities in the last year

List your work history, including any contracts or internships
DataRobot Full-time
Jan 2021 - Jun 2021 (5 months)
Washington, United States
Senior Data Scientist, Team Lead
python data science sql pyspark machine learning data visualization nlp data analytics
5 years 9 months
Gaithersburg, Maryland
Senior Data Scientist
Apr 2018 - Jan 2021 (2 years 9 months)
Developed software to assess the semantic similarity between two or more sets of phenotypes.

Significantly reduced the amount of time genetic counselors spend abstracting patient phenotypes from pdfs by implementing named entity recognition with natural language processing tools.

Developed a calculator that determines the classification of genetic variants given a list of ACMG criteria.

Serve as an internal resource for data preprocessing and feature engineering techniques.
python sql pandas
Apr 2015 - Apr 2018 (3 years)
Develop and implement machine learning models to support decision making for next-generation sequencing (NGS) assays. Models include gene prioritization in clinical exomes, NGS quality control improvements and natural language processing for named entity recognition.

Performed study design, programming and analysis for genetic association studies from hereditary cancer multi-gene panel testing assays.

Created semantic similarity algorithms that compare phenotypic similarity between cases, diseases, and genes using phenotype terms from the Human Phenotype Ontology.

Validate new bioinformatics software and, when necessary, integrate them into existing pipelines.
Machine learning sklearn pandas
Quest Diagnostics
6 years 3 months
Chantilly, Va
Jan 2012 - Apr 2015 (3 years 3 months)
Member of a small research and development team focused on designing new molecular diagnostic assays for the clinical anatomic pathology laboratory. Principle investigator on a study that led to publication. The publication describes the role of stem cells and cancer pathways on tumor progression in colorectal cancer. Methods included in the study were immunohistochemistry, fluorescent in-situ hybridization, oligo-snp arrays and NGS. Developed bioinformatics workflows for hematopoietic and solid tumor NGS assays. Validated clinical NGS mutation-calling pipeline with current open-source software tools. Investigated the role of somatic mutations in hematopoietic disorders using multi-gene NGS panels.
excel data analysis python jupyter notebook
Jun 2011 - Jan 2012 (7 months)
Performed quality control by microscopic examination of immunohistochemical tissue slides in a high-throughput setting.
Validated and optimized new antibodies on multiple automated staining platforms. Company-wide validation of a new, automated staining platform.
excel data analysis
Lab Associate III
Jan 2009 - Jun 2011 (2 years 5 months)
Worked closely with pathologists and clinical directors to validate immunohistochemistry and fluorescent in-situ hybridization image analysis algorithms.
excel data analysis
Add some compelling projects here to demonstrate your experience
Feb 2018 - Present
I developed and maintain an open-source genome imputation pipeline for Open Humans.
This section lets you add any degrees or diplomas you have earned.
George Mason University
Master of Science (M.S.), Biomathematics, Bioinformatics, and Computational Biology
Jan 2012 - Jan 2014
Deep Learning Nanodegree, Artificial Intelligence
Jan 2017 - Jan 2017
Met the nanodegree requirements by successfully creating five unique neural networks written in Python using the TensorFlow library. The projects were pulled from diverse subject matter areas, providing exposure to different network architectures. Following a rigorous machine learning introduction, the first project tackled the basics of multi-layer perceptrons. Next, I created a convolutional neural network to classifiy images. The course shifted focus to text processing for the next two projects, using recurrent neural nets to generate novel TV scripts and translate languages. Finally, for the final project I created a Generative Adversarial Network to generate fake images of human faces.
Indiana University Bloomington
Certificate, Histologic Technology/Histotechnologist
Jan 2011 - Jan 2011
Lycoming College
Bachelor of Science (B.S.), Biology, General
Jan 2004 - Jan 2008
Jun 2011

Jobs for you

Show all jobs