arvkevi
Kevin Arvai
Washington, United States

I am a data scientist with experience in clinical genomics. I am also a Python enthusiast and an open-source advocate. My ambition is to create software that leverages data to improve human health. Experience: Creating machine learning models with scikit-learn and deep learning models using Keras and TensorFlow. Delivering natural language processing solutions for named entity recognition. Collaborating on software projects using git for version control. Optimizing algorithms on high-performance computing platforms. Python, R, Perl, SQL, shell scripting, and cloud computing. Personal: I developed and currently maintain Imputer, an open-source genome imputation application that performs genotype imputation for members of the non-profit organization, Open Humans. I am active in the open-source community, frequently developing and contributing to open-source repositories on GitHub. Occasionally I compete in machine learning challenges or write kernels on Kaggle (https://www.kaggle.com/kevinarvai/kernels). I also enjoy participating in hackathons and local coding meetups.

CodersRank Score

What is this?

This represents your current experience. It calculates by analyzing your connected repositories. By measuring your skills by your code, we are creating the ranking, so you can know how good are you comparing to another developers and what you have to improve to be better

Information on how to increase score and ranking details you can find in this blog post.

1,712.3
CodersRank Rank
Top 1%
Based on:
Stackoverflow 103 events
Top 100
Jupyter Notebook
Jupyter Notebook
Developer
United States
Top 10
Python
Python
Developer
United States
Highest experience points: 0 points,

0 activities in the last year

List your work history, including any contracts or internships
DataRobot Full-time
Jan 2021 - Jun 2021 (5 months)
Washington, United States
Senior Data Scientist, Team Lead
python data science sql pyspark machine learning data visualization nlp data analytics
GeneDx
5 years 9 months
Gaithersburg, Maryland
Senior Data Scientist
Apr 2018 - Jan 2021 (2 years 9 months)
Developed software to assess the semantic similarity between two or more sets of phenotypes.
https://github.com/genedx/phenopy

Significantly reduced the amount of time genetic counselors spend abstracting patient phenotypes from pdfs by implementing named entity recognition with natural language processing tools.

Developed a calculator that determines the classification of genetic variants given a list of ACMG criteria.

Serve as an internal resource for data preprocessing and feature engineering techniques.
python sql pandas
Scientist
Apr 2015 - Apr 2018 (3 years)
Develop and implement machine learning models to support decision making for next-generation sequencing (NGS) assays. Models include gene prioritization in clinical exomes, NGS quality control improvements and natural language processing for named entity recognition.

Performed study design, programming and analysis for genetic association studies from hereditary cancer multi-gene panel testing assays.

Created semantic similarity algorithms that compare phenotypic similarity between cases, diseases, and genes using phenotype terms from the Human Phenotype Ontology.

Validate new bioinformatics software and, when necessary, integrate them into existing pipelines.
Machine learning sklearn pandas
Quest Diagnostics
6 years 3 months
Chantilly, Va
Scientist
Jan 2012 - Apr 2015 (3 years 3 months)
Member of a small research and development team focused on designing new molecular diagnostic assays for the clinical anatomic pathology laboratory. Principle investigator on a study that led to publication. The publication describes the role of stem cells and cancer pathways on tumor progression in colorectal cancer. Methods included in the study were immunohistochemistry, fluorescent in-situ hybridization, oligo-snp arrays and NGS. Developed bioinformatics workflows for hematopoietic and solid tumor NGS assays. Validated clinical NGS mutation-calling pipeline with current open-source software tools. Investigated the role of somatic mutations in hematopoietic disorders using multi-gene NGS panels.
excel data analysis python jupyter notebook
Histotechnologist
Jun 2011 - Jan 2012 (7 months)
Performed quality control by microscopic examination of immunohistochemical tissue slides in a high-throughput setting.
Validated and optimized new antibodies on multiple automated staining platforms. Company-wide validation of a new, automated staining platform.
excel data analysis
Lab Associate III
Jan 2009 - Jun 2011 (2 years 5 months)
Worked closely with pathologists and clinical directors to validate immunohistochemistry and fluorescent in-situ hybridization image analysis algorithms.
excel data analysis
Request failed with status code 503

Jobs for you

Show all jobs
Feedback