fabiosv
Fabio Valonga
Sao Paulo, Brazil

I am a versatile data professional with a background spanning data engineering, NLP, and web development. With a strong track record of creating and optimizing data pipelines, leading teams, and improving code quality, I thrive on complex challenges. My experience includes working with diverse tech stacks and cloud platforms, ensuring efficient data processing and analytics. I am passionate about delivering data-driven insights and fostering a culture of excellence in every role I take on.

CodersRank Score

What is this?

This represents your current experience. It calculates by analyzing your connected repositories. By measuring your skills by your code, we are creating the ranking, so you can know how good are you comparing to another developers and what you have to improve to be better

Information on how to increase score and ranking details you can find in this blog post.

928.1
CodersRank Rank
Top 1%
Associate Developer
Ruby
Ruby
Mid Developer
JavaScript
JavaScript
Mid Developer
Python
Python
Highest experience points: 0 points,

0 activities in the last year

List your work history, including any contracts or internships
Hvar Consulting Service
Jun 2023 - Apr 2024 (10 months)
Remote
Tech Lead - Data Engineer
Summary:
• Migrating data from SAP Data Warehouse to Azure Storage Gen2/Databricks Lakehouse (Unity Catalog), enabling seamless integration with PowerBI reports
• Providing leadership to the team and ensuring the successful execution of projects
• Ingest SAP tables through Azure Data Factory Pipeline and dispose in StarSchema arquitecture
• Supporting streaming data pipelines utilizing Spark Structured Streaming, Change Data Feed, and Delta Live Tables
• Explore View Materializing with Spark Streaming for incremental joins


Day-to-day responsibilities:
• Designing data pipeline architectures to ensure optimal performance
• Assisting the team in overcoming technical challenges and roadblocks
• Mentoring and guiding squad members in their career development
• Defining and promoting code patterns for reusable and maintainable code for Databricks and Data Factory

Improvements/Accomplishments:
• Created a generic framework that supports all data wrangling.
• Reduced computer costs by optimizing cluster usage/resources and applying Spark Structured Streaming
• Reduced cloud costs by migrating Data Factory parallelism to Databricks Jobs run in Spark Streaming pipelines

Technology Stack:
• Azure Data Factory, Azure Functions, Azure Storage Gen2
• Azure Databricks
databricks azure pyspark unity catalog
Softensity
Apr 2022 - Jun 2023 (1 year 2 months)
Remote
Data Engineer
Summary:
• Migrating data from RDS to S3 using EMR and Databricks, which involved Spark and Hadoop
• Establishing a data-serving mechanism from S3 through REST API, developed using Node.js and Express
• Creating a robust Data Pipeline leveraging Delta Lake libraries and Kafka

Day-to-day responsibilities:
• Processed historical data, optimizing performance and efficiency by coalescing thousands of small files and database dumps into a Delta Table using batch and stream processing, specifically Databricks Autoloader.
• Applied intricate business rules to the data, employing PySpark for data wrangling
• Supported an ongoing pipeline, adding new datasets with Scala, reading Kafka topics, applying deduplication, business rules, and saving them to S3
• Facilitated data access via a REST API developed with Node.js and Express
• Pioneered code patterns for reusability and maintainability

Improvements/Accomplishments:
• Creating a Historical pipeline responsible for ingesting more than 1 Petabyte of data, demonstrating my capability to handle large-scale data operations
• Developing reports for data quality, aiding in the identification of erroneous or missing data and rule changes
• Supporting the creation of monitoring tools to ensure the health of the data pipeline
• Reducing operational costs through cluster optimization and adhering to clean code practices

Technology Stack:
• Leveraging Java and Python for Blockchain and REST API collectors
• Employing Databricks pipelines powered by PySpark, Delta Lake, and Autoloader
• Crafting EMR pipelines using Spark and Scala
• Managing RDS with PostgreSQL databases
• Serving data through REST API on ECS with Node.js
• Facilitating data access via WebSockets using Python

This role not only showcased my proficiency in data engineering but also highlighted my problem-solving skills, adaptability, and commitment to maintaining data quality and pipeline efficiency.
nodeJS Databricks aws spark python ganglia hdfs kafka Big Data quicksight data visualization scala EMR
Hvar Consulting Services
Dec 2021 - Oct 2023 (1 year 10 months)
Remote
Tech Lead - Data Engineer
(Part-time job)

Summary:
• Migrating data from SAP Data Warehouse to AWS Redshift or Databricks Lakehouse, enabling seamless integration with PowerBI reports
• Providing leadership to the team and ensuring the successful execution of projects
• Supporting streaming data pipelines utilizing Spark Structured Streaming, Change Data Feed, and Delta Live Tables


Day-to-day responsibilities:
• Designing data pipeline architectures to ensure optimal performance
• Assisting the team in overcoming technical challenges and roadblocks
• Mentoring and guiding squad members in their career development
• Defining and promoting code patterns for reusable and maintainable code

Improvements/Accomplishments:
• Created a generic framework that supports all data wrangling.
• Refactored the framework to run in AWS Glue, AWS EMR, and Azure Databricks
• Reduced computer costs by optimizing cluster usage/resources and applying Spark Structured Streaming

Technology Stack:
• AWS Glue, Redshift, Athena, AWS S3
• Azure Databricks
pyspark Databricks python aws AWS Glue Redshift azure Data Factory amazon s3

Request failed with status code 503

Jobs for you

Show all jobs
Feedback