About me

I graduated in Economics from Universidade Federal do Rio Grande do Sul (UFRGS) in June 2022, and I am currently seeking new opportunities in the field of Data. Most recently, I worked as a Data Scientist at a data consulting company. In this role, I was involved in Data Science projects for several companies, performing tasks ranging from data cleaning and exploratory analysis to the implementation of Machine Learning models. These efforts aimed to deliver high- quality results in the projects. I also have some personal projects, which you can find at the "Data Science Projects" section.

Skills

Programming Languages and Databases

  • Python.
  • SQL for data extraction.
  • PostgreSQL and SQLite Databases.

Statistics and Machine Learning

  • Descriptive Statistics.
  • Regression, Classification, Clustering and Time Series Algorithms (including Neural Networks).
  • Algorithms Performance Metrics.
  • Techniques for Feature Selection, Validation and Hyperparameter Optimization.
  • Dimensionality Reduction with PCA, t-SNE, UMAP and Autoencoders
  • Machine Learning packages (Scikit-learn, Scipy, Tensorflow, Keras).
  • Data Analysis packages (Pandas).
  • Big Data tools and packages (PySpark, Dask).

Data Visualization

  • Matplotlib, Seaborn and Plotly.
  • Power BI, Streamlit, Looker Studio and Metabase.

Software Engineering

  • Web Scraping techniques.
  • Git, GitHub, GitLab.
  • Linux Environment.
  • Flask, Python API's and Telegram Bot.
  • AWS and Azure basic cloud tools.
  • Data Science Certifications

    Data Science Projects

    Creating a Bot that Predicts Rossmann Future Sales

    In this project I used Python, Flask and Regression Algorithms to predict Rossmann sales, a drug store chain, six weeks in advance. Reason being Rossmann CEO needs to determine the best resource allocation for each store renovation. The final solution is a Telegram Bot that returns a sales prediction of any given available store number, with the possibility of being accessed from anywhere.

    Tools and techniques used:

    • Python, Pandas, Matplotlib, Seaborn and Sklearn.
    • Jupyter Notebook and VSCode.
    • Flask and Python API's.
    • Render Cloud and Telegram Bot.
    • Git and Github.
    • Exploratory Data Analysis (EDA).
    • Techniques for Feature Selection.
    • Regression Algorithms (Linear and Lasso Regression; Random Forest, XGBoost and LGBM Regressors).
    • Cross-Validation Methods, Hyperparameter Optimization and Algorithms Performance Metrics (RMSE, MAE, MAPE, R2).

    Customer Loyalty Program for E-commerce

    In this project I used Python, Power BI and Clustering Algorithms to create a customer loyalty program for Outleto, a fictitious outlet company that sells its products through an E-commerce platform. Outleto's Marketing Team wishes to launch a customer loyalty program, dividing the 5,702 customers in clusters, on which the best customers will be placed in a cluster named Insiders. To achieve this goal, the Data Science Team was requested to provide a business report regarding the clusters, as well as a list of customers that will participate in Insiders. With that report the Marketing Team will promote actions to each cluster, in order to increase revenue. This project's Data Science Product is the final business report built in Power BI, by using Render Cloud and Google Drive.

    Tools and techniques used:

    • Python, Pandas, Matplotlib, Seaborn, Sklearn, SciPy and Pandas Profiling.
    • SQL and PostgresSQL.
    • Jupyter Notebook and VSCode.
    • Power BI.
    • Render Cloud and Google Drive.
    • Git and GitHub.
    • Exploratory Data Analysis (EDA).
    • Techniques for Feature Selection.
    • Clustering Algorithms (K-Means, Gaussian Mixture Models, Agglomerative Hierarchical Clustering and DBSCAN).

    Creating a Customer Ranking System for an Insurance Company

    In this Learning to Rank project I used Python, Flask and Classification Algorithms to create a client ranking system for Insuricare, a fictitious insurance company that is willing to sell a new vehicle insurance to new clients. The reason for ranking is to determine which customers to prioritize, given a limit call restriction. Insuricare already had a model that randomly selected customers to call, however, by using Data Science a full list of customers sorted by propensity score was provided, and for future clients a spreadsheet is available, returning the sorted propensity score for each requested client. Financially, with a very reasonable annual premium this solution could bring Insuricare up to US$ 25 million annually, which is around 89% better than Insuricare's original model.

    Tools and techniques used:

    • Python, Pandas, Matplotlib, Seaborn and Sklearn.
    • SQL and PostgresSQL.
    • Jupyter Notebook and VSCode.
    • Flask and Python API's.
    • Render Cloud, Google Sheets and JavaScript.
    • Git and GitHub.
    • Exploratory Data Analysis (EDA).
    • Techniques for Feature Selection.
    • Classification Algorithms (KNN Classifier, Logistic Regression; Random Forest, AdaBoost, CatBoost, XGBoost and LGBM Classifiers).
    • Cross-Validation Methods, Bayesian Optimization with Optuna and Learning to Rank Performance Metrics (Precision at K, Recall at K, Cumulative Gains Curve and Lift Curve).

    Business Solution for a Real Estate Company

    In this insights project I used Python and Streamlit to solve a profit maximization problem for House Rocket, a fictitious real estate company, by suggesting whether a property should or shouldn't be bought and resold. If this feasible solution strategy were applied the total obtained profit would be around US$ 473 million, with an average profit of 45 thousand dollars per property.

    Tools and techniques used:

    • Python, Pandas, Matplotlib, Plotly and Geopandas.
    • Jupyter Notebook and VSCode.
    • Streamlit and Streamlit Cloud.
    • Git and GitHub.
    • Measures of Central Tendency and Dispersion.
    • Exploratory Data Analysis (EDA).

    Predicting Next Booking Destinations for Airbnb Users

    In this project I used Python, Flask, Streamlit and Classification Algorithms to make predictions of the five most likely countries for an Airbnb USA user to make their next booking. The company provided data from over 200 thousand users and required predictions for another 61 thousand users, on which there were 12 possible destinations outcomes. The final product is a Streamlit App that displays a table of these 61 thousand users with their respective predictions, as well as graphical analysis of the predictions by age, gender and overall analysis. In addition to that, a Flask App was built for when new data comes in, as we can get new predictions with a click of a button, so it can be later retrieved by the Streamlit App, because both Streamlit and Flask applications are connected by the same PostgreSQL Database.

    Tools and techniques used:

    • Python, Pandas, Matplotlib, Seaborn and Sklearn.
    • SQL and PostgresSQL.
    • Jupyter Notebook and VSCode.
    • Flask and Render Cloud.
    • Streamlit.
    • Git and GitHub.
    • Exploratory Data Analysis (EDA).
    • Techniques for Feature Selection.
    • Classification Algorithms (Logistic Regression, Decision Tree, Random Forest, ExtraTrees, AdaBoost, XGBoost and LGBM Classifiers).
    • Cross-Validation Methods, Bayesian Optimization with Optuna and Performance Metrics (NDCG at rank K).

    ETL Building for an E-commerce Jeans Company

    In this data engineering project I used Python, Web Scraping and PostgresSQL to create an ETL process for Star Jeans, a fictitious company. Star Jeans' owners have just recently started the company, and for now, their plan is to sell male jeans in the USA through an E-commerce. However, this market already has strong competitors, such as H&M for instance. In addition to that, the owners aren't familiar with this segment in particular. Therefore, in order to better understand how this market works they hired a Data Science/Engineering to gather information regarding H&M. The built solution is an ETL that extracts data from H&M website, cleans it, and saves it to a PostgreSQL database on a weekly basis. Then, it adds and displays the data with filters in a Streamlit App, where it can be accessed from anywhere by Star Jeans' owners.

    Tools and techniques used:

    • Python, Pandas and Beautiful Soup.
    • SQL and PostgresSQL.
    • Jupyter Notebook and VSCode.
    • Web Scraping.
    • ETL Process and Windows Task Scheduler.
    • Streamlit.
    • Git and GitHub.

    Contact

    Feel free to get in touch.