Masood Salman Choudhury
Senior Data Engineer & AI Solutions Architect

About
Senior Data Engineer & AI Solutions Architect with 5+ years' experience delivering end-to-end data platforms and intelligent applications across fintech, SaaS, and industrial analytics. Expert in designing scalable data pipelines, building AI-powered systems with LLMs and machine learning models, and deploying robust cloud-native solutions on AWS, Azure, and GCP.
Experience
- Senior Data Engineer & AI Solutions Architect @ HelixioraThe Hague, Netherlands - RemoteSummary:
- Worked as a lead data engineer delivering end-to-end solutions for multiple international clients across fintech, enterprise SaaS, and industrial analytics domains. Responsible for architecting and implementing data platforms, AI solutions, and cloud-based applications.
Responsibilities:
- Designed, built, and deployed Azure data pipelines in Databricks (PySpark, SparkSQL) to ingest large-scale structured and semi-structured datasets from Blob Storage and Azure Cosmos DB, execute complex transformations, and persist curated features in Delta Lake for downstream machine learning workflows
- Led development of an enterprise-grade Multi-Stage RAG system for clients, combining Python-based data pipelines (Google Drive, Slack) with LLMs, Langchain, Reranker, and Pinecone VectorDB to deliver highly accurate, context-aware retrieval workflows
- Designed, built, and deployed an end-to-end Azure data pipeline using App Functions and CosmosDB, applying Kimball dimensional modeling for Power BI datasets with DAX-based measures, enabling KPI reporting and forecasting
- Led development of a full-stack AI SaaS app (Android and iOS) leveraging OpenAI Assistant API with a FastAPI backend, React Native (Expo) frontend, and PostgreSQL database, including OAuth 2.0 authentication and integrated Stripe payment processing
- Designed and implemented end-to-end CI/CD pipelines for fully automated deployment of containerized applications to AWS, ECS using Docker, Docker Compose, and infrastructure-as-code best practices (Terraform)
- Secured and deployed SaaS applications with SSL, reverse proxies, Zero Trust controls, and firewall rules, optimising load balancing for high availability, and set up logging and monitoring with Prometheus and Grafana to ensure system reliability and observability.
- Provided technical mentorship and conducted code reviews for junior and mid-level engineers, promoting best practices, improving code quality, and accelerating team growth
- Python
- Databricks
- PySpark
- Delta Lake
- Langchain
- Pinecone
- React Native
- PostgreSQL
- MySQL
- Docker
- AWS
- Azure
- Terraform
- Git
- Prometheus
- Grafana
- SSL
- Nginx
- Senior Data Engineer @ VaultoroLondon, United Kingdom - On-SiteSummary:
- Architected complete data warehouse solutions and led development of scalable ETL pipelines for financial data processing.
Responsibilities:
- Architected a Kimball style star schema data warehouse using Elasticsearch and BigQuery, enabling real-time KPI dashboards in Kibana and empowering data-driven decision-making for stakeholders
- Led the design and implementation of multiple scalable ETL pipelines with Python, GCP Dataflow, Scrapy, and managed workflow orchestration with Apache Airflow, processing over 10 million financial data rows daily for real-time analytics
- Co-led the agile development of a secure, scalable Savings Platform using FastAPI, delivering the product in 3 months; currently manages $2M+ in monthly customer deposits
- Managed and optimized databases including MongoDB, PostgreSQL, Elasticsearch, and BigQuery by tuning queries, indexes, and partitions, resulting in significant performance improvements
- Conducted deep data analysis with Pandas to detect anomalies and identify potential fraud patterns, enhancing platform security
- Automated data validation workflows using Python scripts, ensuring pipeline integrity and achieving 100% uptime for critical microservices
- Deployed containerized applications with Docker and Kubernetes, ensuring high availability, scalability, and streamlined CI/CD operations
- Developed and implemented a Random Forest classification model to identify high-value clients during signup, enabling personalized onboarding experiences
- Configured and monitored Google Analytics and Tag Manager dashboards to track user behavior and support data-driven marketing strategies
- Python
- Elasticsearch
- BigQuery
- Kibana
- GCP
- FastAPI
- MongoDB
- PostgreSQL
- Apache Airflow
- Pandas
- Docker
- Kubernetes
- Data Analyst @ SoftCrop ITGuwahati, India - On-siteSummary:
- Delivered actionable insights via Tableau dashboards and automated data collection processes.
Responsibilities:
- Delivered actionable insights via Tableau dashboards (waterfall/cohort analysis), improving stakeholder decision-making
- Automated competitor data scraping (Scrapy) and ETL into MySQL, reducing manual effort by 50%
- Analyzed sales and geographic data to guide strategic fibre network expansion
- Python
- Tableau
- MySQL
- Scrapy
- Pandas
Projects
A comprehensive collection of production-ready Docker Compose stacks for self-hosted services, including monitoring, databases, home automation, and infrastructure management tools for personal homelab environment.
- 🐳 Designed and deployed 15+ production-ready Docker Compose stacks for self-hosted services
- 📊 Implemented comprehensive monitoring stack with Prometheus, Grafana, cAdvisor, and Node Exporter for infrastructure visibility
- 🔍 Built Elasticsearch stack for log aggregation and search capabilities with Kibana visualization
- 🏠 Integrated Home Assistant for smart home automation and IoT device management
- 💾 Configured MySQL and database services with optimized Docker configurations for data persistence
- 🚀 Deployed Portainer for container management and orchestration with streamlined deployment workflows
- ⚡ Implemented performance testing tools (LibreSpeed, OpenSpeedTest) for network and system benchmarking
A Python automation script that automatically tags anime episodes as filler or canon by scraping animefillerlist.com and intelligently renaming files with metadata for better organization and viewing experience.
- 🔍 Built automated web scraper to extract episode metadata from animefillerlist.com for accurate filler/canon classification
- 📁 Implemented intelligent file renaming system that preserves quality tags while adding filler/canon metadata
- ⚙️ Developed configurable system supporting multiple anime series with customizable quality tags and file paths
- 🎯 Created smart episode detection algorithm to handle various filename formats and episode numbering schemes
- 📊 Automated metadata integration that enhances media library organization and viewing experience
- 🚀 Streamlined workflow for anime enthusiasts to efficiently manage large episode collections
A computer vision-based hand gesture recognition system using TensorFlow and OpenCV for real-time gaming control, enabling hands-free game interaction through custom-trained machine learning models.
- 🤖 Built custom hand gesture detection model using TensorFlow transfer learning with SSD MobileNet V2 FPNLite architecture
- 📸 Developed automated image collection pipeline for custom hand gesture dataset creation and labeling
- 🎯 Implemented real-time hand gesture recognition from video feed for gaming applications
- 🎮 Created practical gaming integration demonstrated with Chrome Dino game control
- 🔬 Applied transfer learning techniques to optimize model performance for specific gesture recognition tasks
- 📊 Structured project workflow with Jupyter notebooks for data collection, training, and detection phases
A Python-based automation tool that downloads and applies the latest Nvidia DLSS and DLSS Frame Generation DLLs to local games, ensuring optimal gaming performance and visual quality.
- 🔧 Built automated DLL management system for Nvidia DLSS and Frame Generation updates
- 🌐 Integrated with TechPowerUp.com API for real-time DLL version monitoring and downloads
- 📁 Implemented intelligent file discovery to locate DLSS DLLs across multiple game directories
- 💾 Created automated backup system with timestamped naming for safe DLL rollback
- ⚙️ Developed configurable system supporting multiple game library paths and server locations
- 🚀 Packaged as executable with PyInstaller for easy distribution and deployment
- ⭐ Achieved 10+ stars on GitHub demonstrating community recognition and utility
A Python-based web scraping application that monitors Gameloot.in for PC component stock changes and sends real-time Telegram notifications when products become available or go out of stock.
- 🕷️ Built automated web scraper using BeautifulSoup and Python for real-time PC component monitoring
- 📊 Implemented MongoDB integration for persistent storage and smart deduplication of products
- 🤖 Developed Telegram bot integration for instant stock change notifications
- 🔍 Implemented change detection algorithms to identify new products, restocked items, and sold-out products
- 📝 Comprehensive logging system with configurable levels for monitoring and debugging
- 🚀 Automated deployment with error handling and retry mechanisms for robust operation
MSc project at University of Liverpool predicting ship delays with 80% accuracy using machine learning techniques.
- 🎯 Predicted ship delays 10+ days in advance with 80% accuracy
- 🔧 Cleaned and engineered features using Pandas, NumPy, and Pearson correlation
- 🤖 Evaluated multiple ML models (SVM, DT, RF, NN) and selected Random Forest
- 🚀 Deployed the model with FastAPI for real-time predictions
Personal project for self-learning diabetes prediction using machine learning techniques.
- 📊 Exploratory Data Analysis (EDA) with Seaborn and UMAP visualization
- ⚙️ Tuned XGBoost hyperparameters with Optuna and ML-Flow
- 🌐 Deployed using Streamlit framework
Education
University of Liverpool
Asian Institute of Management and Technology
Certificates
Skills
- Python
- Databricks
- PySpark
- Delta Lake
- Langchain
- Pinecone
- React Native
- PostgreSQL
- MySQL
- Docker
- AWS
- Azure
- GCP
- Terraform
- Git
- Elasticsearch
- BigQuery
- FastAPI
- MongoDB
- Apache Airflow
- Pandas
- Kubernetes
- Tableau
- Scrapy
- Prometheus
- Grafana
- Nginx