ARULRAJ J

Senior Data Engineer & Architect

About Me

Data Scientist and Engineer with deep expertise in designing, building, and optimizing scalable, AI-driven data solutions. Proficient in leveraging Generative AI, cloud analytics platforms, and modern stream processing architectures to develop high-performance data pipelines.

I specialize in integrating AI into complex data engineering workflows, enriching datasets, and orchestrating centralized data hubs that drive strategic business impact.

Work Experience

Senior Associate - Data Solutions @ QBrainX

November 2024 - Present
  • Contributed to the development of a Large Language Model (LLM) project, leveraging modern NLP techniques for enhanced language processing.
  • Built and fine-tuned Vertex AI training pipelines, establishing highly scalable and efficient model training methodologies.
  • Integrated stream processing workflows and machine learning solutions seamlessly into enterprise cloud infrastructures.

Data Scientist / Engineer @ Avvenire Technologies

October 2019 - December 2023
  • Applied generative AI techniques using Azure Machine Learning to modernize data-driven decision-making processes.
  • Leveraged Databricks and cloud platforms to ingest, transform, and manage structured, semi-structured, and unstructured big data.
  • Developed advanced deep learning models to forecast outcomes, alongside predictive analysis and robust sentiment analysis pipelines.
  • Designed domain-specific data models, optimizing complex analytics workflows to deliver significant operational business impact.

Data Analyst @ Avvenire Technologies

January 2015 - September 2019
  • Developed data and pattern mining algorithms to extract vital insights from expansive and complex datasets.
  • Utilized Hadoop MapReduce ecosystems to process vast amounts of raw data efficiently.
  • Conducted comprehensive analysis to surface anomalies, trends, and business patterns to stakeholders.

Key Architectural Projects

  • Statewide Central Data Hub (DMS): Engineered a modular data management solution acting as the central data hub for 15 state agencies. Ensured reliable data availability for essential back-office functions including HR, procurement, and facilities.
  • Streaming Data Integration Platform: Designed an optimized stream processing architecture utilizing Apache Kafka in tandem with Snowpipe to ensure low-latency ingestion into centralized Snowflake environments.
  • Enterprise Analytics Dashboarding: Led technical discovery and documentation for extensive BI initiatives. Established rigorous data contracts and ingestion pathways, moving raw telemetry (GA4) reliably into curated silver tables for downstream Power BI synthetic modeling.

Technical Skills

Data Engineering & Cloud

GCP / Vertex AI Azure ML / Synapse Snowflake BigQuery Apache Spark Apache Kafka ETL Pipelines Databricks

Machine Learning & AI

LLM / GenAI Deep Neural Networks SKLearn PySpark NLP Training Pipelines

Analysis & Languages

Python SQL DAX PowerBI Matplotlib / Seaborn Predictive Analysis