OPEN TO · DATA ENGINEER · FULL STACK AGENTIC AI ENGINEER

Puja Ankitha Ivaturi

I'm a Data Engineer and Full Stack Agentic AI Engineer with experience building enterprise data platforms and production AI systems across Finance, Healthcare, and Social Networking. I currently own a $40B mortgage lakehouse with 5,000+ tables across 42 databases , from ingestion and modeling all the way to governed Gold marts and Power BI semantic layers.

Beyond pipelines, I've shipped 4 production AI agents , document intelligence over loan portfolios, real-time clinical dictation, RAG over 2M+ patient records, and an interview-prep voice agent , delivering each end-to-end across data, AI backend, and React frontend.

4
Industries
$40B
Lakehouse
30+
Projects
4
AI Agents
TOOLS & TECHNOLOGIES
Data FactoryDatabricksSynapse AnalyticsADLS Gen2Event HubsAzure OpenAIFunctionsAI FoundryAzure SQLActive DirectoryGlueEMRRedshiftKinesisDMSBedrockLambdaS3SNS/SQSQuickSightDataflowBigQueryCloud StoragePub/SubGKECloud FunctionsSnowflakePostgreSQLMySQLSQL ServerOracleMongoDBRedisDelta LakeApache ParquetCassandraApache KafkaApache SparkPySparkApache AirflowdbtGreat ExpectationsDelta MERGECDC / DMSTerraformDockerKubernetesCI/CDGitGitHub ActionsIAM / RBACData LineageLangChainOpenAI GPT-4oAzure OpenAIAWS BedrockWhisperAssemblyAIRAGVector DBsn8nEmbeddingsPythonPySparkSQL / T-SQLJavaScriptReactFlaskScalaBashData FactoryDatabricksSynapse AnalyticsADLS Gen2Event HubsAzure OpenAIFunctionsAI FoundryAzure SQLActive DirectoryGlueEMRRedshiftKinesisDMSBedrockLambdaS3SNS/SQSQuickSightDataflowBigQueryCloud StoragePub/SubGKECloud FunctionsSnowflakePostgreSQLMySQLSQL ServerOracleMongoDBRedisDelta LakeApache ParquetCassandraApache KafkaApache SparkPySparkApache AirflowdbtGreat ExpectationsDelta MERGECDC / DMSTerraformDockerKubernetesCI/CDGitGitHub ActionsIAM / RBACData LineageLangChainOpenAI GPT-4oAzure OpenAIAWS BedrockWhisperAssemblyAIRAGVector DBsn8nEmbeddingsPythonPySparkSQL / T-SQLJavaScriptReactFlaskScalaBash
Puja Ankitha Ivaturi in a data engineering command center
5,000+
Tables · 42 DBs
5 Agents
AI/GenAI Systems
35+ ETL
Production Pipelines
STORY GALLERY

Where Data Meets AI
at Scale

A gallery of the systems, deployments, outcomes, and AI layers behind the portfolio.

Lakehouse architecture, ETL pipelines, lineage, and quality controls in one working view.

Career

Where I've Built

Banking · Federal Mortgage

Software Engineer – Data & Agentic AI

Farmer MacJul 2025 – Present

Owns the $40B mortgage data platform end-to-end , Medallion Lakehouse, AI document intelligence, financial dashboards, and governance.

1$40B Lakehouse: 5,000+ tables / 42 databases, 99.9% accuracy
235+ production ETL/ELT pipelines (batch + streaming + CDC)
3Loan Document Intelligence AI app (full-stack: React + Flask + RAG)
4Bloomberg, Moody's, Reuters DB migrations
5Financial dashboards + Power BI semantic models
6Data quality framework + Dev/Test/UAT + IAM/RBAC
🎓
MS Computer Science
UT Arlington · GPA 3.9/4.0 · Dec 2023
🎓
BTech Computer Science
Ramachandra College · GPA 3.7/4.0 · Jul 2021
Portfolio

Things I've Shipped

Click any card to expand full details.

Enterprise $40B Mortgage Lakehouse
Data Engineering$40B · Enterprise · Banking

Enterprise $40B Mortgage Lakehouse

Farmer Mac · Medallion Architecture
5,000+
Tables
42
Databases
$40B
Portfolio
<5 min
Reporting
35+ Production ETL/ELT Pipelines
Data Engineering · PipelinesETL · ELT · CDC · Streaming

35+ Production ETL/ELT Pipelines

Farmer Mac & Portfolio
35+
Pipelines
Batch+Stream
Both Modes
1TB+
Data Moved
99.9%
Accuracy
100+ SAS → PySpark Modernization
Data Engineering · ModernizationPySpark · Legacy → Cloud

100+ SAS → PySpark Modernization

Multi-company · Code Migration
100+
Scripts Converted
~60%
Perf Gain
Cloud
Native
Zero
Data Loss
60+ EMR/EHR Client Migrations
Data Engineering · HealthcareHIPAA · CDC · Migration

60+ EMR/EHR Client Migrations

Aesthetic Record · Healthcare Data
60+
Clients
2M+
Patient Records
6,000
Client DBs
HIPAA
Compliant
Financial & Healthcare Dashboards
Data Engineering · AnalyticsPower BI · Finance · Healthcare

Financial & Healthcare Dashboards

Farmer Mac & Aesthetic Record · BI
10+
Dashboards
2 Domains
Finance+Health
Semantic
Models
Sub-second
Refresh
Bloomberg, Moody's & Reuters Migrations
Data Engineering · FinanceBloomberg · Moody's · Reuters

Bloomberg, Moody's & Reuters Migrations

Farmer Mac · Financial Data
3
Major Vendors
Tier-1
Financial Data
Real-time
Feeds
<5 sec
Queries
Data Quality & Governance Framework
Data Engineering · GovernanceQuality · Lineage · IAM

Data Quality & Governance Framework

Farmer Mac · Enterprise Governance
Automated
DQ Checks
Lineage
Tracked
Multi-tier
Validation
Zero
Silent Failures
Dev / Test / UAT Environments & IAM
Data Engineering · DevOpsDev · Test · UAT · Terraform

Dev / Test / UAT Environments & IAM

Farmer Mac · Enterprise DevOps
3
Environments
Terraform
Provisioned
IAM
Enforced
Zero
Prod Incidents
Loan Document Intelligence App
Agentic AI · Full StackGenAI · RAG · React

Loan Document Intelligence App

Farmer Mac · AI + Full Stack
100+
PDFs/batch
RAG
Architecture
Full Stack
End-to-End
<1hr
vs 2 weeks
AI Dictation & Telehealth Notes Tool
Agentic AI · HealthcareVoice AI · GenAI · Telehealth

AI Dictation & Telehealth Notes Tool

Aesthetic Record · Clinical AI
Real-time
Transcription
Voice AI
Powered
Auto
SOAP Notes
~60%
Time Saved
Interview Preparation AI Agent
Agentic AI · Full StackGenAI · Voice AI · React

Interview Preparation AI Agent

Ogha Inc · Agentic System
RAG
Powered
Voice AI
Enabled
Full Stack
Delivered
Real-time
Coaching
Marketing AI Agent
Agentic AI · AutomationLLM · n8n · GenAI

Marketing AI Agent

Ogha Inc · Content Automation
LLM
Powered
n8n
Orchestrated
Multi-channel
Output
~80%
Time Saved
Stack

Tools & Technologies

The full stack I use to build enterprise data platforms and AI systems.

Data FactoryDatabricksSynapse AnalyticsADLS Gen2Event HubsAzure OpenAIFunctionsAI FoundryAzure SQLActive DirectoryGlueEMRRedshiftKinesisDMSBedrockLambdaS3SNS/SQSQuickSightDataflowBigQueryCloud StoragePub/SubGKECloud FunctionsSnowflakePostgreSQLMySQLSQL ServerOracleMongoDBRedisDelta LakeApache ParquetCassandraApache KafkaApache SparkPySparkApache AirflowdbtGreat ExpectationsDelta MERGECDC / DMSTerraformDockerKubernetesCI/CDGitGitHub ActionsIAM / RBACData LineageLangChainOpenAI GPT-4oAzure OpenAIAWS BedrockWhisperAssemblyAIRAGVector DBsn8nEmbeddingsPythonPySparkSQL / T-SQLJavaScriptReactFlaskScalaBashData FactoryDatabricksSynapse AnalyticsADLS Gen2Event HubsAzure OpenAIFunctionsAI FoundryAzure SQLActive DirectoryGlueEMRRedshiftKinesisDMSBedrockLambdaS3SNS/SQSQuickSightDataflowBigQueryCloud StoragePub/SubGKECloud FunctionsSnowflakePostgreSQLMySQLSQL ServerOracleMongoDBRedisDelta LakeApache ParquetCassandraApache KafkaApache SparkPySparkApache AirflowdbtGreat ExpectationsDelta MERGECDC / DMSTerraformDockerKubernetesCI/CDGitGitHub ActionsIAM / RBACData LineageLangChainOpenAI GPT-4oAzure OpenAIAWS BedrockWhisperAssemblyAIRAGVector DBsn8nEmbeddingsPythonPySparkSQL / T-SQLJavaScriptReactFlaskScalaBash
AZURE
Data FactoryDatabricksSynapse AnalyticsADLS Gen2Event HubsAzure OpenAIFunctionsAI FoundryAzure SQLActive Directory
AWS
GlueEMRRedshiftKinesisDMSBedrockLambdaS3SNS/SQSQuickSight
GCP
DataflowBigQueryCloud StoragePub/SubGKECloud Functions
DATABASES
SnowflakePostgreSQLMySQLSQL ServerOracleMongoDBRedisDelta LakeApache ParquetCassandra
PIPELINE
Apache KafkaApache SparkPySparkApache AirflowdbtGreat ExpectationsDelta MERGECDC / DMS
DEVOPS
TerraformDockerKubernetesCI/CDGitGitHub ActionsIAM / RBACData Lineage
AI & LLM
LangChainOpenAI GPT-4oAzure OpenAIAWS BedrockWhisperAssemblyAIRAGVector DBsn8nEmbeddings
LANGUAGES
PythonPySparkSQL / T-SQLJavaScriptReactFlaskScalaBash
Publications & Writing

Articles, Posts & Research

Selected writing on data engineering, cloud platforms, agentic AI systems, and peer-reviewed research.

ARTICLEMedium

The MOLT Ecosystem: MoltBot, MoltBook, MoltHub & OpenClaw

Deep dive into the MOLT product ecosystem , bots, knowledge hub, and OpenClaw architecture.

Read full article →
POSTLinkedIn

Data Loading Techniques in Modern Data Engineering

Practical patterns for batch, streaming, CDC, and incremental data loading across ETL workflows in AWS.

Read full article →
POSTLinkedIn

Cloud Data Engineering Tools: AWS, GCP & Azure with Real-time Use Cases

Comparing AWS, GCP, and Azure data engineering stacks with concrete real-time scenarios.

Read full article →
RESEARCHIJEAT Journal

Research Paper , IJEAT (Vol. 9, Issue 2)

Peer-reviewed research published in the International Journal of Engineering and Advanced Technology.

Read full article →
POSTLinkedIn

Medallion Lakehouse Architecture , Bronze, Silver, Gold Explained

Why the Medallion pattern wins for enterprise lakehouses , and how to lay out the zones in practice.

Read full article →
POSTLinkedIn

SCD Type 2 & Delta MERGE at Scale

Hands-on take on slowly changing dimensions and idempotent MERGE on Delta Lake across 5,000+ tables.

Read full article →
POSTLinkedIn

From SAS to PySpark , A Migration Playbook

Field notes from converting 100+ legacy SAS scripts into distributed PySpark with 60% perf gains.

Read full article →
POSTLinkedIn

Building RAG Pipelines That Actually Work in Production

Real-world RAG over 5,000+ lakehouse tables , chunking, retrieval, grounding, and evaluation.

Read full article →
POSTLinkedIn

Voice AI for Telehealth: Whisper + LLMs for SOAP Notes

How real-time transcription + structured prompting cut clinical documentation time by 60%.

Read full article →
POSTLinkedIn

n8n + LLMs: Orchestrating Marketing AI Agents

A pragmatic recipe for chaining LLM calls with n8n to ship multi-channel marketing automation.

Read full article →
POSTLinkedIn

Data Quality at Lakehouse Scale , Great Expectations Patterns

Schema, null, referential, and range checks promoted between Bronze, Silver, and Gold.

Read full article →
POSTLinkedIn

IAM, RBAC & HIPAA in Cloud Data Platforms

How least-privilege, audit logging, and field-level encryption come together for healthcare data.

Read full article →
Interactive

Talk to My Resume

AI assistant trained on Puja's complete background.

Recruiter mode:
Data Engineer
DATA ENGINEER
Puja is a Data Engineer with experience building enterprise cloud data platforms. She owns a $40B Medallion Lakehouse with 5,000+ tables across 42 databases, migrated Thomson Reuters/Moody's/Bloomberg, built 35+ production ETL/ELT pipelines (batch + streaming + CDC), and converted 100+ SAS programs to PySpark. She has shipped 1TB+ migrations, 2M+ patient record loads, and sub-5-minute reporting.
Agentic AI / GenAI
AGENTIC AI / GENAI
Puja ships production AI agents. She's built 5 complete systems: an Interview Prep Agent with RAG + voice AI, a Marketing Agent with LLM + n8n, an AI Dictation tool for telehealth (60% doc time saved), a Loan Document Intelligence app processing 100+ PDFs, and 1 more agentic system at Ogha. All built full-stack , data pipeline, AI backend, and React frontend.
Full Stack + Data
FULL STACK + DATA
Puja brings rare full-stack + data + AI depth. She builds React frontends, Flask backends, REST APIs, and connects them to intelligent data pipelines and LLM agents. At Ogha and Farmer Mac she delivered complete systems end-to-end , from Kafka streams and PySpark to Azure OpenAI APIs and interactive React UIs.
TRY ASKING
P
Puja AI
● online · knows everything

Hi! I'm Puja's AI assistant. I know her complete background , every project, every agent, every pipeline. Ask me anything!

Puja Ivaturi , Data & Agentic AI Engineer illustration
Contact

Let's Connect

Open to Data Engineer, Agentic AI, and ML Engineering roles , remote or hybrid.

EMAIL
pujaankitha.uta@gmail.com
LINKEDIN
linkedin.com/in/pujai
PHONE
682-699-1060
PREFERRED ROLES
Data Engineer (Primary)
Agentic AI / ML Engineer
GenAI / LLM Engineer
Full Stack + Data Engineer