Emerging Technologies – Page 19 – C4: Container, Code, Cloud & Context

Edge AI with ONNX Runtime: Running Models On-Device

Posted on January 10, 2025 by Nithin Mohan TK 6 min read

Last year, I deployed an AI model to a mobile device. The first attempt failed—the model was too large, inference was too slow, and battery drain was unacceptable. After optimizing 15+ models for edge deployment using ONNX Runtime, I’ve learned what works. Here’s the complete guide to running AI models on-device with ONNX Runtime. Figure […]

Read more →

Vector Database Comparison: Pinecone vs Weaviate vs Qdrant vs Chroma – Choosing the Right One for Your RAG Application

Posted on January 9, 2025 by Nithin Mohan TK 4 min read

Last March, a 3AM alert changed everything. Our Pinecone bill had tripled overnight, and I spent the next three months migrating between vector databases, learning hard lessons about what actually matters. Let me share what I discovered—and what I wish someone had told me. Figure 1: Comprehensive comparison of vector database options The Night Everything […]

Read more →

Embracing the DevSecOps Landscape in Azure: A Comprehensive Guide

Posted on January 7, 2025 by Nithin Mohan TK 5 min read

Introduction The world of software development is continuously evolving, and one of the key drivers of this evolution is the need for speed, agility, and security. The DevSecOps approach is gaining traction, as it integrates security practices into the DevOps pipeline, ensuring that applications are developed and deployed in a secure and compliant manner. Microsoft […]

Read more →

RAG Optimization: Query Rewriting, Hybrid Search, and Re-ranking

Posted on January 1, 2025 by Nithin Mohan TK 9 min read

Introduction: Retrieval-Augmented Generation (RAG) grounds LLM responses in factual data, but naive implementations often retrieve irrelevant content or miss important information. Optimizing RAG requires attention to every stage: query understanding, retrieval strategies, re-ranking, and context integration. This guide covers practical optimization techniques: query rewriting and expansion, hybrid search combining dense and sparse retrieval, re-ranking with […]

Read more →

LLM Routing and Model Selection: Optimizing Cost and Quality in Production

Posted on December 24, 2024 by Nithin Mohan TK 9 min read

Introduction: Not every query needs GPT-4. Routing simple questions to cheaper, faster models while reserving expensive models for complex tasks can cut costs by 70% or more without sacrificing quality. Smart LLM routing is the difference between a $10,000/month AI bill and a $3,000 one. This guide covers implementing intelligent model selection: classifying query complexity, […]

Read more →

Designing Enterprise VPC Networks on Google Cloud: From Zero Trust to Global Scale

Posted on December 20, 2024 by Nithin Mohan TK 11 min read

Enterprise VPC design on Google Cloud requires balancing security, performance, and operational simplicity. This comprehensive guide covers Zero Trust architecture, global network design, VPC Service Controls, and hybrid connectivity patterns that meet the demands of modern enterprise workloads. Zero Trust Network Architecture Zero Trust assumes no implicit trust—every access request must be authenticated and authorized […]

Read more →

Searching in

Category: Emerging Technologies

Edge AI with ONNX Runtime: Running Models On-Device

Embracing the DevSecOps Landscape in Azure: A Comprehensive Guide

RAG Optimization: Query Rewriting, Hybrid Search, and Re-ranking

LLM Routing and Model Selection: Optimizing Cost and Quality in Production