About
I started in silicon. The early part of my career was spent designing ASICs at startups — packet processing engines, storage switches, Ethernet controllers — working from micro-architecture through tapeout, post-silicon validation, production bring-up. That foundation is still how I think: systems from the physics up, constraints are real, timing and power matter.
From there I moved into leading the teams that build those systems. At 3PAR I managed the ASIC development org through two chip generations and an acquisition by HP. At HPE I ran the platform software organization — ASIC drivers, BIOS, Linux kernel, 30+ engineers across the storage stack. A stint at Amazon gave me exposure to how hyperscaler-scale software organizations operate. Then back to startups, where the pace is different.
At Flex Logix I ran the compiler and solutions teams for an inference accelerator — connecting the hardware architecture to the ML models running on it. At ZaiNar I built a 30-engineer organization from the ground up spanning cloud, edge, algorithms, and UI — and drove a system that goes from RF signal at the antenna to asset position on a map, deployed in production.
AI was a thread long before it was practical. I was writing neural networks in LISP at Purdue in the mid-90s when it was an academic curiosity. Over the past two years I have built a production multi-agent development platform at ZaiNar — specialized agents operating in distinct roles: code generation, dependency management, automated testing, and live system monitoring, coordinated as a development team. This is not AI tooling adoption; it is systems architecture applied to agent orchestration. The result is an organization that ships at a scope and velocity that previously required far more headcount.
What I've learned is that the scarce thing now isn't depth — agents can go deep. The scarce thing is someone who can direct across domains, knows enough of each to catch when the output is wrong, and can architect the system that spans them. That's the role I've been building toward my whole career without knowing it had a name.
Experience
ZaiNar · Belmont, CA
Built and led a 30-engineer organization spanning cloud platform, UI/backend, edge/embedded, algorithms, and SQA — delivering a full indoor location SaaS from RF signal to asset position on a map.
- Organized software group into five functional teams: cloud, UI/backend, edge, algorithms, SQA
- Re-architected cloud from Lambda microservices to modular monolith (NestJS on ECS) — improved velocity and product alignment
- Deployed location SaaS on AWS with data pipelines, Apache Flink streaming, authentication, and geofencing for asset tracking
- Built a system of AI agents operating as a development team — shipped a full company portal with MCP integrations (Jira, Confluence, Slack, Athena) and agentic infrastructure monitoring in ~30 days
- On joining, conducted a detailed memory movement analysis of the 900MHz edge product and refactored it for maximum efficiency — one of the first technical contributions while simultaneously standing up the organization
- Re-wrote DSP algorithms in C++, deployed to iMX6 edge hardware
- Built .dae → USD → Mitsuba XML → GPU pipeline for Sionna RF simulation; used output to improve LOS classification and annotate USD for physical AI ingestion
Flex Logix Technologies · Mountain View, CA
Managed three teams totaling 20 engineers — Compiler, DevOps, and Solutions Architecture — for an eFPGA/inference accelerator company. Hands-on across compiler infrastructure, driver development, and customer-facing tooling. Role concluded when Flex Logix discontinued its inferencing product line and significantly reduced headcount in October 2022.
- Developed the Thunderbolt driver enabling PCIe-over-Thunderbolt attach for the inference accelerator — the primary customer attach path for the development platform
- Developed compiler infrastructure to multiplex compile, place-and-route jobs across the toolchain; containerized for cloud deployment
- Led team to build a vision ML-Ops pipeline; expanded model runtime support from 2 to 30 models including YOLOv3/YOLOv4 CNN operator optimization
- Built operator kernel configurations targeting eFPGA RTL macros — Add, CONCAT, convolutional layers, ReLU/leaky ReLU activations, and Winograd convolution optimizations for the inference accelerator
- Developed a model validation UI integrating with Roboflow and other vision SDK containers for loading, running, and validating inference results on device
- Oversaw GPU cluster deployment and runtime environment scaling; developed CNN integrity check tooling
Amazon Web Services · Cupertino, CA
Led a 12-person firmware, infrastructure, and DevOps team developing AWS's custom in-house BMC stack — built to deprecate IPMI and the legacy backend console network across Amazon's entire data center fleet.
- Developed a custom BMC stack on Amazon Linux Carbon OS, networked to servers via the onboard 100Gb NIC; data center operators access it via Secure CoAP through the top-of-rack router — replacing the legacy IPMI console network entirely
- Led the security ingestion initiative: personally made companion FPGA modifications that enforced the security posture during server ingestion, replacing vendor BMC firmware with the custom AWS stack at hardware intake across all new server types
- Developed data center tooling to replace IPMI management tools, built systems for test rack management, and developed hardware QA testing plans for the data center
- Built microservices and Lambdas for hardware installation tracking, firmware update orchestration, and fleet health monitoring; integrated with security and data center teams
Hewlett Packard Enterprise (3PAR) · San Jose, CA
After Gen6 ASIC completion in 2015, stepped into software leadership during the massive usermode re-architecture — formally took the Software Section Manager role in 2017 and drove the project to product release in 2019. HPE Primera shipped entirely in usermode.
- Led the architectural transition made possible by the Gen6 ASIC: moved 3PAR's monolithic kernel-mode storage application to a fully usermode application with userspace drivers and direct processor memory access to the NUMA cache — HPE Primera shipped at release in 2019 as a fully usermode product
- Served as architecture decision-maker across driver interfaces, kernel integration, memory access patterns, and release architecture — grounded in direct ownership of the hardware those layers ran on
- Established development process, code review standards, CI/CD pipeline, and release discipline for a 30+ engineer organization across four functionally distinct teams
- Set IP ownership policy and served as internal technical authority on novel implementation versus vendor-supplied components
- Managed test strategy and release process through the HPE Primera hardware launch, coordinating firmware, software, and hardware teams under a fixed delivery schedule
- Served as final escalation point for customer-impacting issues — hardware background enabled root-cause analysis that crossed the ASIC/driver boundary
Hewlett Packard Enterprise (3PAR) · Fremont, CA
Led 18-person ASIC development and validation team through two full chip generations (Gen5, Gen6) while driving a decade-long architectural transformation — from an ASIC that held the storage cache locally in DDR3 to a lean ordered-semantics engine over unified processor memory. Co-inventor on three patents across both generations.
- Gen5: Two ASICs per node controller, each with 2x DDR3 interfaces (up to 256GB cache); PCIe gen1/gen2 x8 to both processor complex (3 links) and backplane (4 links); drove the NUMA synchronization work captured in US8244930
- Gen6: Led architectural redesign to 7 ASICs per node — one per backplane link — enabling a fully connected 8-node storage mesh; cache relocated entirely to unified processor memory; each ASIC: PCIe Gen3 x8 to processor memory + proprietary backplane at PCIe Gen3 equivalent bandwidth
- Gen6 DMA engine was a full micro-architecture overhaul — virtualized DMA pipeline scaling from 8 to 256 parallel engines, a significant departure from the Gen4/Gen5 approach; MSI-X interrupt mechanism and flush barriers retained for ordering correctness
- Co-inventor on US10467162 — Interrupt based on a last interrupt request indicator and a work acknowledgement — the interrupt ordering mechanism that was the key to safely migrating the storage cache from ASIC-local DDR3 to processor memory
- Co-inventor on US20180373653 — Commitment of acknowledged data in response to request to commit — write commitment ordering mechanism complementing the interrupt architecture
- Co-inventor on US8244930 — Mechanisms for synchronizing data transfers between non-uniform memory architecture computers — NUMA data transfer synchronization from the Gen4/Gen5 era
- Drove most of the architectural decisions across both generations while building and growing the team; owned bring-up and verification plans; Gen6 early integration used Xilinx development kits with a scaled-down Gen6 ASIC (1 lane gen1 PCIe and backplane, 2 DMA engines vs. 256) — when production silicon arrived, the full system was integrated and fully functional within a week
- Gen6 completed 2015: zero ECOs, zero respins — the culmination of LEC/ECO discipline and architectural rigor built across four prior chip generations; completion opened the path to leading the usermode re-architecture of the software stack
- Served as primary architecture decision-maker at the hardware/software boundary — defined interfaces and memory access patterns that shaped the entire software stack above
- Managed customer escalations at the ASIC/platform interface — root-caused hardware bugs under production pressure, translating field symptoms to chip-level root cause
- Negotiated silicon vendor contracts with Open Silicon and Broadcom
- Led team through 3PAR IPO and HP acquisition, maintaining continuity and delivery through significant organizational change
3PAR / Hewlett Packard Enterprise · Fremont, CA
Joined 3PAR as ASIC Design Lead in 2005 — the start of a 14-year tenure through the company's IPO, HP acquisition, and two subsequent management roles. Led design and implementation of the Gen4 ASIC: the chip that held the storage cache in ASIC-local DDR3 and connected the backplane, Fibre Channel, and SAS line cards through a single fabric.
- Designed the application layer for backplane and PCIe host connect — established the architectural foundation that Gen5 carried forward
- Gen4 was the DDR3-cache era: ASIC as the central fabric holding the storage cache locally and connecting all front-end (FC, SAS) and backplane traffic — the architecture Gen5 and Gen6 would progressively transform
- Set up and ran synthesis and formal verification for Gen4 ASIC
- Owned the system management FPGA — implemented SPI, I2C, and other low-level serial interfaces along with a management CPU interface, providing the system-level control plane for the storage platform
- Applied the LEC/ECO automation practice to Gen4 — implemented gate/wire-level ECOs across the development cycle using formal equivalence flows built at Aarohi and Mistletoe
- Built ASIC-specific scripts and hooks for in-system performance diagnostics in Perl
Mistletoe Technologies (later Gigafin Networks) · San Jose, CA
Owned the PCI-X host interface and TCP/IP input buffer parsing on a Gigabit Ethernet ASIC through tapeout and bring-up.
- Designed and owned the complete PCI-X host interface implementation
- Implemented TCP/IP packet input buffer parsing for the packet processing pipeline
- Designed an Ethernet PHY converter FPGA using Lattice Semiconductor
- Owned formal boolean equivalence checking (LEC/Conformal) and ECO automation — refined the practice started at Aarohi, implementing and verifying hundreds-to-thousands of gate/wire-level changes with automated flows; became the go-to for late-stage netlist changes
- Delivered through a complete ASIC tapeout and successful silicon bring-up
- Co-inventor on patent application US20070019661 — Packet output buffer for semantic processor
Aarohi Communications · San Jose, CA
Took over micro-architecture and RTL design of the input buffer for a Fibre Channel switch ASIC through tapeout and bring-up. Company later acquired by Emulex.
- Owned micro-architecture and RTL design of the input buffer parser for Fibre Channel and Fibre Channel over Ethernet
- Designed buffer credit manager for the FC switch fabric
- Developed expertise in formal boolean equivalence checking (LEC/Conformal) and wrote automation to implement gate/wire-level ECOs — applied across several hundred-to-thousand gate/wire changes during development; a practice that carried through four subsequent ASIC generations
- Delivered through a complete ASIC tapeout and successful silicon bring-up for the Aarohi FC switch
Caspian Networks · San Jose, CA
Designed core components of a 10Gb IP packet processor ASIC — including the central forwarding lookup engine — and built the complete verification environment through one tapeout and bring-up.
- Designed the ingress and egress packet processing pipeline on a high-speed IP packet processor ASIC
- Designed an inline data structure processor to parse TCP/IP and Ethernet header fields into a hash key for 'flow block' memory lookup — the core forwarding decision engine
- Built the complete verification environment using Perl socket interfaces to Verilog for end-to-end packet processing validation
- Designed an MPLS conversion FPGA using Altera — part of the packet processing pipeline for label switching
- Carried the design through one full tapeout and silicon bring-up
iReady Corporation · San Jose, CA
First role after graduation. Built the verification and post-silicon validation environment for a hardware TCP/IP silicon stack, then deployed to the field to integrate it into a commercial wireless product.
- Developed the complete verification environment for a hardware TCP/IP stack including TCP/IP, ARP, and UDP protocol engines
- Built regression suite, C behavioral models, and Verilog testbench infrastructure
- Created Perl socket interface to the Verilog simulator enabling real-time protocol injection and checking
- Developed post-silicon validation environment for hardware operation in both simulation and real-time
- Deployed to the field to integrate iReady's TCP/IP stack FPGA with Sony's Airboard base station — a wireless internet appliance with a dial-up uplink distributing content over wireless Ethernet to handheld display devices; one of the first consumer wireless internet products
Cisco Systems · San Jose, CA
Continuous internship throughout college on the ASIC tools team, building internal infrastructure for monitoring, coordinating, and supporting the ASIC design flow across engineering teams.
- Developed internal web-based tools and company sites for tracking and monitoring ASIC tool usage across design teams
- Built and maintained mailing lists and discussion forums for ASIC tool users
- Designed and implemented a real-time monitoring system for ASIC backend and simulation processes dispatched to compute clusters
- Worked continuously from summer 1996 through graduation in December 1998
Patents
Interrupt based on a last interrupt request indicator and a work acknowledgement
US10467162 · Granted Nov 2019 · Hewlett Packard Enterprise
Co-inventor with Gregory Lee Dykema and Michael T. Longenbach. The interrupt ordering mechanism that enabled safe migration of the 3PAR storage cache from ASIC-local DDR3 to unified processor memory — ensures correct ordering semantics when work completion signals are generated after cache becomes processor-attached.
Commitment of acknowledged data in response to request to commit
US20180373653 · Application filed Jun 2017 · Hewlett Packard Enterprise
Co-inventor with Gregory Lee Dykema, Siamak Nazari, and Michael T. Longenbach. Write commitment ordering mechanism complementing the interrupt architecture — together these two patents capture the ordering IP at the core of the Gen6 ASIC architectural transition.
Mechanisms for synchronizing data transfers between non-uniform memory architecture computers
US8244930 · Granted Aug 2012 · 3PAR / Hewlett-Packard
Co-inventor with Greg L. Dykema and David H. Bassett. NUMA data transfer synchronization from the Gen4/Gen5 dual-ASIC architecture — when the node controller used two ASICs per node with ASIC-local DDR3 storage cache.
Packet output buffer for semantic processor
US20070019661 · Application filed Jul 2005 · Gigafin Networks
Co-inventor with Kevin Rowett, Rajesh Nair, and Caveh Jalali. Packet output buffer architecture for the Gigabit Ethernet ASIC semantic processor.