Conducted Compute Disturb Analysis and Proposed Mitigation techniques for SRAM 6T-based Digital IMC
Designed IMC Architecture including Decoder and Peripherals
Developed Compute Circuits for bit-serial addition, multiplication, and MAC operation
Designed a Compute-enabled Sense Amplifier robust to process variation
Analog In Memory Computing in 6T SRAM
Duration: Dec 2020 - Present
Highlights:
Proposed a column-major analog IMC for multi-bit multiplication with minimal changes to peripheral circuits
Developed a compute-disturb free IMC architecture with high dynamic range and resilience to process variations
Identified wordline degradation as a major source of non-linearity in analog and digital IMC
Implemented four wordline shaping techniques to improve access time and reduce inconsistency in bitline discharge
Proposed a novel process variation tracking method for analog IMC to compensate for process variation
Simulator Design for Digital IMC
Duration: May 2019 - Present
Highlights:
Designed i-CACTI simulator for application-level analysis of Digital IMC
Built FastMem simulator for accurate and fast SRAM modeling integrated with Digital IMC Compute Logic using Bit Serial and Bit Parallel paradigms of computation
Built NeuroCACTI-IMC to calculate performance metrics for in-memory computations in SRAM-based memory sub-system, achieving 8X improvement in EDP for BPA and BSA by modifying memory array structure.
Automatic Analog Circuit Design Tool using Machine Learning
Duration: Aug 2020 - Present
Highlights:
Utilized machine learning to model technology parameters for automatic analog circuit design
Created a design space of circuits (Common Source and Two Stage Operational Amplifier) using the machine learning model
Employed Bayesian optimization to find the optimum design within the design space
Implemented an active learning plugin for dataset generation when it was not readily available
Developed SRAM Memory Compiler
Duration: Feb 2020 - Sep 2020
Highlights:
Analyzed the performance metrics of SRAM arrays generated by OpenRAM compiler in NCSU45nm, ranging from 8B to 4KB
Ported OpenRAM memory compiler to UMC65nm and UMC28nm at both schematic and layout levels, performing a detailed analysis across multiple technologies
Optimized the performance of the complete Memory subsystem by properly sizing transistors in custom library cells and implementing circuit-level changes
Developed VosTrOF: A Unified Tool for Design Space Exploration of Approximate Arithmetic Circuits
Duration: Mar 2019 - May 2020
Highlights:
Designed VosTrOF, a tool that unifies four approximation strategies (Truncation, Overclocking, Voltage Overscaling, and Functional Approximation) for adders, multipliers, and dividers, enabling holistic analysis and design space exploration
Utilized VosTrOF for eight separate image processing applications, demonstrating optimized designs
Compression and Quantization of Convolutional Neural Networks
Duration: May 2019 - July 2019
Highlights:
Surveyed compression and quantization techniques for Neural Networks to reduce memory and compute requirements during inference
Implemented logarithmic quantization in Python for various neural networks and analyzed the effects
A Deep Learning based Automatic Image Colorization Framework
Duration: Feb 2019 - Apr 2019
Highlights:
Used Deep Learning to colorize grayscale images
Utilized class-rebalancing at training time to enhance the diversity of colors in the output image
Implemented the system as a feed-forward net in a Convolutional Neural Network at test time, trained on the Imagenet Dataset
Xilinx Vivado Automation Using Python
Duration: Dec 2018 - Jan 2019
Highlights:
Automated Xilinx Vivado's Synthesis and Simulation flow using Python, Bash, and Tcl
Designed a wrapper to simulate and synthesize a large number of RTL files, providing metrics in a single file
Heterogeneous SRAM Cell Sizing for Low-Power Image Processing Applications
Duration: Oct 2018 - Nov 2018
Highlights:
Designed an 8x8 array of 6T SRAM cells with 8 cells in a row of heterogeneous/different sizes
Simulated and performed read/write operations with the help of peripheral circuitry in Cadence Virtuoso using UMC 65nm technology
Developed a 1000X faster algorithm than the proposed one to find optimum sizes for heterogeneous SRAM cells using the Dynamic Programming approach.