Machine Learning Engineer, AIML
Convert AIML research to real world use cases. Implement ML models in Apple frameworks and release sample ML applications.
ML frameworks development on MacOS and iOS. Swift APIs Design.
Released iOS sample app: Counting Human Body Actions in a Live Video Feed.
Released iOS sample app: Detect Hand Poses with Vision.
Released iOS sample app: Detecting Human Actions in a Live Video Feed.
Released iOS sample app: Build a Feature-Rich App for Sports Analysis.
WWDC 2022 project DRI (human body action repetition counting): What's New in Create ML. Compose Advanced Models with Create ML Components.
WWDC 2021 project DRI (hand gesture recognition): Classify Hand Poses and Actions with Create ML.
WWDC 2020 project DRI and speaker (human body pose and action recognition): Build an Action Classifier with Create ML.
Apple Worldwide Developers Conference (WWDC) 2019 project DRI and speaker (activity recognition with motion sensors and Apple Watch): Building Activity Classification Models in Create ML.
Apple SpotlightOn Software University 2019 speaker: Data Manipulation and Model Creation powered by Create ML.
Performance and Power Software Engineer, CoreOS
Apply machine learning in system (System ML), and develop predictive energy technologies.
OS kernel intelligent resource management: system energy efficiency, performance, power, and battery life.
(US Patent) Using software to prevent iPhone unexpected shutdown in cold weather and with aged batteries: https://support.apple.com/en-us/HT208387.
Parallel Computing Lab (PCL) Software Engineering Intern, 2014.
Focused on System ML, Linux kernel, system modeling and system control.
Project areas:
Design and implementation of a Machine Learning based power management governor in Linux Kernel -- SMART governor -- to learn and predict the most energy efficient operating points for each core.
Analytical modeling for Intel CPUs performance and power, based on different types of computation and memory bounded workloads and benchmarks, and different types of on-chip voltage regulators.
Parallel Computing Lab (PCL) Software Engineering Intern, 2015.
Focused on applied ML in systems, and system architecture innovation.
Project areas:
Design of a "Shadow Core" architecture to offload memory-bounded workload threads from main CPU cores to a "Shadow Core" accelerator for the overall performance and energy efficiency improvement.
Design of a Machine Learning based energy efficient mechanism to learn and detect workload characteristics, and transparently migrate threads to the most suitable heterogenous cores, including CPU, GPU, and accelerator cores.
Parallel Computing Lab (PCL), 2015
Project: SMART Linux Governor
• A Linux kernel module---SMART Linux governor---is developed for online power management in the operating system. The intelligence comes from a machine learning model which is offline trained with Caffe, and takes online performance counters as inputs.
Project: Voltage Regulator Aware Performance & Power Modeling
• A performance and power model is constructed to integrate voltage regulator parameters into DVFS power management. This model can guide more energy efficient workload optimizations under different types of voltage regulators, as well as better voltage regulator designs under different types of workloads.
Parallel Computing Lab (PCL), 2014
Project: Shadow Core Architecture
Project: Intelligent Power Management
• Machine-learning and heuristics based intelligent power management mechanisms are designed to detect workload characteristics and transparently migrate memory-bound threads onto---Shadow Core---a hidden, specialized memory core optimized for energy efficiency.
Computer Systems Architecture Lab, 2010 - 2016
Project: Voltage Regulator Aware Power Management
• A reinforcement learning based online DVFS policy, and a Linear voltage regulator (LDO) based hybrid framework are developed for more energy efficient power management in multi-core processors.
Project: Near-Threshold Computing
• An aggressively fast, and deeply-pipelined microprocessor in MOS Current-Mode Logic (MCML) is designed for near-threshold computing. This architecture mitigates the static power limitation of MCML, and results in 3x energy, 4x speed, and 10x noise improvement over CMOS multi-core processors.
Project: MOS Current Mode Logic (MCML) Microprocessors
• Designed a low noise (40×), high frequency (13GH@22nm), energy-efficient (1.6× over an eight-core static CMOS processor) single-core processor for throughput demanding parallel applications in MOS Current Mode Logic (MCML). The static power is compensated by a low cost deep pipelining scheme---C-slow retiming and interleaved multithreading.
Project: 3D Memory Architecture in GPUs
• Designed 3-D memory architecture for frame buffers in GPUs to speed up multi-dimensional memory access (especially z-buffer processing) with negligible hardware overhead. The number of memory misses are significantly reduced with a flexible banking and memory remapping scheme.
Project: Ternary Content Addressable Memory (TCAM)
• Designed a PCM-based resistive ternary content addressable memory (TCAM) accelerator that improves performance (4×) and reduces power (10×) for data-intensive applications. It eliminates data movement overhead in conventional RAM based systems, and directly processes data on the TCAM chip.
Advanced System-on-Chip (SoC) and Integrated System Research Center, 2009-2010
Project: Task Clustering and Mapping on Network-on-Chips (NoCs)
• Algorithm designed to reduce communication overhead among tasks on homogenous NoCs. It heuristically groups dependent tasks into the same cluster, duplicates the task when shared by multiple clusters, and maps independent clusters onto different processor nodes.
Department of Electrical and Computer Engineering, 2010-2011
Teaching Assistant for Introduction to Signals & Circuits, 2010
Teaching Assistant for Circuits and Signals, 2011
• Supported faculty instructors by guiding labs and recitations independently, and providing individual and small group instruction to students, reinforced topics covered in lecture and lab, and specifically enhanced fundamental knowledge of circuit theories, signal processing and electronic devices.
ISCA (The International Symposium on Computer Architecture), 2012 - 2016.
MICRO (IEEE/ACM International Symposium on Microarchitecture), 2011 - 2016.
HPCA (The IEEE International Symposium on High-Performance Computer Architecture), 2018, 2019.
Supercomputing (The International Conference on Super Computing) 2018.
ICPP-EMS (The International Workshop on Embedded Multicore Systems) 2017, 2018, 2019.
Computer Frontiers (ACM International Conference on Computing Frontiers), 2018.
IISWC (The IEEE International Symposium on Workload Characterization), 2018.
International Journal of Electronics and Communications, 2017.
Journal of Signal Processing, 2017.
TCAD (IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems), 2017, 2018.
ACM Transactions on Design Automation of Electronic Systems, 2017, 2018.
TC (The IEEE Transactions of Computers), 2017.
ICCD (The IEEE International Conference on Computer Design), 2017, 2018.
Microelectronics Journal, 2017, 2018, 2019.
...