Opening
8:30-9:30 Keynote I: Madan Musuvathi, Microsoft, Beyond the embarrassingly parallel – New languages, compilers, and runtimes for big-data processing
Session 1A – Hardware Accelerators


Session chair: David Kaeli, Northeastern

Memristive Boltzmann Machine: A Hardware Accelerator for Combinatorial Optimization and Deep Learning
Mahdi Nazm Bojnordi and Engin Ipek (University of Rochester)

TABLA: A Unified Template-based Architecture for Accelerating Statistical Machine Learning
Divya Mahajan, Jongse Park, Emmanuel Amaro, Hardik Sharma, Amir Yazdanbaksh, Joon Kim, and Hadi Esmaeilzadeh (Georgia Institute of Technology)

Pushing the Limits of Accelerator Efficiency While Retaining General-Purpose Programmability
Tony Nowatzki, Vinay Gangadhar, and Karthikeyan Sankaralingam (University of Wisconsin – Madison) and Greg Wright (Qualcomm)

Session 1B – Mobile/IoT


Session chair: Xuehai Qian, USC

A Low Power Software-Defined-Radio Baseband Processor for the Internet of Things
Yajing Chen, Shengshuo Lu, Hun-Seok Kim, David Blaauw, Ronald Dreslinski Jr, and Trevor Mudge (University of Michigan)

Improving Smartphone User Experience by Balancing Performance and Energy with Probabilistic QoS Guarantee
Benjamin Gaudette, Carole-Jean Wu, and Sarma Vrudhula (Arizona State University)

Mobile CPU’s Rise to Power: Quantifying the Impact of Generational Mobile CPU Design Trends on Performance, Energy, and User Satisfaction
Matthew Halpern, Yuhao Zhu, and Vijay Janapa Reddi (UT Austin)

Session 2A – Non-volatile Memories


Session chair: Daniel Jiménez, Texas A&M

Atomic Persistence for SCM with a Non-intrusive Backend Controller
Kshitij Doshi (Intel Corporation) and Ellis Giles and Peter Varman (Rice University)

CompEx: Compression-Expansion Coding for Energy, Latency, and Lifetime Improvements in MLC/TLC NVM
Poovaiah M. Palangappa and Kartik Mohanram (University of Pittsburgh)

A Low-Power Hybrid Reconfigurable Architecture For Resistive Random-Access Memories
Miguel Angel Lastras Montaño, Amirali Ghofrani, and Kwang-Ting Cheng (UCSB)

Session 2B – Reconfigurable Architectures


Session chair: Murali Annavaram, U. Southern California

A Performance Analysis Framework for Optimizing OpenCL Applications on FPGAs
Zeke Wang and Bingsheng He (Nanyang Technological University), Wei Zhang (HKUST), and Shunning Jiang (Nanyang Technological University)

HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing
Mingyu Gao and Christos Kozyrakis (Stanford University)

Software Transparent Dynamic Binary Translation for Coarse-Grain Reconfigurable Architectures
Matthew Watkins (Lafayette College), Anthony Carno (Bucknell University), and Tony Nowatzki (University of Wisconsin-Madison)

Session 3A – GPUs


Session chair: Carole-Jean Wu, Arizona State

Core Tunneling: Variation-Aware Voltage Noise Mitigation in GPUs
Renji Thomas, Kristin Barber, Naser Sedaghati, Li Zhou, and Radu Teodorescu (The Ohio State University)

Warped-Preexecution: A GPU Pre-execution Approach for Improving Latency Hiding
Keunsoo Kim, Sangpil Lee, and Myung Kuk Yoon (Yonsei University), Gunjae Koo (University of Southern California), Won Woo Ro (Yonsei University), and Murali Annavaram (University of Southern California)

Approximating Warps with Intra-warp Operand Value Similarity
Daniel Wong (University of California, Riverside), Nam Sung Kim (University of Illinois at Urbana–Champaign), and Murali Annavaram (University of Southern California)

A Case for Toggle-Aware Compression for GPU Systems
Gennady Pekhimenko (CMU), Evgeny Bolotin (NVIDIA), Nandita Vijaykumar, Onur Mutlu, and Todd C. Mowry (CMU), and Stephen W. Keckler (NVIDIA / UT-Austin)

Session 3B – Caches


Session chair: Jun Yang, U. Pittsburgh

Minimal Disturbance Placement and Promotion
Elvira Teran and Daniel A. Jiménez (Texas A&M), Zhe Wang (Intel Labs), and Yingying Tian (AMD)

Revisiting Virtual L1 Caches: A Practical Design Using Dynamic Synonym Remapping
Hongil Yoon and Gurindar S. Sohi (University of Wisconsin–Madison)

Modeling Cache Performance Beyond LRU
Nathan Beckmann and Daniel Sanchez (MIT)

Efficient Footprint Caching for Tagless DRAM Caches
Hakbeom Jang (Sungkyunkwan University), Yongjun Lee (Sungkyunkwan University and Samsung Electronics), Jongwon Kim (Sungkyunkwan University), Youngsok Kim and Jangwoo Kim (POSTECH), and Jinkyu Jeong and Jae W. Lee (Sungkyunkwan University)

Session 4A – Coherence and Consistency


Session chair: Daniel Sanchez, MIT

SCsafe: Logging Sequential Consistency Violations Continuously and Precisely
Yuelu Duan and Josep Torrellas (University of Illinois) and David Koufaty (Intel Corporation)

LASER: Light, Accurate Sharing dEtection and Repair
Liang Luo, Akshitha Sriraman, and Brooke Fugate (University of Pennsylvania), Shiliang Hu, Gilles Pokam, and Chris Newburn (Intel), and Joseph Devietti (University of Pennsylvania)

Improving GPU Hardware Transactional Memory Performance via Conflict and Contention Reduction
Sui Chen and Lu Peng (Louisiana State University)

PleaseTM: Enabling Transaction Conflict Management in Requester-wins Hardware Transactional Memory
Sunjae Park and Milos Prvulovic (Georgia Institute of Technology) and Christopher J Hughes (Intel)

Session 4B – Interconnects


Session chair: José Flich, U. Politècnica de Valencia

Efficient Synthetic Traffic Models for Large, Complex SoCs
Jieming Yin, Onur Kayiran, and Matthew Poremba (AMD Research), Natalie Enright Jerger (AMD Research, University of Toronto), and Gabriel H. Loh (AMD Research)

DVFS for NoCs in CMPs: A Thread Voting Approach
Yuan Yao and Zhonghai Lu (KTH Royal Institute of Technology, Sweden)

SLaC: Stage Laser Control for a Flattened Butterfly Network
Yigit Demir (Intel) and Nikos Hardavellas (Northwestern University)

The Runahead Network-On-Chip
Zimo Li and Joshua San Miguel (University of Toronto) and Natalie Enright Jerger (University of Toronto/AMD)

8:30-9:30 Keynote II: Keshav Pingali, U. Texas, 50 Years of Parallel programming: Ieri, Oggi, Domani
Session 5A – GPGPUs


Session chair: Jangwoo Kim, POSTECH

Towards High Performance Paged Memory for GPUs
Tianhao Zheng (The University of Texas at Austin & NVIDIA), David Nellans, Arslan Zulfiqar, and Mark Stephenson (NVIDIA), and Stephen W Keckler (NVIDIA / UT-Austin)

Simultaneous Multikernel GPU: Multi-tasking Throughput Processors via Fine-Grained Sharing
Zhenning Wang (Shanghai Jiao Tong University), Jun Yang, Rami Melhem, Bruce Childers, and Youtao Zhang (University of Pittsburgh), and Minyi Guo (Shanghai Jiao Tong University)

iPAWS : Instruction-Issue Pattern-based Adaptive Warp Scheduling for GPGPUs
Minseok Lee (KAIST), Gwangsun Kim (KAIST / NVIDIA), John Kim (KAIST), and Woong Seo, Yeongon Cho, and Soojung Ryu (Samsung Electronics)

Session 5B – Security


Session chair: Drew Hilton, Duke

Lattice Priority Scheduling: Low-Overhead Timing Channel Protection for a Shared Memory Controller
Andrew Ferraiuolo, Yao Wang (Cornell University), Danfeng Zhang (Penn State University), Andrew Myers, and Ed Suh (Cornell University)

A Complete Key Recovery Timing Attack on a GPU
Zhen Jiang, Yunsi Fei, and David Kaeli (Northeastern University)

CATalyst: Defeating Last Level Cache Side Channel Attacks in Cloud Computing
Fangfei Liu (Princeton University), Qian Ge (NICTA and UNSW), Yuval Yarom (University of Adelaide and NICTA), Frank Mckeen and Carlos Rozas (Intel), Gernot Heiser (NICTA and UNSW), and Ruby Lee (Princeton University)

Session 6A – Large-Scale Systems


Session chair: Jason Mars, U. Michigan

Predicting the Memory Bandwidth and Optimal Core Allocations for Multi-threaded Applications on Large-scale NUMA Machines
Wei Wang, Jack Davidson, and Mary Lou Soffa (University of Virginia)

A Market Approach for Handling Power Emergencies in Multi-Tenant Data Center
Mohammad A. Islam (UC Riverside), Xiaoqi Ren (Caltech), Shaolei Ren (UC Riverside), Adam Wierman (Caltech), and Xiaorui Wang (The Ohio State University)

SizeCap: Efficiently Handling Power Surges in Fuel Cell Powered Data Centers
Yang Li (Carnegie Mellon University), Di Wang (Microsoft Corporation), Saugata Ghose (Carnegie Mellon University), Jie Liu, Sriram Govindan, Sean James, Eric Peterson, and John Siegler (Microsoft Corporation), and Rachata Ausavarungnirun and Onur Mutlu (Carnegie Mellon University)

Session 6B – Potpourri


Session chair: Trevor Mudge, U. Michigan

MaPU: A Novel Mathematical Computing Architecture
Donglin Wang, Shaolin Xie, Zhiwei Zhang, Xueliang Du, Lei Wang, Zijun Liu, Xiao Lin, Jie Hao, Chen Lin, Hong Ma, Zhonghua Pu, Guangxin Ding, Wenqin Sun, Fabiao Zhou, Weili Ren, Huijuan Wang, Mengchen Zhu, Lipeng Yang, NuoZhou Xiao, Qian Cui, Xingang Wang, Ruoshan Guo, Xiaoqin Wang (Chinese Academy of Science, Institute of Automation), Leizu Yin (Spreadtrum Comm), Tao Wang, Yongyong Yang (Huawei)

Best-Offset Hardware Prefetching
Pierre Michaud (Inria)

DUANG: Fast and Lightweight Page Migration in Asymmetric Memory Systems
Hao Wang (University of Wisconsin-Madison), Jie Zhang (Yonsei University), Gieseo Park (UT-Dallas), Sharmila Shridhar (University of Wisconsin-Madison), Myoungsoo Jung (Yonsei University), and Nam Sung Kim (University of Illinois-Urbana-Champaign)

Session 7A – Industry Session


Session chair: Jian Li, Huawei

Selective GPU Caches to Eliminate CPU–GPU HW Cache Coherence
Neha Agarwal (University of Michigan), David Nellans, Eiman Ebrahimi (NVIDIA), Thomas F. Wenisch (University of Michigan), John Danskin, and Stephen W. Keckler (NVIDIA)

Venice: Exploring Server Architectures for Effective Resource Sharing
Jianbo Dong, Rui Hou (Institute of Computing Technology, Chinese Academy of Sciences), Michael Huang (University of Rochester), Tao Jiang, Boyan Zhao (Institute of Computing Technology, Chinese Academy of Sciences), Sally A. McKee (Chalmers University of Technology), Haibin Wang, Xiaosong Cui (Huawei Technologies Co., Ltd), and Lixin Zhang (Institute of Computing Technology, Chinese Academy of Sciences)

A Large-Scale Study of Soft-Errors on GPUs in the Field
Bin Nie (College of William and Mary), Devesh Tiwari, Saurabh Gupta (Oak Ridge National Laboratory), Evgenia Smirni (College of William and Mary), and James H. Rogers (Oak Ridge National Laboratory)

Design and Implementation of A Mobile Storage Leveraging the DRAM Interface
Sungyong Seo, Youngjin Cho, Youngkwang Yoo, Otae Bae, Jaegeun Park, Heehyun Nam, Sunmi Lee, Yongmyung Lee, Seungdo Chae, Moonsang Kwon, Jin-Hyeok Choi, Sangyeun Cho, Jaeheon Jeong, and Duckhyun Chang (Samsung Electronics Co., Ltd.)

Session 7B – Memory Technology


Session chair: Engin Ipek, U. Rochester

Restore Truncation for Performance Improvement in Future DRAM Systems
Xianwei Zhang, Youtao Zhang, and Bruce R. Childers (Computer Science Department, University of Pittsburgh) and Jun Yang (Electrical and Computer Engineering Department, University of Pittsburgh)

Parity Helix: Efficient Protection for Single-Dimensional Faults in Multi-dimensional Memory Systems
Xun Jian and Rakesh Kumar (University of Illinois at Urbana Champaign) and Vilas Sridharan (AMD)

Low-Cost Inter-Linked Subarrays (LISA): Enabling Fast Inter-Subarray Data Movement in DRAM
Kevin K. Chang (Carnegie Mellon University), Prashant J. Nair (Georgia Institute of Technology), Saugata Ghose and Donghyuk Lee (Carnegie Mellon University), Moinuddin K. Qureshi (Georgia Institute of Technology), and Onur Mutlu (Carnegie Mellon University)

ChargeCache: Reducing DRAM Latency by Exploiting Row Access Locality
Hasan Hassan (Carnegie Mellon University, TOBB University of Economics & Technology), Gennady Pekhimenko, Nandita Vijaykumar, Vivek Seshadri, and Donghyuk Lee (Carnegie Mellon University), Oguz Ergin (TOBB University of Economics & Technology), and Onur Mutlu (Carnegie Mellon University)

8:30-9:30 Keynote III: Avinash Sodani, Intel, Knights Landing Intel Xeon Phi CPU: Path to Parallelism with General Purpose Programming
Session 8A – Best of IEEE Computer Architecture Letters


Session chair: José Martínez, Cornell

Resistive Associative Processor
Leonid Yavits, Shahar Kvatinsky, Amir Morad, and Ran Ginosar (Technion-Israel Institute of Technology)

Comparing Stochastic and Deterministic Computing
Rajit Manohar (Cornell Tech)

A Graph-Based Program Representation for Analyzing Hardware Specialization Approaches
Tony Nowatzki, Venkatraman Govindaraju, and Karthikeyan Sankaralingam (University of Wisconsin)

Leveraging Heterogeneous Power for Improving Datacenter Efficiency and Resiliency
Longjun Liu (Xi’an Jiaotong University), Chao Li (Shanghai Jiao Tong University), Hongbin Sun (Xi’an Jiaotong University), Yang Hu (University of Florida), Jingmin Xin, Nanning Zheng (Xi’an Jaiotong University), and Tao Li (University of Florida)

Session 8B – Modeling and Testing


Session chair: Brad Beckmann, AMD

Amdahl’s Law for Lifetime Reliability Scaling in Heterogeneous Multicore Processors
William Song, Saibal Mukhopadhyay, and Sudhakar Yalamanchili (Georgia Tech)

LiveSim: Going Live with Microarchitecture Simulation
Sina Hassani, Gabriel Southern, and Jose Renau (UC Santa Cruz)

McVerSi: A Test Generation Framework for Fast Memory Consistency Verification in Simulation
Marco Elver and Vijay Nagarajan (University of Edinburgh)

Session 9A – Caches and TLB


Session chair: Joe Devietti, UPenn

Energy-Efficient Address Translation
Vasileios Karakostas (Barcelona Supercomputing Center, Universitat Politécnica de Catalunya), Jayneel Gandhi (University of Wisconsin – Madison), Adrián Cristal (Barcelona Supercomputing Center, Universitat Politécnica de Catalunya, IIIA-CSIC), Mark D. Hill (University of Wisconsin – Madison), Kathryn S. McKinley (Microsoft Research), Mario Nemirovsky (Barcelona Supercomputing Center, ICREA), Michael M. Swift (University of Wisconsin – Madison), and Osman Ünsal (Barcelona Supercomputing Center)

RADAR: Runtime-Assisted Dead Region Management for Last-Level Caches
Madhavan Manivannan, Vassilis Papaefstathiou, Miquel Pericas, and Per Stenstrom (Chalmers University of Technology)

Cache QoS: From Concept to Reality in the Intel Xeon E5-2600 v3 Server Processor Family
Andrew Herdrich, Edwin Verplanke, Chris Gianos, Ronak Singhal, Ravi Iyer, and Priya Autee (Intel)

Session 9B – Microarchitecture


Session chair: Jose Renau, UC Santa Cruz

Symbiotic Job Scheduling on the IBM POWER8
Josué Feliu (Universitat Politècnica de València), Stijn Eyerman (Ghent University), and Julio Sahuquillo and Salvador Petit (Universitat Politècnica de València)

ScalCore: Designing a Core for Voltage Scalability
Bhargava Gopireddy (University of Illinois), Choungki Song (University of Wisconsin), Josep Torrellas and Nam Sung Kim (University of Illinois), Aditya Agrawal (Nvidia), and Asit Mishra (Intel)

Cost Effective Physical Register Sharing
Arthur Perais and André Seznec (INRIA)

