Conference Program
Sunday
18:00-20:00 | Welcome reception |
Monday
8:00-8:30 | Opening |
8:30-9:30 | Keynote I: Madan Musuvathi, Microsoft, Beyond the embarrassingly parallel – New languages, compilers, and runtimes for big-data processing |
9:30-10:00 | Break |
10:00-11:15 |
Session 1A – Hardware Accelerators
Session chair: David Kaeli, Northeastern Memristive Boltzmann Machine: A Hardware Accelerator for Combinatorial Optimization and Deep Learning TABLA: A Unified Template-based Architecture for Accelerating Statistical Machine Learning Pushing the Limits of Accelerator Efficiency While Retaining General-Purpose Programmability Session 1B – Mobile/IoT
Session chair: Xuehai Qian, USC A Low Power Software-Defined-Radio Baseband Processor for the Internet of Things Improving Smartphone User Experience by Balancing Performance and Energy with Probabilistic QoS Guarantee Mobile CPU’s Rise to Power: Quantifying the Impact of Generational Mobile CPU Design Trends on Performance, Energy, and User Satisfaction |
11:15-11:35 | Break |
11:35-12:50 |
Session 2A – Non-volatile Memories
Session chair: Daniel Jiménez, Texas A&M Atomic Persistence for SCM with a Non-intrusive Backend Controller CompEx: Compression-Expansion Coding for Energy, Latency, and Lifetime Improvements in MLC/TLC NVM A Low-Power Hybrid Reconfigurable Architecture For Resistive Random-Access Memories Session 2B – Reconfigurable Architectures
Session chair: Murali Annavaram, U. Southern California A Performance Analysis Framework for Optimizing OpenCL Applications on FPGAs HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing Software Transparent Dynamic Binary Translation for Coarse-Grain Reconfigurable Architectures |
12:50-14:20 | Lunch |
14:20-16:00 |
Session 3A – GPUs
Session chair: Carole-Jean Wu, Arizona State Core Tunneling: Variation-Aware Voltage Noise Mitigation in GPUs Warped-Preexecution: A GPU Pre-execution Approach for Improving Latency Hiding Approximating Warps with Intra-warp Operand Value Similarity A Case for Toggle-Aware Compression for GPU Systems Session 3B – Caches
Session chair: Jun Yang, U. Pittsburgh Minimal Disturbance Placement and Promotion Revisiting Virtual L1 Caches: A Practical Design Using Dynamic Synonym Remapping Modeling Cache Performance Beyond LRU Efficient Footprint Caching for Tagless DRAM Caches |
16:00-16:20 | Break |
16:20-18:00 |
Session 4A – Coherence and Consistency
Session chair: Daniel Sanchez, MIT SCsafe: Logging Sequential Consistency Violations Continuously and Precisely LASER: Light, Accurate Sharing dEtection and Repair Improving GPU Hardware Transactional Memory Performance via Conflict and Contention Reduction PleaseTM: Enabling Transaction Conflict Management in Requester-wins Hardware Transactional Memory Session 4B – Interconnects
Session chair: José Flich, U. Politècnica de Valencia Efficient Synthetic Traffic Models for Large, Complex SoCs DVFS for NoCs in CMPs: A Thread Voting Approach SLaC: Stage Laser Control for a Flattened Butterfly Network The Runahead Network-On-Chip |
18:15-20:00 | Business meetings |
Tuesday
8:30-9:30 | Keynote II: Keshav Pingali, U. Texas, 50 Years of Parallel programming: Ieri, Oggi, Domani |
9:30-10:00 | Break |
10:00-11:15 |
Session 5A – GPGPUs
Session chair: Jangwoo Kim, POSTECH Towards High Performance Paged Memory for GPUs Simultaneous Multikernel GPU: Multi-tasking Throughput Processors via Fine-Grained Sharing iPAWS : Instruction-Issue Pattern-based Adaptive Warp Scheduling for GPGPUs Session 5B – Security
Session chair: Drew Hilton, Duke Lattice Priority Scheduling: Low-Overhead Timing Channel Protection for a Shared Memory Controller A Complete Key Recovery Timing Attack on a GPU CATalyst: Defeating Last Level Cache Side Channel Attacks in Cloud Computing |
11:15-11:35 | Break |
11:35-12:50 |
Session 6A – Large-Scale Systems
Session chair: Jason Mars, U. Michigan Predicting the Memory Bandwidth and Optimal Core Allocations for Multi-threaded Applications on Large-scale NUMA Machines A Market Approach for Handling Power Emergencies in Multi-Tenant Data Center SizeCap: Efficiently Handling Power Surges in Fuel Cell Powered Data Centers Session 6B – Potpourri
Session chair: Trevor Mudge, U. Michigan MaPU: A Novel Mathematical Computing Architecture Best-Offset Hardware Prefetching DUANG: Fast and Lightweight Page Migration in Asymmetric Memory Systems |
12:50-14:20 | Lunch |
14:20-16:00 |
Session 7A – Industry Session
Session chair: Jian Li, Huawei Selective GPU Caches to Eliminate CPU–GPU HW Cache Coherence Venice: Exploring Server Architectures for Effective Resource Sharing A Large-Scale Study of Soft-Errors on GPUs in the Field Design and Implementation of A Mobile Storage Leveraging the DRAM Interface Session 7B – Memory Technology
Session chair: Engin Ipek, U. Rochester Restore Truncation for Performance Improvement in Future DRAM Systems Parity Helix: Efficient Protection for Single-Dimensional Faults in Multi-dimensional Memory Systems Low-Cost Inter-Linked Subarrays (LISA): Enabling Fast Inter-Subarray Data Movement in DRAM ChargeCache: Reducing DRAM Latency by Exploiting Row Access Locality |
16:15-22:00 | Excursion followed by banquet dinner |
Wednesday
8:30-9:30 | Keynote III: Avinash Sodani, Intel, Knights Landing Intel Xeon Phi CPU: Path to Parallelism with General Purpose Programming |
9:30-10:00 | Break |
10:00-11:15 |
Session 8A – Best of IEEE Computer Architecture Letters
Session chair: José Martínez, Cornell Resistive Associative Processor Comparing Stochastic and Deterministic Computing A Graph-Based Program Representation for Analyzing Hardware Specialization Approaches Leveraging Heterogeneous Power for Improving Datacenter Efficiency and Resiliency Session 8B – Modeling and Testing
Session chair: Brad Beckmann, AMD Amdahl’s Law for Lifetime Reliability Scaling in Heterogeneous Multicore Processors LiveSim: Going Live with Microarchitecture Simulation McVerSi: A Test Generation Framework for Fast Memory Consistency Verification in Simulation |
11:15-11:35 | Break |
11:35-12:50 |
Session 9A – Caches and TLB
Session chair: Joe Devietti, UPenn Energy-Efficient Address Translation RADAR: Runtime-Assisted Dead Region Management for Last-Level Caches Cache QoS: From Concept to Reality in the Intel Xeon E5-2600 v3 Server Processor Family Session 9B – Microarchitecture
Session chair: Jose Renau, UC Santa Cruz Symbiotic Job Scheduling on the IBM POWER8 ScalCore: Designing a Core for Voltage Scalability Cost Effective Physical Register Sharing |
12:50-13:00 | Closing |