Marc Casas Guix

Marc Casas Guix
Education
Ph.D. in Computer Sciences, March 2010
Department of Computer Architecture, Technical University of Catalonia (UPC), Barcelona,
Spain.
5-years degree in Applied Mathematics (equivalent to M.S.) June 2004.
Faculty of Mathematics and Statistics (FME), UPC, Barcelona, Spain.
Honors and
Awards
• Best Paper Finalist, International Conference for High Performance Computing, Networking,
Storage and Analysis (SC’15).
• Marie Curie Fellow (Beatriu de Pinós - Marie Curie COFUND 7FP Award) September 2014 Now.
• Scholarship for Research Staff Training (Beca de Formación de Personal Investigador, FPI) from
the Spanish Government. September 2005 - August 2009.
• Best Paper Award at the International Euro-Par Conference on Parallel and Distributed Computing, (Euro-Par) August 2007.
• Scholarship from Computer Architecture Department to develop analytical models of parallel
programs. September 2004 - August 2005.
• Award from Bank of Manresa (Caixa de Manresa) for placement among the top 300 undergraduates in Catalonia during the academic year 2000 - 2001.
Employment
Journals
Barcelona Supercomputing Center, Spain.
Senior Researcher
Lawrence Livermore National Laboratory, USA.
Postdoctoral Researcher
Barcelona Supercomputing Center, Spain.
PhD Student
June 2013 - Now
May 2010 - May 2013
July 2006 - April 2010
1. M. Casas, G. Bronevetsky, Evaluation of HPC applications Memory Resource Consumption
via Active Measurement, accepted in IEEE Transactions on Parallel and Distributed
Systems (TPDS), 2015.
2. D. Chasapis, M. Casas, M. Moreto, R, Vidal, E. Ayguade, J. Labarta and M. Valero, PARSECSs: Evaluating the Impact of Task Parallelism in the PARSEC Benchmark Suite, accepted
in Transactions on Architecture and Code Optimization (TACO), 2015.
3. S. Chen, G. Bronevetsky, B. Li, M. Casas, L. Peng, A framework for evaluating comprehensive fault resilience mechanisms in numerical programs, The Journal of Supercomputing,
Volume 71, Issue 8, pages 2963-2984, 2015.
4. J. González, J. Giménez, M. Casas, M. Moretó, A. Ramı́rez, J. Labarta, M. Valero, Simulating
Whole Supercomputer Applications, IEEE Micro, Volume 31, Number 31, pages 32 - 45, 2011.
5. M. Casas, H. Servat, R. M. Badia, J. Labarta, Extracting the Optimal Sampling Frequency
of Applications Using Spectral Analysis, Concurrency and Computation: Practice and
Experience, Volume 23, Number 3, pages 237-259, 2011.
6. M. Casas, R. M. Badia, J. Labarta, Automatic Phase Detection and Structure Extraction of
MPI Applications, International Journal of High Performance Computing Applications (IJHPCA), Volume 24, Number 3, pages 335 - 360, 2010.
International
Conferences
1. E. Castillo, M. Moreto, M. Casas, L. Alvarez, E. Vallejo, K. Chronaki, R. M. Badia, J. L.
Bosque, R. Beivide, E. Ayguade, J. Labarta, M. Valero, CATA: Criticality Aware Task Acceleration for Multicore Processors, accepted in the 30th IEEE International Parallel &
Distributed Processing Symposium (IPDPS), 2016.
2. L. Alvarez, M. Moreto, M. Casas, E. Castillo, X. Martorell, J. Labarta, E. Ayguade, M. Valero,
Runtime-Guided Management of Scratchpad Memories in Multicore Architectures, accepted
in the 24th International Conference on Parallel Architectures and Compilation
Techniques (PACT), 2015.
3. L. Jaulmes, M. Casas, M. Moreto, E. Ayguade, J. Labarta, M. Valero, Exploiting Asynchrony
from Exact Forward Recovery for DUE in Iterative Solvers, Proceedings of the International
Conference for High Performance Computing, Networking, Storage and Analysis
(SC’15), pages 53:1-53:12, 2015.
4. L. Alvarez, L. Vilanova, M. Moreto, M. Casas, M. Gonzalez, X. Martorell, N, Navarro, E.
Ayguade, M. Valero, Coherence protocol for transparent management of scratchpad memories
in shared memory manycore architectures, proceedings of the International Symposium in
Computer Architecture (ISCA), pages 720-732, 2015.
5. M. Casas, M. Moreto, L. Alvarez, E. Castillo, D. Chasapis, T. Hayes, L. Jaulmes, O. Palomar, O. Unsal, A. Cristal, E. Ayguad, J. Labarta, M. Valero, Runtime-Aware Architectures.
proceedings of the International Euro-Par Conference on Parallel and Distributed
Computing, (Euro-Par), pages 16-27, 2015
6. M. Casas, G. Bronevetsky, Active Measurement of Memory Resource Consumption, proceedings of the 28th IEEE International Parallel & Distributed Processing Symposium
(IPDPS), pages 995-1004, 2014.
7. M. Casas, G. Bronevetsky, Active Measurement of the Impact of Network Switch Utilization
on Application Performance, proceedings of the 28th IEEE International Parallel & Distributed Processing Symposium (IPDPS), pages 165-174, 2014.
8. M. Schulz, J. Belak, A. Bhatele, P.-T. Bremer, G. Bronevetsky, M. Casas, T. Gamblin, K.
Isaacs, I. Laguna, J. Levine, V. Pascucci, D. Richards, B. Rountree, Performance Analysis
Techniques for the Exascale Co-Design Process, proceedings of PARCO 2013, Munich, Germany, September 2013.
9. M. Casas, B. R. de Supinski, G. Bronevetsky, M. Schulz, Fault Resilience of the Algebraic
Multi-Grid Solver, 26nd International Conference on Supercomputing (ICS), pages
91-100, 2012.
10. G. Llort, M. Casas, H. Servat, K. Huck, J. Giménez, J. Labarta, Trace Spectral Analysis
Toward Dynamic Levels of Detail, Proceedings of the IEEE International Conference
on Parallel and Distributed Systems (ICPADS), pages 332 - 339, 2011.
11. M. Casas, R. M. Badia, J. Labarta, Prediction of Behavior of MPI Applications, 2008 IEEE
International Conference on Cluster Computing (Cluster), pages 242-251, 2008.
12. M. Casas, R. M. Badia, J. Labarta, Automatic analysis of speedup of MPI applications, 22nd
ACM International Conference on Supercomputing (ICS), pages 349-358, 2008.
13. M. Casas, R. M. Badia, J. Labarta, Automatic Phase Detection of MPI Applications,Parallel
Computing: Architectures, Algorithms and Applications (ParCo), Volume 15, pages
129-136, 2007.
14. M. Casas, R. M. Badia, J. Labarta, Automatic Extraction of Structure of MPI Applications
Tracefiles, International Euro-Par Conference on Parallel and Distributed Computing (Euro-Par), pages 3-12, 2007.
Workshops and
Tech Reports
1. R. Vidal, M. Casas, M. Moreto, D. Chasapis, R. Ferrer, X. Martorell, E. Ayguade, J. Labarta,
M. Valero, Evaluating the Impact of OpenMP 4.0 Extensions on Relevant Parallel Workloads,
International Workshop in OpenMP (IWOMP), Volume 9342 of the series Lecture
Notes in Computer Science, pages 60-72, 2015.
2. D. Prat, C. Ortega, M. Casas, M. Moretó and M. Valero Adaptive and application dependent
runtime guided hardware prefetcher reconfiguration on the IBM POWER7 in the 5th International Workshop on Adaptive Self-tuning Computing Systems (co-located with
HiPEAC 2015) 21 January 2015, Amsterdam, The Netherlands.
3. M. Valero, M. Moreto, M. Casas, E. Ayguade, J. Labarta, Runtime-Aware Architectures: A
First Approach, Supercomputing Frontiers and Innovations, Volume 1, Number 1, 2014.
4. T. Grass, A. Rico, M. Casas, M. Moretó, A. Ramı́rez, Evaluating Execution Time Predictability
of Task-Based Programs on Multi-Core Processors, in the 7th International Workshop
on Multi-/Many-Core Computing Systems (MuCoCoS-2014). August, 2014, Porto,
Portugal.
5. Adolfy Hoisie, Kevin J Barker, Greg Bronevetsky, Laura Carrington, Marc Casas, Daniel
Chavarria, Roberto Gioiosa, Darren J Kerbyson, Gokcen Kestor, Nathan R Tallent, Ananta
Tiwari, Progress report for the X-Stack Meeting.
Invited Talks
4th Workshop of the Joint Laboratory for Exascale Computing (JLESC), December 2015, Bonn,
Germany. Invitational talk on “The Hybrid OmpSs + Charm++ Programming Model”.
3rd Workshop of the Joint Laboratory for Exascale Computing (JLESC), June 2015, Barcelona,
Spain. Invitational talk on “Asynchronous algorithms to mitigate faults recoveries and enable approximate computing”.
2015 SIAM Conference on Computational Science and Engineering, Mini-Symposia on Resilience
in Numerical Simulations and Algorithms at Extreme Scale, March 2015, Salt Lake City, talk on
“Runtime Systems for Fault Tolerant Computing”.
2nd Workshop of the Joint Laboratory for Exascale Computing (JLESC), November 2014, Chicago,
Illinois. Invitational talk on “Exploiting Asynchronous Programming Models to Reduce Faults
Impact in Iterative Solvers”.
HiPEAC Computing Systems Week (CSW), October 8 - October 10 2014, Athens, Greece. Invited
to be a member of a panel on “Handling Errors at Multiple Levels: Opportunities and Challenges”.
8th International Workshops on Parallel Matrix Algorithms and Applications (PMAA14), June 2 June 4 2014, Lugano, Switzerland. Invitational talk on “Dealing with Faults in HPC systems”.
International Conference on Parallel and Distributed Processing (IPDPS), May 2014, Phoenix, Arizona, Research Presentation.
26th ACM International Conference on Supercomputing (ICS), June 25-29, San Servolo Island,
Venice, Italy, 2012. Research presentation.
Technische Universitat Dresden - ZIH Colloquium, May 24, Dresden, Germany, 2012. Invitational
talk on “Automatic Phase Detection and Structure Extraction of Parallel Applications”.
Schloss Dagstuhl Seminars, “Program Developing for Extreme Scale Computing”, May 2 - May 6,
Dagstuhl, Germany, 2010. Invited to give a talk on the Applications of Spectral Analysis in Data
Acquisition, Multiplexing Hardware Counters and Architecture Simulation.
IEEE International Conference on Cluster Computing (Cluster), September 29 - October 1, Tsukuba,
Japan, 2008. Research presentation.
22nd ACM International Conference on Supercomputing (ICS), June 7-12, Island of Kos, Aegean
Sea, Greece, 2008. Research presentation.
International Conference on Parallel Computing (ParCo), September 3-7, Jülich, Germany, 2007.
Research presentation.
International Euro-Par Conference on Parallel and Distributed Computing (Euro-Par). August 2831, Rennes, France, 2007. Research presentation.
Research
Projects
MontBlanc 3 The main target of the Mont-Blanc 3 project is the creation of a new high-end
HPC platform (SoC and node) that is able to deliver a new level of performance / energy ratio
whilst executing real applications. The technical objectives are: 1. To design a well-balanced
architecture and to deliver the design for an ARM based SoC or SoP (System on Package) capable
of providing pre-exascale performance when implemented in the time frame of 2019-2020. The
predicted performance target must be measured using real HPC applications. 2. To maximise the
benefit for HPC applications with new high-performance ARM processors and throughput-oriented
compute accelerators designed to work together within the well-balanced architecture . 3. To develop
the necessary software ecosystem for the future SoC.
Work Package 4 (Compute Efficiency) Leader: Marc Casas
IBM-BSC Joint Study Agreement (JSA) on OmpSs for Asynchronous Algorithms This
JSA focuses on applications that are likely to benefit from the locality awareness of OmpSs, and the
irregular or asynchronous forms of parallelism it supports. These characteristics allow for additional
asynchronicity in the execution of parallel tasks (compared to OpenMP) and lower bandwidth requirements. As a result, an application’s tolerance for network or memory latency increases, which
is an interesting property for the target platform. PI: Marc Casas, Costas Bekas
IBM-BSC Joint Study Agreement (JSA) on Adaptive resource management for Power
architectures This research focuses on adaptive resource management for improvement of powerperformance metrics associated with current and future POWER-series microprocessors. Both
hardware-only and runtime-aided adaptive control systems will be pursued. The collaboration
will pursue the development of new adaptive algorithms to exploit prefetching enhancements in
current and future POWER architectures and generalized concepts in cross-layer co-optimization
for improving power-performance metrics in future POWER systems. Proposals on new harware
requirements to support the development of a new generation of hardware-software co-managed
adaptative systems will be pursued in this collaboration. PI: Miquel Moreto, Marc Casas,
Alper Buyuktosunoglu, Pradip Bose
Riding on Moore’s Law (RoMoL), 2013-2018 RoMoL is a 5-year project funded by an ERC
Advanced Grant awarded to Prof. Mateo Valero (GA 321253). RoMoL involves research in microarchitecture, runtime systems, compilers and programming languages, and has the objective to
maximize positive synergies between current research activities in the Computer Sciences Department at BSC. PI: Mateo Valero
Institute for Sustained Performance, Energy and Resilience (SUPER), 2010-2016 The
SUPER project is a broadly-based SciDAC institute with expertise in compilers and other system
tools, performance engineering, energy management, and resilience. The goal of the project is
to ensure that DOE’s computational scientists can successfully exploit the emerging generation
of high performance computing (HPC) systems. This goal will be met by providing application
scientists with strategies and tools to productively maximize performance, conserve energy, and
attain resilience. PI: Bob Lucas
Exascale Computing Technologies (ExaCT) 2009 - 2012 ExaCT is a project focused on
creating essential capabilities to overcome exascale barriers to predictive simulation. It integrates
research on these barriers into 3 key LLNL applications using LLNL’s most advanced multicore
systems. Specifically, the main targets of the project are: Scalable Multicore Algorithms, Application
Level Fault Tolerance, Asynchronous Load Analyzer and Adaptor and Next Generation Debugging
Methodologies. PI: Bronis R. de Supinski
Reliable High Performance Peta- and Exa-Scale Computing 2010-2015: This project
will develop a detailed understanding of the effects of faults on real HPC systems, producing fault
models that will enable new tools to identify the root causes of failures and help developers to
create more reliable applications and systems by characterizing their vulnerabilities to errors. The
ultimate goal of this project is to help create a new generation of highly -productive and cost-efficient
supercomputers to enable novel DOE science applications. PI: Greg Bronevetsky
IBM/BSC MareIncognito Project (MI) 2006-2011: MI is a bilateral project between BSC and
IBM. The project considers several fields that define the technical characteristics and the components
design for a new generation of Petascale supercomputers for the year 2011, involving all aspects
related to that machine: applications, programming models, performance tools, interconnection and
processor architecture, etc. PI: Jesús Labarta
High Performance Computing V: Architectures, compilers, operative systems, tools
and applications. (Computacion de Altas Prestaciones V TIN2007-60625) 2008-2012.
An R&D project funded by the Spanish Interministerial Commission of Science and Technology
(Comision Interministerial de Ciencia y Tecnologia CICYT), a Spanish government agency focused
on research and development. PI: Mateo Valero.
High Performance Computing IV: Architectures, compilers, operative systems, tools
and applications. (Computacion de Altas Prestaciones IV TIN2004-07739C0201) 20042008 An R&D project funded by the Spanish Interministerial Commission of Science and Technology
(Comision Interministerial de Ciencia y Tecnologia CICYT), a Spanish government agency focused
on research and development. PI: Mateo Valero.
Directed
Bachelor Thesis
• “Analysis of Adaptive Prefetcher Configuration in Advanced Server-Class Processors”, Calvin
Bulla, June 2015
• “Parallelization of the Facesim simulator”, Raúl Vidal, January 2015
• “Exploring Scalability Techniques of OmpSs”, Iulian Brumar, June 2014
• “Parallelization techniques of the x264 video encoder”, Daniel Ruiz, June 2014
Directed Master • “Adaptive and Application Dependant Runtime Guided Hardware Reconfiguration for the IBM
Thesis
POWER7”, David Prat, September 2014
• “Runtime Assisted Cache Memory Optimizations”, Vladimir Dimic, July 2015
Directed PhD
Thesis
• “Transparent Management of Scratchpad Memories in Shared Memory Programming Models”,
Lluc Àlvarez, December 2015
Program
Committees
• Program Committee (PC) member of the ACM International Conference on Supercomputing
(ICS), 2016.
• Program Committee (PC) member of the 16th IEEE/ACM International Symposium on Cluster,
Cloud and Grid Computing (CCGRID) 2016.
• Program Committee (PC) member of the 1st IEEE Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM2016)
• External Review Committee (ERC) member of the ACM International Conference on Supercomputing (ICS) 2014.
Service
•
•
•
•
•
•
•
•
•
•
PhD Committees
Reviewer of the Concurrency and Computation: Practice and Experience Journal, 2015.
Reviewer for the Euro-Par conference 2015.
Reviewer for the CCGRID conference 2015.
Organizer of the “European Initiative on Runtime Systems and Architecture Co-Design” thematic
session in the HiPEAC Computing Systems Week (CSW). Oslo, May 2015.
Reviewer for the Parallel Computing Journal 2014.
Reviewer for the ACM Transactions on Architecture and Code Optimization Journal (TACO)
2014.
Reviewer for the International Conference on Parallel Architectures and Compilation Techniques
(PACT) 2012.
Reviewer for IEEE Transactions on Parallel and Distributed Systems Journal 2012.
Reviewer for PARA 2010: State of the Art in Scientific and Parallel Computing Conference.
Organizer of the Managing Large-Scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques Workshop (SLAML), held in conjuntion with the 23rd
ACM Symposium on Operating Systems Principles (SOSP), 2011.
• Vladimir Marjanovic, Technical University of Catalonia (UPC), January 2016.
• Stojce Nakov, University of Bordeaux, December 2015.
• Judit Planas, Technical University of Catalonia (UPC), November 2015.