OFA的会议资料,我曾经在公众号上发过2次:
《RDMA、HPC资料分享:2019 OpenFabrics Alliance会议》
《OpenFabrics Alliance Workshop会议资料分享(2021、2020)》
但具体到OFA联盟的历史,还是在之前一篇唐杰总的文章《SoC之三:AWS Elastic Fabric Adapter》里介绍得更好:
“RDMA的技术是在一个有Mellanox主导的行业组织OFA[7]主导的... OFA是2004年成立的工业组织,在整个HPC行业从Myrinet[8]转换到IB的时候成立的。在2005年, Myrinet在TOP500的市场份额占到了28%,之后就一路下降,被IB替换掉了。对于诞生于HPC专业的领域,可用性一直是个大问题,HPC一切为了性能,不要虚拟化,不要通用操作系统和架构,每台超算恨不得自成一台体系。大家看看Mellanox的Linux 驱动的家族就知道这个有多复杂了。”
BTW. 近几年在OFA里比较积极的是Intel,比如CXL 3.1也在这次会议内容里。
2024 OFA Virtual Workshop资料网盘下载
链接:https://pan.baidu.com/s/1WPIT1LqEegAlAEjcoNZTuQ?pwd=4dfy
提取码:4dfy
官网链接 https://www.openfabrics.org/2024-ofa-virtual-workshop-agenda/(里面还有视频,qiang外面的)
演讲主题
Session 1
“OFI 2.0 Update”
Jianxin Xiong, Intel
Session 2
“Status of OpenFabrics Interfaces (OFI) Support in MPICH”
Yanfei Guo, Argonne National Laboratory
Session 3
"Accelerating MPI AllReduce Communication with Efficient GPU-Based Compression Schemes on Modern GPU Clusters"
Hari Subramoni and Qinghua Zhou, The Ohio State University
Session 4
"High Performance & Scalable MPI library over Broadcom RoCE"
Mustafa Abduljabbar, The Ohio State University; Hemal Shah, Broadcom Inc; and Shulei Xu, The Ohio State University
Session 5
"Scaling Large Language Model Training using Hybrid GPU-based Compression in MVAPICH"
Speakers: Aamir Shafi and Lang Xu, The Ohio State University
Session 6
"OFI Integrated Shared Memory Offload"
Speakers: Alexia Ingerson, Intel; Shi Jin, Amazon; and Amir Shehata, Oak Ridge National Laboratories
Session 7
"Managing Composable Disaggregated Infrastructure With OFA Sunfish"
Christian Pinto, IBM Research Europe; Michael Aguilar, Sandia National Laboratories; Phil
Cayton, Intel; Russ Herrell, Hewlett Packard Enterprise; and Brian Pan, H3 Platform
Session 8
"An Integrated Deep Reinforcement Learning Agent for Sunfish and HPC Workload Manager
Composable Disaggregated Resource Scheduling"
Speakers: Catherine Appleby and Michael Aguilar, Sandia National Laboratories
Session 9
"Cornelis Networks CN5000 Adapter and Software Update"
Dennis Dalessandro, Cornelis Networks
Session 10
"System Composability Using CXL"
Kurtis Bowman, CXL Consortium MWG Co-Chair
Session 11
"Optimized All-to-all Connection Establishment for High-Performance MPI Libraries over
InfiniBand"
Mustafa Abduljabbar and Dhabaleswar Panda, The Ohio State University
Session 12
"RecoNIC: RDMA-enabled Compute Offloading on FPGA-based SmartNIC"
Speaker: Guanwen Zhong, AMD
Session 13
"Designing In-Network Computing Aware Reduction Collectives in MPI"
Speakers: Dhabaleswar Panda and Bharath Ramesh, The Ohio State University
Tutorial
"How to setup RDMA CI using the FSDP cluster" and "How to do manual RDMA testing using the FSDP cluster"
Doug Ledford, Redhat and Jeremy Spewock, UNH InterOperability Lab (IOL)
目前暂缺以下一份演讲资料,等后续我下载到了争取再补充到网盘目录里。
KEYNOTE
Pavan Balaji, Meta
希望对大家有帮助:)
原文:企业存储技术
推荐阅读
欢迎关注企业存储技术极术专栏,欢迎添加极术小姐姐微信(id:aijishu20)加入技术交流群,请备注研究方向。