2024 OpenFabrics Alliance网络会议资料分享

image.png

OFA的会议资料,我曾经在公众号上发过2次:

RDMA、HPC资料分享:2019 OpenFabrics Alliance会议

OpenFabrics Alliance Workshop会议资料分享(2021、2020)

但具体到OFA联盟的历史,还是在之前一篇唐杰总的文章《SoC之三:AWS Elastic Fabric Adapter》里介绍得更好:

“RDMA的技术是在一个有Mellanox主导的行业组织OFA[7]主导的... OFA是2004年成立的工业组织,在整个HPC行业从Myrinet[8]转换到IB的时候成立的。在2005年, Myrinet在TOP500的市场份额占到了28%,之后就一路下降,被IB替换掉了。对于诞生于HPC专业的领域,可用性一直是个大问题,HPC一切为了性能,不要虚拟化,不要通用操作系统和架构,每台超算恨不得自成一台体系。大家看看Mellanox的Linux 驱动的家族就知道这个有多复杂了。”

BTW. 近几年在OFA里比较积极的是Intel,比如CXL 3.1也在这次会议内容里。

2024 OFA Virtual Workshop资料网盘下载

链接:https://pan.baidu.com/s/1WPIT1LqEegAlAEjcoNZTuQ?pwd=4dfy
提取码:4dfy

官网链接 https://www.openfabrics.org/2024-ofa-virtual-workshop-agenda/(里面还有视频,qiang外面的)

演讲主题

Session 1

“OFI 2.0 Update” 
Jianxin Xiong, Intel

Session 2
“Status of OpenFabrics Interfaces (OFI) Support in MPICH”
Yanfei Guo, Argonne National Laboratory

Session 3
"Accelerating MPI AllReduce Communication with Efficient GPU-Based Compression Schemes on Modern GPU Clusters"
Hari Subramoni and Qinghua Zhou, The Ohio State University

Session 4
"High Performance & Scalable MPI library over Broadcom RoCE"
Mustafa Abduljabbar, The Ohio State University; Hemal Shah, Broadcom Inc; and Shulei Xu, The Ohio State University

Session 5
"Scaling Large Language Model Training using Hybrid GPU-based Compression in MVAPICH"
Speakers: Aamir Shafi and Lang Xu, The Ohio State University

Session 6
"OFI Integrated Shared Memory Offload"
Speakers: Alexia Ingerson, Intel; Shi Jin, Amazon; and Amir Shehata, Oak Ridge National Laboratories

Session 7
"Managing Composable Disaggregated Infrastructure With OFA Sunfish"
Christian Pinto, IBM Research Europe; Michael Aguilar, Sandia National Laboratories; Phil
Cayton, Intel; Russ Herrell, Hewlett Packard Enterprise; and Brian Pan, H3 Platform

Session 8
"An Integrated Deep Reinforcement Learning Agent for Sunfish and HPC Workload Manager
Composable Disaggregated Resource Scheduling"
Speakers: Catherine Appleby and Michael Aguilar, Sandia National Laboratories

Session 9
"Cornelis Networks CN5000 Adapter and Software Update"
Dennis Dalessandro, Cornelis Networks

Session 10
"System Composability Using CXL"
Kurtis Bowman, CXL Consortium MWG Co-Chair

Session 11
"Optimized All-to-all Connection Establishment for High-Performance MPI Libraries over
InfiniBand"
Mustafa Abduljabbar and Dhabaleswar Panda, The Ohio State University

Session 12
"RecoNIC: RDMA-enabled Compute Offloading on FPGA-based SmartNIC"
Speaker: Guanwen Zhong, AMD

Session 13
"Designing In-Network Computing Aware Reduction Collectives in MPI"
Speakers: Dhabaleswar Panda and Bharath Ramesh, The Ohio State University

Tutorial
"How to setup RDMA CI using the FSDP cluster" and "How to do manual RDMA testing using the FSDP cluster"
Doug Ledford, Redhat and Jeremy Spewock, UNH InterOperability Lab (IOL)

目前暂缺以下一份演讲资料,等后续我下载到了争取再补充到网盘目录里。

KEYNOTE 
Pavan Balaji, Meta

希望对大家有帮助:)

原文:企业存储技术

推荐阅读

欢迎关注企业存储技术极术专栏,欢迎添加极术小姐姐微信(id:aijishu20)加入技术交流群,请备注研究方向。
文件名 大小 下载次数 操作
session-1_OFI 2.0 Update.pdf 1.31MB 0 下载
session-2_Status of OpenFabrics Interfaces (OFI) Support in MPICH.pdf 1.79MB 0 下载
session-3_Accelerating MPI AllReduce Communication with Efficient GPU-Based Compression Schemes on Modern GPU Clusters.pdf 2.6MB 0 下载
session-4_High Performance & Scalable MPI library over Broadcom RoCE.pdf 2.24MB 3 下载
session-5_Scaling Large Language Model Training using Hybrid GPU-based Compression in MVAPICH.pdf 3.15MB 0 下载
session-6_OFI Integrated Shared Memory Offload.pdf 3.86MB 1 下载
session-7_Managing Composable Disaggregated Infrastructure With OFA Sunfish.pdf 4.14MB 1 下载
session-8_An Integrated Deep Reinforcement Learning Agent for Sunfish and HPC Workload Manager Composable Disaggregated Resource Scheduling.pdf 1.01MB 1 下载
session-9_Cornelis Networks CN5000 Adapter and Software Update.pdf 580.96KB 1 下载
session-10_System Composability Using CXL.pdf 767.38KB 1 下载
推荐阅读
关注数
5559
内容数
240
关注存储、服务器、图形工作站、AI硬件等方面技术。WeChat:490834312
目录
极术微信服务号
关注极术微信号
实时接收点赞提醒和评论通知
安谋科技学堂公众号
关注安谋科技学堂
实时获取安谋科技及 Arm 教学资源
安谋科技招聘公众号
关注安谋科技招聘
实时获取安谋科技中国职位信息