AllReduce - Search News

Enabling Cost-Efficient LLM Inference on Mid-Tier GPUs With NMP DIMMs

Abstract: Large Language Models (LLMs) require substantial computational resources, making cost-efficient inference challenging. Scaling out with mid-tier GPUs (e.g., NVIDIA A10) appears attractive ...

GitHub

Prime Collective Communications Library (PCCL)

The Prime Collective Communications Library (PCCL) implements efficient and fault-tolerant collective communications operations such as reductions over IP and provides shared state synchronization ...

GitHub

PDAF (Parallel Data Assimilation Framework)

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Enabling Cost-Efficient LLM Inference on Mid-Tier GPUs With NMP DIMMs

Prime Collective Communications Library (PCCL)

PDAF (Parallel Data Assimilation Framework)

Trending now