BarraCUDA: Open-Source Compiler That Runs CUDA Code on AMD GPUs
Explore BarraCUDA, an open-source compiler that compiles CUDA .cu files directly to AMD GPU machine code without LLVM, challenging NVIDIA's GPU computing monopoly.
Overview
In the world of GPU computing, NVIDIA’s CUDA is the de facto standard. But this also means vendor lock-in — code written in CUDA could only run on NVIDIA GPUs.
BarraCUDA is an open-source compiler born to break down this wall. It takes .cu files and compiles them directly to AMD RDNA 3 (GFX11) machine code. Written in 15,000 lines of C99 with zero LLVM dependency, no HIP translation layer required.
BarraCUDA’s Architecture
BarraCUDA’s compilation pipeline follows traditional compiler structure while directly targeting AMD GPUs.
graph TD
A[CUDA Source .cu] --> B[Preprocessor]
B --> C[Lexer → Tokens]
C --> D[Parser → AST]
D --> E[Semantic Analysis]
E --> F[BIR Intermediate Rep<br/>SSA Form]
F --> G[mem2reg Optimization]
G --> H[Instruction Selection<br/>AMDGPU Instructions]
H --> I[Register Allocation<br/>VGPR/SGPR]
I --> J[Binary Encoding<br/>GFX11]
J --> K[ELF Output .hsaco]
Key characteristics include:
- Zero LLVM dependency: ~1,700 lines of hand-written instruction selection logic
- SSA-based IR: Uses its own intermediate representation called BIR (BarraCUDA IR)
- Full preprocessor: Supports
#include,#define, macros, conditional compilation - Verified encoding: All instruction encodings validated against
llvm-objdump
Supported CUDA Features
BarraCUDA already supports a substantial set of CUDA features:
Core Language Features
__global__,__device__,__host__function qualifiersthreadIdx,blockIdx,blockDim,gridDimbuiltins- Structs, enums, typedefs, namespaces
- Pointers, arrays, pointer arithmetic
- All C control flow: if/else, for, while, switch/case, goto
- Basic template instantiation
CUDA-Specific Features
__shared__memory: Allocated from LDS, properly tracked__syncthreads(): Translates tos_barrier- Atomic operations:
atomicAdd,atomicSub,atomicMin,atomicMax, etc. - Warp intrinsics:
__shfl_sync,__shfl_up_sync,__shfl_down_sync - Vector types:
float2,float3,float4with.x/.y/.z/.waccess - Half precision:
__half,__float2half(),__half2float() - Cooperative Groups:
this_thread_block()with.sync(),.thread_rank()
Usage
Building is remarkably simple:
# Build — just need a C99 compiler
make
# Compile to AMD GPU binary
./barracuda --amdgpu-bin kernel.cu -o kernel.hsaco
# Dump IR (for debugging)
./barracuda --ir kernel.cu
# Output AST
./barracuda --ast kernel.cu
What GPU Democratization Means
BarraCUDA’s emergence goes beyond a technical achievement — it signals structural change in the GPU ecosystem.
Challenging NVIDIA’s Monopoly
CUDA’s position in the current GPU computing market is absolute. The vast majority of AI/ML workloads are CUDA-based, making them impossible to run without NVIDIA GPUs. BarraCUDA cracks this structure.
Expanding AMD GPU Potential
AMD’s ROCm/HIP ecosystem is growing, but friction still exists when converting existing CUDA code. BarraCUDA minimizes this friction by compiling directly without code conversion.
The Power of Open Source
The fact that a CUDA compiler was implemented in 15,000 lines of C99 demonstrates the capability of the open-source community. It garnered 66 points on Hacker News, drawing attention from the developer community.
Current Limitations and Outlook
Being in its early stages, there are naturally some limitations:
- GFX11 (RDNA 3) only: Currently supports only AMD’s latest architecture
- No runtime included: Host APIs like
cudaMalloc,cudaMemcpyneed separate implementation - Limited optimization: Not yet at nvcc-level optimization
- Tenstorrent support in progress: Plans to expand beyond AMD architectures
However, the project’s direction is clear: increase CUDA code portability and give developers freedom to choose their GPUs.
Conclusion
BarraCUDA is a project that offers a glimpse into the future of GPU computing. As an open-source alternative to NVIDIA’s CUDA monopoly, it demonstrates the possibility of running CUDA code on AMD GPUs without any code changes.
With AI/ML workloads growing explosively, GPU choice diversity is crucial for cost reduction and supply chain stability. If projects like BarraCUDA mature, the competitive landscape of the GPU ecosystem will become much healthier.
References
Was this helpful?
Your support helps me create better content. Buy me a coffee! ☕