BarraCUDA: Open-Source Compiler That Runs CUDA Code on AMD GPUs

BarraCUDA: Open-Source Compiler That Runs CUDA Code on AMD GPUs

Explore BarraCUDA, an open-source compiler that compiles CUDA .cu files directly to AMD GPU machine code without LLVM, challenging NVIDIA's GPU computing monopoly.

Overview

In the world of GPU computing, NVIDIA’s CUDA is the de facto standard. But this also means vendor lock-in — code written in CUDA could only run on NVIDIA GPUs.

BarraCUDA is an open-source compiler born to break down this wall. It takes .cu files and compiles them directly to AMD RDNA 3 (GFX11) machine code. Written in 15,000 lines of C99 with zero LLVM dependency, no HIP translation layer required.

BarraCUDA’s Architecture

BarraCUDA’s compilation pipeline follows traditional compiler structure while directly targeting AMD GPUs.

graph TD
    A[CUDA Source .cu] --> B[Preprocessor]
    B --> C[Lexer → Tokens]
    C --> D[Parser → AST]
    D --> E[Semantic Analysis]
    E --> F[BIR Intermediate Rep<br/>SSA Form]
    F --> G[mem2reg Optimization]
    G --> H[Instruction Selection<br/>AMDGPU Instructions]
    H --> I[Register Allocation<br/>VGPR/SGPR]
    I --> J[Binary Encoding<br/>GFX11]
    J --> K[ELF Output .hsaco]

Key characteristics include:

  • Zero LLVM dependency: ~1,700 lines of hand-written instruction selection logic
  • SSA-based IR: Uses its own intermediate representation called BIR (BarraCUDA IR)
  • Full preprocessor: Supports #include, #define, macros, conditional compilation
  • Verified encoding: All instruction encodings validated against llvm-objdump

Supported CUDA Features

BarraCUDA already supports a substantial set of CUDA features:

Core Language Features

  • __global__, __device__, __host__ function qualifiers
  • threadIdx, blockIdx, blockDim, gridDim builtins
  • Structs, enums, typedefs, namespaces
  • Pointers, arrays, pointer arithmetic
  • All C control flow: if/else, for, while, switch/case, goto
  • Basic template instantiation

CUDA-Specific Features

  • __shared__ memory: Allocated from LDS, properly tracked
  • __syncthreads(): Translates to s_barrier
  • Atomic operations: atomicAdd, atomicSub, atomicMin, atomicMax, etc.
  • Warp intrinsics: __shfl_sync, __shfl_up_sync, __shfl_down_sync
  • Vector types: float2, float3, float4 with .x/.y/.z/.w access
  • Half precision: __half, __float2half(), __half2float()
  • Cooperative Groups: this_thread_block() with .sync(), .thread_rank()

Usage

Building is remarkably simple:

# Build — just need a C99 compiler
make

# Compile to AMD GPU binary
./barracuda --amdgpu-bin kernel.cu -o kernel.hsaco

# Dump IR (for debugging)
./barracuda --ir kernel.cu

# Output AST
./barracuda --ast kernel.cu

What GPU Democratization Means

BarraCUDA’s emergence goes beyond a technical achievement — it signals structural change in the GPU ecosystem.

Challenging NVIDIA’s Monopoly

CUDA’s position in the current GPU computing market is absolute. The vast majority of AI/ML workloads are CUDA-based, making them impossible to run without NVIDIA GPUs. BarraCUDA cracks this structure.

Expanding AMD GPU Potential

AMD’s ROCm/HIP ecosystem is growing, but friction still exists when converting existing CUDA code. BarraCUDA minimizes this friction by compiling directly without code conversion.

The Power of Open Source

The fact that a CUDA compiler was implemented in 15,000 lines of C99 demonstrates the capability of the open-source community. It garnered 66 points on Hacker News, drawing attention from the developer community.

Current Limitations and Outlook

Being in its early stages, there are naturally some limitations:

  • GFX11 (RDNA 3) only: Currently supports only AMD’s latest architecture
  • No runtime included: Host APIs like cudaMalloc, cudaMemcpy need separate implementation
  • Limited optimization: Not yet at nvcc-level optimization
  • Tenstorrent support in progress: Plans to expand beyond AMD architectures

However, the project’s direction is clear: increase CUDA code portability and give developers freedom to choose their GPUs.

Conclusion

BarraCUDA is a project that offers a glimpse into the future of GPU computing. As an open-source alternative to NVIDIA’s CUDA monopoly, it demonstrates the possibility of running CUDA code on AMD GPUs without any code changes.

With AI/ML workloads growing explosively, GPU choice diversity is crucial for cost reduction and supply chain stability. If projects like BarraCUDA mature, the competitive landscape of the GPU ecosystem will become much healthier.

References

Read in Other Languages

Was this helpful?

Your support helps me create better content. Buy me a coffee! ☕

About the Author

JK

Kim Jangwook

Full-Stack Developer specializing in AI/LLM

Building AI agent systems, LLM applications, and automation solutions with 10+ years of web development experience. Sharing practical insights on Claude Code, MCP, and RAG systems.