Tachyum calls Prodigy the world’s first “universal processor,” and says it was designed from the ground up to be a multi-purpose CPU capable of running a multitude of the world’s most intensive computing applications. Prodigy not only handles all of these different tasks on a single chip, it does so with a power budget that’s 10 times lower than that of traditional hardware — and at one-third the cost.
Tachyum boldly claims the Prodigy supercomputer chip offers four times the performance of Intel’s fastest Xeon on the market and triple the raw performance of Nvidia’s H100 in high-performance computing applications. All while being 10 times more power efficient.
To create such impressive performance within a single core architecture, Tachyum says it built Prodigy with matrix and vector processing capabilities from the ground up — rather than making them an afterthought. Prodigy supports a range of data types, including FP64, FP32, TF32, BF16, Int8, FP8, and TAI, all from the individual CPU cores themselves.
The Prodigy processors could be game-changers when they arrive in 2023. The latest server hardware from AMD, Intel, and Nvidia all rely on individual pieces of hardware — even within a single CPU or GPU — to perform these different workloads. An example of this is Nvidia’s RTX series GPUs, which require dedicated machine learning Tensor cores for AI to work and dedicated RT cores for ray tracing applications.
Prodigy, on the other hand, will be able to run ray tracing and AI applications on individual cores, and won’t need to divert data to another chip inside the microprocessor.
Running all of these different HPC workloads inside a single chip could drastically change the server landscape: Companies would be able to pack many more chips into a server farm with lower power requirements and less cooling.
The Prodigy T16128 runs on a 5nm process technology of unknown origin, and operates within a very small (for the power it provides) 64 mm x 84mm FCLGA package. Tachyum says the chip is capable of performing 12 AI PetaFLOPS and 90 TeraFLOPS when it comes to HPC workloads. The Prodigy chip can also run binaries for x86, ARM, RISC-V, and ISA. For some perspective, a single Nvidia A100 is only capable of 5 AI PetaFLOPS.
Each core is specifically capable of 2x 1024-bit vector units, 4096-bit matrix operations, and 4 out-of-order instructions per clock. Virtualization and Advance RAS are also supported. The chip also includes over 128MB of L2+L3 cache with error correction capabilities. To feed all of its cores the chip comes with 16 DDR5 memory controllers rated for up 7200MT/s with a maximum capacity of 8TB per socket.
The T16128 is the flagship model in Tachyum’s Prodigy lineup, with the 64 core T864 and the 32 Core T832 filling the mid-range and entry-level slots, respectively, in the product stack. Production starts in 2023, so we should see actual benchmarks of these chips sometime next year.