How many GPUs are in your cluster?
How long did you spend on building your GPU cluster?
Was it on schedule or delayed?
Survey
Comments
2 responses to “Survey”
-
I have 128 H100 in my cluster.
It took me 6 months to get it up and running.
It was delayed for 3 months. -
We own more than 500 H100 nodes. It took us more than half a year to get them online. The InfiniBand caused a lot of trouble and our training stopped every a few hours.
Leave a Reply