RkBlog

Hardware, programming and astronomy tutorials and reviews.

Benchmarking of astro-processing apps

Astrophotography can produce lots of data - wherever it's Solar System imaging creating multiple large AVI or SER files or lucky imaging with DS CMOS cameras on short exposures generating lots of large FITS files. When it comes to processing time can then become a big problem. Lets do some benchmarking of various processing tasks.

Benchmarking setup

I've tested 3 systems using latest Windows 10 version and latest versions of tested apps. The hardware was as follows:

  • CPU: Intel i5-9400F on Asus PRIME Z390-P (6C/6T), AMD Ryzen 5 3500X on Gigabyte B450M DS3H (6C/6T) and AMD Threadripper 1920X (12C/24T)
  • RAM: 2x8GB G.SKILL Ripjaws V 3200MHz CL15 (dual channel), 4x8 for Threadripper (quad channel)
  • Storage: Apacer AS350 SATA SSD (512GB), Kingston SA2000 1GB NVMe SSD, HGST HTS721010A SATA HDD 7200 RPM
  • GPU: GTX 1070/Vega 64 (not used in benchmarks)

Applications tested:

  • Autostakkert 3.1.4 64-bit
  • PIPP 2.5.9 64-bit
  • Nebulosity 4 4.4.3

Benchmarks were made by measuring time it takes to finish a given task (less - better):

  • Analyzing and stacking 3,1GB SER file in Autostakkert
  • Stacking 40 4644x3506 FITS dark frames in Nebulosity 4
  • Debayering those frames in PIPP and saving each as FITS in a subfolder
Apacher SSD performance
SATA HDD performance
NVMe SSD performance

Results

i5-9400F and Ryzen 3500X are both 6-core 6-thread CPUs and show very similar performance. What differs them is CPU architecture. Intel has a monolithic design that offers very low latency on RAM and during inter-core communications. Ryzen can scale to higher core counts more easily but in specific workloads limited by latency it will lose to Intel design.

1920X is the first generation Threadripper with 12-core 24-thread CPU. It has a quad-channel memory configuration (dual-channel for the other two CPUs) but still suffers on latency as it's not a monolithic design.

Image, video, data stream processing can easily be limited by storage - how quickly data can be read/written, then memory - can all required data can be loaded into memory for processing, and in the end - CPU - some processing can leverage all CPU cores. With memory also latency can come into play.

Autostakkert analysis benchmark
Autostakkert stacking benchmark

When it comes to frame analysis faster storage gives better results. cold results are when the file wasn't cached by the drive and system (what will happen when you are processing subsequent files). hot is when the file have been cached.

Autostakkert resource usage
Autostakkert resource usage

Autostakkert resource usage

Autostakkert uses RAM according to files size and shows peak high CPU utilization. High turbo clock CPUs with good amount of cores should perform best in such scenario.

Nebulosity benchmark

Nebulosity dark frame creation did not really depend on storage type.

PIPP storage benchmark
PIPP storage 1920X benchmark

PIPP processing and saving of 40 FITS files did depend on storage speed. Here we can see that NVMe SSD is faster than SATA SSD while Ramdisk (virtual filesystem created in RAM where files were copied to) is fastest.

The outlier metric for HDD is for cold run where the data has to be read (during subsequent runs there is no read activity) so that’s more of a real world performance figure. For the Apacer SATA SSD after 3-4 runs the processing time increased and was consistent at that. After leaving the SSD idle for a while the performance returned just to drop again after few runs. This is very likely related to the SSD cache and controller not managing to push all data in time. Cheaper SSDs will rely on this cache to maintain good levels of performance. If you are processing larger data sets this could become a problem on such less performing drives.

Autostakkert CPU benchmark

Here we can see that i5-9400F and 3500X perform similarly during SER file analysis. During stacking Ryzen is bit faster. Threadripper is noticeably slower. This could indicate that it's limited by latency or similar problem related to first Threadripper designs.

Nebulosity CPU benchmark

For Nebulosity 4 we see it more clearly - this task is likely RAM latency limited. If so the newer Zen 3 CPUs should be somewhat faster while latest Intel CPUs should still be best or very close.

Nebulosity resource usage
Nebulosity resource usage

Nebulosity resource usage

Nebulosity CPU utilization

Nebulosity CPU utilization on 1920X

Here we see that Nebulosity uses all of CPU cores but the load isn't high. This can imply that 1920X even while having twice as many cores is handicapped by inter-core communication. Newer Threadrippers and Zen design should be better at this, although it's hard to tell if better than monolithic Intel design.

PIPP CPU benchmark

PIPP shows quite different results here. Ryzen is fastest while i5 and Threadripper are bit slower. Hard to tell what's the limiting factor in this case.

PIPP resource usage

PIPP resource usage

PIPP show very little CPU and RAM activity. As each of 40 files is saved the write activity is high. Notice how the files are read only once even when the workload is started 4 times - first run is cold run while subsequent are hot and less indicative of real world performance.

Image capture

I've also used ASI178MM with Asi Studio as well as with latest FireCapture. I've did a 30 second recording at max USB traffic reaching ~30 FPS on full frame capture. In case of HDD few frames were dropped during capture. Both SSDs did not experienced this. Data capture uses very little of system resources.

Performance of consumer image processing software

Photoshop, Pixinsight and alike can have their own quirks and preferred hardware. Pixinsight does not use GPU acceleration while Photoshop can as you can see in pugetsystems analysis. Some apps will use Nvidia CUDA or GPU agnostic OpenCL and for example excel on Radeon VII (true for some video processing apps). If you want better performance in such apps try finding benchmarks/recommendations - as those consumer non-astro apps have a lot of them.

Hardware choices

From the results it looks like core astro-processing apps favor low latency systems as well as fast, really fast storage. Data capture can be done on a decent and cheap SATA SSD, while fastest processing does require NVMe drives or a RAID setup. If you want to buy a M.2 SSD for a laptop or older PC do check what type of M.2 drives it support - there are NVMe and SATA M.2 drives and not every slot supports both.

  • CPU: 6-8 core modern CPU. Some apps prefer low latency of Intel (and maybe Zen 3 - 5000 series). 4-core should also work without much problems.
  • RAM: 16GB base, 32GB optimal if you process large frames or large planetary clips. For Intel I would recommend a motherboard with a chipset allowing for memory OC (XMP profiles), AMD AM4 boards have that by default. Memory frequency and CL (CAS latency) do affect performance. 3200MHz at CL14-15 is a good starting point.
  • Storage: NVMe SSD or good SATA SSD. Raid 0 if you want best possible performance. In edge cases you could use a RAMdisk if you have excess RAM. Check for benchmarks/review before purchase - check if the performance is good and is not lost during larger data streams.

When I finally get it I'll also test R9 5900X.

RkBlog

Astronomy and Astrophotography, 13 November 2020, Piotr Maliński

Comment article