FR: Better SSD support (command queues) - with reasoning and explanation - Fxpansion.com

Forum

FXpansion Forum

FR: Better SSD support (command queues) - with reasoning and explanation

Product Support for BFD3

Moderator: Moderators

The Navigator
Posts: 13
Joined: Wed Dec 12, 2012 6:23 am

FR: Better SSD support (command queues) - with reasoning and explanation

Postby The Navigator » Mon Jul 28, 2014 5:14 am

I think that all sample based fxpansion products should get a (rather easy to implement!) SSD performance mode.

1. Reasoning

Here are my benchmark results for my Samsung 840 Pro, 256 GB:

Code: Select all

-----------------------------------------------------------------------
CrystalDiskMark 3.0.3 x64 (C) 2007-2013 hiyohiyo
                           Crystal Dew World : http://crystalmark.info/
-----------------------------------------------------------------------
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

           Sequential Read :   516.964 MB/s
          Sequential Write :   487.559 MB/s
         Random Read 512KB :   443.380 MB/s
        Random Write 512KB :   464.603 MB/s
    Random Read 4KB (QD=1) :    33.631 MB/s [  8210.6 IOPS]
   Random Write 4KB (QD=1) :    95.699 MB/s [ 23363.9 IOPS]
   Random Read 4KB (QD=32) :   259.869 MB/s [ 63444.6 IOPS]
  Random Write 4KB (QD=32) :   273.092 MB/s [ 66672.9 IOPS]

  Test : 1000 MB [C: 49.2% (105.2/214.0 GB)] (x5)
  Date : 2014/07/26 16:52:24
    OS : Windows 8.1 Pro [6.3 Build 9600] (x64)


So, what I see here is that, using a queue depth of 32, reading operations are almost 8 times as fast.

SSDs are not so much bottlenecked by their raw throughput (which is insane nowadays) but by their IOP/s rate at a low queue depth.

This is because of AHCI / SATA overhead (waiting for one IO to complete before issuing the next IO is a waste of time), lack of parallelization opportunities without a filled queue, etc...

I'm well aware that the actual reads are larger, but the scaling is the same, wherever one goes. But the principle is always the same - higher queue depth means increased performance.

SSDs like:

+ Sequential reads of large blocks
+ High queue depths

SSDs dislike:

- Low queue depths

2. The solution

A deeper command queue is easy to accomplish: multi threading.

Create a thread for each file you want to load, maybe with a pool of size 32 or so. This will fill the command queue of SATA (maybe add an option for 64 thread pool size for those among us who use read performance improving RAID - which can be a RAID 1 as well, btw... just saying...) and allow for insane sample loading speeds.

User avatar
SKoT_FX
Promulgator of Beats
Posts: 2375
Joined: Tue Sep 21, 2004 9:51 am
Location: FX Australia, Perth
Contact:

Postby SKoT_FX » Wed Jul 30, 2014 5:01 am

BFD3's performance is more like the 512K block random access - a typical BFD disk streaming audio block read is 16386 samples x 24 bit x 12 channels / BFDLAC compression ratio (~3) = 576Kb.

With reads this large, we should be looking at the read performance of 512KB blocks (443MB/s) which is close to that of the ideal one huge sequential read (517MB/s), so queuing up further reads at the IO sub system level (reads are already queued within the application layer to minimize IO downtime) will only produce marginal benefit, I suspect - that final 70MB/s difference.

I have downloaded CrystalDiskMarks, and CrystalDiskInfo - great utilities, thanks for heads up - and will perform further experiments.

Simply multi-threading a bunch of reads won't load up the SATA command queue efficiently, I suspect - as the operating system could decide to interleave the reads - will need to investigate what low-level SATA APIs are available. Looks like Crystal's source code is available, which is intriguing.

I have plans to introduce further multi-threading into the disk streaming engine after 3.0.3 is blessed as "release" - 3.0.3 has seen a lot of AVX and SSE vector optimization of the BFDLAC decoder, and farming out the decodes to spare cores is on the cards. This won't massively improve I/O performance, but every little bit helps.
SKoT McDonald
CTO FXpansion]

The Navigator
Posts: 13
Joined: Wed Dec 12, 2012 6:23 am

Postby The Navigator » Wed Jul 30, 2014 6:51 am

Great! :-)

I like to help - I'm a senior software architect and developer in my professional life and I like to min/max performance of applications as well.


Return to “BFD3”

Who is online

Users browsing this forum: No registered users and 6 guests