For those who don't remember, I'm currently working on a project to implement SVE2 instructions into an open-source project.
In the last part, I was looking for an excellent candidate to which we could add SVE2 instruction.
This part will look into the open-source directory and see how we can implement scalable vector extensions.
First of all, I previously decided to work with the VC library on our project.https://github.com/VcDevel/Vc
After searching around, I found that this repository has a better project than SSE and Neon.
STD-SIMD is almost the same project I was working on but a bit more advanced: https://github.com/VcDevel/std-simd.
Here, you can see my progress: with the maintainer team and the progress I'm making:
Issue-1:https://github.com/VcDevel/Vc/issues/320
Issue-2:https://github.com/VcDevel/std-simd/issues/34
Pull-Request: https://github.com/VcDevel/std-simd/pull/35
I started by cloning the repo and searching through the code:
Are there SIMD instructions? Yes.
There is also AVX.
Also, let's check for architecture.
X86. Yes.
ARM/ARM64. No
After I checked all the files, I decided to start implementing the code:
There were no .s .S files which made it a bit scary, but I have a plan.
It will be some If&def that will switch between existing x86 functions to arm/arm64. I'm going to try to implement SVE2 with arm/64.
What options do I have:
- auto-vectorization
- intrinsics
- inline-assembly
Let's see what we can do:
For auto-vectorization, it's needed to change the build instructions, but it won't be specific and as our professor said C compiler can't always know what to do.
Intrinsics and Inline-assembly are complicated to include because it's not only a build instruction but a change in a source code. As I looked a bit through the code, it has intrinsic inside it, defined as macros.
Also, I found our If statement for different instructions:
simd.c
#if _GLIBCXX_SIMD_X86INTRIN
#include "bits/simd_x86.h"
#elif _GLIBCXX_SIMD_HAVE_NEON
#include "bits/simd_neon.h"
#elif __ALTIVEC__
#include "bits/simd_ppc.h"
#endif
#include "bits/simd_math.h"
I've decided to start with auto-vectorization and work a bit with arm/arm64 inline assembly or intrinsics if I have time.
I want to note that we won't have a chance to see an increase in time because we don't have the hardware yet, but lets at least prepare our code and build it successfully.
I just read through the library instructions for auto-vectorization and tried to build on different machines, simply adding the flags to my build.
gcc -O3 -march=armv8-a+sve2 ... // 03 flag to enable auto-vectorization
gcc -O2 -march=armv8-a+sve2 -ftree-vectorize ... // or -ftre...
I have an arm64 and x86 local computer, so I checked on both machines, and everything was built. But as mentioned above, we can't see the difference because of hardware, but at least we know it will work in the future new architecture.
In addition to this, I decided to dive deeper into the code and maybe at least try adding intrinsics for sve2.
Here, you can see all the progress. I have already changed a few files. Include header:
#include <arm_sve.h>
And wrote an ifdef statement to switch between different architecture and instructions.
I will progress on this more until part three and after, so maybe the maintainer could accept my changes into their code.
Pull-Request: https://github.com/VcDevel/std-simd/pull/35
Conclusion
References:
https://developer.arm.com/documentation/100987/0000/
β οΈ Pull-Request: https://github.com/VcDevel/std-simd/pull/35
Links
π Follow me on GitHub
π Follow me on Twitter
_p.s This post was made for my Software Portability and Optimization class. project part2
Top comments (0)