DEV Community

Franck Pachot for AWS Heroes

Posted on • Updated on • Originally published at blog.dbi-services.com

AWS: PostgreSQL on Graviton2 with newer GCC

In the previous post I have run PostgreSQL on AWS m6gd.2xlarge (ARM Graviton2 processor).
I didn't precise the compilation option and this post will give more details following this feedback:
https://twitter.com/N_B__N_B/status/1369180884608315398

First, the PostgreSQL ./configure has correctly detected ARM and compiled with the following flags: -march=armv8-a+crc
This is ARM v8. However, LSE (Large System Extensions) for atomic instructions were added later in ARM v8.1 and they can make a huge difference on PostgreSQL especially with spinlocks on on high CPU usage.

I followed the information in https://github.com/aws/aws-graviton-getting-started/blob/master/c-c++.md to check the binaries after compilation.


for i in $(find postgres/src/backend -name "*.o") ; do objdump -d "$i" | awk '/:$/{w=$2}/aarch64_(cas|casp|swp|ldadd|stadd|ldclr|stclr|ldeor|steor|ldset|stset|ldsmax|stsmax|ldsmin|stsmin|ldumax|stumax|ldumin|stumin)/{printf "%-27s %-20s %-30s %-60s\n","(LSE instructions)",$NF,w,f}' f="$i" ; done | sort | uniq -c | sort -rnk1,4


      8 (LSE instructions)          <__aarch64_swp4_acq> <StartupXLOG>:                 postgres/src/backend/access/transam/xlog.o
      7 (LSE instructions)          <__aarch64_swp4_acq> <BitmapHeapNext>:              postgres/src/backend/executor/nodeBitmapHeapscan.o
      6 (LSE instructions)          <__aarch64_ldclr4_acq_rel> <LWLockDequeueSelf>:           postgres/src/backend/storage/lmgr/lwlock.o
      6 (LSE instructions)          <__aarch64_cas8_acq_rel> <shm_mq_send_bytes>:           postgres/src/backend/storage/ipc/shm_mq.o
      5 (LSE instructions)          <__aarch64_swp4_acq> <WalReceiverMain>:             postgres/src/backend/replication/walreceiver.o
      5 (LSE instructions)          <__aarch64_cas8_acq_rel> <shm_mq_receive_bytes.isra.0>: postgres/src/backend/storage/ipc/shm_mq.o
      4 (LSE instructions)          <__aarch64_swp4_acq> <ProcessRepliesIfAny>:         postgres/src/backend/replication/walsender.o
      4 (LSE instructions)          <__aarch64_swp4_acq> <hash_search_with_hash_value>: postgres/src/backend/utils/hash/dynahash.o
      4 (LSE instructions)          <__aarch64_swp4_acq> <copy_replication_slot>:       postgres/src/backend/replication/slotfuncs.o
      4 (LSE instructions)          <__aarch64_ldadd4_acq_rel> <parallel_vacuum_index>:       postgres/src/backend/access/heap/vacuumlazy.o
      4 (LSE instructions)          <__aarch64_cas4_acq_rel> <LWLockAcquire>:               postgres/src/backend/storage/lmgr/lwlock.o
      3 (LSE instructions)          <__aarch64_swp4_acq> <xlog_redo>:                   postgres/src/backend/access/transam/xlog.o
      3 (LSE instructions)          <__aarch64_swp4_acq> <XLogInsertRecord>:            postgres/src/backend/access/transam/xlog.o
      3 (LSE instructions)          <__aarch64_swp4_acq> <SaveSlotToPath>:              postgres/src/backend/replication/slot.o
      3 (LSE instructions)          <__aarch64_swp4_acq> <RequestCheckpoint>:           postgres/src/backend/postmaster/checkpointer.o
      3 (LSE instructions)          <__aarch64_swp4_acq> <LogicalRepSyncTableStart>:    postgres/src/backend/replication/logical/tablesync.o
      3 (LSE instructions)          <__aarch64_swp4_acq> <LogicalConfirmReceivedLocation>: postgres/src/backend/replication/logical/logical.o
      3 (LSE instructions)          <__aarch64_swp4_acq> <InvalidateObsoleteReplicationSlots>: postgres/src/backend/replication/slot.o
      3 (LSE instructions)          <__aarch64_swp4_acq> <CreateInitDecodingContext>:   postgres/src/backend/replication/logical/logical.o
      3 (LSE instructions)          <__aarch64_swp4_acq> <CreateCheckPoint>:            postgres/src/backend/access/transam/xlog.o
      3 (LSE instructions)          <__aarch64_swp4_acq> <CheckpointerMain>:            postgres/src/backend/postmaster/checkpointer.o
      3 (LSE instructions)          <__aarch64_ldclr4_acq_rel> <LWLockQueueSelf>:             postgres/src/backend/storage/lmgr/lwlock.o
      3 (LSE instructions)          <__aarch64_ldadd4_acq_rel> <tbm_prepare_shared_iterate>:  postgres/src/backend/nodes/tidbitmap.o
      3 (LSE instructions)          <__aarch64_ldadd4_acq_rel> <tbm_free_shared_area>:        postgres/src/backend/nodes/tidbitmap.o
      3 (LSE instructions)          <__aarch64_cas8_acq_rel> <ProcessProcSignalBarrier>:    postgres/src/backend/storage/ipc/procsignal.o
      3 (LSE instructions)          <__aarch64_cas8_acq_rel> <ExecParallelHashIncreaseNumBatches>: postgres/src/backend/executor/nodeHash.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <XLogWrite>:                   postgres/src/backend/access/transam/xlog.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <XLogSendPhysical>:            postgres/src/backend/replication/walsender.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <XLogBackgroundFlush>:         postgres/src/backend/access/transam/xlog.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <WalRcvStreaming>:             postgres/src/backend/replication/walreceiverfuncs.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <WalRcvRunning>:               postgres/src/backend/replication/walreceiverfuncs.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <WalRcvDie>:                   postgres/src/backend/replication/walreceiver.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <TransactionIdLimitedForOldSnapshots>: postgres/src/backend/utils/time/snapmgr.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <StrategyGetBuffer>:           postgres/src/backend/storage/buffer/freelist.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <shm_mq_wait_internal>:        postgres/src/backend/storage/ipc/shm_mq.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <ReplicationSlotReserveWal>:   postgres/src/backend/replication/slot.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <ReplicationSlotRelease>:      postgres/src/backend/replication/slot.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <ProcKill>:                    postgres/src/backend/storage/lmgr/proc.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <process_syncing_tables>:      postgres/src/backend/replication/logical/tablesync.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <pg_get_replication_slots>:    postgres/src/backend/replication/slotfuncs.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <exec_replication_command>:    postgres/src/backend/replication/walsender.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <CreateRestartPoint>:          postgres/src/backend/access/transam/xlog.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <ConditionVariableBroadcast>:  postgres/src/backend/storage/lmgr/condition_variable.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <BarrierArriveAndWait>:        postgres/src/backend/storage/ipc/barrier.o
      2 (LSE instructions)          <__aarch64_ldset4_acq_rel> <LWLockWaitListLock>:          postgres/src/backend/storage/lmgr/lwlock.o
      2 (LSE instructions)          <__aarch64_ldclr4_acq_rel> <LWLockWaitForVar>:            postgres/src/backend/storage/lmgr/lwlock.o
      2 (LSE instructions)          <__aarch64_ldclr4_acq_rel> <LWLockUpdateVar>:             postgres/src/backend/storage/lmgr/lwlock.o
      2 (LSE instructions)          <__aarch64_ldadd4_acq_rel> <vacuum_delay_point>:          postgres/src/backend/commands/vacuum.o
      2 (LSE instructions)          <__aarch64_ldadd4_acq_rel> <StrategyGetBuffer>:           postgres/src/backend/storage/buffer/freelist.o
      2 (LSE instructions)          <__aarch64_ldadd4_acq_rel> <LWLockRelease>:               postgres/src/backend/storage/lmgr/lwlock.o
      2 (LSE instructions)          <__aarch64_ldadd4_acq_rel> <lazy_parallel_vacuum_indexes>: postgres/src/backend/access/heap/vacuumlazy.o
      2 (LSE instructions)          <__aarch64_cas8_acq_rel> <WalReceiverMain>:             postgres/src/backend/replication/walreceiver.o
      2 (LSE instructions)          <__aarch64_cas8_acq_rel> <WaitForProcSignalBarrier>:    postgres/src/backend/storage/ipc/procsignal.o
      2 (LSE instructions)          <__aarch64_cas8_acq_rel> <shm_mq_receive>:              postgres/src/backend/storage/ipc/shm_mq.o
      2 (LSE instructions)          <__aarch64_cas8_acq_rel> <ResolveRecoveryConflictWithLock>: postgres/src/backend/storage/ipc/standby.o
      2 (LSE instructions)          <__aarch64_cas8_acq_rel> <ProcSignalInit>:              postgres/src/backend/storage/ipc/procsignal.o
      2 (LSE instructions)          <__aarch64_cas8_acq_rel> <ExecParallelHashTableInsert>: postgres/src/backend/executor/nodeHash.o
      2 (LSE instructions)          <__aarch64_cas8_acq_rel> <ExecParallelHashTableInsertCurrentBatch>: postgres/src/backend/executor/nodeHash.o
      2 (LSE instructions)          <__aarch64_cas8_acq_rel> <ExecParallelHashIncreaseNumBuckets>: postgres/src/backend/executor/nodeHash.o
      2 (LSE instructions)          <__aarch64_cas4_acq_rel> <TransactionIdSetTreeStatus>:  postgres/src/backend/access/transam/clog.o
      2 (LSE instructions)          <__aarch64_cas4_acq_rel> <ProcArrayEndTransaction>:     postgres/src/backend/storage/ipc/procarray.o
      2 (LSE instructions)          <__aarch64_cas4_acq_rel> <LWLockAcquireOrWait>:         postgres/src/backend/storage/lmgr/lwlock.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <XLogWalRcvFlush.part.4>:      postgres/src/backend/replication/walreceiver.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <XLogSetReplicationSlotMinimumLSN>: postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <XLogSetAsyncXactLSN>:         postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <XLogSendLogical>:             postgres/src/backend/replication/walsender.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <XLogPageRead>:                postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <XLogNeedsFlush>:              postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <XLogGetLastRemovedSegno>:     postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <XLogFlush>:                   postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <worker_freeze_result_tape>:   postgres/src/backend/utils/sort/tuplesort.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <WalSndWakeup>:                postgres/src/backend/replication/walsender.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <WalSndWaitStopping>:          postgres/src/backend/replication/walsender.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <WalSndSetState>:              postgres/src/backend/replication/walsender.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <WalSndRqstFileReload>:        postgres/src/backend/replication/walsender.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <WalSndKill>:                  postgres/src/backend/replication/walsender.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <WalSndInitStopping>:          postgres/src/backend/replication/walsender.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <WalRcvForceReply>:            postgres/src/backend/replication/walreceiver.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <WaitXLogInsertionsToFinish>:  postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <UpdateMinRecoveryPoint.part.10>: postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <tuplesort_performsort>:       postgres/src/backend/utils/sort/tuplesort.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <tuplesort_begin_common>:      postgres/src/backend/utils/sort/tuplesort.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <table_block_parallelscan_startblock_init>: postgres/src/backend/access/table/tableam.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <SyncRepInitConfig>:           postgres/src/backend/replication/syncrep.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <SyncRepGetCandidateStandbys>: postgres/src/backend/replication/syncrep.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <StrategySyncStart>:           postgres/src/backend/storage/buffer/freelist.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <StrategyNotifyBgWriter>:      postgres/src/backend/storage/buffer/freelist.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <StrategyFreeBuffer>:          postgres/src/backend/storage/buffer/freelist.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <SnapshotTooOldMagicForTest>:  postgres/src/backend/utils/time/snapmgr.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <s_lock>:                      postgres/src/backend/storage/lmgr/s_lock.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <SIInsertDataEntries>:         postgres/src/backend/storage/ipc/sinvaladt.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <SIGetDataEntries>:            postgres/src/backend/storage/ipc/sinvaladt.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ShutdownWalRcv>:              postgres/src/backend/replication/walreceiverfuncs.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <shm_toc_insert>:              postgres/src/backend/storage/ipc/shm_toc.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <shm_toc_freespace>:           postgres/src/backend/storage/ipc/shm_toc.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <shm_toc_allocate>:            postgres/src/backend/storage/ipc/shm_toc.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <shm_mq_set_sender>:           postgres/src/backend/storage/ipc/shm_mq.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <shm_mq_set_receiver>:         postgres/src/backend/storage/ipc/shm_mq.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <shm_mq_sendv>:                postgres/src/backend/storage/ipc/shm_mq.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <shm_mq_get_sender>:           postgres/src/backend/storage/ipc/shm_mq.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <shm_mq_get_receiver>:         postgres/src/backend/storage/ipc/shm_mq.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <shm_mq_detach_internal>:      postgres/src/backend/storage/ipc/shm_mq.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ShmemAllocRaw>:               postgres/src/backend/storage/ipc/shmem.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <SharedFileSetOnDetach>:       postgres/src/backend/storage/file/sharedfileset.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <SharedFileSetAttach>:         postgres/src/backend/storage/file/sharedfileset.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <SetWalWriterSleeping>:        postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <SetRecoveryPause>:            postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <SetPromoteIsTriggered>:       postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <SetOldSnapshotThresholdTimestamp>: postgres/src/backend/utils/time/snapmgr.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <RequestXLogStreaming>:        postgres/src/backend/replication/walreceiverfuncs.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ReplicationSlotsDropDBSlots>: postgres/src/backend/replication/slot.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ReplicationSlotsCountDBSlots>: postgres/src/backend/replication/slot.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ReplicationSlotsComputeRequiredXmin>: postgres/src/backend/replication/slot.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ReplicationSlotsComputeRequiredLSN>: postgres/src/backend/replication/slot.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ReplicationSlotsComputeLogicalRestartLSN>: postgres/src/backend/replication/slot.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ReplicationSlotPersist>:      postgres/src/backend/replication/slot.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ReplicationSlotMarkDirty>:    postgres/src/backend/replication/slot.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ReplicationSlotDropPtr>:      postgres/src/backend/replication/slot.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ReplicationSlotCreate>:       postgres/src/backend/replication/slot.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ReplicationSlotCleanup>:      postgres/src/backend/replication/slot.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ReplicationSlotAcquireInternal>: postgres/src/backend/replication/slot.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <RemoveOldXlogFiles>:          postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <RemoveLocalLock>:             postgres/src/backend/storage/lmgr/lock.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <RecoveryRestartPoint>:        postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <RecoveryIsPaused>:            postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ReadRecord>:                  postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <PublishStartupProcessInformation>: postgres/src/backend/storage/lmgr/proc.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <PromoteIsTriggered>:          postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ProcSendSignal>:              postgres/src/backend/storage/lmgr/proc.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ProcessWalSndrMessage>:       postgres/src/backend/replication/walreceiver.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <PhysicalReplicationSlotNewXmin>: postgres/src/backend/replication/walsender.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <pg_stat_get_wal_senders>:     postgres/src/backend/replication/walsender.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <pg_stat_get_wal_receiver>:    postgres/src/backend/replication/walreceiver.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <pg_replication_slot_advance>: postgres/src/backend/replication/slotfuncs.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ParallelWorkerReportLastRecEnd>: postgres/src/backend/access/transam/parallel.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <MaintainOldSnapshotTimeMapping>: postgres/src/backend/utils/time/snapmgr.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <LWLockNewTrancheId>:          postgres/src/backend/storage/lmgr/lwlock.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <LogicalIncreaseXminForSlot>:  postgres/src/backend/replication/logical/logical.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <LogicalIncreaseRestartDecodingForSlot>: postgres/src/backend/replication/logical/logical.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <lock_twophase_recover>:       postgres/src/backend/storage/lmgr/lock.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <LockRefindAndRelease>:        postgres/src/backend/storage/lmgr/lock.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <LockAcquireExtended>:         postgres/src/backend/storage/lmgr/lock.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <KnownAssignedXidsSearch>:     postgres/src/backend/storage/ipc/procarray.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <KnownAssignedXidsGetAndSetXmin>: postgres/src/backend/storage/ipc/procarray.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <KnownAssignedXidsAdd>:        postgres/src/backend/storage/ipc/procarray.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <KeepLogSeg>:                  postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <InitWalSender>:               postgres/src/backend/replication/walsender.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <InitProcess>:                 postgres/src/backend/storage/lmgr/proc.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <InitAuxiliaryProcess>:        postgres/src/backend/storage/lmgr/proc.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <HotStandbyActive>:            postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <HaveNFreeProcs>:              postgres/src/backend/storage/lmgr/proc.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetXLogWriteRecPtr>:          postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetXLogReplayRecPtr>:         postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetXLogInsertRecPtr>:         postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetWalRcvFlushRecPtr>:        postgres/src/backend/replication/walreceiverfuncs.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetSnapshotCurrentTimestamp>: postgres/src/backend/utils/time/snapmgr.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetReplicationTransferLatency>: postgres/src/backend/replication/walreceiverfuncs.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetReplicationApplyDelay>:    postgres/src/backend/replication/walreceiverfuncs.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetRedoRecPtr>:               postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetRecoveryState>:            postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetOldSnapshotThresholdTimestamp>: postgres/src/backend/utils/time/snapmgr.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetLatestXTime>:              postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetInsertRecPtr>:             postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetFlushRecPtr>:              postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetFakeLSNForUnloggedRel>:    postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetCurrentChunkReplayStartTime>: postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <FirstCallSinceLastCheckpoint>: postgres/src/backend/postmaster/checkpointer.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <element_alloc>:               postgres/src/backend/utils/hash/dynahash.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <do_pg_stop_backup>:           postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <do_pg_start_backup>:          postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <DecodingContextFindStartpoint>: postgres/src/backend/replication/logical/logical.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ConditionVariableTimedSleep>: postgres/src/backend/storage/lmgr/condition_variable.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ConditionVariableSignal>:     postgres/src/backend/storage/lmgr/condition_variable.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ConditionVariablePrepareToSleep>: postgres/src/backend/storage/lmgr/condition_variable.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ConditionVariableCancelSleep>: postgres/src/backend/storage/lmgr/condition_variable.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ComputeXidHorizons>:          postgres/src/backend/storage/ipc/procarray.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <CheckXLogRemoved>:            postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <CheckRecoveryConsistency.part.11>: postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <_bt_parallel_seize>:          postgres/src/backend/access/nbtree/nbtree.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <_bt_parallel_scan_and_sort>:  postgres/src/backend/access/nbtree/nbtsort.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <btparallelrescan>:            postgres/src/backend/access/nbtree/nbtree.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <_bt_parallel_release>:        postgres/src/backend/access/nbtree/nbtree.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <_bt_parallel_done>:           postgres/src/backend/access/nbtree/nbtree.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <_bt_parallel_advance_array_keys>: postgres/src/backend/access/nbtree/nbtree.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <btbuild>:                     postgres/src/backend/access/nbtree/nbtsort.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <BarrierParticipants>:         postgres/src/backend/storage/ipc/barrier.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <BarrierDetach>:               postgres/src/backend/storage/ipc/barrier.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <BarrierAttach>:               postgres/src/backend/storage/ipc/barrier.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <BarrierArriveAndDetach>:      postgres/src/backend/storage/ipc/barrier.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <BarrierArriveAndDetachExceptLast>: postgres/src/backend/storage/ipc/barrier.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <AuxiliaryProcKill>:           postgres/src/backend/storage/lmgr/proc.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <AdvanceXLInsertBuffer>:       postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <AbortStrongLockAcquire>:      postgres/src/backend/storage/lmgr/lock.o
      1 (LSE instructions)          <__aarch64_ldset4_acq_rel> <ProcessProcSignalBarrier>:    postgres/src/backend/storage/ipc/procsignal.o
      1 (LSE instructions)          <__aarch64_ldset4_acq_rel> <LWLockWaitForVar>:            postgres/src/backend/storage/lmgr/lwlock.o
      1 (LSE instructions)          <__aarch64_ldset4_acq_rel> <LWLockQueueSelf>:             postgres/src/backend/storage/lmgr/lwlock.o
      1 (LSE instructions)          <__aarch64_ldset4_acq_rel> <LWLockDequeueSelf>:           postgres/src/backend/storage/lmgr/lwlock.o
      1 (LSE instructions)          <__aarch64_ldset4_acq_rel> <LWLockAcquire>:               postgres/src/backend/storage/lmgr/lwlock.o
      1 (LSE instructions)          <__aarch64_ldset4_acq_rel> <LockBufHdr>:                  postgres/src/backend/storage/buffer/bufmgr.o
      1 (LSE instructions)          <__aarch64_ldset4_acq_rel> <EmitProcSignalBarrier>:       postgres/src/backend/storage/ipc/procsignal.o
      1 (LSE instructions)          <__aarch64_ldclr4_acq_rel> <LWLockReleaseClearVar>:       postgres/src/backend/storage/lmgr/lwlock.o
      1 (LSE instructions)          <__aarch64_ldadd8_acq_rel> <table_block_parallelscan_nextpage>: postgres/src/backend/access/table/tableam.o
      1 (LSE instructions)          <__aarch64_ldadd8_acq_rel> <EmitProcSignalBarrier>:       postgres/src/backend/storage/ipc/procsignal.o
      1 (LSE instructions)          <__aarch64_ldadd4_acq_rel> <find_or_make_matching_shared_tupledesc>: postgres/src/backend/utils/cache/typcache.o
      1 (LSE instructions)          <__aarch64_ldadd4_acq_rel> <ExecParallelHashJoin>:        postgres/src/backend/executor/nodeHashjoin.o
      1 (LSE instructions)          <__aarch64_cas8_acq_rel> <table_block_parallelscan_reinitialize>: postgres/src/backend/access/table/tableam.o
      1 (LSE instructions)          <__aarch64_cas8_acq_rel> <ProcWakeup>:                  postgres/src/backend/storage/lmgr/proc.o
      1 (LSE instructions)          <__aarch64_cas8_acq_rel> <ProcSleep>:                   postgres/src/backend/storage/lmgr/proc.o
      1 (LSE instructions)          <__aarch64_cas8_acq_rel> <pg_stat_get_wal_receiver>:    postgres/src/backend/replication/walreceiver.o
      1 (LSE instructions)          <__aarch64_cas8_acq_rel> <InitProcess>:                 postgres/src/backend/storage/lmgr/proc.o
      1 (LSE instructions)          <__aarch64_cas8_acq_rel> <InitAuxiliaryProcess>:        postgres/src/backend/storage/lmgr/proc.o
      1 (LSE instructions)          <__aarch64_cas8_acq_rel> <GetWalRcvWriteRecPtr>:        postgres/src/backend/replication/walreceiverfuncs.o
      1 (LSE instructions)          <__aarch64_cas8_acq_rel> <GetLockStatusData>:           postgres/src/backend/storage/lmgr/lock.o
      1 (LSE instructions)          <__aarch64_cas8_acq_rel> <ExecParallelScanHashBucket>:  postgres/src/backend/executor/nodeHash.o
      1 (LSE instructions)          <__aarch64_cas8_acq_rel> <CleanupProcSignalState>:      postgres/src/backend/storage/ipc/procsignal.o
      1 (LSE instructions)          <__aarch64_cas4_acq_rel> <UnpinBuffer.constprop.11>:    postgres/src/backend/storage/buffer/bufmgr.o
      1 (LSE instructions)          <__aarch64_cas4_acq_rel> <StrategySyncStart>:           postgres/src/backend/storage/buffer/freelist.o
      1 (LSE instructions)          <__aarch64_cas4_acq_rel> <StrategyGetBuffer>:           postgres/src/backend/storage/buffer/freelist.o
      1 (LSE instructions)          <__aarch64_cas4_acq_rel> <ProcessProcSignalBarrier>:    postgres/src/backend/storage/ipc/procsignal.o
      1 (LSE instructions)          <__aarch64_cas4_acq_rel> <PinBuffer>:                   postgres/src/backend/storage/buffer/bufmgr.o
      1 (LSE instructions)          <__aarch64_cas4_acq_rel> <MarkBufferDirty>:             postgres/src/backend/storage/buffer/bufmgr.o
      1 (LSE instructions)          <__aarch64_cas4_acq_rel> <LWLockRelease>:               postgres/src/backend/storage/lmgr/lwlock.o
      1 (LSE instructions)          <__aarch64_cas4_acq_rel> <LWLockConditionalAcquire>:    postgres/src/backend/storage/lmgr/lwlock.o

So, this confirms that it was compiled with -march=armv8-a and outline -moutline-atomics (which is the default in GCC >= 10 and also in the GCC 7 compiled in Amazon Linux 2). LSE (Large-System Extensions) are there, and we can see where the atomic instructions are used: WAL and buffer lightweight locks that protect access to shared memory.

for i in /usr/local/pgsql/bin/postgres $(find postgres/src/backend -name "*.o") ; do objdump -d "$i" | awk '/:$/{w=$2}/aarch64_(cas|casp|swp|ldadd|stadd|ldclr|stclr|ldeor|steor|ldset|stset|ldsmax|stsmax|ldsmin|stsmin|ldumax|stumax|ldumin|stumin)/{printf "%-27s %-40s %-40s %-60s\n","(LSE instructions)",$NF,w,f}/\t(ldxr|ldaxr|stxr|stlxr)\t/{printf "%-27s %-40s %-40s %-60s\n","(load and store exclusives)",$3,w,f}' f="$i" ; done | sort | uniq -c | sort -rn

      1 (load and store exclusives) stxr                                     <__aarch64_swp4_acq>:                    /usr/local/pgsql/bin/postgres
      1 (load and store exclusives) stlxr                                    <__aarch64_ldset4_acq_rel>:              /usr/local/pgsql/bin/postgres
      1 (load and store exclusives) stlxr                                    <__aarch64_ldclr4_acq_rel>:              /usr/local/pgsql/bin/postgres
      1 (load and store exclusives) stlxr                                    <__aarch64_ldadd8_acq_rel>:              /usr/local/pgsql/bin/postgres
      1 (load and store exclusives) stlxr                                    <__aarch64_ldadd4_acq_rel>:              /usr/local/pgsql/bin/postgres
      1 (load and store exclusives) stlxr                                    <__aarch64_cas8_acq_rel>:                /usr/local/pgsql/bin/postgres
      1 (load and store exclusives) stlxr                                    <__aarch64_cas4_acq_rel>:                /usr/local/pgsql/bin/postgres
      1 (load and store exclusives) ldaxr                                    <__aarch64_swp4_acq>:                    /usr/local/pgsql/bin/postgres
      1 (load and store exclusives) ldaxr                                    <__aarch64_ldset4_acq_rel>:              /usr/local/pgsql/bin/postgres
      1 (load and store exclusives) ldaxr                                    <__aarch64_ldclr4_acq_rel>:              /usr/local/pgsql/bin/postgres
      1 (load and store exclusives) ldaxr                                    <__aarch64_ldadd8_acq_rel>:              /usr/local/pgsql/bin/postgres
      1 (load and store exclusives) ldaxr                                    <__aarch64_ldadd4_acq_rel>:              /usr/local/pgsql/bin/postgres
      1 (load and store exclusives) ldaxr                                    <__aarch64_cas8_acq_rel>:                /usr/local/pgsql/bin/postgres
      1 (load and store exclusives) ldaxr                                    <__aarch64_cas4_acq_rel>:                /usr/local/pgsql/bin/postgres

This confirms that the PostgreSQL binary also contains load and store exclusives so that the binary can run on Graviton and Graviton2.


[ec2-user@ip-172-31-11-116 ~]$ nm /usr/local/pgsql/bin/postgres | grep -E "aarch64(_have_lse_atomics)?"

00000000008fb460 t __aarch64_cas4_acq_rel
00000000008fb490 t __aarch64_cas8_acq_rel
0000000000bbe640 b __aarch64_have_lse_atomics
00000000008fb4f0 t __aarch64_ldadd4_acq_rel
00000000008fb580 t __aarch64_ldadd8_acq_rel
00000000008fb520 t __aarch64_ldclr4_acq_rel
00000000008fb550 t __aarch64_ldset4_acq_rel
00000000008fb4c0 t __aarch64_swp4_acq

This is the run-time detection. As it was compiled for ARM v8, with atomics outlined, the same binary can run on v8 or >=v8.1


[ec2-user@ip-172-31-11-116 ~]$ gcc --version
gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12)
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

This is GCC 7, but on Amazon Linux 2 it has been patched to enable -moutline-atomics by default.

Install latest version of GCC (version 11 experimental)

Here is how I compiled the latest GCC available:


gcc --version
sudo yum -y install bzip2 git gcc gcc-c++ gmp-devel mpfr-devel libmpc-devel make flex bison
git clone https://github.com/gcc-mirror/gcc.git
cd gcc
make distclean
./configure --enable-languages=c,c++
make
sudo make install

This basically get the latest GCC fron source, compiles and installs it (please remember this is a lab - use stable versions elswhere)

[ec2-user@ip-172-31-38-254 ~]$ gcc --version
gcc (GCC) 11.0.1 20210309 (experimental)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Here we are: gcc 11.0.1 20210309 (experimental)

PGIO LIOPS

I'm running the same PGIO as in previous post


Date: Wed Mar 10 14:39:38 UTC 2021
Database connect string: "pgio".
Shared buffers: 8500MB.
Testing 4 schemas with 1 thread(s) accessing 1024M (131072 blocks) of each schema.
Running iostat, vmstat and mpstat on current host--in background.
Launching sessions. 4 schema(s) will be accessed by 1 thread(s) each.
pg_stat_database stats:
          datname| blks_hit| blks_read|tup_returned|tup_fetched|tup_updated
BEFORE:  pgio    | 38262338086 |    562443 |  37644815538 | 37635763756 |          24
AFTER:   pgio    | 49691750429 |    562449 |  48890461241 | 48878858651 |          49
DBNAME:  pgio. 4 schemas, 1 threads(each). Run time: 3600 seconds. RIOPS >793709<

This is a little higher than what I had: 793709 LIOPS / CPU where I had 780651 with GCC 7 but that's still lower than the 896280 I had on x86.

Of course, there can be more optimisations as mentioned in https://github.com/aws/aws-graviton-getting-started/blob/master/c-c++.md
I'll recompile with the recommended flags

(
cd postgres
CFLAGS="-march=armv8.2-a+fp16+rcpc+dotprod+crypto -mtune=neoverse-n1 -fsigned-char" ./configure
make clean
make
make install
)

I didn't make any difference in the PGIO run. Of course, this may change with a read-write workload (more spinlocks) with checksum.

Note that I compiled with the default (empty) CFLAGS and then gcc was called with -march=armv8-a+crc (and -moutline-atomics is the default) so I'm in the same situation with run-time detection. Because the GCC >=10 behaviour has been backorted by Amazon to the GCC 7 in Amazon Linux 2. This was not clear for me initially (I got this clarified here).

By the way, Aurora on Graviton2 is still compiled with GCC 7.4

Update 15-MAY-2021: I have rephrased a few things here which were not clear (even for myself) but I'll write more on PostgreSQL on ARM, and on benchmarks in general. http://blog.pachot.net should send to the right place (or @FranckPachot twitter of course)

Top comments (0)