DEV Community

Johny
Johny

Posted on

Dynamic Allocation Issues On Spark 2.4.8 (Possible Issue with External Shuffle Service?)

Hey Team,

I am having some issue with dynamic Allocation for spark 2.4.8. I have setup a cluster using your clemlab distribution (https://www.clemlab.com/) . Spark jobs are now running fine. The issue is when I try to use dynamicAllocation options. I am thinking the problems could be due to External Shuffle Service but I feel like it should be setup properly from what I have.

From the resource manager logs we can see that the container goes from ACQUIRED to RELEASED resources which is weird. It does not go to RUNNING state.

I am out of ideas at this point how to make the dynamic Allocation work. So I am turning to you in hope that you may have some insight in the matter.

There are no issues if I do not use dynamic Allocation and spark jobs work just fine but I really want to make dynamic allocation work.

Thank you for the assistance and apologies for the long message but just wanted to supply all details possible.

Here are setting I have in ambari related to it:

Yarn:

Image description

Checking the directories here I can find necessary jar on all nodemanager hosts in the right directory:
/usr/odp/1.2.2.0-138/spark2/yarn/spark-2.4.8.1.2.2.0-138-yarn-shuffle.jar
/usr/odp/current/spark2-cient/yarn/spark-2.4.8.1.2.2.0-138-yarn-shuffle.jar ( I believe there is symbolic link to the above jar)

Spark2:

Image description

 In the spark log I can see this message continuously spamming:

24/10/13 16:38:16 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/13 16:38:31 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/13 16:38:46 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/13 16:39:01 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/13 16:39:16 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/13 16:39:31 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Top comments (1)

Collapse
 
mr_boom_boom profile image
Johny

I found the issue in one of the container logs:
Caused by: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist
It looks like when using dynamic allocation it was looking for spark_shuffle while I had spark2_shuffle.
Also need to adjust these settings in ambari:

yarn.nodemanager.aux-services: spark_shuffle (remove spark2_shuffle if it is present)
yarn.nodemanager.aux-services.spark_shuffle.classpath: {{stack_root}}/current/spark2-client/yarn/* (point to correct jar location

yarn.nodemanager.aux-services.spark_shuffle.class: org.apache.spark.network.yarn.YarnShuffleService
And that fixed the issue, now I can use dynamic allocation.