You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched in the issues and found no similar issues.
What would you like to be improved?
Background
In our internal uniffle cluster, when using the storage type of MEMORY_LOCALFILE, the partial single huge partitions make disk reach high watermark, which make whole shuffle-server out-of-service due to full memory occupation.
Introduce partition size based strategy to flush single huge partition data to HDFS
Currently, I prefer 3th solution. In which, we could use cold storage(HDFS) when cumulative size of a particular partition is above a specific threshold(like 50g?). Actually, if exclude the partial huge partitions, the disk free ratio of whole shuffle-server is 20%-30%.
Reasons
We could avoid hotspot of a single shuffle server, which could use HDFS to distribute pressure.
Especially useful when disk spaces of shuffle servers are limited, which cannot hold a super large shuffle partition.
Compared with 1th solution, it is more restrained for HDFS use and will not cause greater pressure. (Actually we don't want to accept much shuffle-data to HDFS with 3 replicas)
Compared with 2th solution, it is more effective and have better performance.
Only downgrade storage of partial huge partition jobs to HDFS, which is effective, especially when the HDFS cluster performance is not good.
Final solution
The solution of handling huge partitions is to make it flush to HDFS directly and limit memory usage, all subtasks are as follows.
Code of Conduct
Search before asking
What would you like to be improved?
Background
In our internal uniffle cluster, when using the storage type of
MEMORY_LOCALFILE,the partial single huge partitions make disk reach high watermark, which make whole shuffle-server out-of-service due to full memory occupation.Possible solutions
Using the storage type ofMEMORY_LOCALFILE_HDFSwith the fallback strategy ofLocalStorageManagerFallbackStrategy. [ISSUE-163][FEATURE] Write to hdfs when local disk can't be write #235Introduce the new strategy of dynamic disk selection [Improvement] Optimize local disk selection strategy #373Introduce partition size based strategy to flush single huge partition data to HDFSCurrently, I prefer3thsolution. In which, we could use cold storage(HDFS) when cumulative size of a particular partition is above a specific threshold(like 50g?). Actually, if exclude the partial huge partitions, the disk free ratio of whole shuffle-server is 20%-30%.Reasons
We could avoid hotspot of a single shuffle server, which could use HDFS to distribute pressure.Especially useful when disk spaces of shuffle servers are limited, which cannot hold a super large shuffle partition.Compared with1thsolution, it is more restrained for HDFS use and will not cause greater pressure. (Actually we don't want to accept much shuffle-data to HDFS with 3 replicas)Compared with 2th solution, it is more effective and have better performance.Only downgrade storage of partial huge partition jobs to HDFS, which is effective, especially when the HDFS cluster performance is not good.Final solution
The solution of handling huge partitions is to make it flush to HDFS directly and limit memory usage, all subtasks are as follows.
How should we improve?
No response
Are you willing to submit PR?