Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
habrok:additional_information:course_material:advanced_exercises_solutions [2023/09/27 07:33] – [Exercise 1.5: Handle spaces in file names] pedrohabrok:additional_information:course_material:advanced_exercises_solutions [2024/10/28 14:40] (current) – [Exercise 0] Changed to GitLab link aurel
Line 20: Line 20:
 ===== Exercise 0 ===== ===== Exercise 0 =====
  
-The files needed to complete these exercises are on [[https://github.com/rug-cit-hpc/cluster_course.git|GitHub]]. Get a copy of the exercise files by running:+The files needed to complete these exercises are on [[https://gitrepo.service.rug.nl/cit-hpc/habrok/cluster_course.git|GitLab]]. Get a copy of the exercise files by running:
  
 <code> <code>
-git clone https://github.com/rug-cit-hpc/cluster_course.git+git clone https://gitrepo.service.rug.nl/cit-hpc/habrok/cluster_course.git
 </code> </code>
  
Line 331: Line 331:
 **Run these commands on the imagefile to see what they do.** **Run these commands on the imagefile to see what they do.**
  
-As you have seen these commands result in numbers that we could use in our script. In order to store a number into a variable we can use the ''$( )'' operator. E.g.:+As you have seen these commands result in numbers that we could use in our script. In order to store the output into a variable we can use the ''$( )'' operator. E.g.:
 <code> <code>
 myvar=$( command ) myvar=$( command )
Line 524: Line 524:
  
 # Check if filename has been supplied # Check if filename has been supplied
-if [ -z $1 ]+if [ -z "$1]
 then then
     echo "ERROR: No input specified"     echo "ERROR: No input specified"
Line 531: Line 531:
  
 # Check if file exists # Check if file exists
-if [ -e $1 ]+if [ -e "$1]
 then then
     echo "Processing image: " $1     echo "Processing image: " $1
Line 540: Line 540:
     module load ImageMagick/7.1.0-53-GCCcore-12.2.0     module load ImageMagick/7.1.0-53-GCCcore-12.2.0
     # Load the compilers     # Load the compilers
-    module load foss/2022b     +    module load foss/2022b
-    # Compile the program +
-    make+
  
     # Get the directory in which the file is stored     # Get the directory in which the file is stored
-    dirname=$(dirname $1) +    dirname=$(dirname "$1"
-    filename=$(basename $1)+    filename=$(basename "$1")
          
     # Determine width and height of the image     # Determine width and height of the image
-    width=$(identify -format "%w" $dirname/$filename) +    width=$(identify -format "%w"$dirname/$filename"
-    height=$(identify -format "%h" $dirname/$filename)+    height=$(identify -format "%h"$dirname/$filename")
     echo "Width: " $width     echo "Width: " $width
     echo "Height: " $height     echo "Height: " $height
 +    
 +    # Compile the program
 +    make
  
     # Convert the jpg file to the rgb format for easy processing     # Convert the jpg file to the rgb format for easy processing
-    convert $dirname/$filename $filename.rgb+    convert "$dirname/$filename" "$filename.rgb"
     # Run the convolution filter program on the image     # Run the convolution filter program on the image
-    ./mpi_omp_conv $filename.rgb $width $height 1 rgb+    ./mpi_omp_conv "$filename.rgb$width $height 1 rgb
     # Convert the resulting file back to jpg format     # Convert the resulting file back to jpg format
-    convert -size ${width}x${height}  -depth 8 conv_$filename.rgb conv_$filename+    convert -size ${width}x${height}  -depth 8 "conv_$filename.rgb" "conv_$filename"
          
     # Remove the intermediate files     # Remove the intermediate files
-    rm $filename.rgb conv_$filename.rgb+    rm "$filename.rgb" "conv_$filename.rgb"
 else else
     echo "ERROR: File $1 does not exist"     echo "ERROR: File $1 does not exist"
Line 585: Line 586:
 module purge module purge
 # Load the compilers # Load the compilers
-module load foss/2020a+module load foss/2022b
 # Compile the program # Compile the program
 make make
Line 624: Line 625:
 module purge module purge
 # Load the compilers # Load the compilers
-module load foss/2020a+module load foss/2022b
 # Compile the program # Compile the program
 make make
Line 648: Line 649:
 ** Since we are going to submit all work from a single job script, we need to make sure that the executable is in place, before we can run the script. ** This can be achieved by running the following commands: ** Since we are going to submit all work from a single job script, we need to make sure that the executable is in place, before we can run the script. ** This can be achieved by running the following commands:
 <code> <code>
-module load foss/2020a+module load foss/2022b
 make make
 </code> </code>
Line 654: Line 655:
 First we need a list of all images to be processed, and this time we are going to store this list in a text file instead of in a variable. First we need a list of all images to be processed, and this time we are going to store this list in a text file instead of in a variable.
  
-**Run a command (you can use ''ls'' for this) that lists all images in ''../images'', and redirect (using ''>'') the output to a file, e.g. ''images.txt''.** Make sure that each image gets printed on a separate line, check this by inspecting the file with cat or less.+**Run a command (you can use ''ls'' for this) that lists all images in ''../images'', and redirect (using ''>'') the output to a file, e.g. ''images.txt''.** Make sure that each image gets printed on a separate line, check this by inspecting the file with ''cat'' or ''less''.
  
 Now open the script ''jobscript.sh''. Instead of passing the filename of an image, we are going to pass the filename of the file that you just created and that contains the filenames of all images to be processed as an argument to our jobscript. Now open the script ''jobscript.sh''. Instead of passing the filename of an image, we are going to pass the filename of the file that you just created and that contains the filenames of all images to be processed as an argument to our jobscript.
Line 667: Line 668:
     echo "Processing image: " $image     echo "Processing image: " $image
 </file> </file>
-**Insert the right command between the parentheses: it needs to print (use ''cat'') the file and take the n-th line, where n is ''$SLURM_ARRAY_TASK_ID'', using the head and tail commands. See the example in the slides to find out how you can get the n-th line of a file.**+**Insert the right command between the parentheses: it needs to print (use ''cat'') the file and take the n-th line, where n is ''$SLURM_ARRAY_TASK_ID'', using the ''head'' and ''tail'' commands. See the example in the slides to find out how you can get the n-th line of a file.**
  
 Finally, to make this a job array, we have to set a range that defines how many times this job is going to be run. Since we use ''$SLURM_ARRAY_TASK_ID'' to refer to a line number, we want the range to go from 1 to N, where N is the number of images. But since we are still testing our script, let's start with a smaller range: Finally, to make this a job array, we have to set a range that defines how many times this job is going to be run. Since we use ''$SLURM_ARRAY_TASK_ID'' to refer to a line number, we want the range to go from 1 to N, where N is the number of images. But since we are still testing our script, let's start with a smaller range:
Line 691: Line 692:
 #SBATCH --mem=4GB #SBATCH --mem=4GB
 #SBATCH --time=00:10:00 #SBATCH --time=00:10:00
-#SBATCH --partition=short+#SBATCH --partition=regular
 #SBATCH --job-name=Edge_Detection #SBATCH --job-name=Edge_Detection
 #SBATCH --array=1-31 #SBATCH --array=1-31
Line 708: Line 709:
     image=$(cat "$1" | head -n ${SLURM_ARRAY_TASK_ID} | tail -n 1)     image=$(cat "$1" | head -n ${SLURM_ARRAY_TASK_ID} | tail -n 1)
     echo "Processing image: " "$image"     echo "Processing image: " "$image"
 +
 +    # Clean up the module environment
 +    module purge
 +    # Load the conversion and identification tools
 +    module load ImageMagick/7.1.0-53-GCCcore-12.2.0
  
     # Get the directory in which the file is stored     # Get the directory in which the file is stored
Line 719: Line 725:
     echo "Height: " $height     echo "Height: " $height
          
-    # Clean up the module environment 
-    module purge 
-    # Load the compilers 
-    module load foss/2020a 
- 
     # Convert the jpg file to the rgb format for easy processing     # Convert the jpg file to the rgb format for easy processing
     convert "$dirname/$filename" "$filename.rgb"     convert "$dirname/$filename" "$filename.rgb"
Line 757: Line 758:
 You can study the resulting image using the command: You can study the resulting image using the command:
 <code> <code>
 +module load ImageMagick/7.1.0-53-GCCcore-12.2.0
 display -resize 30% conv_Microcrystals.jpg display -resize 30% conv_Microcrystals.jpg
 </code> </code>
Line 769: Line 771:
 #SBATCH --mem=4GB #SBATCH --mem=4GB
 #SBATCH --time=00:10:00 #SBATCH --time=00:10:00
-#SBATCH --partition=short+#SBATCH --partition=regular
 #SBATCH --job-name=Blurring #SBATCH --job-name=Blurring
 #SBATCH --output=singlecpu.out #SBATCH --output=singlecpu.out
Line 776: Line 778:
 module purge module purge
 # Load the compilers # Load the compilers
-module load foss/2020a+module load foss/2022b 
 +# Load the conversion tool 
 +module load ImageMagick/7.1.0-53-GCCcore-12.2.0
 # Compile the program # Compile the program
 make make
Line 803: Line 807:
  
 After performing this exercise, you should obtain something like the following: After performing this exercise, you should obtain something like the following:
-{{:peregrine:additional_information:course_material:openmp_times.png?nolink |}}+{{:habrok:additional_information:course_material:openmp_times.png?nolink |}}
  
 The ''Ideal Performance'' shows the case where the scaling is perfect. The work is fully parallelizable, and the walltime is halved with doubling the number of CPUs. The real case is not as efficient: the ''CPU Time'' is consistently larger than the ''Ideal Performance'' suggesting that there is some inefficiency in parallelization; furthermore, the ''Walltime'' is somewhat larger still, which means that some overhead is introduced by adding additional CPUs to the computation. The ''Ideal Performance'' shows the case where the scaling is perfect. The work is fully parallelizable, and the walltime is halved with doubling the number of CPUs. The real case is not as efficient: the ''CPU Time'' is consistently larger than the ''Ideal Performance'' suggesting that there is some inefficiency in parallelization; furthermore, the ''Walltime'' is somewhat larger still, which means that some overhead is introduced by adding additional CPUs to the computation.
Line 838: Line 842:
  
 **Run the blurring app with 2, 4, 8, and 16 MPI tasks, each using one core and running on a separate node.** Make note of the runtimes, as well as the overall wallclock time. How does this differ from the previous exercise? **Run the blurring app with 2, 4, 8, and 16 MPI tasks, each using one core and running on a separate node.** Make note of the runtimes, as well as the overall wallclock time. How does this differ from the previous exercise?
 +
 +You can try to resubmit the job with 4 nodes to a ''parallel'' partition in which the nodes have a faster low-latency interconnect. Does this make a difference? Note that using more nodes will result in a long waiting time as there are only 24 nodes in this partition.
 +
 +The "low-latency" means that the time it takes for the first byte of a message to reach the other node is very small. It only takes 1.2 μs on our 100 Gb/s Omni-Path network, whereas on our 25 Gb/s ethernet the latency is 19.7 μs.
 +
  
 <hidden Solution> <hidden Solution>
Line 846: Line 855:
 After performing this exercise, you should get something like this: After performing this exercise, you should get something like this:
  
-{{:peregrine:additional_information:course_material:mpi_times_nodes.png?nolink |}}+{{:habrok:additional_information:course_material:mpi_times_nodes.png?nolink |}}
  
 It is interesting to compare this graph with the one from exercise 2.2. The main difference is in ''Walltime'', which does not scale the same way with the number of CPUs. When all the CPUs were on the same machine, as in the previous exercise, the ''Walltime'' scaling was similar to that for ''CPU Time'' and ''Ideal Performance'', though less steep. When the CPUs are distributed over many machines, however, we see that, even though the ''CPU Time'' scales the same way as previously, and close to ''Ideal Performance'', the ''Walltime'' eventually levels off and remains constant, not decreasing with an increasing number of CPUs. This points to a fundamental limitation of MPI, which stems from the fact that memory is not shared among the CPUs, and data needs to be copied over the network between machines, which limits the scaling. It is interesting to compare this graph with the one from exercise 2.2. The main difference is in ''Walltime'', which does not scale the same way with the number of CPUs. When all the CPUs were on the same machine, as in the previous exercise, the ''Walltime'' scaling was similar to that for ''CPU Time'' and ''Ideal Performance'', though less steep. When the CPUs are distributed over many machines, however, we see that, even though the ''CPU Time'' scales the same way as previously, and close to ''Ideal Performance'', the ''Walltime'' eventually levels off and remains constant, not decreasing with an increasing number of CPUs. This points to a fundamental limitation of MPI, which stems from the fact that memory is not shared among the CPUs, and data needs to be copied over the network between machines, which limits the scaling.
Line 868: Line 877:
 So far, you've been using either multiple CPUs on a single machine (OpenMP), or on multiple machines (MPI), to parallelize and speed up the image processing. In this exercise, you will have a look at how to use GPUs to achieve parallelization. In the folder named ''2.4_gpu/'' you will find a simple jobscript which you need to modify.  So far, you've been using either multiple CPUs on a single machine (OpenMP), or on multiple machines (MPI), to parallelize and speed up the image processing. In this exercise, you will have a look at how to use GPUs to achieve parallelization. In the folder named ''2.4_gpu/'' you will find a simple jobscript which you need to modify. 
  
-We've reserved one GPU node with 2 k40 GPUs for this course. The reservation is called ''advanced_course''. **Make use of it by adding the following line to your script:**+We've reserved one GPU node with 2 V100 GPUs for this course. The reservation is called ''advanced_course''. **Make use of it by adding the following line to your script:**
 <code> <code>
 #SBATCH --reservation=advanced_course #SBATCH --reservation=advanced_course
 </code> </code>
-**Request 1 (one) GPU of type k40 on the ''gpu'' partition**. You can find out how to do this on this wiki and in the slides. +**Request 1 (one) GPU of type V100 on the ''gpu'' partition**. You can find out how to do this on this wiki and in the slides. 
  
 **Compare the times from exercise 2.2 with what you obtain now**. You might want to make the problem a bit harder by processing all the files from the ''images'' folder. **Compare the times from exercise 2.2 with what you obtain now**. You might want to make the problem a bit harder by processing all the files from the ''images'' folder.
Line 878: Line 887:
 Programming the GPU is not for the faint of heart, though OpenACC makes it relatively easy. If you read C code, **study the code and try to figure out where is the GPU used**. If you plan to use an existing application with the GPU, you needn't worry about the implementation. Programming the GPU is not for the faint of heart, though OpenACC makes it relatively easy. If you read C code, **study the code and try to figure out where is the GPU used**. If you plan to use an existing application with the GPU, you needn't worry about the implementation.
  
-<hidden Solution>''#SBATCH --gres=gpu:k40:1'' +<hidden Solution> 
- +<code> 
-''#SBATCH --partition=gpu'' +#SBATCH --gpus-per-node=v100:1 
- +#SBATCH --reservation=advanced_course 
-''#SBATCH --reservation=advanced_course''+</code>
 </hidden> </hidden>