Parallel use of Py-SPHViewer 2.0

Parallel SPH Rendering Tutorial with `ipyparallel`

This tutorial will extend the previous one by introducing parallel computing using the ipyparallel library to render large cosmological datasets more efficiently. We will split the computational load across multiple processes (or cores), using MPI engines to distribute the work across different nodes or processors. This allows the rendering to be done in parallel, drastically speeding up the computation.

Note: For issues setting up the cluster environment, consider checking the official ipyparallel documentation.

Prerequisites

Before starting, make sure you have the following installed:

ipyparallel (for parallel computing)
h5py (for reading HDF5 files)
sphviewer2 (for SPH rendering)
mpi4py (for MPI-based parallel computing)

Install them via pip if needed:

pip install ipyparallel h5py mpi4py sphviewer2

Step-by-Step Code Walkthrough for Parallel Rendering

1. Setup the Cluster

import ipyparallel as ipp

# Start a cluster with 8 engines using MPI for communication
cluster = ipp.Cluster(n=8, engines='mpi')

# Await cluster startup
await cluster.start_cluster(n=8)

# Connect the client to the cluster
rc = cluster.connect_client_sync()

Here, we initialize a cluster using 8 MPI engines. This will allow the parallel execution of tasks across 8 processes (or more if you specify). The ipyparallel library will manage distributing tasks to different workers (engines).

ipp.Cluster: Initializes a cluster object with a specified number of processes (n=8), using MPI engines for parallel computing. MPI is often used for parallel tasks across distributed systems, making it a good fit for high-performance tasks like rendering.
await cluster.start_cluster(n=8): Starts the cluster with 8 parallel processes.
rc = cluster.connect_client_sync(): This connects the client to the cluster, allowing for task submission.

2. Defining the `make_image` Function

def make_image(xx, yy, mm, hh, xmin, xmax, BoxSize, LevelMin, LevelMax):
    from sphviewer2 import sphviewer2
    sph = sphviewer2()
    image = sph.render(xx, yy, mm, hh, xmin, xmax, BoxSize, LevelMin, LevelMax)
    return image

make_image: This function performs the same SPH rendering task as before. It takes in subsets of particle data (positions, masses, smoothing lengths) and returns an image for a subset of particles.

Since the rendering will be performed in parallel, each worker will handle a subset of the full dataset, process it, and then return an image corresponding to that subset.

3. Splitting the Data and Dispatching Parallel Tasks

nproc = 8  # Number of processes for parallel rendering

async_results = [] 
LevelMin = 8
LevelMax = 11

for i in range(nproc):
    # Dispatch the parallel task to one of the engines
    async_result = rc[i].apply_async(
        make_image, 
        pgas[i::nproc,0], pgas[i::nproc,1], 
        mgas[i::nproc], hgas[i::nproc], 0.0, 0.0, 
        BoxSize, LevelMin, LevelMax)
    
    # Store the result
    async_results.append(async_result)

In this part of the code, we:

Split the data: We distribute the particle data across 8 processes. Each process gets every 8th particle, so pgas[i::nproc] picks out every 8th element starting from index i.
Dispatch tasks: Using rc[i].apply_async(), we submit the rendering task to one of the available engines. Each engine receives a subset of data to process, and the rendering is done in parallel.
Collect results: The async_results list stores the result of each parallel rendering task. These results are retrieved later.

4. Combining the Results

final_image = np.zeros([1<<LevelMax, 1<<LevelMax])  # Initialize an empty image of size 2^(2*LevelMax)

# Collect the results from the engines and sum the images
for i in range(nproc):
    final_image += async_results[i].get()

# Display the final combined image
plt.imshow(np.log10(final_image))

final_image: We initialize an empty 2048x2048 image (1<<11 is a bit-shift operation equivalent to 2048).
async_results[i].get(): We retrieve the results from each engine using the .get() method, which waits for the task to complete and returns the image generated by that engine.
Summing the images: The images from each process are summed to generate the final density field. This is done because each process renders a portion of the particles, so combining them gives the full picture.
Display the result: Finally, the combined image is displayed using imshow with a logarithmic scale to better visualize the density contrast.

Full Parallel Code Example

Here is the complete code for rendering the image in parallel. As before, you can download a simulation file from the CAMELS project:

https://users.flatironinstitute.org/~camels/Sims/IllustrisTNG/CV/CV_0/snapshot_090.hdf5

import ipyparallel as ipp
import h5py
import numpy as np
import matplotlib.pyplot as plt

# Initialize and start the cluster with MPI engines
cluster = ipp.Cluster(n=4, engines='mpi')
await cluster.start_cluster(n=8)
rc = cluster.connect_client_sync()

# Ensure that all 8 engines are ready
rc.wait_for_engines(n=8)

# Load the simulation snapshot (HDF5 file)
snap = h5py.File("path/to/snapshot_090.hdf5", 'r')
pgas = snap['PartType0/Coordinates'][()]
rho  = snap['PartType0/Density'][()] * 1e10
mgas = snap['PartType0/Masses'][()] * 1e10
hgas = (32 * mgas / (4./3.*np.pi * rho))**(1./3.)
BoxSize = snap['Header'].attrs['BoxSize']

# Define the SPH rendering function
def make_image(xx, yy, mm, hh, xmin, xmax, BoxSize, LevelMin, LevelMax, k=None):
    from sphviewer2 import sphviewer2
    sph = sphviewer2()
    image = sph.render(
        xx, yy, mm, 
        hh, xmin, xmax, 
        BoxSize, LevelMin, LevelMax, k
        )
    return image

nproc = 8  # Number of processes for parallel rendering

async_results = [] 
LevelMin = 8
LevelMax = 11

for i in range(nproc):
    # Dispatch the parallel task to one of the engines
    async_result = rc[i].apply_async(
        make_image, 
        pgas[i::nproc,0], pgas[i::nproc,1], 
        mgas[i::nproc], hgas[i::nproc], 0.0, 0.0, 
        BoxSize, LevelMin, LevelMax)
    
    # Store the result
    async_results.append(async_result)

# Combine the results
final_image = np.zeros([1<<LevelMax, 1<<LevelMax])  # Initialize an empty image of size 1024x1024

# Collect the results from the engines and sum the images
for i in range(nproc):
    final_image += async_results[i].get()

# Display the final combined image
plt.imshow(np.log10(final_image))
plt.show()

Conclusion

In this tutorial, we extended the single-process SPH rendering example to run in parallel using ipyparallel and MPI engines. The key steps were:

Splitting the data across multiple processes.
Distributing the tasks to parallel workers (engines).
Collecting and combining the results to produce the final image.

Parallel processing is a powerful tool when dealing with large datasets like cosmological simulations, allowing for significant speed improvements when rendering complex fields like SPH gas densities.

By adapting this method, you can scale the computation for larger datasets or more complex simulations using a higher number of processes.

Alejandro Benitez-Llambay