jCUDA 1.1 released

jCUDA version 1.1 is released to the public. This version adds many improvments to the previous 1.0.1 release.

Additions:

  • – Adding object oriented support for CUDA, OpenGL and CUFFT functionality
  • – Splitting FFT and CUDA native libraries to operate as standalone
  • – Extending native interface to provide more functionality (NativeUtiles.getPointerSize method)

You may download the new release from: jCUDA.

Using Hoopoe File System (HoopoeFS)

The Hoopoe File System service reference can be found at: http://www.hoopoe-cloud.com/HoopoeFS.asmx

This post will present the File System interface to Hoopoe distributing engine. The File System (FS) interface can be used by users to transfer data files to be processed by Hoopoe with CUDA computing kernels. After processing completes, the same interface can be used to read back computed results.

Topics

  • Features
  • General terms
  • API description
  • API examples
    • Creating new instance
    • Authenticating
    • Creating a file
    • Creating a directory
    • Creating a file under a sub-directory
    • Deleting a file
    • Writing data into files
    • Reading data from files

1. Features

HoopoeFS exposes a simple interface for data and file management. In general, most features available by general OS file systems are given by HoopoeFS service, providing high flexibility for users.

Taking security into consideration, every users is provided with a completely isolated environment, so no special security functions should be used or exposed, as every users sees, and able to access only the files he generated or uploaded.

As a simplified manner, the API provided by HoopoeFS is generic, but there are few limitations to user operations and capabilities. For example, a user is allowed to place files in the root directory or under sub directories. A user is also allowed to create only one level of sub directories being able to contain additional files.

2. General terms

As previously mentioned, HoopoeFS provides all general constructs for working with files and directories.

A file, is simply a container for data, either raw or compressed and can be named using every supported character.

A directory, is a container for files, and provided is the root directory, and further sub directories that can be created by the user.

3. API description

For data constructs (File, Directory), are provided management functions as follows:

  • CreateFile/CreateDirectory – allows to create a new file or directory, respectively. Calling these functions is required as a first step prior to accessing them.
  • DeleteFile/DeleteDirectory – deletes a previously created file or directory.
  • RenameFile/RenameDirectory – given existing files or directories, allows to modify their name.
  • WriteData/ReadData – modifies the content of a file (write) or reads content from a specific file.
  • GetFileSize – returns the number of bytes in a file.

For general operation, few more functions are given:

  • Authenticate – returns a value indicating whether the user is registered and recognized by HoopoeFS
  • IsUserOverQuota – returns a value indicating whether the user has exceeded the allowed storage space. In such case the user cannot create new files or directories, but can delete and read contents of files.

4. API examples

4.1 Creating new instance

In order to work with HoopoeFS, it is necessary to create a new instance of HoopoeFS class:
HoopoeFS hps = new HoopoeFS();

4.2 Authenticating

It is a good practice to check with HoopoeFS if we are authenticated, before performing futher operations. Every operation to be performed must use this authentication level:

Authentication a = new Authentication();
a.User = test@company_alias;
a.Password = "my_password"
hfs.AuthenticationValue = a;

4.3 Creating a file 

Creating a file is a simple task with HoopoeFS API:

hfs.CreateFile("test.dat");

4.4 Creating a directory

Following the previous example, a similar API can be used to create a new sub-directory (all directories are created under the root):

hfs.CreateDirectory("test_data");

4.5 Creating a file under a sub-directory

Once created a sub directory, any number of files can be created under it.

To do that, the following operations are necessary:

Directory d = new Directory();
d.Name = "test_data";
hfs.DirectoryValue = d;
hfs.CreateFile("child_temp.dat");

You may note, that once hfs.DirectoryValue is set, all file related operations correspond to the directory (creating new files, deleting, modifying etc.), so when working with the directory ends, hfs.DirectoryValue should be set to null.

4.6 Deleting a file

A very straight forward operation:

hfs.DeleteFile("test.dat");

4.7 Writing data into files

The concept of writing data to files within Hoopoe maps to the real world, with a simplified API.

byte[] data = new byte[512*1024];
// Load/generate data
....

// Write the data, starting at offset 0 of the file
long offset = 0;
hfs.WriteFile("temp.dat", data, offset);

// If willing to write more data, then consider a
// new offset
offset += data.Length;

4.8 Reading data from files

The same rules for writing data apply to reading it from files.

// Read the data, starting at offset 0 of the file
long offset = 0;
// Determines the amount of bytes to read
int length = 512*1024;
byte[] data = hfs.ReadFile("temp.dat", offset, length);

// Past this point, data will contain the bytes
// that were read.
// In case fewer bytes than requested were read,
// the size of data will be consistent with the actual
// bytes read.

// If willing to read more data, then consider a
// new offset
offset += data.Length;

jCUDA 1.0.1 released

We are pleased to annouce the availability of jCUDA version 1.0.1 for the public.

New in this version:

  • Support for Windows operatin system (XP/Vista) in 32/64 bit
  • Fixing issues with native layer

You may download it from: jCUDA.

Announcing jCUDA

Hello to everyone,

We are glad to introduce jCUDA, a Java library that allows to exploit the full power of CUDA and NVIDIA GPUs.

The library is supported under both Linux, Windows, MacOSX and Solaris 10, with 32 and 64 bit versions.

Currently, functionality is added for CUDA API and CUFFT routines. Future versions will add support for CUBLAS as well.

A link jCUDA.

Will be released soon for the public usage.

CUDA.NET 2.1 Released

We are pleased to announce that a new version of CUDA.NET is out, following the release of CUDA 2.1.

The new release of CUDA.NET, provides support for new DirectX 10 API interoperability, and JIT compiler.

To download CUDA.NET.

DirectX 10 interoperability

The new API by NVIDIA allows to integrate existing DirectX 10 applications with CUDA, to provide another level of computing, if for post-processing, image processing or other computations to perform.

DirectX 9 API is still supported.

JIT Compiler

A new compiler support is provided by NVIDIA, through the API. This allows to generate CUDA kernel code in runtime and compile it on demand using this new facility.

In addition, it allows to attach kernel source code to an application, and compile it at the site, using specific configuration: maximum register usage, specific hardware support and more.

FIXes

This release of CUDA.NET 2.1, fixes an issue with CUDAExecution class. When running a computation on the GPU using the class, an then calling the Clear method, didn’t clear the parameters state. As with this release the issue was fixed.

Security in Hoopoe

Security in cloud systems is always a major part of the system, and requires a great effort to deal with and develop.
It usually starts when users are given access to actual machines, so they can run applications using the operating system, whether it is Windows or Linux based.

Security models in Hoopoe

Hoopoe provides several features to overcome this problem.

Isolated user environment

Hoopoe provides each user with a unique, isolated, environment. This way, only the user can access its files and computations, using the specific mechanism provided by for file management and related operations.

Hiding the “metal”

Hoopoe hides the “metal” from the user, providing access only through a web service interface to communicate with the system.
Thus, the user is limited with the flexibility of the code it can run.
There is no direct access to machines, so the user is able to submit his task to Hoopoe for further processing of the system. After the submission point, the user waits for the task to finish, and copy the results back.

Independent data management

User data is managed by Hoopoe as files, either raw or compressed (using GZip).
A buffer is then read in a fully managed (.NET) environment, thus reducing the risk for malformed or “bad” files.

Running computations

Hoopoe is meant to run computations, and not serve as an operating system. By such, user tasks are compiled on demand for the platform it should be processed on (if 32/64 bit, or specific hardware support).

Computations are running on the GPU itself, and this is where the interaction with the GPU ends. Copying the relevant data, performing the computations and placing the results back in the appropriate buffer.

Using CUDA FFT from FORTRAN

In this post we will try to demonstrate how to call CUDA FFT routines (CUFFT) from a FORTRAN application, using the native CUDA interface and our bindings.

CUFFT usage

CUFFT library by NVIDIA, follows FFTW library manners to run FFTs.
For example, executing a 2D FFT over a 256×256 data set involves the following steps.

General GPU steps:

  1. Select the GPU device to work with
  2. Allocate enough device memory to store data
  3. Transfer input data to device

FFT steps:

  1. Create FFT plan with specific dimensions
  2. Execute FFT on device with input and output parameters
  3. Destroy FFT plan

After computing steps:

  1. Copy results back to CPU memory (RAM)
  2. Release device memory

Let’s code

General GPU steps

To select the device we want to work with we can take two possible ways. One is to use the driver interface, and the 2nd is to use the runtime interface.

Selecting a device with CUDA driver is a bit more complicated but adds more levels of flexibility.


# Initialize CUDA, default flags
call cuInit(0)
# Get a reference to the 1st device in the system
# recognized by CUDA
call cuDeviceGet(idev, 0)
# Now, create a new context a bind it to the
# device we got before
call cuCtxCreate(ictx, 0, idev)

This code fragment is relevant to clause 1 of general GPU steps, as we actually selected the device to work with, to be the 1st in the system.

Allocating device memory can be done using cuMemAlloc function of CUDA.
For example:


# Allocate memory for array of nx * ny with real
# complex elements
call cuMemAlloc(iptr, inx * iny * 4 * 2)

This one, maps to step 2 of general GPU steps.

To copy memory from CPU to GPU, or device, we need to issue cuMemcpyHtoD meaning Host->Device copy.


# Assume that data was defined as COMPLEX data(inx, iny)
call cuMemcpyHtoD(iptr, data, inx*iny * 4 * 2)

This maps to step 3 of general GPU steps.

By that we have finished to prepare the data on the GPU and we are ready to run the FFT routine.

FFT steps

Using CUFFT library is relatively easy using the following example.


# Here we create the FFT plan, note that dimensions
# of the FFT are specified in this stage so this plan
# can be reused later.
# The last parameter denotes the type of FFT to perform:
# Real->Complex, Complex->Real or Complex->Complex,
# The value 0x29 represents Complex->Complex, while
# it is possible to create a constant for this purpose.
call cufftPlan2d(iplan, inx, iny, 0x29)

This maps to step 1 of FFT steps, to create an FFT plan.

When we have the plan we can simply execute our requested FFT and get back results


# Execute the FFT according to our plan. Specifying
# iptr for input & output means in place FFT.
# It is possible to store the results in a different buffer.
# The value -1, denotes the direction of FFT, where
# -1 is forward and 1 is inverse.
call cufftExecC2C(iplan, iptr, iptr, -1)

This maps to step 2 of FFT steps.

After we managed to execute our FFT and finished working with it, it is now time to release the resources consumed by the FFT library.


# Destroy the FFT plan
call cufftDestroy(iplan)

Here we completed our FFT steps.

After computing steps:

Computations using the GPU are now over, we can copy the results back to CPU memory for further computations.


# Use the Device->Host function to copy the
# computed data from GPU to CPU.
call cuMemcpyDtoH(data, iptr, inx*iny * 4 * 2)

This maps to step 1 of after computing steps. After this copy command, data computed by the GPU will be available in “data” array variable.

Now we shall release GPU resources used during our computation


# Free the GPU memory we allocated previously
call cuMemFree(iptr)
# Unbind the CUDA context, this step happens in any case
# when the process exits, but it's a good habit
# to follow that
call cuCtxDestroy(ictx)

This is it, our entire code is over, and we used the GPU to compute FFT.

Final words

This example showed the usage of FFT computations using the GPU with CUDA framework by NVIDIA. FFT is a very important tool for many applications and scientific computations. The GPU can significantly improve performance with FFT computations, by many factors compared to the CPU.

Compiling

If using gfortran, g77, g95 or ifort under Linux, to compile the above code in FORTRAN simple issue the command:


gfortran fft.f cuda.o cufft.o -lcufft -lcuda

Where gfortran can be replaced by any of your favoured compiler. Libraries libcufft.so and libcuda.so come as part of NVIDIA CUDA Toolkit release and driver, so they are present on a machine having them installed. Files cuda.o and cufft.o contain the bridge code needed for FORTRAN to C communication.

Annoucing Hoopoe – Cloud Services for GPU Computing

We are happy to introduce to you “Hoopoe”, a cloud solution for GPU computing.

You may have all expected it to be available sometime, and indeed it is.

Hoopoe provides a web service interface to communicate with. In the near future it will also provide machine level access to run specific applications like with regular CPU based clouds.

Partial feature list of the system:

  • CUDA Support
  • Executing CUDA kernels, FFT and BLAS routines
  • OpenCL Support
  • Executing OpenCL kernels
  • Fully secure – Check out

Take a further look at: Hoopoe™. The system will be open for alpha testing very soon so you are invited to register.

Next CUDA.NET release (2.1)

CUDA 2.1 is already in a beta version and expected to add some of the more interesting features up to date.

One first feature is the newly added support for DirectX 10 API interoperability with through CUDA.

Another interesting features is by allowing to compile (JIT – Just In Time) CUDA PTX code to match a specific GPU architecture. This feature allows to generate CUDA kernels on demand and compile them directly to the GPU architecture available in the running system. Just like using a compiler.

CUDA.NET 2.1 is expected to be released on early February, adding support for all new features of CUDA 2.1 and extended functionality as we add it incrementaly

CASS Blogs

Welcome to our new blog system.

We are intending to publish in the blogs material and resources related to programmers and other users who are interested in HPC or GPU Computing techonologies.

The blogs will be used by us on regular basis to share and discuss issues presented by users of our products, such as: CUDA.NET, OpenCL.NET, FORTRAN and more.

We hope you will find it useful!