matthew ephraim

Archive for January, 2009

A Simple C# Wrapper for Ghostscript

Tuesday, January 6th, 2009
Update:

This post has become somewhat popular (relative to my other posts anyway), so I decided to take the code and release it as an open source library. More information here

PDF thumbnails with Ghostscript

I’ve been looking for a while now for a simple solution for generating thumbnail images from PDF files. I wanted something that would let me programmatically load in a PDF file, choose a page, and generate a thumbnail from that page. As far as I can tell, there are only a few open source options and of those options I haven’t been able to find one that I could get working with C#.

After seeing it recommended a few times, I decided take a look at Ghostscript. Ghostscript is an open source interpreter for Postscript and PDF files. Among other things, Ghostscript allows you generate images from PDF pages. Which is exactly what I needed.

Ghostscript is a tool that can be used from the command line, which is how most of the examples I’ve found online have used it. Unfortunately, this is what a call to Ghostscript looks like:

gs -q -dQUIET -dPARANOIDSAFER  -dBATCH -dNOPAUSE \          
-dNOPROMPT -dMaxBitmap=500000000 -dFirstPage=1 \
-dAlignToPixels=0 -dGridFitTT=0 -sDEVICE=jpeg \
-dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r100x100 \
-sOutputFile=output.jpg input.pdf

Not pretty. Luckily, I needed to automate the task of creating the thumbnails, so I wouldn’t need to manually generate the parameters to be passed to the command line tool. However, I still felt like there might be a better way to hook into Ghostscript’s functionality. So, I decided to take advantage of the API provided by Ghostscript by writing a simple C# wrapper for the API to use in my current ASP.Net project.

A simple Ghostscript wrapper

The first thing I needed was the Windows version of the Ghostscript DLL, which can be obtained here. Once I included the DLL in my project, I needed to expose the unmanaged API functions to my C# wrapper function.

C#
[DllImport("gsdll32.dll", EntryPoint = "gsapi_new_instance")]
private static extern int CreateAPIInstance(out IntPtr pinstance, 
                                        IntPtr caller_handle);

[DllImport("gsdll32.dll", EntryPoint = "gsapi_init_with_args")]
private static extern int InitAPI(IntPtr instance, int argc, IntPtr argv);

[DllImport("gsdll32.dll", EntryPoint = "gsapi_exit")]
private static extern int ExitAPI(IntPtr instance);

[DllImport("gsdll32.dll", EntryPoint = "gsapi_delete_instance")]
private static extern void DeleteAPIInstance(IntPtr instance);

Above, I complained about the long list of parameters that need to be passed to the Ghostscript command line tool. Those same parameters need to be passed to the API, so the next thing I did was create a function that wrapped up the functionality for building the list of parameters. For simplicity, I left in a lot of default parameters, but the function could be expanded later on to allow more specific parameters.

C#

private string[] GetArgs(string inputPath, string outputPath, 
                         int firstPage, int lastPage, int width, int height)
{
    return new[]
    {
        // Keep gs from writing information to standard output
        "-q",                     
        "-dQUIET",
       
        "-dPARANOIDSAFER", // Run this command in safe mode
        "-dBATCH", // Keep gs from going into interactive mode
        "-dNOPAUSE", // Do not prompt and pause for each page
        "-dNOPROMPT", // Disable prompts for user interaction           
        "-dMaxBitmap=500000000", // Set high for better performance
        
        // Set the starting and ending pages
        String.Format("-dFirstPage={0}", firstPage),
        String.Format("-dLastPage={0}", lastPage),   
        
        // Configure the output anti-aliasing, resolution, etc
        "-dAlignToPixels=0",
        "-dGridFitTT=0",
        "-sDEVICE=jpeg",
        "-dTextAlphaBits=4",
        "-dGraphicsAlphaBits=4",
        String.Format("-r{0}x{1}", width, height),

        // Set the input and output files
        String.Format("-sOutputFile={0}", outputPath),
        inputPath
    };
}

Once I had a way of creating a list of parameters, I could start using the Ghostscript API functions. I created a function called CallAPI that would accept an array of parameters and use them to call the Ghostcript API.

The function I created for building a list of arguments returned an array of strings, but to use the API I needed to convert each of those parameters into a ANSI null terminated byte array (I added the code I used to do this to the bottom of this post). Then I needed to allocate some space in memory for each of those arguments and get pointers to each one of them.

C#
var argStrHandles = new GCHandle[args.Length];
var argPtrs = new IntPtr[args.Length];

// Create a handle for each of the arguments after 
// they've been converted to an ANSI null terminated
// string. Then store the pointers for each of the handles
for (int i = 0; i < args.Length; i++)
{
    argStrHandles[i] = GCHandle.Alloc(StringToAnsi(args[i]), GCHandleType.Pinned);
    argPtrs[i] = argStrHandles[i].AddrOfPinnedObject();
}

// Get a new handle for the array of argument pointers
var argPtrsHandle = GCHandle.Alloc(argPtrs, GCHandleType.Pinned);

Then, to use the newly converted parameters, I needed to create an instance of the Ghostscript API and pass them into the initialization function.

C#
// Get a pointer to an instance of the GhostScript API 
// and run the API with the current arguments
IntPtr gsInstancePtr;
CreateAPIInstance(out gsInstancePtr, IntPtr.Zero);
InitAPI(gsInstancePtr, args.Length, argPtrsHandle.AddrOfPinnedObject());

The call to InitAPI runs Ghostscript and generates any requested files at the output path.

Now the only remaining thing I needed to do was clean up the memory that was allocated for the API. To handle this, I wrote a cleanup function that takes in the items that need to be cleaned up. The API provides some cleanup functions, so I called those in the cleanup function as well.

C#
private void Cleanup(GCHandle[] argStrHandles, GCHandle argPtrsHandle, 
                                       IntPtr gsInstancePtr)
{
    for (int i = 0; i < argStrHandles.Length; i++)
        argStrHandles[i].Free();

    argPtrsHandle.Free();

    ExitAPI(gsInstancePtr);
    DeleteAPIInstance(gsInstancePtr);
}

One last thing I added to the wrapper was a simple function for generating thumbnails from a source PDF file. Technically, I could have just used the CallAPI function to do that, but I wanted to hide the details of working with the API from code outside of the wrapper class.

C#

public void GeneratePageThumbs(string inputPath, string outputPath, 
                              int firstPage, int lastPage, int width, int height)
{
    CallAPI(GetArgs(inputPath, outputPath, firstPage, lastPage, width, height));
}

The GeneratePageThumbs doesn’t do anything other than calling the CallAPI function. However, in the future, I’d like to provide other functions that use the Ghostscript API as well. If anyone has any ideas for improving the code, drop me line.

Update: Here is the code I used to convert the arguments to null terminated byte arrays. There might be a better way to do this in .Net, this is just the quick solution I’m using.

C#
public static byte[] StringToAnsi(string original)
{
       var strBytes = new byte[original.Length + 1];
       for (int i = 0; i < original.Length; i++)
            strBytes[i] = (byte)original[i];
        
        strBytes[original.Length] = 0;
        return strBytes;
}

Update: This code has been open sourced

Treating C# Like A Scripting Language

Friday, January 2nd, 2009

Creating code on the fly

One thing that I like about scripting languages is their ability to dynamically modify code during runtime. Ruby and JavaScript, for example, both give you the ability to load in code directly from a string and execute it as part of your program. While that sort of thing can be dangerous, it also gives you access to some really fun metaprogramming techniques.

While working on a simple DSL for one my ASP.NET sites, I started to wonder if C# had some similar functionality that I could take advantage of. In particular, I wanted to load in C# code from a file and execute the code inside of it. What I found was that C# does indeed have the ability to accomplish this task, albeit in sort of an ugly way.

Using C# to compile C#

Most scripting languages give you a function that allows you directly evaluate a block or raw string of code as soon as it’s encountered. Because C# is a compiled language, it’s a little bit more complicated. The C# code needs to be compiled into an assembly before it can be used. And then classes from the compiled code can be instantiated directly from the assembly.

C# code can be compiled on the fly with an instance of the CSharpCodeProvider class (there’s a similar class for VB.Net as well). Additionally, you can create an instance of the CompilerParameters class, which contains a collection of parameters that will be used when compiling your code. In the example below, I’m creating a new C# compiler and a set of parameters that will tell it not to create an assembly file, but to instead compile the new assembly in memory. I also tell the compiler to include System.dll as a reference assembly.

C#
// Create a new instance of the C# compiler
var compiler = new CSharpCodeProvider();

// Create some parameters for the compiler
var parms    = new System.CodeDom.Compiler.CompilerParameters
{
    GenerateExecutable      = false,
    GenerateInMemory        = true
};
parms.ReferencedAssemblies.Add("System.dll");

Once a C# compiler has been created, you can use it to compile raw source into an assembly. CSharpCodeProvider allows you to compile code from a variety of sources. In the example below, I’m using the CompileAssemblyFromSource method to compile my code directly from an array of strings. CompileAssemblyFromSource will look at the code provided and return an instance of the CompilerResults class.

C#
// Try to compile the string into an assembly
var results = compiler.CompileAssemblyFromSource(parms, new string[]
{@" using System;

    class MyClass
    {
        public void Message(string message)
        {
            Console.Write(message);
        }               
    }"});

One thing to note is that the compilation method will complete regardless of whether or not the code has compiled successfully. To make sure you code has compiled, you need to check the Errors collection that is part of the CompilerResults instance returned by CompileAssemblyFromSource. If there were no errors, the code was compiled successfully and you can begin using the assembly.

Using the compiled code

Once your code is compiled into an assembly, you can use that assembly to create instances of classes from your source code and use reflection to invoke methods and get and set properties of those classes. In the example below, I’m creating an instance of MyClass and storing it as an object. I’m then using reflection to invoke the Message method on the class.

C#
// If there weren't any errors get an instance of "MyClass" and invoke
// the "Message" method on it
if (results.Errors.Count == 0)
{
    var myClass = results.CompiledAssembly.CreateInstance("MyClass");
    myClass.GetType().
            GetMethod("Message").
            Invoke(myClass, new []{ "Hello World!" });
}

It’s not exactly pretty, but it gets the job done. Scripting languages make it much easier to accomplish this sort of task, but it’s still nice to see that it can be done in C#.