Three basic concepts are common to all forms of digital imaging; computers, detectors and analog to digital conversion. All present digital imaging systems are based on the modern computer. The computer serves as the host mechanism to control the acquisition, storage, processing, retrieval and display of the digital image. A detector is needed to convert the transmitted light of a conventional radiograph or the remnant x-ray beam into an electronic signal. And finally, the electronic signal must be converted from an analog form to a digital form. Because these three concepts are so critical to all digital image, we will review some fundamentals.
Several basic components are required for any computer architecture. Input devices are required to collect information from the outside world. These may be a keyboard or an electronic detector system. Memory is required to store both instructions and data for processing. A central processing unit (CPU) is needed to perform the manipulation of the instructions on the data. It is the CPU which actually performs the operations on the information. Long term storage is required due to the fact that computer memory often called random access memory (RAM) is usually volatile and is lost once power is removed. Storage may be in the form of magnetic floppy or hard disks. Recently, development of optical disk technology has greatly increased the amount of storage space available, while dramatically reducing the cost. Output devices such as printers, plotters and video monitors are used to present the information in a form which can be interpreted by biologic systems like human beings. Finally, the computer "bus" allows for the communication and interaction of all these components into a functioning device.
Most CCD imagers are made of high grade pure silicon. In the crystalline form, each atom of silicon is covalently bonded to its neighbor. Energy greater that the gap enery, about 1.1 V, is required to break a bond and create an electron hole pair. Incident electromagnetic radiation in the form of photons of wavelength shorter than 1 µm can break the bonds and generate electron hole pairs. In order to measure the electronic charge produced by incident photons, it is necessary to provide a means for collecting this charge. A potential well method is used. Figure 3 illustrates the potential well concept. As incoming light or x-ray photons interacts with the silicon, covalent bonds are broken and an electric charge is created in each individual well or pixel. The CCD array can integrate and collect charge over a prescribed period of time with the total charge collected at a individual pixel being proportional to incident light striking the detector. The serial register, shown at the top of Figure 2, is itself a one-dimensional CCD and plays an important role during CCD readout. The CCD is readout by changing potentials across the array. The charge is passed to an adjacent pixel analogous to a bucket brigade. In this way the charge is transferred from pixel to pixel toward the serial register were it is passed to the output amplifier or node. The output amplifier produces a mesureable signal that is proportional to the quantity of the charge in each charge packet. This analog signal is converted to a standard video format. The video signal comprises a series of analog television lines. The format is read out on a line-by-line basis from left to rightm, top to bottom. Additionally, a techniquie known as interlacing is employed. Interlacing refers to the reading of all even-numbered line, top to bottom, followed by all odd lines. Interlacing is used to produce an apparent update of the entire frame in half the time that a full update actually occures. The eye's integration of sequential fields gives the impression that the frame is updated twice as often as it really is. This results in a television monitor image with less apparent flicker. An entire frame is made of of 525 lines. Figure 4 represents the output voltage of a single line of the RS-170 standard video signal from a video camera or CCD detector. As you can see the voltage ranges from the black level to the white level at is proportional to the light striking the detector. The accuracy to which the analog voltage amplitude represents a brightness along the horizontal dictates the brightness resolution contained in the RS-170 video singal. This resolution is equivalent to roughly seven or eight bits when quantized. It is this video signal which is digitized to create the digital image.
Typically, a video camera with an analog electronic output is used to obtain an electronic image of the radiograph. Next an analog to digital or "A to D" converter is used to convert the analog voltage signal from the camera to a numerical representation.The A/D converter measures the voltage at discrete intervals and gives a number corresponding to the intensity. These numbers are then stored in memory called a frame buffer. Figure 5 shows the voltage output on the left and the corresponding digital information on the right. In this way a analog image is converted to a digital form. Two parameters, contrast resolution and spatial resolution, directly effect the image quality of the digital image.
The concept of contrast resolution is concerned with how accurately the digital pixel brightness compares to the brightness at the same location in the original image. This value is dependent upon the output voltage of the video camera and the quantization of the that voltage by the A/D converter. In quantizing the brightness of a pixel we must define the level of accuracy required. For instance, conversion to a two-bit binary number would allow for only four levels of gray. The four levels of brightness comprise what is called a gray scale. Increasing the number of bits representing the brightness expands the gray scale so that the digital image more closely resembles the original image. Image processor equipment manufacturers have generally adopted 8-bit or 256 levels of gray quantization as a standard.
Spatial resolution is determined by the number of pixels which compose the digital image. The higher the number of pixels used to define the image the closer we approach the spatial appearance of the original image. This means that a properly displayed digital image will be identical to the original to an observer. In order to determine the appropriate spatial resolution required, we use the classical Nyquist Criterion. This theory states that to fully represent the rate of brightness change or detail in an original image we must sample it at a rate at least twice as high as the highest spatial frequency of the detail. In practice this theorem is not used because we are limited to the hardware that is available. Most framegrabber hardware provides for 512 x 512 or 640 x 480 spatial resolution. The cost of higher resolution framegrabbers preclude their use in dental radiography. The two numbers define the size of the image matrix, that is to say the number of pixels and the number of lines contained in the digital image.
The level of contrast and spatial resolution greatly affect the amount of computer memory that is required to store or transmit the image. For example, a 4-bit image with a spatial resolution of 512 x 512 requires 131,072 bytes to store compared with 262,144 bytes for an 8-bit image with the same spatial resolution. The contrast and spatial resolution of most systems are defined by the equipment manufacturers and not on the specific image processing application. Research is needed to address the required resolution for relevant dental diagnostic tasks.
A typical imaging system is composed of a video camera, a framegrabber with both A/D and D/A converter, a host computer with optical disk storage and image processor software or hardware and a video monitor. The cost of these systems can range from 5,000 dollars to over a quarter of a million dollars depending upon the sophistication of the hardware.
Image processing are those processes that visually enhance or quantitatively evaluate some aspect of an image not readily apparent in its original forms. Historically, image processing developed as a discipline during the early 1960s. It was at that time that NASA was involved in mapping the lunar surface in preparation for the Apollo space missions. Conventional television transmissions were degraded and distorted due to long transmission distances and interference. In order to solve this problem the analog television signal was converted to a digital form. Once the image was in digital form mathematical algorithms were developed to correct the noise and distortion. Since those early beginnings the use of image processing has expanded to almost every field.
Image processing can be divided into three basic types of operations: analysis - operations that produce numeric information based on an image, enhancement - operations that subjectively or objectively modify the appearance or qualities of an image and encoding - operations that code an image into a new form.
Image analysis operations serve to describe some aspect of the image which is not readily obtained by visual means. The most common analysis operation is the histogram. An image histogram is a graphic representation of how many pixels have a specific gray value. The gray value is plotted along the horizontal axis and the number of pixels with that specific gray value is plotted along the vertical axis (Figure 6). From this analysis, brightness, contrast and dynamic range can be readily obtained. This provides a starting point for determining appropriate enhancement operations that will produce the desired result. Density analysis is the determination of the intensity or gray value at a specific point in the image. This analysis can be used to compared two images taken over a period of time to determine if some change has occurred. Finally, dimensional analysis such as length, width, angle, area or perimeter is greatly facilitated by having the image in digital format.
Image enhancement operations are designed to improve the perceptibility of some feature within the image. The most common enhancement operations used in dentistry are: contrast manipulations, spatial filtering, subtraction and pseudo-color. Contrast manipulations can increase or decrease the contrast as well as alter the brightness of an image. To alter the original image a pixel mapping function is used. The pixel at coordinates I(x,y) in the input image is modified and returned to the output image at coordinate O(x,y). The general equation for this process is:
O(x,y) = M[I(x,y)] where M is the mapping function.
This function can be implemented using a Look Up Table or LUT. This is a mapping function in which all pixels having a specific gray value are changed to a new gray value. The LUT can be graphically represented by plotting the input values (ranging from 0 to 255) on the horizontal axis and the new output gray values (ranging from 0 to 255) on the vertical axis. Figure 7 show a linear LUT in which no change has been made to the input image. Figure 8 shows an inverse or "negative" image, notice that 0 in the input image has been mapped to 255 and 255 has been mapped to 0. The resulting image is a "photographic negative" of the original image. By manipulating the slope of the line in the LUT the contrast can be increased or decreased. The brightness can be modified by changing the "y" intercept. Non-linear functions can also be implemented for photometric correction of sensor nonlinearities using a look-up table.
Spatial filtering is the accentuation or attenuation of selected frequencies within the image. An image is composed of many frequency subcomponents, ranging from high to low. Where rapid brightness transitions are prevalent, the spatial frequency is high. Slow transition represent low frequency. In order to select which frequencies to pass a convolution operation is performed. A convolution is a mathematical method whereby a pixel's gray value is determined by the pixels surrounding it. For every pixel in the input image, we calculate a value for the output image pixel by calculating a weighted average of it and its surrounding neighbors. The average is formed from a group of pixels, called a kernel, around and including, the center pixel being processed. The dimensions are that of a square. The kernel may have the dimension of 3 x 3, 5 x 5, 7 x 7, and so on. A wieghted average, is formed by attaching a multiplicative weighting factor to each term in the average. By altering these weighting factors, or convolution coefficients, certain pixels will have more or less influence on the overall average. The mechanics of spatial convolution are fairly straightforward(Figure 9). In carrying out a 3 x 3 kernel convolution, nine weighting coefficients (A thru I) are defined in an array called the convolution mask. The center pixel (#5) and its eight neighbors (#1 thru #9) are multiplied by their respective weighting coefficients and summed. The result is placed in the output image at the same center pixel location (#5*). Every pixel in the image is evaluated with its eight neighbors, using this mask to produce a resultant pixel value to be placed in the output image. Each pixel processed requires nine multiplications and nine additions, a full image convolution may require on the order of a million mathematical operations.
The high pass filter accentuates high-frequency spatial components while leaving low-frequency components untouched. A common high pass mask is comprised of a 9 in the center location with -1s in the surrounging locations:
-1 -1 -1
-1 9 -1
-1 -1 -1
Note that the coefficients add to 1, and that smaller coefficients surround the large positive center coefficient. The overall visual effect is one a sharpening detail in the image. The low pass filter, has basically to opposite effect of the high pass filter. It attenuates the high-frequency spatial components and passes the low-frequency components of the image. A common low pass convolution mask is comprised of all nine coefficients having the value of 1/9:
1/9 1/9 1/9
1/9 1/9 1/9
1/9 1/9 1/9
Notice that the sum of all the coefficients is equal to 1 and they are all positive numbers. This is true of all low pass filter masks. The overall visual effect is one of blurring the image.
Enhancement of edges in an image is a common operation in image processing. The operation typically reduces an image to display only its edge information. The gradient operation forms a directional edge enhancement. By using a 3 x 3 kernel, eight gradient images may be generated from an original. Each highlights edges oriented in one of the eight compass direction: N, NE, E, SE, S, SW, W, and NW. The mask oriented for the East direction is given as:
-1 1 1
-1 -2 1
-1 1 1
The Laplacian operation is an omnidirectional edge enhancement, highlighting all edges regardless of their orientation. The common Laplacian mask is comprised of an 8 in the center locations with -1s in the surrounding locations.
-1 -1 -1
-1 8 -1
-1 -1 -1
In the human visual system, the eye-brain system applies a Laplacian-like enhancement to everything we view. Another type of filter which is not based on a weighted average of the nine pixels in the neighborhood, but uses a 3 x 3 kernel is the median filter. Instead, the output of the median filter is the median value of the nine pixels in the kernel. The nine gray values from the 3 x 3 kernel placed in ascending numerical order. The median value is that value such that four values are less than or equal to and four greater than or equal to the center value. The center value becomes the output pixel. An example of an 3 x 3 median filter operation would be, given a 3 x 3 group of pixels:
8 18 4
12 52 18
12 8 18
The ascending order would be: 4, 8, 8, 12, 12, 18, 18, 52. The median value would be 12. Therefore the original 52 in the input image would be replaced by 12 in the output image. This type of filter is excellent in removing any type of random noise.
Another image enhancement used to remove structural noise is image subtraction. Digital image subtraction is a methodology for reducing the structured noise of normal anatomic detail and therefore increase the signal to noise ratio. By increasing the signal to noise ratio the pathology should be made more evident to the human observer. Figure 10 illustrates the basic principles of digital image subtraction. Digital image subtraction has been applied to almost every disease process which affects dental hard tissues. The dramatic improvement in diagnostic performance has been clearly demonstrated by numerous researchers.
A prerequisite for digital subtraction radiography is that the projections are identical or almost identical at the different examinations. Other prerequisites for subtraction are the ability to properly align the two images, this is refered to a registration, and the ability to correct for variations associated with exposure and processing that would obscure the changes in radiographic density associated with the pathology. These prerequisite has limited the direct clinical application of these technique. Recent developments in reproducible radiographic techniques and gamma correction algorithms may provide the answers to these limitation. The application of longitudinal radiographic assessment techniques such a subtraction radiography may hold the greatest promise for improving diagnostic accuracy.
The human visual system is much more tuned to the recognition of color than gray scale image features. While we can detect no more than 28 - 32 individual shades of gray we are capable of detecting thousands of shades of different colors. Because of this characteristic of the human visual system, gray scale images can be represented by pseudo-colors. That is to say each gray value from 0 - 255 can be arbitrarily assigned a specific color. This operation can be performed using a color look-up table or CLUT similar to a gray scale LUT. The major limitation to the use of pseudo-color is that we have no reference as to want the color means in these images. A standard color scale must be defined in order to take advantage of our color visual system.
Image encoding are those operations which server to reduce the amount of information necessary to describe an image. Coding images into reduced forms allows either more image data to be transmitted in a given period of time or more to be stored in a given segment of a storage device. Two basic types of encoding schemes exist. Those in which no information is lost called lossless algorithms and those in which information is lost called lossy algorithms. Using lossless algorithms the reconstruction of an image from the coded image is identical to the original. Two examples of lossless techniques are Huffman coding and run-length coding. These provide approximately a 2:1 compression ratio. Lossy techniques do not produce an image identical to the original, some degradation to the original occurs. The degree to which this affects the use of the image is entirely task dependent. Some examples of lossy techniques are discrete cosine transform (DCT), vector quantization, and fractal formating. These may provide for compression ratios ranging from 3:1 to as high as 250:1. It is clear that standards of "visually lossless" encoding must be developed to determine which encoding techniques are appropriate for dental images. Much research is needed in this area considering the large amounts of data that will be required to store these images.
With the development of new digital imaging modalities, the future of maxillofacial imaging never looked brighter. New possibilities for research and clinical diagnoses are just beginning to be realized. Digital imaging in dentistry may have the most profound effect on the practice and treatment of dental diseases since the development of the roentgen ray itself.

ddsweb@uthscsa.edu