Digital Imaging in Dentistry

Overview

Introduction

Digital imaging is the new frontier in dental radiology. It is probable the fastest growing area of our discipline. I believe we are at the beginning of a revolution in oral and maxillofacial radiology which is similar to the revolution that occured in medical radiology in the early 1970's. With the advent of digital technologies such as computed tomography and magnetic resonance imaging medical radiology entered into a renasaince. Today radiology is considered one of the most sought after specialties. The introduction of direct digital imaging techniques in maxillofacial radiology likewise will forever change our field. The new possibilities for improving the diagnostic capabilities of imaging technologies are endless. I hope that these lectures will serve as a basis for future understanding and research into this new frontier.

Three basic concepts are common to all forms of digital imaging; computers, detectors and analog to digital conversion. All present digital imaging systems are based on the modern computer. The computer serves as the host mechanism to control the acquisition, storage, processing, retrieval and display of the digital image. A detector is needed to convert the transmitted light of a conventional radiograph or the remnant x-ray beam into an electronic signal. And finally, the electronic signal must be converted from an analog form to a digital form. Because these three concepts are so critical to all digital image, we will review some fundamentals.

Computers

Conceptually computers can be thought of as having several basic functions. First, they provide for the input and output of data. Second, they provide a mechanism for performing instructions which compose a program in order to act upon the data. Third, they provide for storage and retrieval of that data. Fourth, and most important they provide for doing all these functions with "blinding speed". Approximately 10 million instructions per second can be performed with a personal computer. It is the speed at which a computer performs these tasks that make it so useful. At the most fundamental level computers are composed of nothing more that millions of individual transistors. Transistors are solid-state devices which function like a switch, having an ON and an OFF position or more specifically a high and a low state. By the very nature of this fact a computer is a binary computing machine. All information must be represented by either a zero or a one. A computer uses language analogous to our spoken language. In the English language, we use 26 characters (A thru Z) to represent information in the form of words. In the computer language only two characters (0 and 1) called bits for Binary digIT are used to represent information. While words in the English language may be any number of characters, computer information is managed in specific units also called words, typically 8, 16 or 32 bits in length. This is determined by the type of computer system. A byte is a common unit of information in this language which represent a word that is 8 bits in length. Figure 1 shows the relationship between the number of bits and the decimal number that can be represented in binary form. As you can see a 3-bit number can represent decimal numbers from 0 to 7, whereas an 8-bit number can represent numbers from 0 to 255. Therefore an 8-bit system can represent 256 shades of gray ranging from 0 to 255 in value.

Several basic components are required for any computer architecture. Input devices are required to collect information from the outside world. These may be a keyboard or an electronic detector system. Memory is required to store both instructions and data for processing. A central processing unit (CPU) is needed to perform the manipulation of the instructions on the data. It is the CPU which actually performs the operations on the information. Long term storage is required due to the fact that computer memory often called random access memory (RAM) is usually volatile and is lost once power is removed. Storage may be in the form of magnetic floppy or hard disks. Recently, development of optical disk technology has greatly increased the amount of storage space available, while dramatically reducing the cost. Output devices such as printers, plotters and video monitors are used to present the information in a form which can be interpreted by biologic systems like human beings. Finally, the computer "bus" allows for the communication and interaction of all these components into a functioning device.

Detectors

Image detectors, whether a video camera or intraoral sensor, both depend on solid-state electronic devices. These solid-state detectors can be either linear or area arrays. A linear array requires that the object to be imaged must be scanned. While this has the advantage of excellent scatter rejection many disadvantages also exist such as motion artifacts and ineffecient use of x-ray output. Area or two-dimensional arrays require no scanning and provide for high spatial resolution with virtually no linear distortions and efficient use of x-ray output. The most common type of two-dimensional array detector is the charged-coupled device (CCD). This detector is used in video cameras as well as all of the direct digital intraoral x-ray devices (Figure 2).

Most CCD imagers are made of high grade pure silicon. In the crystalline form, each atom of silicon is covalently bonded to its neighbor. Energy greater that the gap enery, about 1.1 V, is required to break a bond and create an electron hole pair. Incident electromagnetic radiation in the form of photons of wavelength shorter than 1 µm can break the bonds and generate electron hole pairs. In order to measure the electronic charge produced by incident photons, it is necessary to provide a means for collecting this charge. A potential well method is used. Figure 3 illustrates the potential well concept. As incoming light or x-ray photons interacts with the silicon, covalent bonds are broken and an electric charge is created in each individual well or pixel. The CCD array can integrate and collect charge over a prescribed period of time with the total charge collected at a individual pixel being proportional to incident light striking the detector. The serial register, shown at the top of Figure 2, is itself a one-dimensional CCD and plays an important role during CCD readout. The CCD is readout by changing potentials across the array. The charge is passed to an adjacent pixel analogous to a bucket brigade. In this way the charge is transferred from pixel to pixel toward the serial register were it is passed to the output amplifier or node. The output amplifier produces a mesureable signal that is proportional to the quantity of the charge in each charge packet. This analog signal is converted to a standard video format. The video signal comprises a series of analog television lines. The format is read out on a line-by-line basis from left to rightm, top to bottom. Additionally, a techniquie known as interlacing is employed. Interlacing refers to the reading of all even-numbered line, top to bottom, followed by all odd lines. Interlacing is used to produce an apparent update of the entire frame in half the time that a full update actually occures. The eye's integration of sequential fields gives the impression that the frame is updated twice as often as it really is. This results in a television monitor image with less apparent flicker. An entire frame is made of of 525 lines. Figure 4 represents the output voltage of a single line of the RS-170 standard video signal from a video camera or CCD detector. As you can see the voltage ranges from the black level to the white level at is proportional to the light striking the detector. The accuracy to which the analog voltage amplitude represents a brightness along the horizontal dictates the brightness resolution contained in the RS-170 video singal. This resolution is equivalent to roughly seven or eight bits when quantized. It is this video signal which is digitized to create the digital image.

Digitization

A radiograph is composed of shades of gray spanning from black to white, and is known as a "continuous tone" image. This means that the shades of gray blend together with no noticeable interruptions. To convert this image into a discrete "digital" form, the image is chopped into individual pieces of information. This information describes the light intensity (brightness) and its location (x, y coordinates) within the image. The process of "chopping" is referred to as digitizing or sampling. The individual pieces of information are often called picture elements, or pixels, because of its representation of a discrete element of the digital image.

Typically, a video camera with an analog electronic output is used to obtain an electronic image of the radiograph. Next an analog to digital or "A to D" converter is used to convert the analog voltage signal from the camera to a numerical representation.The A/D converter measures the voltage at discrete intervals and gives a number corresponding to the intensity. These numbers are then stored in memory called a frame buffer. Figure 5 shows the voltage output on the left and the corresponding digital information on the right. In this way a analog image is converted to a digital form. Two parameters, contrast resolution and spatial resolution, directly effect the image quality of the digital image.

The concept of contrast resolution is concerned with how accurately the digital pixel brightness compares to the brightness at the same location in the original image. This value is dependent upon the output voltage of the video camera and the quantization of the that voltage by the A/D converter. In quantizing the brightness of a pixel we must define the level of accuracy required. For instance, conversion to a two-bit binary number would allow for only four levels of gray. The four levels of brightness comprise what is called a gray scale. Increasing the number of bits representing the brightness expands the gray scale so that the digital image more closely resembles the original image. Image processor equipment manufacturers have generally adopted 8-bit or 256 levels of gray quantization as a standard.

Spatial resolution is determined by the number of pixels which compose the digital image. The higher the number of pixels used to define the image the closer we approach the spatial appearance of the original image. This means that a properly displayed digital image will be identical to the original to an observer. In order to determine the appropriate spatial resolution required, we use the classical Nyquist Criterion. This theory states that to fully represent the rate of brightness change or detail in an original image we must sample it at a rate at least twice as high as the highest spatial frequency of the detail. In practice this theorem is not used because we are limited to the hardware that is available. Most framegrabber hardware provides for 512 x 512 or 640 x 480 spatial resolution. The cost of higher resolution framegrabbers preclude their use in dental radiography. The two numbers define the size of the image matrix, that is to say the number of pixels and the number of lines contained in the digital image.

The level of contrast and spatial resolution greatly affect the amount of computer memory that is required to store or transmit the image. For example, a 4-bit image with a spatial resolution of 512 x 512 requires 131,072 bytes to store compared with 262,144 bytes for an 8-bit image with the same spatial resolution. The contrast and spatial resolution of most systems are defined by the equipment manufacturers and not on the specific image processing application. Research is needed to address the required resolution for relevant dental diagnostic tasks.

A typical imaging system is composed of a video camera, a framegrabber with both A/D and D/A converter, a host computer with optical disk storage and image processor software or hardware and a video monitor. The cost of these systems can range from 5,000 dollars to over a quarter of a million dollars depending upon the sophistication of the hardware.

Image Processing

In order to better understand the field of image processing a few basics about characteristics of an image will be discussed. An image can be characterized by three different features: contrast, spatial frequency, and noise. Contrast can be defined as the difference in brightness between two regions in a image. Spatial frequency can be defined as a measure of the relative rate of change of brightness from one point in the image to another. Every image contains scene detail in varying degrees. A MR image contains very high spatial frequencies associated with the detail representing the brain, whereas the area representing the background has very low spatial frequencies. Finally, noise can be defined as any distracting information in the image that does not contribute to the diagnostic usefulness of the image. These image characteristics are fundamental to the field of image processing.

Image processing are those processes that visually enhance or quantitatively evaluate some aspect of an image not readily apparent in its original forms. Historically, image processing developed as a discipline during the early 1960s. It was at that time that NASA was involved in mapping the lunar surface in preparation for the Apollo space missions. Conventional television transmissions were degraded and distorted due to long transmission distances and interference. In order to solve this problem the analog television signal was converted to a digital form. Once the image was in digital form mathematical algorithms were developed to correct the noise and distortion. Since those early beginnings the use of image processing has expanded to almost every field.

Image processing can be divided into three basic types of operations: analysis - operations that produce numeric information based on an image, enhancement - operations that subjectively or objectively modify the appearance or qualities of an image and encoding - operations that code an image into a new form.

Image analysis operations serve to describe some aspect of the image which is not readily obtained by visual means. The most common analysis operation is the histogram. An image histogram is a graphic representation of how many pixels have a specific gray value. The gray value is plotted along the horizontal axis and the number of pixels with that specific gray value is plotted along the vertical axis (Figure 6). From this analysis, brightness, contrast and dynamic range can be readily obtained. This provides a starting point for determining appropriate enhancement operations that will produce the desired result. Density analysis is the determination of the intensity or gray value at a specific point in the image. This analysis can be used to compared two images taken over a period of time to determine if some change has occurred. Finally, dimensional analysis such as length, width, angle, area or perimeter is greatly facilitated by having the image in digital format.

Image enhancement operations are designed to improve the perceptibility of some feature within the image. The most common enhancement operations used in dentistry are: contrast manipulations, spatial filtering, subtraction and pseudo-color. Contrast manipulations can increase or decrease the contrast as well as alter the brightness of an image. To alter the original image a pixel mapping function is used. The pixel at coordinates I(x,y) in the input image is modified and returned to the output image at coordinate O(x,y). The general equation for this process is:

O(x,y) = M[I(x,y)] where M is the mapping function.

This function can be implemented using a Look Up Table or LUT. This is a mapping function in which all pixels having a specific gray value are changed to a new gray value. The LUT can be graphically represented by plotting the input values (ranging from 0 to 255) on the horizontal axis and the new output gray values (ranging from 0 to 255) on the vertical axis. Figure 7 show a linear LUT in which no change has been made to the input image. Figure 8 shows an inverse or "negative" image, notice that 0 in the input image has been mapped to 255 and 255 has been mapped to 0. The resulting image is a "photographic negative" of the original image. By manipulating the slope of the line in the LUT the contrast can be increased or decreased. The brightness can be modified by changing the "y" intercept. Non-linear functions can also be implemented for photometric correction of sensor nonlinearities using a look-up table.

Spatial filtering is the accentuation or attenuation of selected frequencies within the image. An image is composed of many frequency subcomponents, ranging from high to low. Where rapid brightness transitions are prevalent, the spatial frequency is high. Slow transition represent low frequency. In order to select which frequencies to pass a convolution operation is performed. A convolution is a mathematical method whereby a pixel's gray value is determined by the pixels surrounding it. For every pixel in the input image, we calculate a value for the output image pixel by calculating a weighted average of it and its surrounding neighbors. The average is formed from a group of pixels, called a kernel, around and including, the center pixel being processed. The dimensions are that of a square. The kernel may have the dimension of 3 x 3, 5 x 5, 7 x 7, and so on. A wieghted average, is formed by attaching a multiplicative weighting factor to each term in the average. By altering these weighting factors, or convolution coefficients, certain pixels will have more or less influence on the overall average. The mechanics of spatial convolution are fairly straightforward(Figure 9). In carrying out a 3 x 3 kernel convolution, nine weighting coefficients (A thru I) are defined in an array called the convolution mask. The center pixel (#5) and its eight neighbors (#1 thru #9) are multiplied by their respective weighting coefficients and summed. The result is placed in the output image at the same center pixel location (#5*). Every pixel in the image is evaluated with its eight neighbors, using this mask to produce a resultant pixel value to be placed in the output image. Each pixel processed requires nine multiplications and nine additions, a full image convolution may require on the order of a million mathematical operations.

The high pass filter accentuates high-frequency spatial components while leaving low-frequency components untouched. A common high pass mask is comprised of a 9 in the center location with -1s in the surrounging locations:

-1 -1 -1
-1 9 -1
-1 -1 -1
Note that the coefficients add to 1, and that smaller coefficients surround the large positive center coefficient. The overall visual effect is one a sharpening detail in the image. The low pass filter, has basically to opposite effect of the high pass filter. It attenuates the high-frequency spatial components and passes the low-frequency components of the image. A common low pass convolution mask is comprised of all nine coefficients having the value of 1/9:

1/9 1/9 1/9
1/9 1/9 1/9
1/9 1/9 1/9

Notice that the sum of all the coefficients is equal to 1 and they are all positive numbers. This is true of all low pass filter masks. The overall visual effect is one of blurring the image.

Enhancement of edges in an image is a common operation in image processing. The operation typically reduces an image to display only its edge information. The gradient operation forms a directional edge enhancement. By using a 3 x 3 kernel, eight gradient images may be generated from an original. Each highlights edges oriented in one of the eight compass direction: N, NE, E, SE, S, SW, W, and NW. The mask oriented for the East direction is given as:

-1 1 1
-1 -2 1
-1 1 1

The Laplacian operation is an omnidirectional edge enhancement, highlighting all edges regardless of their orientation. The common Laplacian mask is comprised of an 8 in the center locations with -1s in the surrounding locations.

-1 -1 -1
-1 8 -1
-1 -1 -1

In the human visual system, the eye-brain system applies a Laplacian-like enhancement to everything we view. Another type of filter which is not based on a weighted average of the nine pixels in the neighborhood, but uses a 3 x 3 kernel is the median filter. Instead, the output of the median filter is the median value of the nine pixels in the kernel. The nine gray values from the 3 x 3 kernel placed in ascending numerical order. The median value is that value such that four values are less than or equal to and four greater than or equal to the center value. The center value becomes the output pixel. An example of an 3 x 3 median filter operation would be, given a 3 x 3 group of pixels:

8 18 4
12 52 18
12 8 18

The ascending order would be: 4, 8, 8, 12, 12, 18, 18, 52. The median value would be 12. Therefore the original 52 in the input image would be replaced by 12 in the output image. This type of filter is excellent in removing any type of random noise.

Another image enhancement used to remove structural noise is image subtraction. Digital image subtraction is a methodology for reducing the structured noise of normal anatomic detail and therefore increase the signal to noise ratio. By increasing the signal to noise ratio the pathology should be made more evident to the human observer. Figure 10 illustrates the basic principles of digital image subtraction. Digital image subtraction has been applied to almost every disease process which affects dental hard tissues. The dramatic improvement in diagnostic performance has been clearly demonstrated by numerous researchers.

A prerequisite for digital subtraction radiography is that the projections are identical or almost identical at the different examinations. Other prerequisites for subtraction are the ability to properly align the two images, this is refered to a registration, and the ability to correct for variations associated with exposure and processing that would obscure the changes in radiographic density associated with the pathology. These prerequisite has limited the direct clinical application of these technique. Recent developments in reproducible radiographic techniques and gamma correction algorithms may provide the answers to these limitation. The application of longitudinal radiographic assessment techniques such a subtraction radiography may hold the greatest promise for improving diagnostic accuracy.

The human visual system is much more tuned to the recognition of color than gray scale image features. While we can detect no more than 28 - 32 individual shades of gray we are capable of detecting thousands of shades of different colors. Because of this characteristic of the human visual system, gray scale images can be represented by pseudo-colors. That is to say each gray value from 0 - 255 can be arbitrarily assigned a specific color. This operation can be performed using a color look-up table or CLUT similar to a gray scale LUT. The major limitation to the use of pseudo-color is that we have no reference as to want the color means in these images. A standard color scale must be defined in order to take advantage of our color visual system.

Image encoding are those operations which server to reduce the amount of information necessary to describe an image. Coding images into reduced forms allows either more image data to be transmitted in a given period of time or more to be stored in a given segment of a storage device. Two basic types of encoding schemes exist. Those in which no information is lost called lossless algorithms and those in which information is lost called lossy algorithms. Using lossless algorithms the reconstruction of an image from the coded image is identical to the original. Two examples of lossless techniques are Huffman coding and run-length coding. These provide approximately a 2:1 compression ratio. Lossy techniques do not produce an image identical to the original, some degradation to the original occurs. The degree to which this affects the use of the image is entirely task dependent. Some examples of lossy techniques are discrete cosine transform (DCT), vector quantization, and fractal formating. These may provide for compression ratios ranging from 3:1 to as high as 250:1. It is clear that standards of "visually lossless" encoding must be developed to determine which encoding techniques are appropriate for dental images. Much research is needed in this area considering the large amounts of data that will be required to store these images.

With the development of new digital imaging modalities, the future of maxillofacial imaging never looked brighter. New possibilities for research and clinical diagnoses are just beginning to be realized. Digital imaging in dentistry may have the most profound effect on the practice and treatment of dental diseases since the development of the roentgen ray itself.


ddsweb@uthscsa.edu
Dental Diagnostic Science
Last revised August 1, 1995
Copyright UTHSCSA 1995 All right reserved