Attrasoft ImageFinder

7.   BioFilters
7.1   Test-Driving the Label Matching Problem, Unsupervised N:N Matching
7.2   BioFilter Overview
7.3   Unsupervised 1:N Matching
7.4   Training
7.5   Supervised N:N Matching
7.6   Supervised 1:N Matching
7.7   Checking the Results
7.8   File Input
7.9   Analysis
7.10   Summary

7.   BioFilters
Before the ABM Engine processes an image directly (Input Space Matching), it will go through a pre-matching step, called Feature Space Matching. The purpose of Feature Recognition is to eliminate unmatched images. This Feature Recognition sub-layer has two filters:
BioFilter; and
NeuralFilter.
Additional Feature Space Filters (for example, the filter used in the software Attrasoft DecisionMaker) can be ordered in customized version.

Figure 7.1   Selecting BioFilter.
This chapter will demonstrate how BioFilter works using a Label-Matching example. The operation of the Neural Filter is identical to the BioFilter. This chapter also introduces input mode, learning mode, and matching mode and shows how these settings work.
Input Mode.
The images are entered into the ImageFinder in two ways:

Search-Directory

Search-File

Where:

Search-Directory is a folder containing the images to be searched;

Search-File is a file listing the images to be searched.

Learning Mode.
Image Recognition has two learning modes:

Supervised Learning;

Unsupervised Learning.

Supervised Learning requires training, which teaches the ImageFinder what to look for. Unsupervised Learning does not require training.
Matching Mode.
The matching can be 1:N or N : N.

1:N Matching compares one key image with the images in a search-directory or search-file; the key image is specified in the “Key Segment” textbox or selected by the “Key Segment” button.

N: N Matching compares each image, specified in the search-directory or search-file, with every image in the search-directory or search-file. N: N Matching is further divided into N: N Matching and N : (N-1)/2 Matching.

Let a and b be two images; N:N Matching is {aa, ab, ba, bb} and the “N : (N-1)/2” Matching is {ab}. The N: N Matching has N * N comparisons; and the “N : (N-1)/2” Matching has N * (N-1 )/2 comparisons.
Throughout this chapter, we will use a Label Recognition example. There are 304 images in this example, forming 152 matching pairs. They are located at the directory “.\biofilterex1”, where “.\” is the ImageFinder software location. This example will continue throughout this chapter.
7.1     Test-Driving the Label Matching Problem, Unsupervised N:N Matching
In this section, we will help you get some experience in using the software. We will use:

Directory Input

Unsupervised Learning

N : N Matching

We will go through the following steps in this section:

Initialization

Converting Images to Records

Template Matching

Results

Initialization sets the ImageFinder parameters. The ImageFinder then maps an image into a record in a Feature Space. Finally, we will do a matching.

Figure 7.2   BioFilter Menu.

Figure 7.3   BioFilter/Drop Down List..
There are 304 images in this example; they are located at the directory “.\biofilterex1”, where “.\” is the ImageFinder software location. The N:N match will compare each of the 304 images with all 304 images.

Figure 7.4 An image in the “.\biofilterex1” folder.

Figure 7.5 BioFilter Parameter.
Initialization

The first drop down list is “Edge Filter”; select “Sobel 1”(first choice).

The second drop down list is “Threshold Filter”; select “Dark Background 128” (first choice).

The third drop is “CleanUp Filter; select “Small” (first choice).

The fourth drop down list is “Reduction Filter”; use the default setting.

The fifth drop down list is “BioFilter”; use the “CL” setting (tenth choice). (Figure 7.3)

To set the BioFilter parameter, click the “Parameter” button next to the BioFilter and set the “BioFilter Scale” to 50. (Figure 7.5)

Converting Images to Records

Click “Search Dir” button to specify the search-directory, which contains the search images. After the “Open Dialog” dialog box comes up, go to “biofilterex1” and select any file in the folder. This will specify the input directory.

Click menu item “BioFilter/Scan Images - Directory Input” to convert images to records. You should see the ImageFinder scan through the images at this point.

Template Matching

Click menu item “BioFilter/BioFilter N:N Matching (Untrained)” button to match.

Results
The result is in a file b1.txt, which will be opened at this time.
C:\…\L01008gi_r90.jpg
C:\…\L01008gi_r90.jpg
C:\…\L01008gi-020501.jpg
C:\…\L01008KEY_m.jpg
C:\…\L01008KEY_m.jpg
…

The result file contains many blocks. The number of blocks is the same as the number of images in the search-directory, i.e. each image has a block. Line 1 in each block is input and the rest of the lines are output; i.e. the first line is the image matched against all images in the search-directory, the rest of the lines represent the matched images. For example, “C:\…\L01008gi_r90.jpg” is matched against all 304 images in the search-directory, and there are two matches, listed in the next two lines. We will continue this example in section 3, Unsupervised 1:N Matching.
Unsupervised Learning is not as accurate the Supervised Learning, which will be introduced later.
7.2   BioFilter Overview
The image matching will be done in several steps:

Initialization

Converting Images to Records

Training

Template Matching

Let us look at each phase.
Initialization
Initialization sets the ImageFinder parameters.

Converting Images to Records:

An image is mapped into a record in a Feature Space. This step is slow (several images per second); however:

This step can be done once for all; and

This is linear, i.e. the time is directly proportional to the number of images. Therefore, this step does not have much impact on the operating speed.

The result will be stored in “.\a1.txt”, where “.\” is the ImageFinder directory.

Training
Training uses the data collected in advance to teach the BioFilter how to match. Training requires two files, a1.txt and match.txt:

A1.txt is the record file, which contains many records. Each image is converted into a record. A1.txt is produced automatically in the last step. A record represents features of an image in a feature space.

Match.txt is a list of matching pairs. This file will teach the ImageFinder who will match with whom. You must prepare this file. We will discuss the format of match.txt later.

Template Matching
The matching speed will be between 100,000 – 1,000,000 comparisons per second.
Both BioFilter and Neural Filter will do the Template Matching. There are several commands for the Matching and all of the commands will be used in this chapter.

7.3   Unsupervised 1:N Matching
1:N Matching compares one key image with the images in a search-directory or search-file; the key image is specified in the “Key Segment” textbox or selected by the “Key Segment” button.
To continue the Label Recognition problem for Unsupervised 1:N Matching:

Click “Key Segment” button, go to “biofilterex1” folder, and select the first image “L01008gi-020501.jpg”;

Click “BioFilter/BioFilter 1:N Match (Untrained)”.

The result is in file, b1.txt, which will be opened at this point:
C:\…\L01008gi-020501.jpg
C:\…\L01008gi_r90.jpg
C:\…\L01008gi-020501.jpg
Total Number of Matches = 2
This result is correct. We will continue this example in the next section, Training.
7.4   Training
Training teaches the ImageFinder what to look for. Unsupervised Learning does not require training. Supervised Learning requires training, which teaches the ImageFinder what to look for.
Each filter is trained differently. For the BioFilter, training requires two files, a1.txt and match.txt:

A1.txt is the record file, which contains many records. Each image is converted into a record. A record represents features of an image in a feature space.

Match.txt is a list of matching pairs. This file will teach the ImageFinder who will match with whom.

These two files for training are fixed; you cannot change the names of these two files. You obtain a1.txt automatically, as you convert images to records. You have to prepare match.txt for each problem.
The match.txt looks like this:
152
1   L01008gi_r90      L01008gi-020501
2   L01008KEY_m   L01008key-082301_m
3   L010103C           L010103C-081502_m
4   L01010co_m       L01010CODE_m
5   L010163C_m      L010163C-083100_m
…
Line 1 is the number of matches in this file. This match file indicates images, L01008gi_r90, will match with image, L01008gi-020501. Each line has the following format:
Number, tab, filename1, tab, filename1, tab.
Note:
(1) You must have a tab at the end of each line;
(2) The file names do not contain “.jpg”.
There are two common errors:
(1) The last Tab is missing;
(2) The number of rows is less than the first number in the file.
Once you get the two files prepared, click “BioFilter\Train (match.txt required)” to train the BioFilter. (Figure 7.2) After training, you can use these commands:
“BioFilter/BioFilter 1:N Match (Trained)”
“BioFilter/BioFilter N:N Match (Trained)”
To continue the Label Recognition example, we must prepare the match.txt file now. This file is already prepared for you and we will simply open it and save it to match.txt. The steps are:

Go to the ImageFinder folder, (The default folder is (“C:\program files\Attrasoft\ImageFinder 6.0\”.), and open the file, biofilterex1_match.txt. This file lists 152 matching pairs. Save it to match.txt (overwrite the existing file). Now the training file is prepared.

Click “BioFilter/Train” to train the BioFilter.

Now the ImageFinder is trained for the Label Recognition problem. We will continue this example in the next section, N:N Matching.
7.5   Supervised N:N Matching
N: N Matching compares each image, specified in the search-directory or search-file, with every image in the search-directory or search-file. N: N Matching is further divided into N: N Matching and N : (N-1)/2 Matching.
Let a and b be two images; N:N Matching is {aa, ab, ba, bb} and the “N : (N-1)/2” Matching is {ab}. The N: N Matching has N * N comparisons; and the “N : (N-1)/2” Matching has “N * (N-1 )/2” comparisons. The purpose of “N : (N-1)/2” Matching is to reduce the number of comparisons.
Now go back to our example, click “BioFilter/BioFilter N:N Match (Trained)” to make an N:N Match. The result will go to a file, b1.txt, which will be opened right after the click. The file will look like this:
C:\…\L01008gi_r90.jpg
C:\…\L01008gi_r90.jpg
C:\…\L01008gi-020501.jpg
C:\…\L01008KEY_m.jpg
C:\…\L01008KEY_m.jpg
C:\…\L01008key-082301_m.jpg
C:\…\L010103C.jpg
C:\…\L010103C.jpg
C:\…\L010103C-081502_m.jpg
Again, line 1 in each block is the input and the rest of the lines are output. Go all the way to the end of the file; the last line is:
Total Number of Matches = 850
There are 152 pairs or 304 images. Each image will match itself and its partner in the pair, giving a total of 608 matches. As we will see next, all of the 608 matches are identified. There is a small amount of false acceptance, 850 – 608. We will further analyze the results in the analysis section.
7.6   Supervised 1:N Matching
Again, 1:N Matching compares one key image with the images in a search-directory or search-file; the key image is specified in the “Key Segment” textbox or selected by the “Key Segment” button.
To continue the Label Recognition problem for Supervised 1:N Matching:

Click the “Key Segment” button, go to the “biofilterex1” folder, and select the first images “L01008gi-020501.jpg”;

Click “BioFilter/BioFilter 1:N Match (Trained)”

The results are in file, b1.txt, which will be opened at this point:
C:\…\L01008gi-020501.jpg
C:\…\L01008gi_r90.jpg
C:\…\L01008gi-020501.jpg
Total Number of Matches = 2
These results are correct. We will continue this example in the next section, Checking.

7.7   Checking the Results
If this is a test run (i.e., you know the correct answers), you can see the matching results in seconds. You must prepare a file, which indicates the matching pairs. To test the results in b1.txt, you must prepare B1_matchlist.txt file.
An example of b1_matchlist.txt is:
608
1    L01008gi_r90       L01008gi-020501
2    L01008KEY_m    L01008key-082301_m
3    L010103C            L010103C-081502_m
4    L01010co_m        L01010CODE_m
5    L010163C_m       L010163C-083100_m
…

Line 1 is the number of matches in this file. The format is exactly the same as match.txt.

Number, tab, filename, tab, filename, tab.

Note:
You must have a tab at the end of each line;
The file names do not contain “.jpg”.

There are two common errors:

The last Tab is missing;

The number of rows is less than the first number in the file.

To continue the Label Recognition example, we must prepare the b1_matchlist.txt file now. This file is already prepared for you and we will simply open it and save it to b1_matchlist.txt (overwrite the existing file). The steps are:

Go to the ImageFinder folder (The default folder is C:\program files\Attrasoft\ImageFinder 6.0\”), and open the file, biofilterex1_matchlist.txt. This file lists 608 matching pairs. Save it to b1_matchlist.txt (overwrite the existing file). Now this file is prepared.

Now generate the result file, b1.txt, for the N:N Matching by licking the "BioFilter/BioFilter N:N Match (Trained)" menu item.

Click “BioFilter/Check (b1_matchlist.txt required)” to check the results of the BioFilter.

You will see the something like following in the text window:
Checking Template Matching Results!
Get b1.txt...
Character = 68045
Lines = 1459
Blocks = 305
Get b1_matchlist.txt...

Check...
Total Matches = 608
The message indicates b1.txt has 305 blocks: the 304 image blocks plus the last line indicating the total number of matches retrieved. The message ”Total Matches = 608” indicates that 608 matches in b1.txt agrees with those in b1_matchlist.txt.
7.8   File Input
The Directory Input has two limits:
(1) The images are limited to the Input Directory only;
(2) The number of images is limited to 1,000.
The Input File has no limits; however, you must prepare the Input File. The Input Files list one image per line. Each line specifies an absolute path. For example,
C:\xyz1\0001.jpg
C:\xyz1\0002.jpg
C:\xyz2\0003.jpg
C:\xyz2\0004.jpg
…
The only difference between the Directory Input and the File Input is how images are entered into the ImageFinder; after that, all other steps (Initialization, Training, Matching) are the same.
To continue the Label Recognition example, we must prepare the input file now. There is one file for the Label Recognition problem, biofilterex1_input.txt. Select input file by clicking “File Input” button. Select the file, biofilterex1_input.txt. This file has 304 images and the software will go through each one of them to make sure each image in the file exists. This will take a few seconds and at the end of this process, you should see:
Number of blocks = 1
Block 0. Length = 304
C:\…\L01008gi_r90.jpg
C:\…\L01008KEY_m.jpg
C:\…\L010103C.jpg
To see the original content of this file, click “ShowFile” button. To clear the text window, click “Clear” button.
To convert the images into records, click menu item “BioFilter/Scan Images - File Input” to convert images to records. You should see the ImageFinder scan through the images at this point.

7.9   Analysis
Possible Matches
Let the Total Images in the input file be N, the Possible Matches will be N*N. In our example, N * N = 304 * 304 = 92,416.
Attrasoft Matches
The number of retrieved matches is listed in the last line of b1.txt. Go to the end of b1.txt, you will see something like this:

Total Number of Matches = 850

Actual Match
This number depends on your problem. It should be the first number in b1_matchlist.txt. In our example, it is 608.

Attrasoft Found Duplicates
Click “BioFilter/Check (b1_matchList.txt required)" menu item to get this number, as discussed in the last section. Now that you have all of the numbers, you can make an analysis.
Positive Identification Rate
Positive Identification Rate = the results of clicking “BioFilter/Check (b1_matchlist.txt required)” menu item divided by the first number in file, b1_matchlist.txt.
In our example, the
Positive Identification Rate is 100%,
i.e., 608/608 = 100%.
Elimination Rate
The Elimination Rate is 1 minus the number at the end of b1.txt divided by the number of possible matches. This number should be normalized so that if all mismatches are eliminated, it should be 100%. In our example, this number is:
Absolute Elimination Rate = 99.08%
= 1 – 850/92,416.
Maximum Absolute Elimination Rate = 1 – 608/92,416.
Elimination Rate = 99.74 %
= (1 – 850/92,416) / (1 – 608/92,416 ).
Hit Ratio
The Hit Ratio is the number indicated by “BioFilter/Check (b1_matchlist.txt required)” menu item divided by the number at the end of b1.txt. In our example, this number is: 608/850 = 71.53%.

Composite Index
Finally, an Identification is measured by multiplication of Positive Identification Rate * Elimination Rate * Hit Ratio. In our example, this number is 100% * 99.74% * 71.53% = 71.34 %.

7.10   Summary
Summary of the steps for N:N matching:
I. Preparations
Data
Data is stored at “.\biofilterex1”. There are 304 images.
Training File
1. Training file must have name “match.txt”;
2. Find the file, “.\biofilterex1_match.txt”, and save it to match.txt.
Checking File
1. Checking file must have the name “b1_matchlist.txt”;
2. Find the file, “.\biofilterex1_matchlist.txt”, and save it to b1_matchlist.txt.
II. Operation

Start the ImageFinder;

Edge Filter: select Sobel 1;

Threshold Filter: select Dark Background 128;

CleanUp Filter: select Small;

Reduction Filter: do nothing;

BioFilter: select CL;

BioFilter Parameter: set “BioFilter Scales:” to 50;

Entering Data: click “Search Dir” and select any file in “biofilterex1” directory;

Records: click menu item “BioFilter/Scan Images - Directory Input”;

Training: click “BioFilter\Train (match.txt required)”;

N:N Matching: click “BioFilter/BioFilter N:N Match (Trained)”;

Attrasoft Matches: go to the last line of b1.txt and see: “Total Number of Matches = 850”;

Checking: click “BioFilter/Check (b1_matchlist.txt required)” and see “Total Matches = 608”;

Results, out of 304 * 304 = 92,416 possible comparisons, there are 608 matches. You have found 850, including all 608 matches.

Congratulations on your very first Image Recognition example with the ImageFinder !

Return