7.   BioFilters
7.1   Test-Driving the Label Matching Problem, Unsupervised N:N Matching
7.2   BioFilter Overview
7.3   Unsupervised 1:N Matching
7.4   Training
7.5   Supervised N:N Matching
7.6   Supervised 1:N Matching
7.7   Checking the Results
7.8   File Input
7.9   Analysis
7.10   Summary



7.   BioFilters

Before the ABM Engine processes an image directly (Input Space Matching), it will go through a pre-matching step, called Feature Space Matching. The purpose of Feature Recognition is to eliminate unmatched images. This Feature Recognition sub-layer has two filters:

BioFilter; and
NeuralFilter.
Additional Feature Space Filters (for example, the filter used in the software Attrasoft DecisionMaker) can be ordered in customized version.

Figure 7.1   Selecting BioFilter.

This chapter will demonstrate how BioFilter works using a Label-Matching example. The operation of the Neural Filter is identical to the BioFilter. This chapter also introduces input mode, learning mode, and matching mode and shows how these settings work.

Input Mode.

The images are entered into the ImageFinder in two ways: Where:


Learning Mode.

Image Recognition has two learning modes: Supervised Learning requires training, which teaches the ImageFinder what to look for. Unsupervised Learning does not require training.
Matching Mode.
The matching can be 1:N or N : N.
Let a and b be two images; N:N Matching is {aa, ab, ba, bb} and the “N : (N-1)/2” Matching is {ab}. The N: N Matching has N * N comparisons; and the “N : (N-1)/2” Matching has N * (N-1 )/2 comparisons.

Throughout this chapter, we will use a Label Recognition example. There are 304 images in this example, forming 152 matching pairs. They are located at the directory “.\biofilterex1”, where “.\” is the ImageFinder software location. This example will continue throughout this chapter.

7.1     Test-Driving the Label Matching Problem, Unsupervised N:N Matching

In this section, we will help you get some experience in using the software. We will use:


We will go through the following steps in this section:

Initialization sets the ImageFinder parameters. The ImageFinder then maps an image into a record in a Feature Space. Finally, we will do a matching.
 

Figure 7.2   BioFilter Menu.

Figure 7.3   BioFilter/Drop Down List..

There are 304 images in this example; they are located at the directory “.\biofilterex1”, where “.\” is the ImageFinder software location. The N:N match will compare each of the 304 images with all 304 images.

Figure 7.4  An image in the “.\biofilterex1” folder.

Figure 7.5  BioFilter Parameter.

Initialization


Converting Images to Records


Template Matching


Results

 The result is in a file b1.txt, which will be opened at this time.

C:\…\L01008gi_r90.jpg
C:\…\L01008gi_r90.jpg
C:\…\L01008gi-020501.jpg

C:\…\L01008KEY_m.jpg
C:\…\L01008KEY_m.jpg


The result file contains many blocks. The number of blocks is the same as the number of images in the search-directory, i.e. each image has a block.  Line 1 in each block is input and the rest of the lines are output; i.e. the first line is the image matched against all images in the search-directory, the rest of the lines represent the matched images. For example, “C:\…\L01008gi_r90.jpg” is matched against all 304 images in the search-directory, and there are two matches, listed in the next two lines. We will continue this example in section 3, Unsupervised 1:N Matching.

Unsupervised Learning is not as accurate the Supervised Learning, which will be introduced later.

7.2   BioFilter Overview

The image matching will be done in several steps:
 


Let us look at each phase.

Initialization

Initialization sets the ImageFinder parameters.


Converting Images to Records:


Training

Training uses the data collected in advance to teach the BioFilter how to match. Training requires two files, a1.txt and match.txt:


Template Matching

The matching speed will be between 100,000 – 1,000,000 comparisons per second.

Both BioFilter and Neural Filter will do the Template Matching. There are several commands for the Matching and all of the commands will be used in this chapter.

7.3   Unsupervised 1:N Matching

1:N Matching compares one key image with the images in a search-directory or search-file; the key image is specified in the “Key Segment” textbox or selected by the “Key Segment” button.

To continue the Label Recognition problem for Unsupervised 1:N Matching:

The result is in file, b1.txt, which will be opened at this point:
C:\…\L01008gi-020501.jpg
C:\…\L01008gi_r90.jpg
C:\…\L01008gi-020501.jpg

Total Number of Matches = 2

This result is correct. We will continue this example in the next section, Training.

7.4   Training

Training teaches the ImageFinder what to look for. Unsupervised Learning does not require training. Supervised Learning requires training, which teaches the ImageFinder what to look for.

Each filter is trained differently. For the BioFilter, training requires two files, a1.txt and match.txt:

These two files for training are fixed; you cannot change the names of these two files. You obtain a1.txt automatically, as you convert images to records. You have to prepare match.txt for each problem.

The match.txt looks like this:

152
1   L01008gi_r90      L01008gi-020501
2   L01008KEY_m   L01008key-082301_m
3   L010103C           L010103C-081502_m
4   L01010co_m       L01010CODE_m
5   L010163C_m      L010163C-083100_m

Line 1 is the number of matches in this file. This match file indicates images, L01008gi_r90, will match with image, L01008gi-020501. Each line has the following format:

Number, tab, filename1, tab, filename1, tab.

Note:
(1) You must have a tab at the end of each line;
(2) The file names do not contain “.jpg”.

There are two common errors:

(1) The last Tab is missing;
(2) The number of rows is less than the first number in the file.

Once you get the two files prepared, click “BioFilter\Train (match.txt required)” to train the BioFilter. (Figure 7.2) After training, you can use these commands:

“BioFilter/BioFilter 1:N Match (Trained)”
“BioFilter/BioFilter N:N Match (Trained)”
To continue the Label Recognition example, we must prepare the match.txt file now. This file is already prepared for you and we will simply open it and save it to match.txt. The steps are:


Now the ImageFinder is trained for the Label Recognition problem. We will continue this example in the next section, N:N Matching.

7.5   Supervised N:N Matching

N: N Matching compares each image, specified in the search-directory or search-file, with every image in the search-directory or search-file. N: N Matching is further divided into N: N Matching and N : (N-1)/2 Matching.

Let a and b be two images; N:N Matching is {aa, ab, ba, bb} and the “N : (N-1)/2” Matching is {ab}. The N: N Matching has N * N comparisons; and the “N : (N-1)/2” Matching has “N * (N-1 )/2” comparisons.  The purpose of “N : (N-1)/2” Matching is to reduce the number of comparisons.

Now go back to our example, click “BioFilter/BioFilter N:N Match (Trained)” to make an N:N Match. The result will go to a file, b1.txt, which will be opened right after the click. The file will look like this:

C:\…\L01008gi_r90.jpg
C:\…\L01008gi_r90.jpg
C:\…\L01008gi-020501.jpg

C:\…\L01008KEY_m.jpg
C:\…\L01008KEY_m.jpg
C:\…\L01008key-082301_m.jpg

C:\…\L010103C.jpg
C:\…\L010103C.jpg
C:\…\L010103C-081502_m.jpg

Again, line 1 in each block is the input and the rest of the lines are output. Go all the way to the end of the file; the last line is:

Total Number of Matches = 850

There are 152 pairs or 304 images. Each image will match itself and its partner in the pair, giving a total of 608 matches. As we will see next, all of the 608 matches are identified. There is a small amount of false acceptance, 850 – 608. We will further analyze the results in the analysis section.

7.6   Supervised 1:N Matching

Again, 1:N Matching compares one key image with the images in a search-directory or search-file; the key image is specified in the “Key Segment” textbox or selected by the “Key Segment” button.

To continue the Label Recognition problem for Supervised 1:N Matching:

The results are in file, b1.txt, which will be opened at this point:
C:\…\L01008gi-020501.jpg
C:\…\L01008gi_r90.jpg
C:\…\L01008gi-020501.jpg

Total Number of Matches = 2

These results are correct. We will continue this example in the next section, Checking.
 

7.7   Checking the Results

If this is a test run (i.e., you know the correct answers), you can see the matching results in seconds. You must prepare a file, which indicates the matching pairs. To test the results in b1.txt, you must prepare B1_matchlist.txt file.

An example of b1_matchlist.txt is:

608
1    L01008gi_r90       L01008gi-020501
2    L01008KEY_m    L01008key-082301_m
3    L010103C            L010103C-081502_m
4    L01010co_m        L01010CODE_m
5    L010163C_m       L010163C-083100_m

 

Line 1 is the number of matches in this file. The format is exactly the same as match.txt.
 

Number, tab, filename, tab, filename, tab.


Note:

You must have a tab at the end of each line;
The file names do not contain “.jpg”.


There are two common errors:


To continue the Label Recognition example, we must prepare the b1_matchlist.txt file now. This file is already prepared for you and we will simply open it and save it to b1_matchlist.txt (overwrite the existing file). The steps are:


You will see the something like following in the text window:

Checking Template Matching Results!
Get b1.txt...
Character = 68045
Lines = 1459
Blocks = 305
Get b1_matchlist.txt...
 

Check...
Total Matches = 608

The message indicates b1.txt has 305 blocks: the 304 image blocks plus the last line indicating the total number of matches retrieved. The message ”Total Matches = 608” indicates that 608 matches in b1.txt agrees with those in b1_matchlist.txt.

7.8   File Input

The Directory Input has two limits:

(1) The images are limited to the Input Directory only;
(2) The number of images is limited to 1,000.

The Input File has no limits; however, you must prepare the Input File. The Input Files list one image per line. Each line specifies an absolute path. For example,

C:\xyz1\0001.jpg
C:\xyz1\0002.jpg
C:\xyz2\0003.jpg
C:\xyz2\0004.jpg
The only difference between the Directory Input and the File Input is how images are entered into the ImageFinder; after that, all other steps (Initialization, Training, Matching) are the same.

To continue the Label Recognition example, we must prepare the input file now. There is one file for the Label Recognition problem, biofilterex1_input.txt. Select input file by clicking “File Input” button. Select the file, biofilterex1_input.txt. This file has 304 images and the software will go through each one of them to make sure each image in the file exists. This will take a few seconds and at the end of this process, you should see:

Number of blocks = 1
Block 0. Length = 304
C:\…\L01008gi_r90.jpg
C:\…\L01008KEY_m.jpg
C:\…\L010103C.jpg
To see the original content of this file, click “ShowFile” button. To clear the text window, click “Clear” button.

To convert the images into records, click menu item “BioFilter/Scan Images  - File Input” to convert images to records. You should see the ImageFinder scan through the images at this point.
 

7.9   Analysis

Possible Matches

Let the Total Images in the input file be N, the Possible Matches will be N*N. In our example, N * N = 304 * 304 = 92,416.
Attrasoft Matches
The number of retrieved matches is listed in the last line of b1.txt. Go to the end of b1.txt, you will see something like this:
Total Number of Matches = 850


Actual Match

This number depends on your problem. It should be the first number in b1_matchlist.txt. In our example, it is 608.


Attrasoft Found Duplicates

Click “BioFilter/Check (b1_matchList.txt required)" menu item to get this number, as discussed in the last section. Now that you have all of the numbers, you can make an analysis.
Positive Identification Rate
Positive Identification Rate = the results of clicking “BioFilter/Check (b1_matchlist.txt required)” menu item divided by the first number in file, b1_matchlist.txt.

In our example, the
Positive Identification Rate is 100%,
i.e.,  608/608 = 100%.

Elimination Rate
The Elimination Rate is 1 minus the number at the end of b1.txt divided by the number of possible matches. This number should be normalized so that if all mismatches are eliminated, it should be 100%. In our example, this number is:

Absolute Elimination Rate = 99.08%
= 1 – 850/92,416.
Maximum Absolute Elimination Rate = 1 – 608/92,416.
Elimination Rate  = 99.74 %
= (1 – 850/92,416) / (1 – 608/92,416 ).

Hit Ratio
The Hit Ratio is the number indicated by “BioFilter/Check (b1_matchlist.txt required)” menu item divided by the number at the end of b1.txt. In our example, this number is: 608/850 = 71.53%.


Composite Index

Finally, an Identification is measured by multiplication of Positive Identification Rate * Elimination Rate * Hit Ratio. In our example, this number is 100% * 99.74% * 71.53% = 71.34 %.


7.10   Summary

Summary of the steps for N:N matching:

I.  Preparations

Data

Data is stored at “.\biofilterex1”. There are 304 images.
Training File
1. Training file must have name “match.txt”;
2. Find the file, “.\biofilterex1_match.txt”, and save it to match.txt.
Checking File
1. Checking file must have the name “b1_matchlist.txt”;
2. Find the file, “.\biofilterex1_matchlist.txt”, and save it to b1_matchlist.txt.
II. Operation Congratulations on your very first Image Recognition example with the ImageFinder !
Return