COM**************************************************************************** COM COM This code was written in the Sept of 2006 by CAATS (AKA David Coderre) and originally posted to the ACL USER FORUM. COM This code is designed to compare two the results from two tables and only identifies duplicates if the COM match is from the second table. It was later uploaded onto the TexasACL.COM. COM COM Levenshtein distance used to search for duplicates. The code is designed to be usable on any table. COM**************************************************************************** COM ACL does not readily support a two dimensional array or matrix. However, it can be done. The following 8 scripts (Leven_Start; Leven_Var_S; Leven_Var_T; Leven_Mat_Col1; Leven_Mat_Row1; Leven_Mat_Rows; Leven_Mat_Rows2a; Leven_Mat_Rows2b) show how ACL can be used to calculate the Levenshtein Distance between two strings: The final answer is stored in the variable Levenshtein_Distance. COM The script should be modified to compare two fields in the same table - but ... I am too busy to do it right now. Maybe someone else can make this modification. Note: see above post for modified script. COM Levenshtein distance (LD) is a measure of the similarity between two strings, which we will refer to as the source string (s) and the target string (t). The distance is the number of deletions, insertions, or substitutions required to transform s into t or t into s. COM Refer to web-site: http://www.merriampark.com/ld.htm which shows how a 2 dimentional array is used to perform the Levenshtein distance calculation. Set echo none DELETE ALL OKDIALOG (DIALOG TITLE "User Dialog" WIDTH 493 HEIGHT 234 ) (BUTTONSET TITLE "&OK;&Cancel" AT 384 12 DEFAULT 1 ) (BUTTONSET TITLE "cc. CAATS" AT 372 180 DEFAULT 1 ) (TEXT TITLE "Enter values" AT 36 40 ) (EDIT TO "v_text1a" AT 48 108 DEFAULT "Gumbo" ) (EDIT TO "v_text2a" AT 240 108 DEFAULT "gambol") Comment ****** set up variables ******** Col1=T V_text1=UPPER("%V_text1a%") V_text2=UPPER("%V_text2a%") ROW1=BLANKS(1000) Min_Val=BLANKS(2) S_Cnt=1 T_Cnt=1 Val1=0 Val2=0 Val3=0 Cost=0 R_Cnt=0 C_Cnt=0 Sub_var=0 Sub_var2=0 Ctr=0 LD=0 N = LEN("%v_text1%") M = LEN("%v_text2%") No_Rows = LEN("v_text2%") Comment ******* Create variables for each letter of text1 and text2 ******** DO Leven_Var_S while S_Cnt<=N DO Leven_Var_T while T_Cnt<=M Comment ***** Create first Row of matrix ************************* S_Cnt=0 T_Cnt=1 DO Leven_Mat_Row1 while S_Cnt<=N Comment ****** Create first column of matrix** T_Cnt=2 Do Leven_Mat_Col1 while T_Cnt-1<=M Comment ****** Create other rows - column by column ************ S_Cnt=2 T_Cnt=2 DO Leven_Mat_Rows while T_Cnt-1<=M Comment ****** Determine Levenshtein Distance********** LD=M+1 Levenshtein_Distance = Split(Row%LD%,';',N+1) ************* Script LEVEN_VAR_S ******************** S%S_Cnt%=substr("%v_text1%",S_Cnt,1) S_Cnt=S_Cnt+1 ************* Script LEVEN_VAR_T ******************** T%T_Cnt%=substr("%v_text2%",T_Cnt,1) T_Cnt=T_Cnt+1 ************* Script LEVEN_MAT_COL1 ******************** Row%T_Cnt%=blanks(1000) Row%T_Cnt%=ALLTRIM(Row%T_Cnt%)+ALLTRIM(STR(T_Cnt-1,2))+';' T_Cnt=T_Cnt+1 ************* Script LEVEN_MAT_ROW1 ******************** Row1=ALLTRIM(Row1)+ALLTRIM(STR(S_Cnt,2))+';' S_Cnt=S_Cnt+1 ************* Script LEVEN_MAT_ROWS ******************** DO Leven_Mat_Rows2a while CtrT%Sub_var2% DO Leven_Mat_Rows2b S_Cnt=Ctr+3 Ctr=Ctr+1 ************* Script LEVEN_MAT_ROWS2b ******************** R_Cnt=Sub_var2 C_Cnt=S_Cnt-1 Val1=val(Split(Row%T_cnt%,';',C_Cnt),0)+1 Val2=val(Split(Row%R_cnt%,';',S_Cnt),0)+1 Val3=val(Split(Row%R_cnt%,';',C_Cnt),0)+ cost Min_val=ALLTRIM(str(MIN(MIN(Val1,Val2),Val3),2)) Row%T_Cnt%=ALLTRIM(Row%T_Cnt%)+Min_Val+';'