Introduction to Data Analytics
Often times new users to ACL do not know how to perform some basic data analytics. They do not even have an idea as to where to begin, this is a short essay exploring some of the more common approaches one can use when presented with a selection of data:
1) Use ACL's built in Data Verify Command. ACL has a powerful tool that will review all of the fields or just the selected ones to ensure that the contents of data fields is valid. This is a powerful way to detect in the inputted data is reliable. If data errors are identified, the user should determine if the anomalies are found in the original data or whether it was corrupted while loading into ACL. If the data anomalies exist within the original data, then one has identified a problem. Depending on the nature of the field this problem can be minor or critical.
2) Profile date fields. First, look at the earliest and latest dates, are there dates that fall outside of the expected date rang? In each field you might find data anomalies. For instance, if you are looking at transactions that occurred during the past year, but an associated date field has a date from 2025 or 1978, you know that there might be control issues related to the date field. Again, the severity of the issue may be dictated based upon what the field is and how it impacts other fields. A date field that is used for determining interest paid or rates may result in significant over/under payments if the data in the field cannot be relied upon. Second, look for date distributions throughout the time period. Do you expect dates to be evenly distributed throughout the period? Do you anticipate that activity might increase/decrease at the beginning or end of a month? Quarter? Year? If so, does it? Third, look at the days of the transactions. Do you expect activity to occur only from Monday through Friday? How about Saturday and Sunday? Anytime the data deviates from expected values, you have something you might want to explore.
3) Profile numeric fields. Again, you are looking for values outside of the expected norm. Do you have negative values where you expect only positive values? Do you have a lot of values that are round numbers? Specifically, multiples of 10 or 100? Have you considered a Benford Analysis?
4) Classify on key fields. Many fields you will discover have expected values in them. Suppose you are dealing with a field that is supposed to contain one for three key codes "Accept" "Reject" or "Review". If you classify on the field and discover unexpected values, then those are the records that you should investigate further. First, unexpected results may indicate data load/extraction/transformation or integrity errors. Or, if the data was loaded correctly, might indicate a control weakness. If the system was written to act based upon the three expected values, how will the system respond to another value? Will it crash the system or simply fail to report a potential problem?
5) Use the MAP() function on key fields. Similar to the classifing key fields, performing a MAP() is designed to identify data issues that might indicate data anomalies that can indicate data issues and/or control issues.
6) Look for duplicated data. ACL has a built in DUPLICATE search function, this function is powerful, but many people do not know how to utilize it properly. The suppose, we