Software Evaluation of Data Quality Management Tools

Data quality management tools (DQM) are growing significantly as volume of data has increased and dependency of more automated tools depend on a high degree of accuracy of the data to avoid exceptions and delays in processes. As customers and other trading partners expectations increase in terms of automation and speed they are more and more dependent on good quality data to be able to execute such processes resulting in a direct impact on both revenues and costs for organizations.

What are the evaluation criteria requirements for a data quality tool and what are the gaping holes which despite implementing these kinds of tools still often results in failure of data cleansing and quality projects. From a technical perspective a DQM application should:

(1) Extracts, parsing and data connectivity
The first step of this kind of application is to either connect to the data or get the data loaded in to the application. There are multiple ways data can get loaded in to the application or the ability to connect and view the data. This also includes the ability to parse or split data fields.

(2) Data profiling
Once the application has or has access to the data the first step of the DQM process is to perform some level of data profiling which would include running statistics on the data (min/max, average, number of missing attributes) including determining relationships between the data. This should also include the ability to verify the accuracy of certain columns such as e-mail addresses, phone numbers etc. as well as the availability of reference libraries such as postal codes, spelling accuracy.

(3) Cleansing and standardization
Data cleansing involves both using seeded automated cleansing functionalities such as date standardization, eliminating spaces, transform functions (such as replacing 1 for F and 2 for M), calculating values, identifying incorrect location names referencing external libraries as well as defining standard rule sets and data normalization which will help the identification of missing or incorrect information. This also includes the ability to manually adjust information.

(4) De-duplication
Deduping records involve leveraging a variety or combination of fields and algorithms to identify, merge and clean up records. Duplicate records can be the result of poor data entry procedures, merging of applications, company mergers or many other reasons. You should ensure that not only addresses are deduped but that any data can be assessed for duplication. Once a suspect duplicate record is identified the process for actually merging the record needs to be clarified which could include automated rules to select which attributes are to be prioritized and/or manual process to clean up the duplication.

(5) Load and export
Ability of the application to export the data in a variety of formats, connect to databases or data stores to drop either full data or incrementally.

New emerging capabilities in DQM applications.

DQM tools are typically designed and built by engineers. Making a data quality project successful is not only the technical aspects of analyzing and cleaning the data but several other aspects. What a few new DQL applications are incorporating in to the application tool set includes areas which are more related to the management of the project and processes either on a one-time of continuing basis. These types of new capabilities can be just as important for successfully getting through a data cleaning or quality project:

(1) Automated task management of stakeholders and data owners
These types of processes or projects usually involve a large set of internal as well as external stakeholders. Managing this through spreadsheets and emails can be a daunting and complex affaire. Applications, which can automate parts of this process, can add significant value and predictability of success of the project. This could be from simple things like monitoring adherence to standards defined and throwing exceptions/tasks to specific users or data owners when violated or coordinating large scale validation directly with external parties such as requesting updated tax exemption certificates or addresses directly.

(2) Data flexibility – ability to handle any data
Some DQM applications are highly specialized to manage only address verification or part/SKU cleansing. The DQM application should be able to handle any type of MDM (master data) or transactional data with flexible rule definitions.

(3) Big Data cleansing
Big Data files can come in structured, semi-structured and completely unstructured formats. Standardizing and automating the cleansing of this data can be necessary on a continuous basis. This emerging process of cleaning up large amounts of data requires automated transformation rules, which can be applied unstructured, formats.

(4) Data governance and adherence monitoring
Data governance and monitoring adherence is a key aspect of being able to maintain accuracy and cleanliness of the data. Many applications are unable to enforce business rules, which is desired from a structural perspective. Some DQM applications can be used to monitoring the data governance processes for requesting new attributes or values and exception monitoring to achieve a higher level of quality of your information.

(5) Project status reporting
A typical data quality management or conversion project goes through a series of steps and phases involving a large set of stakeholders. Appropriate allocation of responsibility, progress on cleansing and inter-dependency of tasks is a complex process and some applications are starting to take on these types of collaborative functionalities as well.

Why Use an Access Database Library?

Maximising code re-use should be the goal of the professional programmer. All too frequently the same code is repeated in numerous forms and procedures. And the same code is often repeated in many other Access databases.

From the developer’s point of view, using less code means greater productivity and faster development cycles. Forms, Reports and procedures can be built faster – and will run faster.

The best way to remove repeated code in a Microsoft Access database is through the use of an ACCDE Library. The ACCDE file is a compiled and executable version of an Access database file. It does not allow the user to read or modify the Visual Basic source code.

The ACCDE file should be regarded as the equivalent of the DLL file – without the complexity.

Why an ACCDE Library is a necessity

There are many reasons why an ACCDE Library should be used to remove redundant code:

  • Improves performance – the application loads and runs faster
  • Optimises memory usage – the application cannot become uncompiled and cause database bloat
  • Efficiency – memory is better utilised with code re-use
  • Security – intellectual property is protected
  • A common resource – the ACCDE may be shared between different database projects
  • Stability – the Front-End database becomes more robust and reliable
  • Easier maintenance – due to a smaller Front-End database code size and reduced complexity
  • Front-End Access limits – less likely to be reached
  • End-user productivity – less training is needed with standardised and consistent software routines
  • Crashes – errors are less likely to occur with shared and re-usable code
  • Change management – simplified as only one Library modification needed

When a database is saved as an ACCDE file, Access compiles all the code modules including Reports and Forms, removes all editable source code and compacts the database. The resulting ACCDE Library file is fast, memory efficient and small.

Hopefully the above will convince you that an Access Library of re-usable procedures is essential.

Eliminate Repeated Code

Start by searching the Access Front-End for repeated code. Likely candidates are modules with:

  • Error Handling
  • ADO and DAO database retrieval and updating
  • Microsoft Word functions
  • File Handling
  • Consistent ActiveX Control colours
  • Security control
  • Validation and Formatting

Where variations in code procedures are found, select the best or handle the variation with Optional Arguments. This makes for powerful functionality.

Setting up a Library Database

This is easy to do in Access 2010:

  • Create a blank ACCDB Access database
  • Add code modules
  • Compile
  • Save the ACCDB file
  • Create an ACCDE from the File, Save & Publish Menu.

Some suggestions

  • Prefix all the Library Modules with “lib”, so that the Modules can be easily distinguished.
  • It is possible to deploy an Access database to the users as an ACCDE file.
  • The ACCDE Library must reside in the same folder path in both the development and user environments. Alternatively, the Reference folder path must be set in Visual Basic code at start-up.
  • Make sure that each Module has error handling on all procedures, and that all errors are logged to a central folder.

The Roles of Libraries in Teaching and Learning


Libraries have long served crucial roles in learning. The first great library, in Alexandria two thousand years ago was really the first university. It consisted of a zoo and various cultural artifacts in addition to much of the ancient world’s written knowledge and attracted scholars from around the Mediterranean who lived and worked in a scholarly community for years at a time. Today, the rhetoric associated with the National/Global Information Infrastructure (N/GII) always includes examples of how the vast quantities of information that global networks provide (i.e., digital libraries) will be used in educational settings. An important aspect of the Library’s educational mission is to promote and develop informational literacy in its users. Information literacy, in general, is the ability to identify, locate, use and interpret information effectively.

Role of Modern Libraries:

A library is defined by three fundamental functions:

(1)selection to create a “collection”;
(2) organization to enable access; and
(3) preservation for ongoing use.

Although technologies may evolve to add the second function to the Web, the first and third functions are antithetical to the very nature of today’s Web. The Web’s successor will become more “library-like,” and libraries will continue to become more “Web-like,” but each will retain some essential differences from the other.

The Web is most definitely not a library now, and it probably never will be. But the Web provides a wonderful mechanism for collaboration between and among scholars and librarians who want to create “libraries” of high-quality resources on a particular topic for scholarship and teaching. Another great concern about Web resources is that they are ephemeral. Libraries select and preserve information resources for generations to come. The longevity of Web-based resources is calculated in days!

How do libraries support teaching and learning?

A library is fundamentally an organized set of resources, which include human services as well as the entire spectrum of media (e.g., text, video, hypermedia). Libraries have physical components such as space, equipment, and storage media; intellectual components such as collection policies that determine what materials will be included and organizational schemes that determine how the collection is accessed; and people who manage the physical and intellectual components and interact with users to solve information problems

Libraries serve at least three roles in learning.

First, they serve a practical role in sharing expensive resources. Physical resources such as books and periodicals, films and videos, software and electronic databases, and specialized tools such as projectors, graphics equipment and cameras are shared by a community of users. Human resources–librarians (also called media specialists or information specialists) support instructional programs by responding to the requests of teachers and students (responsive service) and by initiating activities for teachers and students (proactive services). Responsive services include maintaining reserve materials, answering reference questions, providing bibliographic instruction, developing media packages, recommending books or films, and teaching users how to use materials. Proactive services include selective dissemination of information to faculty and students, initiating thematic events, collaborating with instructors to plan instruction, and introducing new instructional methods and tools. In these ways, libraries serve to allow instructors and students to share expensive materials and expertise.

Second, libraries serve a cultural role in preserving and organizing artifacts and ideas. Great works of literature, art, and science must be preserved and made accessible to future learners. Although libraries have traditionally been viewed as facilities for printed artifacts, primary and secondary school libraries often also serve as museums and laboratories. Libraries preserve objects through careful storage procedures, policies of borrowing and use, and repair and maintenance as needed. In addition to preservation, libraries ensure access to materials through indexes, catalogs, and other finding aids that allow learners to locate items appropriate to their needs.

Third, libraries serve social and intellectual roles in bringing together people and ideas. This is distinct from the practical role of sharing resources in that libraries provide a physical place for teachers and learners to meet outside the structure of the classroom, thus allowing people with different perspectives to interact in a knowledge space that is both larger and more general than that shared by any single discipline or affinity group. Browsing a catalog in a library provides a global view for people engaged in specialized study and offers opportunities for serendipitous insights or alternative views. In many respects, libraries serve as centers of interdisciplinary–places shared by learners from all disciplines.

Formal learning is systematic and guided by instruction. Formal learning takes place in courses offered at schools of various kinds and in training courses or programs on the job. The important roles that libraries serve in formal learning are illustrated by their physical prominence on university campuses and the number of courses that make direct use of library services and materials. Most of the information resources in schools are tied directly to the instructional mission. Students or teachers who wish to find information outside this mission have in the past had to travel to other libraries. By making the broad range of information resources discussed below available to students and teachers in schools, digital libraries open new learning opportunities for global rather than strictly local communities.

Much learning in life is informal–opportunistic and strictly under the control of the learner. Learners take advantage of other people, mass media, and the immediate environment during informal learning. The public library system that developed in the U.S. in the late nineteenth century has been called the “free university”, since public libraries were created to provide free access to the world’s knowledge. Public libraries provide classic nonfiction books, a wide range of periodicals, reference sources, and audio and video tapes so that patrons can learn about topics of their own choosing at their own pace and style. Just as computing technology and world-wide telecommunications networks are beginning to change what is possible in formal classrooms, they are changing how individuals pursue personal learning missions.

Professional learning refers to the on going learning adults engage in to do their work and to improve their work-related knowledge and skills. In fact, for many professionals, learning is the central aspect of their work. Like informal learning, it is mainly self-directed, but unlike formal or informal learning, it is focused on a specific field closely linked to job performance, aims to be comprehensive, and is acquired and applied longitudinally. Since professional learning affects job performance, corporations and government agencies support libraries (often called information centers) with information resources specific to the goals of the organization.

The main information resources for professional learning, however, are personal collections of books, reports, and files; subscriptions to journals; and the human networks of colleagues nurtured through professional meetings and various communications. Many of the data sets and computational tools of digital libraries were originally developed to enhance professional learning. The information resources–both physical and human–that support these types of learning are customized for specific missions and have traditionally been physically separated, although common technologies such as printing, photography, and computing are found across all settings.
Role of Digital Libraries:

Digital libraries extend such inter disciplinarily by making diverse information resources available beyond the physical space shared by groups of learners. One of the greatest benefits of digital libraries is bringing together people with formal, informal, and professional learning missions. Many of the data sets and computational tools of digital libraries were originally developed to enhance professional learning. The information resources–both physical and human–that support these types of learning are customized for specific missions and have traditionally been physically separated, although common technologies such as printing, photography, and computing are found across all settings.

Digital libraries combine technology and information resources to allow remote access, breaking down the physical barriers between resources. Although these resources will remain specialized to meet the needs of specific communities of learners, digital libraries will allow teachers and students to take advantage of wider ranges of materials and communicate with people outside the formal learning environment. This will allow more integration of the different types of learning. Although not all students or teachers in formal learning settings will use information resources beyond their circumscribed curriculum and not all professionals will want to interact even occasionally with novices, digital libraries will allow learners of all types to share resources, time and energy, and expertise to their mutual benefits. The following sections illustrate some of the types of information resources that are defining digital libraries.

As research and teaching increasingly rely on global networks for the creation, storage and dissemination of knowledge, the need to educate information-literate students has become more widely recognized. Students often lack the skills necessary to succeed in this rapidly changing environment, and faculty need training and support to make use of new technologies for effective teaching and learning. The current environment provides an opportunity for librarians to play a key role in the evolution of integrated information literacy. Thus, technology itself may provide a positive impetus as, “developments in education and technology are beginning to help academic librarians achieve new breakthroughs in integrating information and technology skills into the curriculum”

Technology allows library services to be available to students and faculty whenever and wherever they need such services. Technology makes possible round-the-clock library services without increasing investment in human resources. In addition, research materials increasingly exist only in digital form. Such resources are available only with the application of technology. Libraries will continue to exploit the inevitable technological innovation to improve productivity, control costs, enrich services, and deliver the high-quality content that is demanded.