A Bridge to Extract, Clean, Transform and LoadBlog Article - Sunday, June 26, 2011
The trouble with today’s growing investment in IT is that everything is integrated — and this makes the HP3000 an island of data in your organization. In order to bridge between this platform and others, several things need to occur. The obvious items have to do with what data needs to be moved, when and how often. The technology stack in such a solution will need to address these issues:
- How to connect to and select from the database
- How to move the data across a network to a target database
- How to add to or create the database on the target platform
- How to deal with data type differences (precision of numbers, item name length, table name length)
- Endian differences (the byte order for integers can differ on different platforms)
- What to do with dates (SQL has strict rules around date and date-time fields)
For a one-time move — perhaps a special project to get certain information on a group of customers who bought a product over the past two years — the scope is easy and the target can just have fields that are selectively populated.
For a continuous feed of data — perhaps to an ODS (Operational Data Store), datamart or to another application — the problem becomes more complex. After all, the need to move data between platforms is becoming a business driver. We have customers taking advantage of our J2EE technology to integrate into a Java environment with JINI, EJB, JTS and SSL support. All of this has allowed the HP 3000 to play as an equal in the Enterprise Data Bus Architecture. But where to start your bridging sparks a good set of questions.
You can follow these questions to outline your requirements:
- Does there need to be a start date?
- What data needs to be captured, and how do we identify the data required?
- Is it transactional data, or updates to a file selected by timestamp?
- Is it synchronous data, or just an hourly, daily, weekly or monthly data sweep that is required?
- Once we have some candidate data, how will it be checked for integrity before it is sent to the application or database?
- Is there a way to make sure (via audit) that all the transactions were correctly posted to the target database and none were missed?
We have been helping customers with synching data for over 15 years and moving data for over 25 years. We have helped customers with the ECTL process (Extract, Clean, Transform and Load) as well as creating a data quality focus to clean up the data before the project is implemented. We often get to discuss the history, policy and lifecycle of the data.
We ask when and how many transactions of what kinds are produced, and how long are the full details required. We want to know when and how do summaries play in the trending and decision making process. Customers need to know what data they want to share with suppliers and customers. Who needs the data internally, and what else do they need to do their job?
Whether you plan on a project for synchronizing data, or moving it one time, or doing periodic refreshes, there is a framework required before you can start the project.
We have been evolving our solutions to help customers with these problems since we first took data from an IMAGE database and built Oracle “loader” files in 1985. Our UDA (Universal Data Access) series was built with the philosophy that we should be database- and operating system-agnostic. We have evolved to go beyond the HP 3000 to include SQLServer, DB2, Sybase, Ingres, Cache, Eloquence, PostgreSQL and MySQL, all to work with Unix, Linux, Windows, AS400 and more.
The objective is to allow “drag and drop” data transformation between any of the databases regardless of source and target platform. We typically pull or push data at the rate of 5-10 million records per hour.
We still support the HP 3000 with all of its file types – IMAGE, Allbase, KSAM and flat files. UDALink which includes ODBC, JDBC and easy to use MBFReporter capability is being used daily by thousands of users in hundreds of sites. We add new copies as customers discover that they need 64-bit clients to support ODBC access to the HP 3000
For many customers we have also been replacing ODBCLink/SE, a product we licensed to HP from 1996-2006 for bundling into MPE/iX. Now that we are five years beyond supporting that product for HP, we find that customers are moving to new versions of Windows Server or SQL Server, triggering the need for a new client to connect to the HP 3000 data source, or in the occasional case of an HP 9000 running Allbase. We continue to evolve the solution and so have added XML, XLS, and PDF as the output types of reports, CSV, and several self-describing file types.
For the past 10 years, our 3000 customers have been able to use .NET applications with ODBC and for our RPC mechanism. The RPC mechanism makes XLs on an HP 3000 available to a Microsoft environment just like they are libraries (both .COM and .NET work). The RPC mechanism takes code compiled on the HP3000 (in COBOL, C, ,C++, Pascal, Fortran and so on) and allows the Microsoft based development environment to leverage the tried and true business logic without having to duplicate the logic. This goes beyond data to allow the 3000 more of a role in the architecture for new and current systems.
The HP 3000 may be gone from the supported platform list for HP, but there exists a small cadre of dedicated companies who know the HP 3000 and will help customers who must homestead to get the most from their systems. Over the past 10 years since HP’s announcement of its plan to phase out 3000 support, MBFoster has continued to support its solutions for data access and delivery. We have added products and services that help the HP 3000 application environment. Beyond the data, MBFoster is helping customers with application support — we have expertise to help write reports or modify business logic in COBOL, Fortran, Powerhouse, C, C++, and other legacy languages.
If customer does decide to move from an HP 3000, we have those services, too. We have helped customers moving data since 1985 and with transitioning applications since 2001. We also do a lot of work on planning the transition (contact us for our “build, buy or migrate” webinar ) as well as the decommissioning process: to transfer data to the new application, first for testing and then for production cutover — and then finally to preserve data for historic purposes and compliance reasons.
The word legacy means treasure. And in the case of the HP 3000 the treasure is huge – a highly reliable system that rarely fails (a mean time between reboots is most often measured in years) and reliably runs millions upon millions of transactions across a wide range of industries from education through local government, healthcare, manufacturing, transportation, pharmaceuticals, and retail. At MBFoster we are striving to sustain the HP 3000, and its legacy applications and data, as assets for our customers.
Whether it is a software product, migration project, data services, or project management, MBFoster makes it easy to deliver the right information to the right person at the right time. We work with our customers to streamline IT business operations to reduce costs, improve delivery, and grow revenues for our customers. To call us with questions contact us at 800-ANSWERS (800-267-9377) See us on Facebook at https://www.facebook.com/MBFosterAssociates or on the Web at www.MBFoster.com.
See All Blog Articles