The big trend in database is the effort to XML-ize Semi- structured data. This is like a big SOAP project in both senses of the Word. Many of the control structures and procedural setting borrow from SOAP. But it is also an attempt by the major Relational Database vendors such as IBM, Microsoft, Oracle and Sybase among others to make their SQL-based silos more transparent and integrated with huge semi-structured data that is rapidly accumulating (29% annual growth rate according to Infoworld) in corporate data centers. The vision is to make XML the linqua franca between realational and semi-structured data sources such as Office files, email, plus huge gobs of text and numbers in various formats including backup, image files capable of scannng, application data that sits in specialized non-relational data formats (think statistical, CAD, architecturaland huge chunks of raw data from process control, logging and other realtime systems.
Other Semi-Structured Players
Well while the relational database players try to structure and integrate to semi-structured via XML, Autonomy has been working with semi-structured data directly and offers a wide range of services that deal directly with large semi-structured data – Web Content management and eCommerce, Rich Media Resource+Asset Control, Archiving and Records Management , Business PaperFlow and Process Management, plus Pan-Enterprise Search with IDOL. Autonomy has added Cardiff, Interwoven and Telium among other semi-software to its portfolio of broadening vendor options. Like Google, Autonomy sits on a superfast and secure indexing and searching engine, its IDOL hardware/software layer. This is the point of integration and departure for its services. Worth a look if content management, search, discovery, and government mandated data recovery on huge quantities of information are among your requirements.
Google has been in the massive semi-structured search business using heavily modifed and optimized hardware and software. They have released various black boxes for sale which which provide search engine facilities for medium to extremely large corporates. But of equal interest is the software tools that keep sprouting at an astonishing rate:
GWT-Google Web Toolkit provides Java coders with means to produce JavaScript UI runtimes-it works.
Google Gears – allows users to run Web pages offline with near-online functionality and then resynch
Android – is mobile phone and netbook Operating System with snazzy UI and App Store
Google App Engine – allows you to run your Web Aplications on Google infrastructure
Google Language API – one of many APIs that allow access to Google tools and services
Google Base – an API that allows users access to the Google Base or semi-structured database engine
These are only six of literaly 4 dozen API and products that Google uses in many cases itself to manage and operate its asset base. Ignore these at your peril.
Microsoft has SharePoint Server, its fastest growing server product, that allows users to integrate Microsoft and some other applications and filetypes in a shared and collaborative environ accessible from Internets and Intranets. You cannot bypass the fastest growing piece of MS software.
Summary
The flood of semi-structured and unstructured data that is rising almost rampantly in data centers everywhere has now become the major IT task over the past 2-4 years. IT managers can be assured it will stay on the top of the list as long as the data continues to grow at such rapid rates. The challenge will be to not just cope with the data but to manage it – that means not just able to search and access the data but to either disciplinely retire it (condense and backup) or effectively mine it(actively plumb through it for management and operational insights and information). So far the database community (read IBM, Microsoft, Oracle, Sybase, etc) has delivered mixed results in managing structured data – so be careful who you choose to manage the incoming semi-structured and unstructured data waves.