Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home



December 05, 2000



Solving The Mystery

The technology behind the human genome project may do more than unravel the secret of life. B2B exchanges will also benefit from these tools

By Hank Simon

For that knowledge, biotech organizations are creating data warehouses that include other parameters along with the genetic database. And as complex as the genetic database is, the surrounding parameters are orders of magnitude more complicated. To collect the new knowledge about these parameters, biotech companies will build a data warehouse that scientists can use for knowledge discovery.

IT experts in the biotech industry will build a data warehouse that ideally contains DNA sequences and genetic information such as gene locations and functions, when genes are active, and genes' metabolic environment across different species. Then the IT expert will define metadata that provides information about the data in the databases that biotech researchers can exploit for knowledge discovery.

One potential solution is the use of a variable pattern match. Variable matches can range from simple wildcard searches that match any character or element at a specific location in a search string to complex routines that match elements based on the coexistence of other predefined elements. A complex routine might allow a search request such as "find all street corners that have traffic lights and do not allow right-on-red turns." With the appropriate data representation, a similar genetic pattern match can discover functional analogs among different genes from different species.

One pattern matching method relates to how a spelling checker works. The old spelling checkers used a method called the Soundex Algorithm that mapped sounds, such as hard k to letter combinations like k, c, ch, qu, and so forth. The checker maps the letters of a misspelled word to various combinations of letters with the same sound value, matches those to a list of words in a dictionary, and then provides the resulting list back to the user. A similar routine for genetic pattern matching to the genetic equivalent of letter combinations can map the various DNA sequences.

Imagine being able to use genetic information to design a genetic treatment for cancer or heart disease that is specific to you, or gaining immunity from diseases by eating a potato or banana rather than undergoing a series of shots. Genetic knowledge discovery will support these developments and others. With these goals in mind, knowledge-management techniques can create a general, genetic data warehouse model for medical, pharmaceutical, and agricultural knowledge discovery.

XML for Data Representation

Another technology to support the creation of a functional genetic data warehouse is XML. XML, like HTML, can build Web pages. However, XML tags data in a way that any application can use. It provides a general language for representing data in a standard format.

More flexible and robust than HTML, XML provides the method for defining the meaning or semantics of the document. XML is already being used in genetics:

  • Bioinformatic Sequence Markup Language (BSML) graphically describes genetic sequences and methods for storing and transmitting encoded sequence and graphic information.
  • Biopolymer Markup Language (BIOML) is a data type definition for the annotation of molecular biopolymer sequence information and structure data.

These markup languages, and similar ones, can represent the data needed to define a genetic data warehouse used for knowledge discovery. Tim Berners-Lee's description of an intelligent XML application is similar to a functional genetic data warehouse that can learn, remember, and make associations. For example, consider the gene function "hox," which contains the instructions for placement of body parts in different animal species. If the intelligent XML-based genetic data warehouse application has been taught that a "fruit fly" has "genes" and that "hox" is a "gene," then in the future, it should be able to determine that a "fruit fly" with "genes" of "hox" characteristics is a related concept, but with a slightly different syntax.

The data warehouse can learn to extend that concept to include other functions and genomes, such as humans, nematodes, and yeast. Now, the data warehouse not only understands the relationship between genome, gene, and hox, but the relationship in meanings between "gene" and "hox" or other gene functions. Imagine this circle of meaning extending to all the genes that have been discovered and mapped - from humans and animals to plants and bacteria. It becomes a massive database of functional genetic knowledge, with associated information that is organized and machine-readable.

These innovative uses of XML and knowledge discovery for functional genomics may well point the way for other industries facing the same challenges: discovering relationships with missing information. As a comparison, B2B exchanges rely on an XML foundation. The "science" behind these massive billion-dollar efforts is still new. Knowledge discovery can help discover profitable buyer and supplier trends in the procurement cycle. The biotechnology industry may unlock more than the secrets of your genetic makeup; it may be developing the tools that help you realize the full potential of your business.



Hank Simon (hank.simon@lmco.com) has more than 20 years of experience in IT. He has a Ph.D. in AI, has worked as a chemist, and is currently consulting and writing about XML, WAP, and Bluetooth Web technologies.




RESOURCES

Bioinformatic Sequence Markup Language (BSML)
Biopolymer Markup Language (BIOML)
Celera Genomics Group (a PE Corp. business)
Human Genome Project information
IBiomatics (a SAS Institute Inc. company)








IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







techweb
Online Communities TechWebInformationWeekLight ReadingIntelligent EnterprisebMightyNetwork ComputingDark ReadingDigital LibraryWall Street & Technology
Byte & SwitchNo JitterInternet EvolutionLight Reading's Cable Digital NewsContentinopleUnStrungBank Systems & TechnologyAdvanced TradingInsurance & Technology
Face-to-Face Events
InteropWeb 2.0 ExpoWeb 2.0 SummitVoiceConBlack HatCSISoftwareEntrprise 2.0 ConferenceGTEC
Mobile Business Expo
InformationWeek 500 ConferenceBuy Side Trading XchangeBuy Side Trading SummitBank Executive SummitInsurance Executive SummitTelcoTVEthernet ExpoOptical Expo
Magazines  
InformationWeekWall Street & TechnologyInsurance & TechnologyBank Systems & TechnologyAdvanced TradingMSDNTechNetSmart EnterpriseThe Architecture JournalDatabase Magazine
 
Research & Analyst Services  
Heavy ReadingInformationWeek ReportsInformationWeek Analytics