Solving The MysteryThe technology behind the human genome project may do more than unravel the secret of life. B2B exchanges will also benefit from these toolsBy Hank Simon For that knowledge, biotech organizations are creating data warehouses that include other parameters along with the genetic database. And as complex as the genetic database is, the surrounding parameters are orders of magnitude more complicated. To collect the new knowledge about these parameters, biotech companies will build a data warehouse that scientists can use for knowledge discovery. IT experts in the biotech industry will build a data warehouse that ideally contains DNA sequences and genetic information such as gene locations and functions, when genes are active, and genes' metabolic environment across different species. Then the IT expert will define metadata that provides information about the data in the databases that biotech researchers can exploit for knowledge discovery. One potential solution is the use of a variable pattern match. Variable matches can range from simple wildcard searches that match any character or element at a specific location in a search string to complex routines that match elements based on the coexistence of other predefined elements. A complex routine might allow a search request such as "find all street corners that have traffic lights and do not allow right-on-red turns." With the appropriate data representation, a similar genetic pattern match can discover functional analogs among different genes from different species. One pattern matching method relates to how a spelling checker works. The old spelling checkers used a method called the Soundex Algorithm that mapped sounds, such as hard k to letter combinations like k, c, ch, qu, and so forth. The checker maps the letters of a misspelled word to various combinations of letters with the same sound value, matches those to a list of words in a dictionary, and then provides the resulting list back to the user. A similar routine for genetic pattern matching to the genetic equivalent of letter combinations can map the various DNA sequences. Imagine being able to use genetic information to design a genetic treatment for cancer or heart disease that is specific to you, or gaining immunity from diseases by eating a potato or banana rather than undergoing a series of shots. Genetic knowledge discovery will support these developments and others. With these goals in mind, knowledge-management techniques can create a general, genetic data warehouse model for medical, pharmaceutical, and agricultural knowledge discovery. XML for Data RepresentationAnother technology to support the creation of a functional genetic data warehouse is XML. XML, like HTML, can build Web pages. However, XML tags data in a way that any application can use. It provides a general language for representing data in a standard format. More flexible and robust than HTML, XML provides the method for defining the meaning or semantics of the document. XML is already being used in genetics:
These markup languages, and similar ones, can represent the data needed to define a genetic data warehouse used for knowledge discovery. Tim Berners-Lee's description of an intelligent XML application is similar to a functional genetic data warehouse that can learn, remember, and make associations. For example, consider the gene function "hox," which contains the instructions for placement of body parts in different animal species. If the intelligent XML-based genetic data warehouse application has been taught that a "fruit fly" has "genes" and that "hox" is a "gene," then in the future, it should be able to determine that a "fruit fly" with "genes" of "hox" characteristics is a related concept, but with a slightly different syntax. The data warehouse can learn to extend that concept to include other functions and genomes, such as humans, nematodes, and yeast. Now, the data warehouse not only understands the relationship between genome, gene, and hox, but the relationship in meanings between "gene" and "hox" or other gene functions. Imagine this circle of meaning extending to all the genes that have been discovered and mapped - from humans and animals to plants and bacteria. It becomes a massive database of functional genetic knowledge, with associated information that is organized and machine-readable. These innovative uses of XML and knowledge discovery for functional genomics may well point the way for other industries facing the same challenges: discovering relationships with missing information. As a comparison, B2B exchanges rely on an XML foundation. The "science" behind these massive billion-dollar efforts is still new. Knowledge discovery can help discover profitable buyer and supplier trends in the procurement cycle. The biotechnology industry may unlock more than the secrets of your genetic makeup; it may be developing the tools that help you realize the full potential of your business.
Hank Simon (hank.simon@lmco.com) has more than 20 years of experience in IT. He has a Ph.D. in AI, has worked as a chemist, and is currently consulting and writing about XML, WAP, and Bluetooth Web technologies. RESOURCESBioinformatic Sequence Markup Language (BSML)Biopolymer Markup Language (BIOML) Celera Genomics Group (a PE Corp. business) Human Genome Project information IBiomatics (a SAS Institute Inc. company)
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
| ||||||||||||||||||||||||||||||||









