W3C Semantic Web Frequently Asked Questions (2)
来源:互联网 发布:dns使用的端口号 编辑:程序博客网 时间:2024/05/18 16:14
3. How do I participate in the Semantic Web?
The Semantic Web is about a web of data. The data itself can reside in databases, spreadsheets, Wiki pages, or indeed traditional web pages.
The challenge is to develop tools that can “export” these data into RDF form: RDF plays the role of a common model, as a kind of a “glue” to integrate the data. That does not mean that the data must be physically converted into RDF form and stored in, say, RDF/XML. Instead, automatic procedures, for example SQL to RDF converters for relational databases, GRDDL processors for XHTML files with microformats etc, can produce RDF data on-the-fly as an answer to, eg, queries. RDF data may also be included in the data via other tools (e.g, Adobe’s XMP data that gets automatically added to JPEG images by Photoshop). Authoring tools also exist to develop, eg, ontologies on a high level instead of editing the ontology files directly. Of course, direct editing of RDF data is sometimes necessary, but it can be expected to become less and less prevalent as smarter editors come to the fore.
Clearly, lots of development is still to be done in this area, and it is a subject of active Research and Development. The goal is to reuse, as much as possible, existing data in its existing form, and minimize the RDF data that has to be created manually.
The Semantic Web provides an application framework that extends the current Web, does not replace it. That also means that the current infrastructure of firewalls, various levels of protections, encryption, etc, remain in place. If, for whatever reason (privacy, business, etc), the data should be kept behind the firewall on the Intranet, rather than being in the open, this just means that that particular Semantic Web application operates on the Intranet. This is not unlike the development of the traditional Web, the usage of Web Services, etc: a number of applications were developed to be used behind corporate firewalls; some of them migrated later to the full Web, some other stayed behind the firewall. The same is valid for Semantic Web applications.
There are several lists on the Web that give a more-or-less comprehensive overview of the various available tools. There is a Wiki page on the W3C ESW Wiki site that is maintained but the W3C staff as well as the community at large. This page includes references to programming environments, validators that can be used to validate RDF/XML data or OWL ontologies, SPARQL endpoints, specialized editors or triple databases. It also includes references to other lists, like Dave Beckett's Resource Description Framework (RDF) Resource Guide or the tool list maintained at the Freie Universität Berlin.
In general most of the tools are of a good quality already. On the open source domain Jena or Redland, for example, can easily be compared to xerces in their widespread usage and richness of features; databases like Sesame are also in widespread use and have undergone a very thorough development in the past few years. There are more and more commercial tools, including editors, specialized databases, content management systems, ontology creation and validation tools, etc. The Wiki page on the W3C ESW Wiki site gives a good overview of most of those.
Obviously, there is room for improvement. SW is a younger technology than XML and it still needs time to catch up and have tools of the same maturity and efficiency level than the XML World. However, huge improvements have already been made in the past few years in all areas, and large-scale enterprise deployment is also happening already. In general: availability of tools is not a reason any more for not developing Semantic Web applications…
Unfortunately, it is currently not possible to incorporate full RDF into XHTML without violating the validity of the resulting XHTML, except for the usage of the meta
and the link
elements in the header.
The best solution is to store the RDF separately and use the URIs to refer to the XHTML page and the link
element in the XHTML page to refer to the RDF content. This technique is often called an RDF autodiscovery link and is used by a number of tools already.
However, work is going on for a better integration of RDF into documents. The GRDDL Working Group has recently to developed a “bridge” to the microformats approach, and the Semantic Web Deployment group’s work on RDFa develops an additional XHTML1.1 module that gives the possibility to use virtually any RDF vocabularies as annotations of the XHTML content. Finally, eRDF (developed by Talis) offers a formalism somewhere between the two: one can add general RDF data to an (X)HTML page without problems with validity, although with restrictions on the type of RDF vocabularies that can be used this way.
This is one of the active areas of R&D, and no final answer is yet available. In general, methods exist to convert RDF queries (e.g., in SPARQL) into SQL queries on-the-fly; ie, the RDB looks like an RDF store when queried by an RDF tool. The details of the mapping from Relational Tables to RDF notions is usually described for a specific database using either a small ontology and/or a set of rules; this is the only manual information to be generated for the conversion. General solutions begin to emerge, but work still has to be done (and is part of the future plans of W3C). See the W3C Wiki page for further details.
Dave Beckett's Resource Description Framework (RDF) Resource Guide gives a quite comprehensive list of references to Semantic Web related articles. The home page of the Semantic Web Activity lists all the recommendations, gives references to some of the presentations, articles, etc, that have been given by the W3C staff or the members of the working groups on the subject. A separate page lists a number of tutorials that might be of interest.
The (now defunct) Semantic Web Best Practices and Deployment Working Group has produced a number of notes that might be useful when developing ontologies, setting up servers to serve RDF data, using XML Schema datatypes with RDF, etc. The newly chartered Semantic Web Deployment Working Group will continue developing similar documents.
A number of books have also been published. A list of books is given on W3C’s Wiki site, comprising (at this moment) over 40 books in different languages, published by major publishers like O’Reilly, MIT Press, Cambridge University Press, Springer Verlag, …
There are a number of conference series that are either dedicated to the Semantic Web or which always have a significant Semantic Web track. The best known are:
- The “International Semantic Web Conference” series is a yearly event that publishes its proceedings by Springer (the proceedings are online since 2006). While these conferences typically circulate around the globe, the “European Semantic Web Conference” and the “Asian Semantic Web Conference” series are held somewhere in Europe, respectively in Asia.
- The “International World Wide Web Conference” is a major yearly conference on World Wide Web Technologies in general, which always has a strong Semantic Web track both for the academic and the developers’ communities. Look at the page of the organizing committee for further details on these conferences and links to their proceedings.
There are several portals that collect information on existing ontologies. A good example is SchemaWeb. Another one is the “PingTheSemanticWeb” service which collects information about new RDF documents on the Web based on “pings” sent by applications generating data and on RDF autodiscovery links found by people browsing the Web. It currently contains information about ~7 million RDF files. There is also a search engine, called Swoogle, which specializes on searching Semantic Web documents.
You can have a human-readable display of RDF data by using RDF data browsers like the Tabulator, Disco, or the OpenLink RDF Browser, and web browser extensions like PiggyBank or the Semantic Radar. While end users will not have a need to see Semantic Web data (instead they will benefit from better information systems built on top of it) it may be helpful to developers to be aware of Semantic Web data directly so that they can use this information in their applications.
The W3C Semantic Web Interest Group is one of those and probably the best place to join first. It is a public mailing list and is also active on the #swig IRC channel Freenode.
There are also various grass-root communities that concentrate on some specific aspects or goal around the Semantic Web. As an example:
- DOAP: a project to describe information about open-source software projects
- FOAF: a project to describe information about people and their social relations (see also the #foaf IRC channel on Freenode)
- SIOC: a project to describe information about online community sites (blogs, bulletin boards, …) and use this information to connect these sites together.
- Linking Open Data on the Semantic Web: is project whose goal is to make various open data sources available on the Web as RDF and to set RDF links between data items from different data sources.
Another source is the PlanetRDF Blog aggregator that aggregates the blogs of a number active Semantic Web developers from around the World.
4. Questions on RDF, Ontologies, SPARQL, Rules…
RDF—the Resource Description Framework—is a standard model for data interchange on the Web. RDF has features that facilitate data merging even if the underlying schemas differ, and it specifically supports the evolution of schemas over time without requiring all the data consumers to be changed.
RDF extends the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link (this is usually referred to as a “triple”). Using this simple model, it allows structured and semi-structured data to be mixed, exposed, and shared across different applications.
This linking structure forms a directed, labelled graph, where the edges represent the named link between two resources, represented by the graph nodes. This graph view is the easiest possible mental model for RDF and is often used in easy-to-understand visual explanations.
The “RDF Primer” is a good material for further reading on RDF.
RDF statements (or triples) can be encoded in a number of different formats, whether XML based (e.g., RDF/XML) or not (Turtle, N-triples, …). In general it does not really matter which of these formats (or serializations) are used to express data—the information is represented in RDF triples and the particular format is only the “syntactic sugar”. Most RDF tools can parse several of these serialization formats.
Compare to “numbers” as opposed to “numerals”. Numbers are mathematical concepts; numerals are a representation thereof using Roman, Arabic, hexadecimal, octal, etc, representations. Some of those representations (like Roman) may be very complicated, some of those may be simpler or more familiar, but they all represent the same abstract concept.
No. The fundamental model of RDF is independent of XML. RDF is a model describing qualified (or named) relationships between two (Web) resources, or between a Web resource and a literal. At that fundamental level, the only commonality between RDF and the XML World is the usage of the XML Schema datatypes to characterize literals in RDF.
Note that one of the serialization formats of RDF is indeed based on XML (RDF/XML), and this is probably the most widely used format today. But others exist, see the separate question on RDF representation.
The Semantic Web standards follow the design principles of the Web in order to allow the growth of a planet-wide collection of semantically-rich data. The key element of this design is the use of Web addresses (URIs) to name things. Because the meaning of a term in a language without central control becomes established by its consistent use to achieve the same effect, and URIs are used around the World to access web pages, the Web is used to establish globally-shared meaning for URIs in the Semantic Web. (This is what people mean when they say RDF URIs are “grounded” in the Web.)
As with the Web in general, this approach allows the Semantic Web to grow and evolve without any central control or authority, but while still maintaining as much consistency and authorial control as needed for particular applications or particular enterprises. The techniques for doing all this are still evolving, but ideally whenever anyone sees a Semantic Web URI they can use it in their browser and see authoritative documentation about its use. Moreover, whenever some software encounters a URI in a Semantic Web context, it can dereference it and find an ontology which precisely specifies how the term is related to other terms. The software may thus learn and exploit new terms which are synonymous with terms it already knows, or related in more complex and useful (but logically precise) ways.
All this results in the ability to find and correctly merge data from multiple sources, sometimes even when they are provided with different ontologies.
“In the Semantic Web, it is not the Semantic which is new, it is the Web which is new” Chris Welty, IBM
The W3C Data Access Working Group has developed the SPARQL Query Language. SPARQL defines queries in terms of graph patterns that are matched against the directed graph representing the RDF data. SPARQL contains capabilities for querying required and optional graph patterns along with their conjunctions and disjunctions. The result of the match can also be used to construct new RDF graphs using separate graph patterns.
SPARQL can be used as part of a general programming environment, like Jena, but queries can also be sent as messages to a remote SPARQL endpoints using the companion technologies SPARQL Protocol and SPARQL Query Result in XML. Using such SPARQL endpoints, applications can query remote RDF data and even construct new RDF graphs, without any local processing or programming burden. For more questions on SPARQL, see also the separate FAQ on SPARQL.
SPARQL is a query language developed for the RDF data model; queries themselves look and act like RDF. I.e., the queries are independent of the physical representation of the RDF data (the structure of the databases, their representation in an RDF/XML file, etc). If query was done via, for example, XQuery, the application would have to know how that particular RDF data exactly represented as RDF/XML (and RDF/XML is only one of the possible serialization of the RDF data).
Ontologies define the concepts and relationships used to describe and represent an area of knowledge. Ontologies are used to classify the terms used in a particular application, characterize possible relationships, and define possible constraints on using those relationships. In practice, ontologies can be very complex (with several thousands of terms) or very simple (describing one or two concepts only).
Not all relationships can be expressed in terms of ontologies, however. The goal of the current work on rules at W3C is to provide an alternative framework to express logical constraints on relationships.
An example for the role of ontologies or rules on the Semantic Web is to help data integration when, for example, ambiguities may exist on the terms used in the different data sets, or when a bit of extra knowledge may lead to the discovery of new relationships.
A general example may help. A bookseller may want to integrate data coming from different publishers, possibly from different countries. The data can be imported into a common RDF model, eg, by using converters to the publishers’ databases. However, one database may use the term “author”, whereas the other may use the French term “auteur”. To make the integration complete, and extra “glue” should be added to the RDF data, describing the fact that the relationship described as “author” is the same as “auteur”. This extra piece of information is, in fact, an ontology, albeit an extremely simple one.
Broadly speaking, inference on the Semantic Web can be characterized by discovering new relationships. As described elsewhere in this FAQ, the data is modeled as a set of (named) relationships between resources. “Inference” means that automatic procedures can generate new relationships based on the data and based on some additional information in the form of an ontology or a set of rules. Whether the new relationships are explicitly added to the set of data, or are returned at query time, is simply an implementation issue.
A simple example may help. The data set to be considered may include the relationships (Flipper isA Dolphin)
. An ontology may declare that “every Dolphin
is also a Mammal
”. That means that a Semantic Web program understanding the notion of “X
is also Y
” can add to the set of relationships the statement (Flipper isA Mammal)
, although that was not part of the original data. One can also say that the new relationship was “discovered”.
It depends on the application. The answer on the role of ontologies and/or rules includes a very simple ontology example. Some applications may decide not to use even such small ontologies, and rely on the logic of the application program. Some application may choose to use very simple ontologies like the one described, and let a general Semantic Web environment use that extra information to make the identification of the terms. Some applications need an agreement on common terminologies, without any rigor imposed by a logic system. Finally, some applications may need more complex ontologies with complex reasoning procedures. It all depends on the requirements and the goals of the applications.
The current Semantic Web technologies offer a large palette of languages to describe simple or complex terminologies: RDF Schemas, SKOS, or various dialects of OWL (OWL Lite, OWL DL, OWL Full). These different technologies differ in expressiveness but also in complexity: applications have a choice (RDF Schemas represent the simplest ontology level, OWL Full being the most complex one, SKOS when less rigorous terminologies, glossaries, are to be used, etc). They also have a choice of not to use any of those; the usage of ontologies is not a requirement for Semantic Web applications.
Note that there is an active area of development of defining other “dialects” of ontology languages (refer to acronyms as pD*, OWL Tiny, OWL Lite-, …) targeting a minimal level of ontology that might be just a little bit more expressive than RDF Schemas. The general goal is to minimize the weight of using ontologies in Semantic Web applications. Also, the current work on rules at W3C may lead, eventually, to the alternative of using some simple rules instead of (or as an extra to) ontologies.
No. What the Semantic Web technologies do is to define the “language” with well understood rules and internal semantics, ie, RDF Schemas, various dialects of OWL, or SKOS. Which of those formalisms are used (if any) and what is “expressed” in those language is entirely up to the applications. Ontologies may be developed by small communities, from “below”, so to say, and shared with other communities.
Obviously, that would not be feasible. If ontologies are used, they can come from anywhere and be mixed freely. In fact the “ethos” of the Semantic Web is to share and reuse as much as possible, and lot of work is done to semi-automatically bridge different vocabularies. Typical Semantic Web applications mix ontologies developed by different communities on the Web, like the Dublin Core metadata, FOAF (friend-of-a-friend) terms, etc.
The Semantic Web’s attitude to ontologies is no more than a rationalization of actual data-sharing practice. Applications can and do interact without achieving or attempting to achieve global consistency and coverage. A system that presents a retailer’s wares to customers will harvest information from suppliers’ databases (themselves likely to use heterogeneous formats) and map it onto the retailer’s preferred data format for re-presentation. Automatic tax return software takes bank data, in the bank’s preferred format, and maps them onto the tax form. There is no requirement for global ontologies here. There isn’t even a requirement for agreement or global translations between the specific ontologies being used except in the subset of terms relevant for the particular transaction. Agreement need only be local, but adoption of vocabulary from existing ontologies facilitates data sharing and integration.
The real difficulty, when developing an ontology, is to understand the problem that has to be modeled and find an agreement on a community level. RDF Schemas and/or OWL provide a framework to formalize those ontologies in a specific language; the time and energy needed to learn and use them is only a fraction of the time needed to develop an ontology itself, ie, understand the terms and the relationships of given area of knowledge and agree with your peers. Ontology development tools, like Protégé or SWOOP, hide most of the syntax complexity and let the user concentrate on the real representation issues.
In general, ontologies should be created and maintained by various, specialized communities. The preference of W3C is to let these other communities develop their own ontologies; this is the case for well known ontologies like the Dublin Core, FOAF, DOAP, etc.
There are cases, however, when ontologies are developed at W3C. This is the case when, for example, another W3C technology needs its own, specialized ontology (CC/PP or EARL are good examples), when W3C feels that the existence of a particular ontology is crucial for the advancement of the Semantic Web, or when the community prefers to use, for example, the facilities offered by the Incubator Activity of W3C.
Major datasets (or access to existing datasets) are created quite often these days. Just some examples:
- The DBpedia community effort to query Wikipedia like a database (see also a more detailed blog entry on this project)
- IngentaConnect bibliographic metadata (around 200 million triples!)
- RDF/OWL representation of Wordnet
- eBusiness ontology for products and services: eClassOwl
- the Gene Ontology, to describe gene and gene products attributes in any organisms
- protein sequence and annotation data: UniProt
- Geonames Ontology and associated RDF data: geographical features (e.g., information on the city of Berlin) encoded in RDF
Note also that one of the “Community Projects” sponsored by the W3C Semantic Web Education and Outreach Interest Group, namely the “Linking Open Data on the Semantic Web” project, aims at making various open data sources available on the Web as RDF and to set RDF links between data items from different data sources.
- W3C Semantic Web Frequently Asked Questions (2)
- W3C Semantic Web Frequently Asked Questions
- Frequently Asked Questions for System.Web.Mail
- WinCE Frequently Asked Questions
- SOA Frequently Asked Questions
- Frequently Asked Questions
- Frequently Asked Questions(MPICH2)
- C# Frequently Asked Questions
- Binder Frequently Asked Questions
- USB Frequently Asked Questions
- Frequently Asked Questions (bouncycastle)
- Frequently Asked Questions
- perf_events Frequently Asked Questions
- OSPF: Frequently Asked Questions
- ovs:Frequently Asked Questions
- perf_events Frequently Asked Questions
- Bank Setup Frequently Asked Questions
- Frequently Asked Questions About Java
- Liferay Portal额外研究(一):初步在新Tomcat下部署
- 关于java的hashCode方法
- 一段把文件转为Base64编码和还原的代码
- W3C Semantic Web Frequently Asked Questions
- 学习javascript:第一课
- W3C Semantic Web Frequently Asked Questions (2)
- 毁灭性的灾难....
- Delphi for PHP 评测
- Microsoft Windows Server Code Name 'Longhorn' Beta 3
- FreeBSD Security Advisory FreeBSD-SA-07:03.ipv6
- 判断checkbox是否被选中
- 在一个框架中,如何用window.location='';的命令返回主页,而不是在框架中返回主页
- 8583例子
- FireFox必备插件(十九)--MeasureIt