Loading...
 

DDC and automatic classification

In response to news that the Deutsche Nationalbibliothek (DNB) will be replacing manual Dewey-classification with automatic Dewey-classification, the EDUG sent the following letter . The response from the DNB  is published below. Both letters may be useful to other institutions investigating the use of automatic classification.

 

Dear Dr. Niggemann,

I am writing on behalf of the European DDC Users Group (EDUG) in reaction to the announcement1 regarding the future subject cataloguing procedure for publications at the German National Library. EDUG represents DDC users in over 30 European institutions, including nine national libraries, and we work to promote efficient usage and leverage of DDC classification. It is our contention that a uniform use of the system across boundaries allows for more reuse of numbers and leads to more efficient use of cataloguing resources. To this end, we recommend the use of full DDC numbers and the registration of DDC editions in bibliographic records to facilitate copy-cataloguing of DDC classification and to allow for better leverage of DDC numbers for search and navigation in end-user discovery systems.

We recognize, however, that the massive increase in digital resources has made the provision of full DDC numbers for all resources a near impossible task. We applaud your work in automatic classification and your efforts to provide DDC short numbers to all resources regardless of format, but hope you will consider the following points before discontinuing deep classificatory cataloguing with full DDC numbers for print resources.

 

          1. The use of both DDC short numbers and full numbers is fully compatible

The underlying purpose of any library classification is to save the time of the reader by clustering resources according to their aboutness and displaying them in a helpful sequence. One of the key strengths of the DDC is the ability to cluster resources with varying degrees of specificity. Full DDC numbers are nested within increasingly shorter numbers representing broader subject categories. In other words, the use of both DDC short numbers and full numbers in the same catalog is fully compatible. This provides an excellent groundwork for the machine-readable navigation of resources in online public access catalogs, which can serve to counteract the heterogeneity caused by varying degrees of specificity: broad searches can be supplemented by navigating to specific subjects (where recall will be lower but precision higher) and vice versa. The discontinuation of providing full numbers, on the other hand, will only serve to eliminate the use of the DDC for the search and navigation of specific subjects.

For example, at time of writing, a search for the DDC full number 618.92238 (asthma from a pediatric perspective) in the German National Library catalog yields 10 resources. This cluster of 10 belongs to a broader cluster represented by the DDC short number 618.92 (pediatrics), which yields over 13.000 resources. A discontinuation of full numbers would render the DDC practically useless to users looking to navigate and retrieve resources on specific subjects such as asthma and other respiratory related illnesses within pediatrics.

          2. DDC full numbers include built numbers, which provide additional subject access points

The DDC is a synthetic system in which numbers may be built upon in order to express secondary facets of a subject. This allows for the potential clustering and navigation of documents across other axes than just hierarchy, such as by place or historical period. Indeed, with DDC short numbers, historical periods are usually not represented.

The building of numbers is a time-consuming process for classifiers, and we have recently begun to discuss methods for making it more resource efficient while retaining the potential built numbers provide for end-user discovery. For example, some institutions now rely on the crowdsourcing of built numbers to facilitate their re-use, while others have begun to register full unbuilt numbers with independent number components in bibliographic records. We are grateful for the German National Library’s inspiring efforts to register and take advantage of number components. The EDUG will continue to explore the ramifications of the various techniques on search and retrieval. 

          3. Full DDC numbers may serve as a multilingual hub in the Semantic Web

We are at a precarious time in bibliographic cataloguing as we slowly move away from MARC and look towards Linked Data solutions. The members of EDUG believe that the DDC has a central role to play in the bibliographic representation of subjects in the Semantic Web. To this end, EDUG continues to fight for a central solution for the representation of the DDC as Linked Data and has promoted the mapping of subject vocabulary systems to the DDC with the idea that the DDC may serve as a hub for subject navigation across languages.

So far, all external vocabularies (including the GND) that have been mapped to the DDC have been mapped to full numbers. Indeed, we are investigating whether the more specific a DDC number, the more likely it is that all the terms mapped to the same number are equivalent or near equivalent. We already know that terms mapped to the same number create a cloud of terms representing similar concepts. For example, as of today there are 3 German-language GND terms, 2 English-language LCSH, 1 English-language BISAC and 4 Norwegian-language Humord terms mapped to 371.1024 (classroom management), all of which represent concepts closely related to the management of classrooms in education systems. In addition, the Relative Index terms connected to the number are translated into at least eight languages. In our view, this shows the potential of the DDC as a powerful multilingual knowledge organization tool and opens up for the potential discovery of resources outside library catalog environments with the use of various language verbal starting points, provided the resources have full DDC numbers. 

 

As you have stated, cataloguing at the German National Library is henceforth to be seen as a cyclical process. We hope that we can work together to explore good practices for automatic classification and to promote a standard use of short numbers where applicable. We also hope that you retain the practice of deep classification of print resources with full DDC numbers for all the reasons stated above. With more time and experience in automatic classification, DDC short numbers may become longer; discontinuing the use of full numbers is far more difficult to reverse.

 

Kind regards on behalf of EDUG,

 

Elise Conradi

EDUG Secretary

Oslo, 31. July, 2017 

 

European DDC Users Group (EDUG)

edug.pansoft.de

 

1 http://www.dnb.de/EN/Erwerbung/Inhaltserschliessung/grundzuegeInhaltserschliessungMai2017.html

 

EDUG statement regarding future subject cataloguing at the DNB

Dear Elise Conradi, dear Members of EDUG,


Thank you for your mail with EDUG’s Statement for regarding future subject cataloguing at the DNB.

I must first apologize for the very delayed reply. Elisabeth Niggemann, the Director General of DNB, asked me to take care of it since subject cataloguing is one of the areas of work in the domain of acquisition and cataloguing I am heading.

In your letter you adressed the topics of compatibility of DDC notations, specificity of subject search and the Semantic Web, but let me first state that the DNB will not abandon the use of DDC but will apply it in a different way considered more approriate to meet the challenges the DNB currently faces. The most pressing of these challenges is the vastly increasing amount of digital publications to be collected and catalogued by the DNB. In the next 5 years we expect our digital collection to sextuple to 26 million documents. At the same time the physical collection continues to grow at about the same pace as before.

The DNB has the goal to create subject access points not only for a sector of its collection, but for as many documents as possible and do this in a homogenuous way for all classes of documents. It seems obvious that this can only be achieved by implementing novel ways of creating metadata. The DNB has therefore started to develop and implement automated procedures to assign subject metadata to documents.

It has gained positive experiences with the automated assignment of DDC subject categories (used to structure the German National Bibliography) as well as subject headings in the recent years. This encouraged us to proceed on this path.

Whereas it is generally assumed that it is not possible to assign complex notations automatically this should be feasible for short DDC notations. The short notations DNB has used for classifying documents in medicine since it started applying DDC were assigned automatically to more than 132.000 documents by now with acceptable results.

The set of short notations that is currently devised for all subjects will be fully compatible with the DDC and reflects the literary warrant of the DNB. The resulting subject data in catalogue records are of course provided via DNB’s metadata services for potential re-use.

The use of short DDC notations has to be seen in conjunction with the application of subject headings. It is true that short notations do not provide aspects like time or place. But if these are relevant in a document they could also be represented by a heading. A primary use of classificatory elements in our concept is to structure a set of documents, disambiguation being another one.

In spring of 2017 the DNB conducted a survey among customers using its metadata services. Some of the questions concerned the re-use and relevance of the certain elements in the catalogue records provided by DNB. It turned out that customers considered DDC notations as little relevant, whereas subject headings were ranked 4th out of 20 elements and are widely re-used.

This result underpins DNB‘s decision to expand indexing using subject headings from the Integrated Authority File (Gemeinsame Normdatei/GND). We view the GND as an important contribution to the Semantic Web, because of the appropriate structure of GND authority records, which also allows linking to other systems of knowledge organisation, be they classifications or standardized vocabularies in German or in other languages. This includes bridges to DDC.

The DNB recognizes the value of DDC as one of the world’s most successful universal classification systems and the international cooperation of DDC users. It intends to further contribute to this network, and it will continue to operate the German DDC agency providing the continuous translation of DDC into German and supporting the products and tools like WebDewey Deutsch.
DNB’s current concept of subject cataloguing aims at combining the two roads of subject cataloguing – indexing and classification – in an efficient way in order to achieve comprehensive subject access.

We hope our motivations are comprehensible. Please do not hesitate with further inquiries!

With best regards
Ulrike Junger