Adds support for dcterms:tableOfContents
.
In rare situations a dcterms:creator
contains no pgterms:agent
, resulting
in a 0
value for Agent IDs. The issue is with a few RDFs where author data
is instead added to an associated marcrel
node.
When this happens, the Agent ID for a creator
can be extracted from its
rdf:resource
attribute.
Rename various RDF
and Ebook
struct fields to align with the naming
used in the official Project Gutenberg gitenberg Python tooling:
SrcPublicationInfo
->PublicationNote
Edition
->EditionNote
Credits
->ProductionNotes
SourceDescription
->PhysicalDescriptionNote
LOC
->LCCN
Many of the previously added marc
tags have now been made available in
the Ebook
struct, for example, MARC 546 (language notes), MARC 250 (edition),
and MARC 904 (source links).
Various tags in the RDFs may have multiple entries but they were previously
handled as single tags. The un/marshalers have been updated to handle these
and the Ebook
struct updated to support slices where needed. Some fields
have also been renamed.
OtherTitles
has been renamed toAlternateTitles
.Language
is now a slice namedLanguages
and includes only an alpha 2 or alpha 3 language code string, meaning the dialect and language notes have been moved to their own struct fields.BookCoverFilename
is now a slice namedBookCovers
.Note
is now a slice namedNotes
.WebPage
inCreator
is now a slice namedWebPages
.
- Handle the case when an RDF uses
Various
inmarc906
instead of a year number.
- Use a custom type for the
BookType
for all know types in the PG collection. - Improve unmarshaller to handle both CR and LF for
dcterms:title
/dcterms:alternative
.
- Renamed
MarcRelatorCode
toMarcRelator
. - Added the remaining
marcrel
tags:marcrel:pbl
,marcrel:adp
,marcrel:pht
, etc.- Matching what are found in the current PG collection.
- Which also means supporting empty tags, e.g.
<marcrel:adp rdf:resource="2009/agents/1" />
.
- The order of the marshalled RDF XML tags has been change to make it a little easier for humans to find information about the work.
- The RDF marshaller structs now have some useful comments.
- General cleanup and improvements related to the above topics.
IMPORTANT: this release contains breaking changes!
pgrdf.ToRDF()
has been renamed toWriteRDF()
pgrdf.NewEbook()
has been renamed toReadRDF()
- The
Language
field onEbook
has changed from astring
type topgrdf.Language
.
Additional changes:
- RDF unmarshalling now processes all MARC codes used by PG
- that's all codes found in the 202-11-05
rdf_files.tar.bz2
archive
- that's all codes found in the 202-11-05
- RDF marshalling now includes:
- all missing tags, such as the contributors, and the new marc tags.
- the generated XML is now tested against the
pg11.rdf
sample file.
- WriteRDF function now includes the XML declaration header and fixes the self-closing tags.
- Updated the sample RDF with more fake data
- its number was also changed to a value PG will never use
- Extract MARC Relators data for compilers (
marcrel:com
) - Extract MARC Relators data for contributors (
marcrel:ctb
) - Extract book Series from
pgterms:marc440
- Extract MARC Relators (
marcrel
) data:- editors (
edt
), illustrators (ill
), and translators (trl
). - adding them as
creators
in the JSON output. - this adds a new
role
field to theCreator
object.
- editors (
- Extract
marc901
book cover filename. - Extract
marc907
language locale.- appending the locale to the language field, e.g.
"language": "en-GB"
.
- appending the locale to the language field, e.g.
- Refactoring of
ebook.go
, in particularmapUnmarshalled()
.
- Extract published year from the
marc906
tag.
- Add missing LICENSE file
- First release.