-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Toolchain: CONTENTdm compound PDFs #492
Comments
Thanks for submitting the issue, @xing93111. Further detail: If MIK is run instead with the class CdmCompound, compound objects are generated with the directory structure of a Book, except each page is a PDF (instead of a TIFF). These PDFs are OK (not corrupt). As far as we understand, the CdmPdfDocuments class is supposed to merge these page-level PDFs into a single aggregated PDF. The result is a corrupted PDF. Is there anything wrong with the configuration? Or is there a flaw in the toolchain? |
I can't see anything wrong with the configuration. This particular toolchain relies on CONTENTdm's internal functionality to merge the PDF pages into a single document. It used to work fine - for example the PDFs in https://ecuad.arcabc.ca/islandora/object/ecuad%3Acals were generated using it, with this .ini file: https://github.com/MarcusBarnes/mik/blob/master/extras/samples/calendars_config.ini That said, the filegetter was has probably not been tested since the major code cleanup that happened after SFU used the toolchain. The code that fetches the assembled PDF content is here. I suggest dumping the value of the URL generated here and then running it using |
The configuration file here uses |
@xing93111, sorry, that config file was an early one and predates #223. The configuration should use |
... and I've just updated https://github.com/MarcusBarnes/mik/wiki/Toolchain:-CONTENTdm-compound-PDFs. Very sorry about that. |
I used a text editor to open the generated PDF file and found it is not a PDF at all but an XML file.
|
We need to establish that CONTENTdm still supports the ability to join PDF pages into a single multipage PDF file (it may have changed since this code was written). To do that we need to create a request URL using the code below (from here): $get_file_url = $this->utilsUrl .'getdownloaditem/collection/'
. $this->alias . '/id/' . $pointer . '/type/compoundobject/show/1/cpdtype/document-pdf/filename/'
. $document_structure['page'][0]['pagefile'] . '/width/0/height/0/mapsto/pdf/filesize/0/title/'
. urlencode($document_structure['page'][0]['pagetitle']); and see if we get a PDF from the server. So that would look like:
If you use |
If you don't mind sharing your CONTENTdm API URL with me I can take a look. |
@bondjimbond has the URL but it requires a VPN connection. URL: |
@mjordan Here is the output:
|
You need the 'utils' subdirectory. Try:
|
This is the response:
|
At http://digicon.athabascau.ca/cdm/ref/collection/auarchives/id/499, if I wanted to download the entire document as a single PDF, how would I do that? I don't see a link that will allow me to do that. Is there an admin option that turns off that feature, and if so, do you have it turned off? |
I don't see a button allowing to download the entire compound object as a single PDF file and I don't find an option at the backend to turn it on/off. However, for this object: http://digicon.athabascau.ca/cdm/ref/collection/auriver/id/454, it has a download link. But I think it is a single object rather than a compound one. |
Correct, that is a single-file object, not a compound. |
I think the manipulator has some problems. If I configure it like:
It does not work because the output of the MIK is:
It just filtered out the two records in the collection. Then, I changed the manipulator like this:
because I found the object types are
It does work but again I get corrupted PDF files because they are indeed XML files. So I am thinking the manipulators section on this page:https://github.com/MarcusBarnes/mik/wiki/Toolchain:-CONTENTdm-compound-PDFs should not be restricted to
|
@xing93111 can you test compound PDF documents with MIK as it stood prior to #223 and the work that brought MIK in line with coding standards? Try commit 9c6b8c5. The compound PDF toolchain code at that commit is essentially how it stood when SFU migrated its compound PDFs (as far as the compound PDF document code anyway). You will need to adjust your .ini file to use If this works for you, then there is a problem with the current MIK code that we need to fix; if it doesn't, then we need to confirm that your CONTENTdm can produce a single multiplage PDF from single-page PDFs (which we have not done yet) and go from there. @MarcusBarnes does this seem like a reasonable way of narrowing down the problem? Does anyone know of another CONTENTdm instance that we can test against? |
@mjordan I don't see the class named CdmPhpDocuments on this page: https://github.com/MarcusBarnes/mik/tree/9c6b8c537f477fd82f20f3c6ba2563fcd30bd7f5/src/filegetters. I suppose this is the commit you would like me to pull out the code. If no such class, the command line will definitely fail |
@xing93111 You're looking at the current code rather than the code from the earlier commit. In your MIK directory:
Then you'll need to This will take you to the earlier commit... Look in src/filegetters to see what the filename is. |
I still don't see the class. Here are my commands:
Anything wrong? |
I gave you the wrong commit hash. Try |
@mjordan It seems
|
When I check that commit out, |
Also good to run |
@MarcusBarnes got the
|
Do you still get that error after running |
After running |
What do you see if you run |
@xing93111 Following up on @mjordan comment, double check if it's in your composer.json file (it might have been added after the commit that we're working from). If it's not there, over-write your exiting composer.json file with a copy of the latest composer.json file and then run |
The command line works now @MarcusBarnes. However, it still outputs corrupted PDFs as I mentioned above: #492 (comment) |
@xing93111 we could go back further in time until it works, but I am not convinced that your CONTENTdm is the same as SFU's was. Is there any way we can confirm that it can in fact allow a user to download a single multipage PDF from a compound PDF object? |
@xing93111 More specifically... are there any objects where this is the case? And/or, can you try creating compound PDF object in your CDM with the option to download a full one? If this is just a problem with the Athabasca CDM instance, it may be more productive to close the issue and just use the Automator scripts we discussed to convert the PDF pages to TIFFs. |
Sorry, that is exactly what I think we need to confirm before looking closer at the MIK code. If we can confirm that the Athabasca CDM instance can produce multipage PDFs from compound single-page PDF documents, we will have narrowed the issue down to the MIK code, which we can then fix. |
On this page:
https://github.com/MarcusBarnes/mik/wiki/Toolchain:-CONTENTdm-compound-PDFs
I read for compound PDFs,
CdmPhpDocuments
class should be used. However, when I runmik
Then, I went to
mik/src/filegetters
andmik/src/writers
. I found a class namedCdmPdfDocuments
. So I thought maybe there are typos on the document, and changed the class name toCdmPdfDocuments
. However, it still does not work. The output gives corrupted PDFs.This is the collection: http://digicon.athabascau.ca/cdm/landingpage/collection/AUebooks
The following is my configuration ini file:
The text was updated successfully, but these errors were encountered: