-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
82 additions
and
19 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,27 +1,54 @@ | ||
# DigiPres Toolbox | ||
# A Docker image with some tools pre-installed | ||
FROM python:3.10-bullseye | ||
|
||
RUN pip install --no-cache notebook bash_kernel opf-fido | ||
# | ||
# Note that some blocks are commented out to keep the image size down and launches fast. | ||
# | ||
|
||
# Core Jupyter support: | ||
RUN pip install --no-cache notebook jupyterlab bash_kernel | ||
RUN python -m bash_kernel.install | ||
|
||
RUN apt-get update && apt-get install -y mediainfo default-jre ffmpeg cloc && \ | ||
apt-get install -y cmake pkg-config libicu-dev zlib1g-dev libcurl4-openssl-dev libssl-dev ruby-dev && \ | ||
# Some lightweight tools and support for installing more: | ||
RUN apt-get update && apt-get install -y sudo mediainfo cloc && \ | ||
apt-get clean && rm -rf /var/lib/apt/lists/* | ||
|
||
RUN gem install github-linguist | ||
|
||
RUN curl -s -L -O https://github.com/richardlehane/siegfried/releases/download/v1.11.1/siegfried_1.11.1-1_amd64.deb && \ | ||
dpkg -i siegfried_1.11.1-1_amd64.deb && \ | ||
# Install Siegfried: | ||
ENV SF_VERSION=1.11.1 | ||
ENV SF_DEB=siegfried_${SF_VERSION}-1_amd64.deb | ||
RUN curl -s -L -O https://github.com/richardlehane/siegfried/releases/download/v${SF_VERSION}/${SF_DEB} && \ | ||
dpkg -i ${SF_DEB} && \ | ||
rm -f ${SF_DEB} && \ | ||
sf -update | ||
|
||
RUN curl -s -L -o /usr/share/java/tika-app-2.9.2.jar https://dlcdn.apache.org/tika/2.9.2/tika-app-2.9.2.jar && \ | ||
ln -s /usr/share/java/tika-app-2.9.2.jar /usr/share/java/tika-app.jar | ||
|
||
COPY droid /usr/share/java/droid | ||
RUN ln -s /usr/share/java/droid/droid.sh /usr/local/bin/droid.sh | ||
COPY tika.sh /usr/local/bin/tika.sh | ||
|
||
# Install TRiD: | ||
RUN curl -s -L -O http://mark0.net/download/trid_linux_64.zip && \ | ||
curl -s -L -O http://mark0.net/download/triddefs.zip && \ | ||
unzip trid_linux_64.zip && unzip triddefs.zip && chmod +x ./trid && \ | ||
cp ./trid /usr/local/bin/trid && cp triddefs.trd /usr/local/bin/ | ||
curl -s -L -O http://mark0.net/download/triddefs.zip && \ | ||
unzip trid_linux_64.zip && unzip triddefs.zip && chmod +x ./trid && \ | ||
mv ./trid /usr/local/bin/trid && mv triddefs.trd /usr/local/bin/ && \ | ||
rm -f trid_linux_64.zip triddefs.zip | ||
|
||
# Install Fido: | ||
RUN pip install --no-cache opf-fido | ||
|
||
# Install JRE for Java programs and ffmpeg for a/v formats (c. 0.6GB!): | ||
#RUN apt-get update && apt-get install -y default-jre ffmpeg && \ | ||
# apt-get clean && rm -rf /var/lib/apt/lists/* | ||
|
||
# Install GitHub Linguist and it's build dependencies (c. 0.2GB): | ||
#RUN apt-get update && \ | ||
# apt-get install -y cmake pkg-config libicu-dev zlib1g-dev libcurl4-openssl-dev libssl-dev ruby-dev && \ | ||
# apt-get clean && rm -rf /var/lib/apt/lists/* | ||
#RUN gem install github-linguist | ||
|
||
# Install Apache Tika (needs Java): | ||
#ENV TIKA_VERSION=2.9.2 | ||
#RUN curl -s -L -o /usr/share/java/tika-app-${TIKA_VERSION}.jar https://dlcdn.apache.org/tika/${TIKA_VERSION}/tika-app-${TIKA_VERSION}.jar && \ | ||
# ln -s /usr/share/java/tika-app-${TIKA_VERSION}.jar /usr/share/java/tika-app.jar | ||
#COPY tika.sh /usr/local/bin/tika.sh | ||
|
||
# Install DROID (needs Java) | ||
#COPY droid /usr/share/java/droid | ||
#RUN ln -s /usr/share/java/droid/droid.sh /usr/local/bin/droid.sh | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,38 @@ | ||
# format-id-toolbox | ||
A Docker container with various format identification tools installed. | ||
DigiPres Toolbox | ||
---------------- | ||
|
||
A Docker image designed to make it easy to experiment with tools for Digital Preservation. Designed to be used via the [DigiPres Sandbox](https://github.com/digipres/sandbox) and the [DigiPres Workbench](https://github.com/digipres/workbench). | ||
|
||
## Supported Tools | ||
|
||
### Pre-installed | ||
|
||
Only light-weight tools are pre-installed, so the Docker image size (and hence Sandbox launch times) can be kept low. | ||
|
||
- [Siegfried](https://www.itforarchivists.com/siegfried) (using the 'deluxe' format signatures which includes mutliple sources). | ||
- [File](https://www.darwinsys.com/file/) | ||
- [TrID](http://mark0.net/soft-trid-e.html) | ||
- [MediaInfo](https://github.com/MediaArea/MediaInfo) | ||
- [CLOC](https://github.com/AlDanial/cloc) | ||
|
||
### Verified Installable | ||
|
||
These aren't installed by default, but the [Sandbox](https://github.com/digipres/sandbox) shows how to install them. | ||
|
||
- [Apache Tika](https://tika.apache.org/) | ||
- [DROID](http://digital-preservation.github.io/droid/) | ||
- [Fido](https://github.com/openpreserve/fido) | ||
- [ffmpeg](https://ffmpeg.org) including [ffprobe](https://ffmpeg.org/ffprobe.html) | ||
- [GitHub Linguist](https://github.com/github/linguist) | ||
|
||
### To Consider | ||
|
||
- VeraPDF | ||
- JHOVE | ||
- Handbrake | ||
- MediaConch | ||
|
||
## Inspirations | ||
|
||
- The PLANETS Testbed ([briefing paper](https://www.dcc.ac.uk/guidance/briefing-papers/technology-watch-papers/planets-testbed), [article](https://journal.code4lib.org/articles/83)) | ||
- [VIPER](https://viper.openpreservation.org/) |