Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.3 release, fixes parsing failure and bumps dependencies #15

Merged
merged 11 commits into from
Oct 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 3 additions & 5 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,20 +21,18 @@ jobs:
strategy:
fail-fast: true
matrix:
python-version: ["3.10"]
poetry-version: ["latest"]
os: [ubuntu-latest, macos-latest, windows-latest]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
id: setup-python-step
with:
python-version: ${{ matrix.python-version }}
python-version-file: .python-version
- name: Python Poetry Action
uses: abatilo/actions-poetry@v2
uses: abatilo/actions-poetry@v3
with:
poetry-version: ${{ matrix.poetry-version }}
poetry-version: latest
# Enable tmate debugging of manually-triggered workflows if the input option was provided
- name: Setup tmate session
uses: mxschmitt/action-tmate@v3
Expand Down
9 changes: 5 additions & 4 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,21 +3,22 @@ default_language_version:
python: python3.10
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: 2c9f875913ee60ca25ce70243dc24d5b6415598c # frozen: v4.6.0
rev: cef0300fd0fc4d2a87a85fa2093c6b283ea36f4b # frozen: v5.0.0
hooks:
- id: check-yaml
- id: end-of-file-fixer
- id: mixed-line-ending
- id: trailing-whitespace
exclude: ^md2enex/xml_cache/
- repo: https://github.com/rhysd/actionlint
rev: c6bd06256dd700a45e483869bcdcf304239393a6 # frozen: v1.6.27
rev: 4e683ab8014a63fafa117492a0c6053758e6d593 # frozen: v1.7.3
hooks:
- id: actionlint-system
- repo: https://github.com/executablebooks/mdformat
rev: 08fba30538869a440b5059de90af03e3502e35fb # frozen: 0.7.17
rev: 86542e37a3a40974eb812b16b076220fe9bb4278 # frozen: 0.7.18
hooks:
- id: mdformat
exclude: ^tests/test4/markdown-with-yaml.md$
exclude: ^tests/test4/test4.markdownyaml.md$
additional_dependencies:
- mdformat-gfm
- mdformat-toc
2 changes: 1 addition & 1 deletion .python-version
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.10.14
3.12.7
8 changes: 4 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,14 @@ help: ## Prints out documentation for available commands

install: $(INSTALL_STAMP) ## Install dependencies
$(INSTALL_STAMP): pyproject.toml poetry.lock
@echo $(POETRY)
@if [[ -z $(POETRY) ]]; then echo "Poetry could not be found. See https://python-poetry.org/docs/"; exit 2; fi
@echo "$(POETRY)"
@if [[ -z "$(POETRY)" ]]; then echo "Poetry could not be found. See https://python-poetry.org/docs/"; exit 2; fi
"$(POETRY)" --version
"$(POETRY)" install
touch $(INSTALL_STAMP)

.PHONY: test
test: $(INSTALL_STAMP) unit-test ## Runs all tests
test: $(INSTALL_STAMP) lint unit-test ## Runs all linting and unit tests

.PHONY: unit-test
unit-test: $(INSTALL_STAMP) ## Runs python unit tests
Expand All @@ -48,7 +48,7 @@ format: $(INSTALL_STAMP) ## Format code base
clean: ## Delete any directories, files or logs that are auto-generated
find . -type d -name "__pycache__" | xargs rm -rf {};
rm -f .install.stamp .coverage
rm -rf results dist .ruff_cache .pytest_cache
rm -rf results dist .ruff_cache .pytest_cache export.enex

.PHONY: deepclean
deepclean: clean ## Delete all poetry environments
Expand Down
9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

[![CI](https://github.com/karloskalcium/md2enex/actions/workflows/ci.yaml/badge.svg?branch=master)](https://github.com/karloskalcium/md2enex/actions/workflows/ci.yaml)
[![LICENSE](https://img.shields.io/badge/license-MIT-blue.svg)](https://raw.githubusercontent.com/karloskalcium/md2enex/master/LICENSE)
[![PYTHON](https://img.shields.io/badge/python-3.10-orange.svg)](https://docs.python.org/3.10/index.html)
[![PYTHON](https://img.shields.io/badge/python-3.12-orange.svg)](https://docs.python.org/3.12/index.html)

`md2enex` is a command-line tool that converts a directory of markdown files to an Evernote `.enex` export format, that can then be imported into Evernote.

Expand All @@ -17,7 +17,7 @@

### Install python and pipx

1. Install `python` verion 3.10 or later: [Instructions](https://www.python.org/downloads/)
1. Install `python` verion 3.12 or later: [Instructions](https://www.python.org/downloads/)
1. Install `pipx`: [Instructions](https://pipx.pypa.io/stable/installation/)

### Install md2enex
Expand All @@ -41,7 +41,7 @@ The resultant `.enex` file can be imported into Evernote using the import featur
You can get additional help by running:

```commandline
md2enex --help
md2enex -h
```

## Markdown formatting notes
Expand All @@ -64,6 +64,7 @@ You can also ask a question in the [discussions section](https://github.com/karl

## Other tools

- [md2evernote](https://github.com/rxrw/md2evernote)
- [Exporter - Apples Notes to markdown exporter](http://falcon.star-lord.me/exporter/)
- [Yarle - The ultimate converter of Evernote notes to Markdown](https://github.com/akosbalasko/yarle)
- [Evernote2md - Convert Evernote .enex files to Markdown](https://github.com/wormi4ok/evernote2md)
Expand All @@ -75,5 +76,5 @@ This project uses [Poetry](https://python-poetry.org/) for packaging and depende
Most of the things you need to do are targets in the makefile.

- `make` to get a list of targets
- `make test` to run the tests
- `make unit-test` to run the tests
- `make lint` to run the ruff linter
121 changes: 92 additions & 29 deletions md2enex/md2enex.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
import os.path
import pathlib
import platform
import subprocess
from enum import Enum
from inspect import getsourcefile
from pathlib import Path
Expand All @@ -29,19 +30,71 @@ class Appconfig(Enum):

class Doctypes(Enum):
ENEX_DOCTYPE = '<!DOCTYPE en-export SYSTEM "http://xml.evernote.com/pub/evernote-export4.dtd">'
ENML_DOCTYPE = '<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd">'


IMPORT_TAG_WITH_DATETIME = (
Appconfig.APP_NAME.value + "-import" + ":" + datetime.datetime.now().isoformat(timespec="seconds")
)
ENML_DOCTYPE = '<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml3.dtd">'


# taken from here https://dev.evernote.com/doc/articles/enml.php
INVALID_TAGS = [
"applet",
"base",
"basefont",
"bgsound",
"blink",
"body",
"button",
"dir",
"embed",
"fieldset",
"form",
"frame",
"frameset",
"head",
"html",
"iframe",
"ilayer",
"input",
"isindex",
"label",
"layer",
"legend",
"link",
"marquee",
"menu",
"meta",
"noframes",
"noscript",
"object",
"optgroup",
"option",
"param",
"plaintext",
"script",
"select",
"style",
"textarea",
"xml",
]

INVALID_ATTRIBUTES = [
"id",
"class",
"onclick",
"ondblclick",
"on*",
"accesskey",
"data",
"data-cites",
"data-emoji",
"dynsrc",
"tabindex",
]

app = typer.Typer(add_completion=False)


# stolen from https://stackoverflow.com/a/39501288/4907881
# returns creation date in seconds since Jan 1 1970 for a file in a platform-agnostic fashion
def creation_date_seconds(path_to_file):
def creation_date_seconds(path_to_file) -> float:
"""
Try to get the date that a file was created, falling back to when it was
last modified if that isn't possible.
Expand All @@ -54,9 +107,12 @@ def creation_date_seconds(path_to_file):
try:
return stat.st_birthtime
except AttributeError:
# We're probably on Linux. No easy way to get creation dates here,
# so we'll settle for when its content was last modified.
return stat.st_mtime
# We're probably on Linux. Try a system call
result = subprocess.run(["stat", "-c", "%W", path_to_file], capture_output=True)
if result.returncode == 0:
return float(result.stdout)
else:
return stat.st_mtime


def create_title(file: str) -> etree.Element:
Expand All @@ -69,32 +125,33 @@ def create_title(file: str) -> etree.Element:

def create_creation_date(file: str) -> etree.Element:
creation_date_ts = creation_date_seconds(file)
creation_date = enex_date_format(datetime.datetime.fromtimestamp(creation_date_ts, tz=datetime.timezone.utc))
creation_date = enex_date_format(datetime.datetime.fromtimestamp(creation_date_ts, tz=datetime.UTC))
created_el = etree.Element("created")
created_el.text = creation_date
return created_el


def create_updated_date(file: str) -> etree.Element:
modification_date_ts = os.path.getmtime(file)
modification_date = enex_date_format(
datetime.datetime.fromtimestamp(modification_date_ts, tz=datetime.timezone.utc)
)
modification_date = enex_date_format(datetime.datetime.fromtimestamp(modification_date_ts, tz=datetime.UTC))
updated_el = etree.Element("updated")
updated_el.text = modification_date
return updated_el


def create_tag() -> etree.Element:
tag_el = etree.Element("tag")
tag_el.text = IMPORT_TAG_WITH_DATETIME
tag_with_datetime = (
Appconfig.APP_NAME.value + "-import" + ":" + datetime.datetime.now().isoformat(timespec="seconds")
)
tag_el.text = tag_with_datetime
return tag_el


def create_note_attributes() -> etree.Element:
note_attributes_el = etree.Element("note-attributes")
# to make format match standard Evernote export
note_attributes_el.text = os.linesep
note_attributes_el.text = "\n"
return note_attributes_el


Expand All @@ -108,20 +165,22 @@ def set_xml_catalog_var():
os.environ["XML_CATALOG_FILES"] = catalog_path


def strip_note_el(en_note_el: etree.Element) -> etree.Element:
etree.strip_attributes(en_note_el, "id", "class", "data", "data-cites")
def strip_note_el(en_note_el: etree.Element):
"""Strips out invalid attributes and tags per https://dev.evernote.com/doc/articles/enml.php"""
etree.strip_attributes(en_note_el, *INVALID_ATTRIBUTES)
etree.strip_tags(en_note_el, *INVALID_TAGS)


def validate_note_xml(note_xml: bytes):
# For speed, access all XML from local XML CATALOG
working_directory = os.getcwd()
parser = etree.XMLParser(dtd_validation=True, no_network=True)
try:
# Due to a libxml2 bug on windows, we need to use a relative path for the DTDs that are packaged with the tool
# As such before doing the parsing, we change directory to the module location then change back afterwards
working_directory = os.getcwd()
# Get the path of where this module lives
script_abs_path = os.path.dirname(os.path.abspath(getsourcefile(lambda: 0)))
os.chdir(script_abs_path)
parser = etree.XMLParser(dtd_validation=True, no_network=True)
etree.fromstring(note_xml, parser=parser)
except etree.XMLSyntaxError as err:
for error in parser.error_log:
Expand All @@ -141,7 +200,7 @@ def create_note_content(file: str) -> etree.Element:
content_text = ""
# set hard_line_breaks here b/c the Exporter on OSX doesn't add proper line breaks in the Markdown export
html_text = pypandoc.convert_file(
file, to="html", format="markdown+hard_line_breaks-smart-auto_identifiers", extra_args=["--wrap=none"]
file, to="html", format="markdown+emoji+hard_line_breaks-smart-auto_identifiers", extra_args=["--wrap=none"]
)
for index, line in enumerate(html_text.splitlines()):
line_trimmed = line.strip()
Expand Down Expand Up @@ -194,7 +253,7 @@ def enex_date_format(date: datetime) -> str:

# header material for enex format
def create_en_export() -> etree.Element:
now = datetime.datetime.now(datetime.timezone.utc)
now = datetime.datetime.now(datetime.UTC)
now_str = enex_date_format(now)
en_export = etree.Element("en-export")
en_export.set("export-date", now_str)
Expand All @@ -207,7 +266,7 @@ def write_enex(target_directory: pathlib.Path, output_file: str):
files = sorted(target_directory.glob("*.md"), key=lambda fn: str.lower(fn.name))
# Ensure at least one markdown file in directory
if len(files) <= 0:
typer.echo("No markdown files found in " + target_directory.name, err=True)
typer.secho("No markdown files found in " + target_directory.name, err=True, fg="red")
raise typer.Exit(code=1)

# ElementTree object that will contain our xml
Expand All @@ -218,7 +277,8 @@ def write_enex(target_directory: pathlib.Path, output_file: str):
for file in files:
filename = str(file)
try:
root.append(process_note(filename))
note_xml = process_note(filename)
root.append(note_xml)
count += 1
except (etree.LxmlError, ValueError) as e:
error_list.append(filename)
Expand All @@ -236,14 +296,17 @@ def write_enex(target_directory: pathlib.Path, output_file: str):
)

if len(error_list) > 0:
logging.warning(
"Some files were skipped - these need to be cleaned up manually and reimported: " + str(error_list)
typer.secho(
"Some files were skipped - these need to be cleaned up manually and reimported: " + str(error_list),
err=True,
fg="red",
)
raise typer.Exit(code=1)

if count > 0:
typer.echo("Successfully wrote " + str(count) + " markdown files to " + output_file)
typer.secho("Successfully wrote " + str(count) + " markdown files to " + output_file, err=True)
else:
logging.error("Error - no files written.")
type.secho("Error - no files written.", err=True, fg="red")
raise typer.Exit(code=2)


Expand All @@ -253,7 +316,7 @@ def version_callback(value: bool):
raise typer.Exit(code=0)


@app.command()
@app.command(context_settings={"help_option_names": ["-h", "--help"]})
def cli(
directory: Annotated[Path, typer.Argument(exists=True, file_okay=False, dir_okay=True, path_type=pathlib.Path)],
output: Annotated[
Expand Down
Loading