Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync upstream #1

Open
wants to merge 32 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
d5f5653
Fix default Unicode alternate skip count
joniles Oct 2, 2015
d67da79
Added support for Cp1254.
joniles Jan 11, 2016
b50177a
Renamed RtfDumpListener, added RtfDump utility.
joniles Feb 2, 2016
26874e5
Added missing Greek encoding
joniles Feb 2, 2016
a3eb954
Added Korean encoding
joniles Feb 2, 2016
910dd52
Documentation updates
joniles Apr 18, 2016
073ab18
Update README.md
joniles Apr 18, 2016
e50d7e4
Update README.md
joniles Apr 18, 2016
ce5a41c
Update README.md
joniles Apr 18, 2016
c3ba2b9
Added support for mac encoding
joniles May 25, 2016
e1c70b2
Merge branch 'master' of ssh://[email protected]/joniles/rtfparserkit.git
joniles May 25, 2016
4becf02
Handle signed values.
joniles Jun 8, 2016
a2ab5fb
Add support for UTF-8 encoding.
joniles Jun 8, 2016
87c1892
Update symbol code page
joniles Jun 15, 2016
49da4bf
Add support for additional code pages
joniles Jan 5, 2017
823ca9a
Update README.md
joniles Jan 13, 2017
de42a30
Correctly handle default encoding.
joniles Feb 8, 2017
97c9f27
Update README.md
joniles Feb 8, 2017
98abe9a
Gracefully handle malformed hex bytes.
joniles Jun 14, 2018
12415ce
Build using Maven
joniles Nov 5, 2018
16c286a
Target Java 1.6
joniles Nov 8, 2018
d609b5c
Handle cpg command. Handle implicit use of font 0.
joniles Mar 29, 2020
9f175a8
Merge branch 'master' of ssh://github.com/joniles/rtfparserkit
joniles Mar 29, 2020
c5f98df
Record when the font has been set explicitly.
joniles Mar 31, 2020
5d6f909
Bump version
joniles Mar 31, 2020
a3ae640
Update JUnit version
joniles Feb 10, 2021
3119eef
Use MS932 to support NEC special characters
joniles Feb 10, 2021
cd420e7
Update to 1.16.0
joniles Feb 10, 2021
5dbd607
Add IDEA project
joniles Apr 21, 2022
2f015b9
Add FUNDING.yml
joniles Apr 21, 2022
4f76c66
Add image dump example
joniles Oct 20, 2022
26c80c8
Tidy up example
joniles Oct 20, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions .classpath
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
<?xml version="1.0" encoding="UTF-8"?>
<classpath>
<classpathentry kind="src" output="target/classes" path="src/main/java">
<attributes>
<attribute name="optional" value="true"/>
<attribute name="maven.pomderived" value="true"/>
</attributes>
</classpathentry>
<classpathentry kind="src" output="target/test-classes" path="src/test/java">
<attributes>
<attribute name="optional" value="true"/>
<attribute name="maven.pomderived" value="true"/>
<attribute name="test" value="true"/>
</attributes>
</classpathentry>
<classpathentry excluding="**" kind="src" output="target/test-classes" path="src/test/resources">
<attributes>
<attribute name="maven.pomderived" value="true"/>
<attribute name="test" value="true"/>
</attributes>
</classpathentry>
<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER/org.eclipse.jdt.internal.debug.ui.launcher.StandardVMType/JavaSE-1.6">
<attributes>
<attribute name="maven.pomderived" value="true"/>
</attributes>
</classpathentry>
<classpathentry kind="con" path="org.eclipse.m2e.MAVEN2_CLASSPATH_CONTAINER">
<attributes>
<attribute name="maven.pomderived" value="true"/>
</attributes>
</classpathentry>
<classpathentry kind="output" path="target/classes"/>
</classpath>
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/target/
40 changes: 23 additions & 17 deletions RTF Parser Kit/.project → .project
Original file line number Diff line number Diff line change
@@ -1,17 +1,23 @@
<?xml version="1.0" encoding="UTF-8"?>
<projectDescription>
<name>RTF Parser Kit</name>
<comment></comment>
<projects>
</projects>
<buildSpec>
<buildCommand>
<name>org.eclipse.jdt.core.javabuilder</name>
<arguments>
</arguments>
</buildCommand>
</buildSpec>
<natures>
<nature>org.eclipse.jdt.core.javanature</nature>
</natures>
</projectDescription>
<?xml version="1.0" encoding="UTF-8"?>
<projectDescription>
<name>rtfparserkit</name>
<comment></comment>
<projects>
</projects>
<buildSpec>
<buildCommand>
<name>org.eclipse.jdt.core.javabuilder</name>
<arguments>
</arguments>
</buildCommand>
<buildCommand>
<name>org.eclipse.m2e.core.maven2Builder</name>
<arguments>
</arguments>
</buildCommand>
</buildSpec>
<natures>
<nature>org.eclipse.jdt.core.javanature</nature>
<nature>org.eclipse.m2e.core.maven2Nature</nature>
</natures>
</projectDescription>
6 changes: 6 additions & 0 deletions .settings/org.eclipse.jdt.core.prefs
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
eclipse.preferences.version=1
org.eclipse.jdt.core.compiler.codegen.targetPlatform=1.6
org.eclipse.jdt.core.compiler.compliance=1.6
org.eclipse.jdt.core.compiler.problem.forbiddenReference=warning
org.eclipse.jdt.core.compiler.release=disabled
org.eclipse.jdt.core.compiler.source=1.6
4 changes: 4 additions & 0 deletions .settings/org.eclipse.m2e.core.prefs
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
activeProfiles=
eclipse.preferences.version=1
resolveWorkspaceProjects=true
version=1
78 changes: 56 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,56 @@
RTF Parser Kit
==============

I have often been frustrated by the lack of comprehensive support for working with RTF in Java, and the need to use RTF parsers which are incomplete and form part of larger projects whose libraries I don't want to import just to use the RTF parser. The RTF Parser Kit project is an attempt to address these points.

The idea is to provide a "kit" of components which can either be used "as-is", for example to extract plain text or HTML from an RTF file, or can be used as a component in a larger application which requires the capability to parse RTF documents.

What's currently included?
--------------------------
* Raw RTF Parser - parses RTF, sends events representing content to a listener. Performs minimal processing - you get the RTF commands and data exactly as they appear in the file.
* Standard RTF Parser - parses RTF, sends events representing content to a listener. Handles character encoding, Unicode and so on, so you don't have to. This is probably the parser you want to use.
* Text Converter - demonstrates very simple text extraction from an RTF file

What's planned?
---------------
* HTML converter
* Parsing to an RTF document object model
* RTF generation from an RTF document object model

That's a lot of stuff!
----------------------
Yes it is! It'll take me a while to work my way through the list of things I want to achieve, so I'd love for you to send me some code which extends what I've done or makes it better!
RTF Parser Kit
==============

I have often been frustrated by the lack of comprehensive support for working with RTF in Java, and the need to use RTF parsers which are incomplete and form part of larger projects whose libraries I don't want to import just to use the RTF parser. The RTF Parser Kit project is an attempt to address these points.

The idea is to provide a "kit" of components which can either be used "as-is", for example to extract plain text or HTML from an RTF file, or can be used as a component in a larger application which requires the capability to parse RTF documents.

What's currently included?
--------------------------
* Raw RTF Parser - parses RTF, sends events representing content to a listener. Performs minimal processing - you get the RTF commands and data exactly as they appear in the file.
* Standard RTF Parser - parses RTF, sends events representing content to a listener. Handles character encoding, Unicode and so on, so you don't have to. This is probably the parser you want to use.
* Text Converter - demonstrates very simple text extraction from an RTF file
* RTF Dump - another demonstration, this time writing the RTF file contents as XML

Getting Started
===============

To install the library, you can either download the latest JAR directly from the GitHub releases page,
or you can add RTF Parser Kit as a dependency using Maven:

```xml
<dependency>
<groupId>com.github.joniles</groupId>
<artifactId>rtfparserkit</artifactId>
<version>1.16.0</version>
</dependency>
```

Once you have the library, you have a choice of two parsers to work with, the standard parser and the raw parser. The raw parser carries out minimal processing on the RTF, the standard parser handles character encodings, and translates commands which represent special characters into their Unicode equivalents. Most people will want to use the standard parser.

The parser is invoked like this:
```java
InputStream is = new FileInputStream("/path/to/my/file.rtf");
IRtfSource source = new RtfStreamSource(is)
IRtfParser parser = new StandardRtfParser();
MyRtfListener listener = new MyRtfListener();
parser.parse(source, listener);
```
You provide input to the parser via a class that implements the `IRtfSource` interface. Two implementations are provided for you, `RtfStreamSource`, for reading RTF from a stream, and `RtfStringSource` for reading RTF from a string.

The other thing you need to provide the parser with is a listener class. The listener class implements the `IRtfListener` listener interface. The interface consists of a set of methods which are called by the parser to inform you of when it encounters different parts of the docuent structure. The set of method, along with some comments describing their purpose can be seen [here](https://github.com/joniles/rtfparserkit/blob/master/RTF%20Parser%20Kit/src/com/rtfparserkit/parser/IRtfListener.java).

You don't need to implement all of the `IRtfListener` interface yourself, if you wish you can subclass `RtfListenerAdaptor` which provides empty methods for all of the `IRtfListener` methods. You can then just override the methods you are interested in.

An example text extractor is provided, you can invoke it like this:
```java
new StreamTextConverter().convert(new RtfStreamSource(inputStream), outputStream, "UTF-8");
```
This code reads an RTF file from the `inputStream` and writes the resulting text to the `outputStream` in the encoding specified by the last argument.

A second example text extractor is also provided, this one extracts text from the RTF file into a string:
```java
StringTextConverter converter = new StringTextConverter();
converter.convert(new RtfStreamSource(inputStream));
String extractedText = converter.getText();
```
8 changes: 0 additions & 8 deletions RTF Parser Kit/.classpath

This file was deleted.

Loading