-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
- Loading branch information
Showing
21 changed files
with
70 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
<!doctype html><html lang=en dir=ltr><meta name=viewport content="width=device-width,initial-scale=1"><meta charset=utf-8><meta charset=UTF-8><link rel=stylesheet href=/preview/css/style.min.d2318b68476f24cbf3792f1a3bf81da83f1aafcaea0d063b7fc8e6aa4440fb1d.css integrity="sha256-0jGLaEdvJMvzeS8aO/gdqD8ar8rqDQY7f8jmqkRA+x0=" crossorigin=anonymous><script src=https://htw-imi-showtime.github.io/preview/js/title-animate.js></script><script src=https://htw-imi-showtime.github.io/preview/js/semester-select.js></script><script src=https://htw-imi-showtime.github.io/preview/js/secret.js></script><title>IMI Showtime - HTW Berlin</title> | ||
<body><nav class="background hugo"><a href=https://htw-imi-showtime.github.io/preview/ class=animate-trigger><strong>IMI<span class=light>×</span><span class=animate>ST</span></strong> <span class=light>2024</span></a><div class=warn>Preview Site! | ||
(without drafts)</div><ul><li><a href=https://htw-imi-showtime.github.io/preview/projects>Projects</a></li><li><a href=https://htw-imi-showtime.github.io/preview/schedule>Schedule</a></li><li><a href=https://htw-imi-showtime.github.io/preview/dates>Dates</a></li><li><a href=/preview/ws23>Archive</a></li><li><a href=https://htw-imi-showtime.github.io/preview/contact>Contact</a></li><li><a href=/preview/ss24/bachelor/b1-infinite-zoom/>Browse Projects</a></li></ul></nav><header><h1>Showtime Website History | ||
<span class=underscore-spacer> </span></h1><section class=underscore></section></header><main><p>For now, this is a gitinfo tryout:</p><p>(see <a href=https://gohugo.io/methods/page/gitinfo/>https://gohugo.io/methods/page/gitinfo/</a>)</p><h2 id=gitinfo>GitInfo:</h2><p>Gitinfo:</p><p>AbbreviatedHash: 29266a4</p><p>AuthorDate: 2024-07-26</p><p>AuthorEmail: 167428823+PaulSchiffner@users.noreply.github.com</p><p>AuthorName: PaulSchiffner</p><p>CommitDate: 2024-07-26</p><p>Hash: 29266a41512c7abf4f46a2b5e7becb935cbf3923</p><p>Subject: B4 sempy add images,gif and fix grammar, spelling (#388)</p></main><footer><section><a href=https://htw-imi-showtime.github.io/preview/ class=animate-trigger><strong>IMI<span class=light>⨯</span><span class=animate>ST</span></strong> | ||
<span class=underscore-spacer> </span></h1><section class=underscore></section></header><main><p>For now, this is a gitinfo tryout:</p><p>(see <a href=https://gohugo.io/methods/page/gitinfo/>https://gohugo.io/methods/page/gitinfo/</a>)</p><h2 id=gitinfo>GitInfo:</h2><p>Gitinfo:</p><p>AbbreviatedHash: 4ede622</p><p>AuthorDate: 2024-07-26</p><p>AuthorEmail: 95168446+lauralgh@users.noreply.github.com</p><p>AuthorName: lauralgh</p><p>CommitDate: 2024-07-26</p><p>Hash: 4ede6221ca812b165fa305c78a22f16ed9a54d48</p><p>Subject: M4 - final website version (#389)</p></main><footer><section><a href=https://htw-imi-showtime.github.io/preview/ class=animate-trigger><strong>IMI<span class=light>⨯</span><span class=animate>ST</span></strong> | ||
2024</a><ul><li><a href=https://htw-imi-showtime.github.io/preview/contact>Contact</a></li><li><a href=https://htw-imi-showtime.github.io/preview/privacy-policy>Privacy Policy</a></li><li><a href=https://htw-imi-showtime.github.io/preview/imprint>Imprint</a></li></ul></section><img src=https://htw-imi-showtime.github.io/preview/img/htw-logo.png alt="HTW Logo"></footer></body></html> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,3 @@ | ||
<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hugoes on IMI Showtime</title><link>https://htw-imi-showtime.github.io/preview/hugo/</link><description>Recent content in Hugoes on IMI Showtime</description><generator>Hugo</generator><language>de-de</language><atom:link href="https://htw-imi-showtime.github.io/preview/hugo/index.xml" rel="self" type="application/rss+xml"/><item><title>Showtime Website History</title><link>https://htw-imi-showtime.github.io/preview/hugo/history/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://htw-imi-showtime.github.io/preview/hugo/history/</guid><description>For now, this is a gitinfo tryout: | ||
(see https://gohugo.io/methods/page/gitinfo/) | ||
GitInfo: Gitinfo: AbbreviatedHash: 29266a4 AuthorDate: 2024-07-26 AuthorEmail: 167428823&#43;PaulSchiffner@users.noreply.github.com AuthorName: PaulSchiffner CommitDate: 2024-07-26 Hash: 29266a41512c7abf4f46a2b5e7becb935cbf3923 Subject: B4 sempy add images,gif and fix grammar, spelling (#388)</description></item></channel></rss> | ||
GitInfo: Gitinfo: AbbreviatedHash: 4ede622 AuthorDate: 2024-07-26 AuthorEmail: 95168446&#43;lauralgh@users.noreply.github.com AuthorName: lauralgh CommitDate: 2024-07-26 Hash: 4ede6221ca812b165fa305c78a22f16ed9a54d48 Subject: M4 - final website version (#389)</description></item></channel></rss> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
<!doctype html><html lang=en dir=ltr><meta name=viewport content="width=device-width,initial-scale=1"><meta charset=utf-8><meta charset=UTF-8><link rel=stylesheet href=/preview/css/style.min.d2318b68476f24cbf3792f1a3bf81da83f1aafcaea0d063b7fc8e6aa4440fb1d.css integrity="sha256-0jGLaEdvJMvzeS8aO/gdqD8ar8rqDQY7f8jmqkRA+x0=" crossorigin=anonymous><script src=https://htw-imi-showtime.github.io/preview/js/title-animate.js></script><script src=https://htw-imi-showtime.github.io/preview/js/semester-select.js></script><script src=https://htw-imi-showtime.github.io/preview/js/secret.js></script><title>IMI Showtime - HTW Berlin</title> | ||
<body><nav class="background master"><a href=https://htw-imi-showtime.github.io/preview/ class=animate-trigger><strong>IMI<span class=light>×</span><span class=animate>ST</span></strong> <span class=light>2024</span></a><div class=warn>Preview Site! | ||
(without drafts)</div><ul><li><a href=https://htw-imi-showtime.github.io/preview/projects>Projects</a></li><li><a href=https://htw-imi-showtime.github.io/preview/schedule>Schedule</a></li><li><a href=https://htw-imi-showtime.github.io/preview/dates>Dates</a></li><li><a href=/preview/ws23>Archive</a></li><li><a href=https://htw-imi-showtime.github.io/preview/contact>Contact</a></li><li><a href=/preview/ws23/bachelor/b1-peek/><</a> | ||
<a href=/preview/ss24/>ss24</a> | ||
<a href=/preview/ss20/bachelor/b1-hexarcade/>></a> | ||
<a href=/preview/ss24/master/m3-mindcraft/><</a> | ||
M4 | ||
<a href=/preview/ss24/master/m5-capsearch/>></a></li></ul></nav><div class="background master"><header class=project-header><h1><span class=type>M4 Master</span> | ||
OrgXtract | ||
<span class="subtitle master"></span> | ||
<span class=underscore-spacer></span></h1><section style=text-align:right><div class=spacer></div><h4>Team</h4><ul><li>Lorenzo Battiston</li><li>Laura Langhauser</li><li>Niklas Lengert</li><li>Sao Chi Pham</li><li>Viet Anh Jimmy Tran</li></ul><h4>Supervision</h4>Stefan Wehrmeyer</section></header></div><div class="project-menu desktop-menu background master"><ul><li><a href=/preview/ss24/master/m4-orgxtract/>Overview</a></li><li><a href=/preview/ss24/master/m4-orgxtract/process/>Process</a></li><li class=active><a href=/preview/ss24/master/m4-orgxtract/challenges/>Challenges</a></li><li><a href=/preview/ss24/master/m4-orgxtract/features/>Features</a></li><li><a href=/preview/ss24/master/m4-orgxtract/techstack/>Tech Stack</a></li><li><a href=/preview/ss24/master/m4-orgxtract/future/>Future</a></li></ul></div><div class="project-menu mobile-menu background master"><ul><li><a href=/preview/ss24/master/m4-orgxtract/>Overview</a></li></ul><input type=checkbox id=project-menu-button> | ||
<label for=project-menu-button><span></span></label><div class=dropdown-menu><ul><li><a href=/preview/ss24/master/m4-orgxtract/process/>Process</a></li><li class=active><a href=/preview/ss24/master/m4-orgxtract/challenges/>Challenges</a></li><li><a href=/preview/ss24/master/m4-orgxtract/features/>Features</a></li><li><a href=/preview/ss24/master/m4-orgxtract/techstack/>Tech Stack</a></li><li><a href=/preview/ss24/master/m4-orgxtract/future/>Future</a></li></ul></div></div><main class=project><section><h3>Large Amounts of Data</h3><div class=spacer></div><div class=content>According to the Federal Office of Administration, there are a total of <strong>966 federal authorities and institutions in Germany</strong> (<a href=https://de.statista.com/statistik/daten/studie/1128113/umfrage/bundesbehoerden-in-deutschland-nach-behoerdenart/>Statista, 20.07.2024</a>). However, there are many more, as the statistics do not include the offices at state level. One of the challenges of the project was to create a tool that would fit as many different organizational charts as possible in order to capture as much data as feasible.</div></section><section><h3>Extracting Text from PDF</h3><div class=spacer></div><div class=content>PDF is a format for displaying documents, not for extracting structural data from them. So, extracting all the data you need from the documents is a challenge. When text is extracted, the output is unstructured, often out of order. Semantic information is completely missing. There are also a large number of <strong>edge cases</strong>, which makes it difficult to write an algorithm that covers every edge case and extracts all the text in a meaningful way.</div></section><section><h3>No Content Standards</h3><div class=spacer></div><div class=content>Every organizational chart is different. Not only do the people who work in different departments differ, but so do their job titles. However, even similar job titles may have different names in different departments. For example, it is not a given that the Ministry of Education in Berlin will have the same job titles as the same ministry in Baden-Württemberg. Abbreviations are also commonly used, but again they are not always the same, and may vary from authority to authority.</div></section><section><h3>Semantic Analysis</h3><div class=spacer></div><div class=content>The challenge for semantic analysis is that we work with unnatural language, so natural language processing (NLP) models are not sufficient. NLPs can extract meaningful text from coherent sentences, but not from individual words. This requires the use of Large Language Models (LLM). Therefore, a <strong>combination of LLMs and NLPs</strong> was used.</div></section></main><footer><section><a href=https://htw-imi-showtime.github.io/preview/ class=animate-trigger><strong>IMI<span class=light>⨯</span><span class=animate>ST</span></strong> | ||
2024</a><ul><li><a href=https://htw-imi-showtime.github.io/preview/contact>Contact</a></li><li><a href=https://htw-imi-showtime.github.io/preview/privacy-policy>Privacy Policy</a></li><li><a href=https://htw-imi-showtime.github.io/preview/imprint>Imprint</a></li></ul></section><img src=https://htw-imi-showtime.github.io/preview/img/htw-logo.png alt="HTW Logo"></footer></body></html> |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.