Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
bkleinen committed Jul 26, 2024
1 parent 4a29cd1 commit 18728b8
Show file tree
Hide file tree
Showing 21 changed files with 70 additions and 15 deletions.
2 changes: 1 addition & 1 deletion hugo/history/index.html
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<!doctype html><html lang=en dir=ltr><meta name=viewport content="width=device-width,initial-scale=1"><meta charset=utf-8><meta charset=UTF-8><link rel=stylesheet href=/preview/css/style.min.d2318b68476f24cbf3792f1a3bf81da83f1aafcaea0d063b7fc8e6aa4440fb1d.css integrity="sha256-0jGLaEdvJMvzeS8aO/gdqD8ar8rqDQY7f8jmqkRA+x0=" crossorigin=anonymous><script src=https://htw-imi-showtime.github.io/preview/js/title-animate.js></script><script src=https://htw-imi-showtime.github.io/preview/js/semester-select.js></script><script src=https://htw-imi-showtime.github.io/preview/js/secret.js></script><title>IMI Showtime - HTW Berlin</title>
<body><nav class="background hugo"><a href=https://htw-imi-showtime.github.io/preview/ class=animate-trigger><strong>IMI<span class=light>×</span><span class=animate>ST</span></strong> <span class=light>2024</span></a><div class=warn>Preview Site!
(without drafts)</div><ul><li><a href=https://htw-imi-showtime.github.io/preview/projects>Projects</a></li><li><a href=https://htw-imi-showtime.github.io/preview/schedule>Schedule</a></li><li><a href=https://htw-imi-showtime.github.io/preview/dates>Dates</a></li><li><a href=/preview/ws23>Archive</a></li><li><a href=https://htw-imi-showtime.github.io/preview/contact>Contact</a></li><li><a href=/preview/ss24/bachelor/b1-infinite-zoom/>Browse Projects</a></li></ul></nav><header><h1>Showtime Website History
<span class=underscore-spacer> </span></h1><section class=underscore></section></header><main><p>For now, this is a gitinfo tryout:</p><p>(see <a href=https://gohugo.io/methods/page/gitinfo/>https://gohugo.io/methods/page/gitinfo/</a>)</p><h2 id=gitinfo>GitInfo:</h2><p>Gitinfo:</p><p>AbbreviatedHash: 29266a4</p><p>AuthorDate: 2024-07-26</p><p>AuthorEmail: 167428823+PaulSchiffner@users.noreply.github.com</p><p>AuthorName: PaulSchiffner</p><p>CommitDate: 2024-07-26</p><p>Hash: 29266a41512c7abf4f46a2b5e7becb935cbf3923</p><p>Subject: B4 sempy add images,gif and fix grammar, spelling (#388)</p></main><footer><section><a href=https://htw-imi-showtime.github.io/preview/ class=animate-trigger><strong>IMI<span class=light></span><span class=animate>ST</span></strong>
<span class=underscore-spacer> </span></h1><section class=underscore></section></header><main><p>For now, this is a gitinfo tryout:</p><p>(see <a href=https://gohugo.io/methods/page/gitinfo/>https://gohugo.io/methods/page/gitinfo/</a>)</p><h2 id=gitinfo>GitInfo:</h2><p>Gitinfo:</p><p>AbbreviatedHash: 4ede622</p><p>AuthorDate: 2024-07-26</p><p>AuthorEmail: 95168446+lauralgh@users.noreply.github.com</p><p>AuthorName: lauralgh</p><p>CommitDate: 2024-07-26</p><p>Hash: 4ede6221ca812b165fa305c78a22f16ed9a54d48</p><p>Subject: M4 - final website version (#389)</p></main><footer><section><a href=https://htw-imi-showtime.github.io/preview/ class=animate-trigger><strong>IMI<span class=light></span><span class=animate>ST</span></strong>
2024</a><ul><li><a href=https://htw-imi-showtime.github.io/preview/contact>Contact</a></li><li><a href=https://htw-imi-showtime.github.io/preview/privacy-policy>Privacy Policy</a></li><li><a href=https://htw-imi-showtime.github.io/preview/imprint>Imprint</a></li></ul></section><img src=https://htw-imi-showtime.github.io/preview/img/htw-logo.png alt="HTW Logo"></footer></body></html>
2 changes: 1 addition & 1 deletion hugo/index.xml
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hugoes on IMI Showtime</title><link>https://htw-imi-showtime.github.io/preview/hugo/</link><description>Recent content in Hugoes on IMI Showtime</description><generator>Hugo</generator><language>de-de</language><atom:link href="https://htw-imi-showtime.github.io/preview/hugo/index.xml" rel="self" type="application/rss+xml"/><item><title>Showtime Website History</title><link>https://htw-imi-showtime.github.io/preview/hugo/history/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://htw-imi-showtime.github.io/preview/hugo/history/</guid><description>For now, this is a gitinfo tryout:
(see https://gohugo.io/methods/page/gitinfo/)
GitInfo: Gitinfo: AbbreviatedHash: 29266a4 AuthorDate: 2024-07-26 AuthorEmail: 167428823&amp;#43;PaulSchiffner@users.noreply.github.com AuthorName: PaulSchiffner CommitDate: 2024-07-26 Hash: 29266a41512c7abf4f46a2b5e7becb935cbf3923 Subject: B4 sempy add images,gif and fix grammar, spelling (#388)</description></item></channel></rss>
GitInfo: Gitinfo: AbbreviatedHash: 4ede622 AuthorDate: 2024-07-26 AuthorEmail: 95168446&amp;#43;lauralgh@users.noreply.github.com AuthorName: lauralgh CommitDate: 2024-07-26 Hash: 4ede6221ca812b165fa305c78a22f16ed9a54d48 Subject: M4 - final website version (#389)</description></item></channel></rss>
7 changes: 4 additions & 3 deletions index.xml

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion projects/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
<a href=/preview/ss24/master/m3-mindcraft/><h3>MindCraft
<span class=project-subtitle>Entwicklung einer Serious Game-Suite zur Förderung der kognitiven Leistungsfähigkeit bei psychischen Erkrankungen</span></h3></a><p>Ein in Kooperation mit Psycholog*innen entwickeltes Serious Game für die Verbesserung kognitiver Fähigkeiten.</p><a href=/preview/ss24/master/m3-mindcraft/>-> Details</a></div><img src=/preview/ss24/master/m3-mindcraft/brain.png alt=MindCraft></section><hr class=alternate><p id=M4></p><section class=projects-list-entry><div><span class=type>M4 Master</span>
<a href=/preview/ss24/master/m4-orgxtract/><h3>OrgXtract
<span class=project-subtitle></span></h3></a><p>In our quest to challenge German bureaucracy, our project transforms organizational charts stored as PDFs into machine-readable open data formats, enhancing research capabilities and promoting transparency within German authorities.</p><a href=/preview/ss24/master/m4-orgxtract/>-> Details</a></div><img src=/preview/ss24/master/m4-orgxtract/logo_orgxtract.jpg alt=OrgXtract></section><hr class=alternate><p id=M5></p><section class=projects-list-entry><div><span class=type>M5 Master</span>
<span class=project-subtitle></span></h3></a><p>In our quest to challenge German bureaucracy, our project transforms organizational charts stored as PDFs into machine-readable open data formats, enhancing research capabilities and promoting transparency of German authorities.</p><a href=/preview/ss24/master/m4-orgxtract/>-> Details</a></div><img src=/preview/ss24/master/m4-orgxtract/logo_orgxtract.jpg alt=OrgXtract></section><hr class=alternate><p id=M5></p><section class=projects-list-entry><div><span class=type>M5 Master</span>
<a href=/preview/ss24/master/m5-capsearch/><h3>Capsearch: Bridging Talent and Career
<span class=project-subtitle></span></h3></a><p>CapSearch, an AI-powered platform, assists employees with project assignments. By analyzing profiles, skills, and goals, it provides personalized job recommendations, streamlines project assignments, and enhances efficiency.</p><a href=/preview/ss24/master/m5-capsearch/>-> Details</a></div><img src=/preview/ss24/master/m5-capsearch/logo.png alt="Capsearch: Bridging Talent and Career"></section><hr class=alternate></main><footer><section><a href=https://htw-imi-showtime.github.io/preview/ class=animate-trigger><strong>IMI<span class=light></span><span class=animate>ST</span></strong>
2024</a><ul><li><a href=https://htw-imi-showtime.github.io/preview/contact>Contact</a></li><li><a href=https://htw-imi-showtime.github.io/preview/privacy-policy>Privacy Policy</a></li><li><a href=https://htw-imi-showtime.github.io/preview/imprint>Imprint</a></li></ul></section><img src=https://htw-imi-showtime.github.io/preview/img/htw-logo.png alt="HTW Logo"></footer></body></html>
2 changes: 1 addition & 1 deletion sitemap.xml

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion ss24/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
<a href=/preview/ss24/master/m3-mindcraft/><h3>MindCraft
<span class=project-subtitle>Entwicklung einer Serious Game-Suite zur Förderung der kognitiven Leistungsfähigkeit bei psychischen Erkrankungen</span></h3></a><p>Ein in Kooperation mit Psycholog*innen entwickeltes Serious Game für die Verbesserung kognitiver Fähigkeiten.</p><a href=/preview/ss24/master/m3-mindcraft/>-> Details</a></div><img src=/preview/ss24/master/m3-mindcraft/brain.png alt=MindCraft></section><hr class=alternate><p id=M4></p><section class=projects-list-entry><div><span class=type>M4 Master</span>
<a href=/preview/ss24/master/m4-orgxtract/><h3>OrgXtract
<span class=project-subtitle></span></h3></a><p>In our quest to challenge German bureaucracy, our project transforms organizational charts stored as PDFs into machine-readable open data formats, enhancing research capabilities and promoting transparency within German authorities.</p><a href=/preview/ss24/master/m4-orgxtract/>-> Details</a></div><img src=/preview/ss24/master/m4-orgxtract/logo_orgxtract.jpg alt=OrgXtract></section><hr class=alternate><p id=M5></p><section class=projects-list-entry><div><span class=type>M5 Master</span>
<span class=project-subtitle></span></h3></a><p>In our quest to challenge German bureaucracy, our project transforms organizational charts stored as PDFs into machine-readable open data formats, enhancing research capabilities and promoting transparency of German authorities.</p><a href=/preview/ss24/master/m4-orgxtract/>-> Details</a></div><img src=/preview/ss24/master/m4-orgxtract/logo_orgxtract.jpg alt=OrgXtract></section><hr class=alternate><p id=M5></p><section class=projects-list-entry><div><span class=type>M5 Master</span>
<a href=/preview/ss24/master/m5-capsearch/><h3>Capsearch: Bridging Talent and Career
<span class=project-subtitle></span></h3></a><p>CapSearch, an AI-powered platform, assists employees with project assignments. By analyzing profiles, skills, and goals, it provides personalized job recommendations, streamlines project assignments, and enhances efficiency.</p><a href=/preview/ss24/master/m5-capsearch/>-> Details</a></div><img src=/preview/ss24/master/m5-capsearch/logo.png alt="Capsearch: Bridging Talent and Career"></section><hr class=alternate></main><footer><section><a href=https://htw-imi-showtime.github.io/preview/ class=animate-trigger><strong>IMI<span class=light></span><span class=animate>ST</span></strong>
2024</a><ul><li><a href=https://htw-imi-showtime.github.io/preview/contact>Contact</a></li><li><a href=https://htw-imi-showtime.github.io/preview/privacy-policy>Privacy Policy</a></li><li><a href=https://htw-imi-showtime.github.io/preview/imprint>Imprint</a></li></ul></section><img src=https://htw-imi-showtime.github.io/preview/img/htw-logo.png alt="HTW Logo"></footer></body></html>
Binary file added ss24/master/m4-orgxtract/architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
13 changes: 13 additions & 0 deletions ss24/master/m4-orgxtract/challenges/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
<!doctype html><html lang=en dir=ltr><meta name=viewport content="width=device-width,initial-scale=1"><meta charset=utf-8><meta charset=UTF-8><link rel=stylesheet href=/preview/css/style.min.d2318b68476f24cbf3792f1a3bf81da83f1aafcaea0d063b7fc8e6aa4440fb1d.css integrity="sha256-0jGLaEdvJMvzeS8aO/gdqD8ar8rqDQY7f8jmqkRA+x0=" crossorigin=anonymous><script src=https://htw-imi-showtime.github.io/preview/js/title-animate.js></script><script src=https://htw-imi-showtime.github.io/preview/js/semester-select.js></script><script src=https://htw-imi-showtime.github.io/preview/js/secret.js></script><title>IMI Showtime - HTW Berlin</title>
<body><nav class="background master"><a href=https://htw-imi-showtime.github.io/preview/ class=animate-trigger><strong>IMI<span class=light>×</span><span class=animate>ST</span></strong> <span class=light>2024</span></a><div class=warn>Preview Site!
(without drafts)</div><ul><li><a href=https://htw-imi-showtime.github.io/preview/projects>Projects</a></li><li><a href=https://htw-imi-showtime.github.io/preview/schedule>Schedule</a></li><li><a href=https://htw-imi-showtime.github.io/preview/dates>Dates</a></li><li><a href=/preview/ws23>Archive</a></li><li><a href=https://htw-imi-showtime.github.io/preview/contact>Contact</a></li><li><a href=/preview/ws23/bachelor/b1-peek/>&lt;</a>
<a href=/preview/ss24/>ss24</a>
<a href=/preview/ss20/bachelor/b1-hexarcade/>></a>
<a href=/preview/ss24/master/m3-mindcraft/>&lt;</a>
M4
<a href=/preview/ss24/master/m5-capsearch/>></a></li></ul></nav><div class="background master"><header class=project-header><h1><span class=type>M4 Master</span>
OrgXtract
<span class="subtitle master"></span>
<span class=underscore-spacer></span></h1><section style=text-align:right><div class=spacer></div><h4>Team</h4><ul><li>Lorenzo Battiston</li><li>Laura Langhauser</li><li>Niklas Lengert</li><li>Sao Chi Pham</li><li>Viet Anh Jimmy Tran</li></ul><h4>Supervision</h4>Stefan Wehrmeyer</section></header></div><div class="project-menu desktop-menu background master"><ul><li><a href=/preview/ss24/master/m4-orgxtract/>Overview</a></li><li><a href=/preview/ss24/master/m4-orgxtract/process/>Process</a></li><li class=active><a href=/preview/ss24/master/m4-orgxtract/challenges/>Challenges</a></li><li><a href=/preview/ss24/master/m4-orgxtract/features/>Features</a></li><li><a href=/preview/ss24/master/m4-orgxtract/techstack/>Tech Stack</a></li><li><a href=/preview/ss24/master/m4-orgxtract/future/>Future</a></li></ul></div><div class="project-menu mobile-menu background master"><ul><li><a href=/preview/ss24/master/m4-orgxtract/>Overview</a></li></ul><input type=checkbox id=project-menu-button>
<label for=project-menu-button><span></span></label><div class=dropdown-menu><ul><li><a href=/preview/ss24/master/m4-orgxtract/process/>Process</a></li><li class=active><a href=/preview/ss24/master/m4-orgxtract/challenges/>Challenges</a></li><li><a href=/preview/ss24/master/m4-orgxtract/features/>Features</a></li><li><a href=/preview/ss24/master/m4-orgxtract/techstack/>Tech Stack</a></li><li><a href=/preview/ss24/master/m4-orgxtract/future/>Future</a></li></ul></div></div><main class=project><section><h3>Large Amounts of Data</h3><div class=spacer></div><div class=content>According to the Federal Office of Administration, there are a total of <strong>966 federal authorities and institutions in Germany</strong> (<a href=https://de.statista.com/statistik/daten/studie/1128113/umfrage/bundesbehoerden-in-deutschland-nach-behoerdenart/>Statista, 20.07.2024</a>). However, there are many more, as the statistics do not include the offices at state level. One of the challenges of the project was to create a tool that would fit as many different organizational charts as possible in order to capture as much data as feasible.</div></section><section><h3>Extracting Text from PDF</h3><div class=spacer></div><div class=content>PDF is a format for displaying documents, not for extracting structural data from them. So, extracting all the data you need from the documents is a challenge. When text is extracted, the output is unstructured, often out of order. Semantic information is completely missing. There are also a large number of <strong>edge cases</strong>, which makes it difficult to write an algorithm that covers every edge case and extracts all the text in a meaningful way.</div></section><section><h3>No Content Standards</h3><div class=spacer></div><div class=content>Every organizational chart is different. Not only do the people who work in different departments differ, but so do their job titles. However, even similar job titles may have different names in different departments. For example, it is not a given that the Ministry of Education in Berlin will have the same job titles as the same ministry in Baden-Württemberg. Abbreviations are also commonly used, but again they are not always the same, and may vary from authority to authority.</div></section><section><h3>Semantic Analysis</h3><div class=spacer></div><div class=content>The challenge for semantic analysis is that we work with unnatural language, so natural language processing (NLP) models are not sufficient. NLPs can extract meaningful text from coherent sentences, but not from individual words. This requires the use of Large Language Models (LLM). Therefore, a <strong>combination of LLMs and NLPs</strong> was used.</div></section></main><footer><section><a href=https://htw-imi-showtime.github.io/preview/ class=animate-trigger><strong>IMI<span class=light></span><span class=animate>ST</span></strong>
2024</a><ul><li><a href=https://htw-imi-showtime.github.io/preview/contact>Contact</a></li><li><a href=https://htw-imi-showtime.github.io/preview/privacy-policy>Privacy Policy</a></li><li><a href=https://htw-imi-showtime.github.io/preview/imprint>Imprint</a></li></ul></section><img src=https://htw-imi-showtime.github.io/preview/img/htw-logo.png alt="HTW Logo"></footer></body></html>
Binary file modified ss24/master/m4-orgxtract/chi.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 18728b8

Please sign in to comment.