ARCHDEVS
Digital Heritage · Data Pipeline · Search

INTELLIV — Romanian Literary Heritage

An intelligent digital platform providing structured access to academic reference collections of Romanian literary heritage — built from raw InDesign exports to Elasticsearch-powered discovery across multiple corpuses.

Type
Digital Humanities
Corpuses
5+ Literary Collections
Search
Elasticsearch
Status
Production
The Challenge

Centuries of literary scholarship locked in print.

The Problem

Academic reference works in Romanian literary studies existed only as print publications and InDesign files — with no unified digital access, no editorial workflow, no way to search or update collaboratively. Rich content including formatted text, images, and bibliographies needed preservation.

Our Solution

A complete data pipeline: scrape InDesign HTML exports, extract semantic fields via CSS class mapping, serialize to JSONL, bulk-index into Elasticsearch, and serve through a FastAPI backend with React admin frontend — supporting collaborative editing, audit logging, and multi-corpus search.

Key Features

From raw HTML to scholarship.

Data Extraction Pipeline

BeautifulSoup4 scraper maps CSS classes to semantic fields, reconstructs hierarchical text from flat HTML, and extracts image references.

⟨/⟩

Multi-Corpus Search

Elasticsearch full-text search across 5+ corpuses (ELIV, CLRV, HLRV, TLVR, DCLR) with Romanian diacritics handling and alphabetical navigation.

Rich Text Editing

CKEditor5 and React-Quill for collaborative editing of scholarly entries with image management, captions, and formatting preservation.

Chronological Browsing

Timeline-based navigation for historical corpuses (DCLR), alphabetical A-Z navigation including Romanian-specific letters (Ă, Â, Î, Ș, Ț).

Audit Logging

Full audit trail of all modifications — who changed what, when, from which IP — maintaining academic integrity across collaborative workflows.

Bilingual Support

Romanian and English interface with role-based permissions ensuring only authorized scholars can edit entries.

Tech Stack

Pipeline to platform.

Data Pipeline

Python 3BeautifulSoup4RegexJSONL

Search

ElasticsearchBulk APIElastic APM

Backend

FastAPISQLAlchemyMySQLJWTbcrypt

Frontend

React 18Material-UI v5CKEditor5React-QuillChart.js

Have a similar challenge?

Let's build your next platform together.

Get in TouchView More Work