Corpus Filtergraph is a free and open source statistical machine translation support toolbox to extract, filter, align and transform text data from multilingual documents into parallel training.
Here are some key features of "Corpus Filtergraph":
· Media filter graph metaphor
· Workflow manager for parallel data
· Configuration-driven, modular filters
· Reusable plug-in architecture
· Standardized base-classes
· GIZA++, Moses Decoder, Joshua Decoder compatiblity
· extract-tmx-corpus compatibility
Requirements:
· Python