Workflow Type: Galaxy

V 20 Renamed several output datasets in workflow

Associated Tutorial

This workflows is part of the tutorial Text-Mining Differences in Chinese Newspaper Articles, available in the GTN

Features

Thanks to...

Workflow Author(s): Daniela Schneider

Tutorial Author(s): Daniela Schneider

Tutorial Contributor(s): Björn Grüning, Daniela Schneider, Saskia Hiltemann, Teresa Müller

Funder(s): German Competence Center Cloud Technologies for Data Management and Processing, Ministry of Science, Research and Arts, German Network for Bioinformatics Infrastructure Service, Training, Cooperations & Cloud Computing

gtn star logo followed by the word workflows

Inputs

ID Name Description Type
Input censored text #main/Input censored text Upload the censored text containing replacement characters like ‘×’.
  • File
Input uncensored text #main/Input uncensored text Upload the uncensored text without replacement characters.
  • File

Steps

ID Name Description
2 Preprocessing of censored text This step uses Regular Expressions to delete all empty spaces (\s) and show only one character per line (\1\n). The result is a cleaned and reformatted text showing only one character per line. toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_line/9.5+galaxy3
3 Preprocessing of uncensored text This step uses Regular Expressions to delete all empty spaces (\s) and show only one character per line (\1\n). The result is a cleaned and reformatted text showing only one character per line. toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_line/9.5+galaxy3
4 Comparison with diff - user version The diff tool compares the two cleaned texts. This version (HTML version) creates an HTML file, which colour codes differences as additions (green) or extractions (red) when comparing the texts. toolshed.g2.bx.psu.edu/repos/bgruening/diff/diff/3.10+galaxy1
5 Comparison with diff - computer version The diff tool compares the two cleaned texts. This version of the output (raw output) is used for the further steps of the analysis. It is less intuitive for users. Therefore, the second diff below includes a more visual version of the output (HTML). toolshed.g2.bx.psu.edu/repos/bgruening/diff/diff/3.10+galaxy1
6 Select only censored lines This step selects all lines from the diff file that contain the censorship symbol ×. Grep1
7 Compute This step unifies the formatting and adds potentially missing columns, should lines extracted before coming up empty in the second text. This ensures the proper number of columns and allows the smooth running of the next steps. toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/2.1+galaxy0
8 Cut This step selects only column 9, which contains the uncensored characters from text two. The result is only one column with different rows of Chinese characters. This step allows scaling words by frequency the word cloud in the next step. meaning characters that appear more often appear bigger, making the results evident at first sight. Cut1
9 Datamash This step sums up how often which character appeared in the table before. toolshed.g2.bx.psu.edu/repos/iuc/datamash_ops/datamash_ops/1.9+galaxy0
10 Generate a word cloud This step shows, which characters were censored in the first text. The bigger the word, the more often it appeared in the text. toolshed.g2.bx.psu.edu/repos/bgruening/wordcloud/wordcloud/1.9.6+galaxy0
11 Sort Sorts the quantified results from those appearing most to those appearing least. sort1

Outputs

ID Name Description Type
output_csv #main/output_csv n/a
  • File
output_graphic #main/output_graphic n/a
  • File

Version History

3.0 (latest) Created 11th May 2026 at 14:58 by GTN Bot

Added/updated 4 files


Open master d649876

2.0 Created 4th May 2026 at 14:31 by GTN Bot

Added/updated 4 files


Frozen 2.0 6d411f2

1.0 (earliest) Created 2nd Jun 2025 at 11:01 by GTN Bot

Added/updated 4 files


Frozen 1.0 3f39b6d
help Creators and Submitter
Creators
Not specified
Submitter
Discussion Channel
Activity

Views: 2924   Downloads: 391   Runs: 0

Created: 2nd Jun 2025 at 11:01

Last updated: 11th May 2026 at 14:58

Annotated Properties
Scientific disciplines
Computer Science
help Attributions

None

Total size: 103 KB
Powered by
(v.1.17.3)
Copyright © 2008 - 2026 The University of Manchester and HITS gGmbH