Subpackage: Tokenization Refinement (src.tokenization.refine_results)

Module: src.tokenization.refine_results.absolute_duration

Module: tokenization.refine_results.asbolute_duration.py

This module, a part of the tokenization.refine_results package, refines MIDI data by converting string representations of duration to absolute numerical values.

Functions

corpus_tokenization_refine_data_absolute_duration: Handles a workflow for refining tokenized data by converting duration values to numerical format.
convert_duration_to_numerical: Helper function that converts a duration string to a numerical format.
refine_data_function_absolute_duration: Applies the convert_duration_to_numerical function to the ‘Duration’ column of a DataFrame.

Notes

The module expects CSV files to have a specific structure, including a ‘Duration’ column. Please refer to the individual function docstrings for more detailed descriptions and examples of usage.

iMaT.src.tokenization.refine_results.absolute_duration.convert_duration_to_numerical(duration)[source]

Converts a duration from a string format (“a.b.c”) to a numerical value if the format matches.

The function checks whether the duration starts with the prefix “Duration_”, and if so, removes the prefix before performing the conversion. If the format of the duration doesn’t match the expected format, the function returns the original duration.

Parameters:: duration (str) – The duration value to be converted, in “a.b.c” format.
Returns:: The converted duration as a numerical value, if the format matches. The original duration, if the format doesn’t match.
Return type:: float or str

iMaT.src.tokenization.refine_results.absolute_duration.corpus_tokenization_refine_data_absolute_duration()[source]

Executes a workflow for refining tokenized and cleaned CSV data.

This function guides the user to select a CSV file with a predefined naming pattern, performs data refining operations on the data within, and displays a table with the refined results. The user then has an option to save the refined data into a new CSV file.

Parameters: None

Returns: None

See also

select_csv_file_2d_token_representation: Opens a file dialog allowing the user to select a CSV file.
refine_data_function_absolute_duration: Applies the convert_duration_to_numerical function to the ‘Duration’ column of a pandas DataFrame.

iMaT.src.tokenization.refine_results.absolute_duration.refine_data_function_absolute_duration(df)[source]

Applies the convert_duration_to_numerical function to the ‘Duration’ column of a pandas DataFrame.

This function first checks if the ‘Duration’ column exists in the DataFrame. If so, it applies the convert_duration_to_numerical function to each entry in the column, converting string representations of duration into numerical values where possible.

Parameters:: df (pandas.DataFrame) – The DataFrame to refine.
Returns:: The DataFrame with converted ‘Duration’ column.
Return type:: pandas.DataFrame

See also

convert_duration_to_numerical: Converts a duration from a string format (“a.b.c”) to a numerical value if the format matches.

Module: src.tokenization.refine_results.calculate_pitch_intervals

Module: tokenization.refine_results.calculate_pitch_intervals.py

This module, part of the tokenization.refine_results package, refines tokenized MIDI data by calculating pitch intervals.

Functions

tokenization_calculate_pitch_intervals: Handles a workflow for refining tokenized MIDI data by calculating pitch intervals.
calculate_pitch_intervals_function: Helper function used within tokenization_calculate_pitch_intervals to add a pitch interval column to a DataFrame.

Notes

The module expects CSV files to have a specific structure, including a ‘filename’ column, and pitches should be represented as MIDI pitch values. Please refer to the individual function docstrings for more detailed descriptions and examples of usage.

iMaT.src.tokenization.refine_results.calculate_pitch_intervals.calculate_pitch_intervals_function(df)[source]

Refines a pandas DataFrame by calculating pitch differences between the current row and the next row.

This function first checks if the ‘Pitch’ column exists in the DataFrame. If so, it calculates the pitch differences between the current row and the next row. The operation is performed for each unique filename if a ‘filename’ column exists in the DataFrame. If the ‘Pitch’ column contained non-numeric entries (i.e., had a prefix), it adds the prefix to the calculated difference values.

Parameters:: df (pandas.DataFrame) – The DataFrame to refine.
Returns:: The DataFrame with added ‘PitchDifferenceToNextPitch’ column.
Return type:: pandas.DataFrame

See also

pandas.DataFrame.diff: Calculates the difference of a DataFrame element compared with another element in the DataFrame (default is the element in the same column of the previous row).
pandas.DataFrame.shift: Shifts index by desired number of periods with an optional time freq.

iMaT.src.tokenization.refine_results.calculate_pitch_intervals.tokenization_calculate_pitch_intervals()[source]

Executes a workflow for refining CSV data by calculating pitch intervals.

This function guides the user to select a CSV file, performs data refining operations to calculate pitch differences between the current row and the next row (grouped by filename if available), and displays a table with the results. The user then has an option to save the refined data into a new CSV file.

Parameters: None

Returns: None

See also

select_csv_file_2d_token_representation: Opens a file dialog allowing the user to select a CSV file.
calculate_pitch_intervals_function: Refines a pandas DataFrame by calculating pitch differences between the current row and the next row, grouping by filename if available.

Module: src.tokenization.refine_results.remove_prefixes

Module: tokenization.refine_results.remove_prefixes.py

This module, part of the tokenization.refine_results package, refines tokenized data in a CSV file by removing unwanted prefixes.

Functions

corpus_tokenization_refine_data_remove_prefixes: Handles a workflow for refining tokenized data by removing unwanted prefixes.
remove_prefixes_function: Helper function used within corpus_tokenization_refine_data_remove_prefixes to refine a DataFrame.

Notes

The module expects CSV files to have a specific structure, including a ‘filename’ column, and is designed to remove prefixes like ‘Ignore_’. Please refer to the individual function docstrings for more detailed descriptions and examples of usage.

iMaT.src.tokenization.refine_results.remove_prefixes.corpus_tokenization_refine_data_remove_prefixes()[source]

Refines CSV data by removing unwanted prefixes from the data.

This function guides the user to select a CSV file, performs data refining operations to remove unwanted prefixes (specifically “Ignore_” prefixes and any prefixes that match column names), and displays a table with the results. The user then has an option to save the refined data into a new CSV file.

Parameters: None

Returns: None

See also

select_csv_file_2d_token_representation: Opens a file dialog allowing the user to select a CSV file.
remove_prefixes_function: Refines the pandas DataFrame by removing unwanted prefixes.

iMaT.src.tokenization.refine_results.remove_prefixes.remove_prefixes_function(df)[source]

Refines a pandas DataFrame by removing unwanted prefixes from the data.

This function removes “Ignore_” prefixes from all entries in the DataFrame. Then, it iterates over each column in the DataFrame and removes any prefixes that match the column name.

Parameters:: df (pandas.DataFrame) – The DataFrame to refine.
Returns:: The DataFrame with removed unwanted prefixes.
Return type:: pandas.DataFrame

Module: src.tokenization.refine_results.tokens_to_txt

Module: tokenization.refine_results.tokens_to_txt.py

This module, a part of the tokenization.refine_results package, handles the conversion of tokenized data from a CSV file to individual text files.

Functions

tokenization_export_csv_columns_to_txt_file: Exports a CSV file’s contents to individual text files, with directories for each column.
save_txt_files_to_directory: Saves data from a dictionary into text files, organizing the files into directories based on keys.

Notes

The module expects CSV files to have a specific structure, including a ‘filename’ column. Please refer to the individual function docstrings for more detailed descriptions and examples of usage.

iMaT.src.tokenization.refine_results.tokens_to_txt.save_txt_files_to_directory(data, file_path)[source]

Saves a dictionary of data into text files in directories named after each key.

Each key-value pair in the data dictionary represents a filename and its associated data respectively. For each filename, the function creates a directory and within that directory, it creates a text file for each data column and writes the corresponding data into it. These directories are then bundled into a single directory named ‘extracted_data_[current date and time]’.

Parameters:

data (dict) – The dictionary of data to be saved. Each key-value pair represents a filename and its associated data.
file_path (str) – The original file path used to generate the new directory’s name.

Returns:

The path to the newly created directory.

Return type:

str

See also

os.path.dirname: Returns the directory component of a pathname.
os.makedirs: Recursively creates directories.

iMaT.src.tokenization.refine_results.tokens_to_txt.tokenization_export_csv_columns_to_txt_file()[source]

Exports the columns of a CSV file to individual text files.

This function asks the user to select a CSV file, and then groups the DataFrame by filename. For each group, it concatenates the values of each column into a string. Finally, it saves each column’s concatenated string into individual text files in directories named after each column. These directories are then bundled into a single directory named ‘extracted_data_[current date and time]’.

Parameters: None

Returns: None

See also

select_csv_file_2d_token_representation: Opens a file dialog allowing the user to select a CSV file.
save_txt_files_to_directory: Saves the refined data into text files in directories named after each column.