Subpackage: Tokenization Refinement (src.tokenization.refine_results)
Module: src.tokenization.refine_results.absolute_duration
Module: tokenization.refine_results.asbolute_duration.py
This module, a part of the tokenization.refine_results package, refines MIDI data by converting string representations of duration to absolute numerical values.
Functions
corpus_tokenization_refine_data_absolute_duration: Handles a workflow for refining tokenized data by converting duration values to numerical format.
convert_duration_to_numerical: Helper function that converts a duration string to a numerical format.
refine_data_function_absolute_duration: Applies the convert_duration_to_numerical function to the ‘Duration’ column of a DataFrame.
Notes
The module expects CSV files to have a specific structure, including a ‘Duration’ column. Please refer to the individual function docstrings for more detailed descriptions and examples of usage.
- iMaT.src.tokenization.refine_results.absolute_duration.convert_duration_to_numerical(duration)[source]
Converts a duration from a string format (“a.b.c”) to a numerical value if the format matches.
The function checks whether the duration starts with the prefix “Duration_”, and if so, removes the prefix before performing the conversion. If the format of the duration doesn’t match the expected format, the function returns the original duration.
- Parameters:
duration (str) – The duration value to be converted, in “a.b.c” format.
- Returns:
The converted duration as a numerical value, if the format matches. The original duration, if the format doesn’t match.
- Return type:
float or str
- iMaT.src.tokenization.refine_results.absolute_duration.corpus_tokenization_refine_data_absolute_duration()[source]
Executes a workflow for refining tokenized and cleaned CSV data.
This function guides the user to select a CSV file with a predefined naming pattern, performs data refining operations on the data within, and displays a table with the refined results. The user then has an option to save the refined data into a new CSV file.
Parameters: None
Returns: None
See also
select_csv_file_2d_token_representationOpens a file dialog allowing the user to select a CSV file.
refine_data_function_absolute_durationApplies the convert_duration_to_numerical function to the ‘Duration’ column of a pandas DataFrame.
- iMaT.src.tokenization.refine_results.absolute_duration.refine_data_function_absolute_duration(df)[source]
Applies the convert_duration_to_numerical function to the ‘Duration’ column of a pandas DataFrame.
This function first checks if the ‘Duration’ column exists in the DataFrame. If so, it applies the convert_duration_to_numerical function to each entry in the column, converting string representations of duration into numerical values where possible.
- Parameters:
df (pandas.DataFrame) – The DataFrame to refine.
- Returns:
The DataFrame with converted ‘Duration’ column.
- Return type:
pandas.DataFrame
See also
convert_duration_to_numericalConverts a duration from a string format (“a.b.c”) to a numerical value if the format matches.
Module: src.tokenization.refine_results.calculate_pitch_intervals
Module: tokenization.refine_results.calculate_pitch_intervals.py
This module, part of the tokenization.refine_results package, refines tokenized MIDI data by calculating pitch intervals.
Functions
tokenization_calculate_pitch_intervals: Handles a workflow for refining tokenized MIDI data by calculating pitch intervals.
calculate_pitch_intervals_function: Helper function used within tokenization_calculate_pitch_intervals to add a pitch interval column to a DataFrame.
Notes
The module expects CSV files to have a specific structure, including a ‘filename’ column, and pitches should be represented as MIDI pitch values. Please refer to the individual function docstrings for more detailed descriptions and examples of usage.
- iMaT.src.tokenization.refine_results.calculate_pitch_intervals.calculate_pitch_intervals_function(df)[source]
Refines a pandas DataFrame by calculating pitch differences between the current row and the next row.
This function first checks if the ‘Pitch’ column exists in the DataFrame. If so, it calculates the pitch differences between the current row and the next row. The operation is performed for each unique filename if a ‘filename’ column exists in the DataFrame. If the ‘Pitch’ column contained non-numeric entries (i.e., had a prefix), it adds the prefix to the calculated difference values.
- Parameters:
df (pandas.DataFrame) – The DataFrame to refine.
- Returns:
The DataFrame with added ‘PitchDifferenceToNextPitch’ column.
- Return type:
pandas.DataFrame
See also
pandas.DataFrame.diffCalculates the difference of a DataFrame element compared with another element in the DataFrame (default is the element in the same column of the previous row).
pandas.DataFrame.shiftShifts index by desired number of periods with an optional time freq.
- iMaT.src.tokenization.refine_results.calculate_pitch_intervals.tokenization_calculate_pitch_intervals()[source]
Executes a workflow for refining CSV data by calculating pitch intervals.
This function guides the user to select a CSV file, performs data refining operations to calculate pitch differences between the current row and the next row (grouped by filename if available), and displays a table with the results. The user then has an option to save the refined data into a new CSV file.
Parameters: None
Returns: None
See also
select_csv_file_2d_token_representationOpens a file dialog allowing the user to select a CSV file.
calculate_pitch_intervals_functionRefines a pandas DataFrame by calculating pitch differences between the current row and the next row, grouping by filename if available.
Module: src.tokenization.refine_results.remove_prefixes
Module: tokenization.refine_results.remove_prefixes.py
This module, part of the tokenization.refine_results package, refines tokenized data in a CSV file by removing unwanted prefixes.
Functions
corpus_tokenization_refine_data_remove_prefixes: Handles a workflow for refining tokenized data by removing unwanted prefixes.
remove_prefixes_function: Helper function used within corpus_tokenization_refine_data_remove_prefixes to refine a DataFrame.
Notes
The module expects CSV files to have a specific structure, including a ‘filename’ column, and is designed to remove prefixes like ‘Ignore_’. Please refer to the individual function docstrings for more detailed descriptions and examples of usage.
- iMaT.src.tokenization.refine_results.remove_prefixes.corpus_tokenization_refine_data_remove_prefixes()[source]
Refines CSV data by removing unwanted prefixes from the data.
This function guides the user to select a CSV file, performs data refining operations to remove unwanted prefixes (specifically “Ignore_” prefixes and any prefixes that match column names), and displays a table with the results. The user then has an option to save the refined data into a new CSV file.
Parameters: None
Returns: None
See also
select_csv_file_2d_token_representationOpens a file dialog allowing the user to select a CSV file.
remove_prefixes_functionRefines the pandas DataFrame by removing unwanted prefixes.
- iMaT.src.tokenization.refine_results.remove_prefixes.remove_prefixes_function(df)[source]
Refines a pandas DataFrame by removing unwanted prefixes from the data.
This function removes “Ignore_” prefixes from all entries in the DataFrame. Then, it iterates over each column in the DataFrame and removes any prefixes that match the column name.
- Parameters:
df (pandas.DataFrame) – The DataFrame to refine.
- Returns:
The DataFrame with removed unwanted prefixes.
- Return type:
pandas.DataFrame
Module: src.tokenization.refine_results.tokens_to_txt
Module: tokenization.refine_results.tokens_to_txt.py
This module, a part of the tokenization.refine_results package, handles the conversion of tokenized data from a CSV file to individual text files.
Functions
tokenization_export_csv_columns_to_txt_file: Exports a CSV file’s contents to individual text files, with directories for each column.
save_txt_files_to_directory: Saves data from a dictionary into text files, organizing the files into directories based on keys.
Notes
The module expects CSV files to have a specific structure, including a ‘filename’ column. Please refer to the individual function docstrings for more detailed descriptions and examples of usage.
- iMaT.src.tokenization.refine_results.tokens_to_txt.save_txt_files_to_directory(data, file_path)[source]
Saves a dictionary of data into text files in directories named after each key.
Each key-value pair in the data dictionary represents a filename and its associated data respectively. For each filename, the function creates a directory and within that directory, it creates a text file for each data column and writes the corresponding data into it. These directories are then bundled into a single directory named ‘extracted_data_[current date and time]’.
- Parameters:
data (dict) – The dictionary of data to be saved. Each key-value pair represents a filename and its associated data.
file_path (str) – The original file path used to generate the new directory’s name.
- Returns:
The path to the newly created directory.
- Return type:
str
See also
os.path.dirnameReturns the directory component of a pathname.
os.makedirsRecursively creates directories.
- iMaT.src.tokenization.refine_results.tokens_to_txt.tokenization_export_csv_columns_to_txt_file()[source]
Exports the columns of a CSV file to individual text files.
This function asks the user to select a CSV file, and then groups the DataFrame by filename. For each group, it concatenates the values of each column into a string. Finally, it saves each column’s concatenated string into individual text files in directories named after each column. These directories are then bundled into a single directory named ‘extracted_data_[current date and time]’.
Parameters: None
Returns: None
See also
select_csv_file_2d_token_representationOpens a file dialog allowing the user to select a CSV file.
save_txt_files_to_directorySaves the refined data into text files in directories named after each column.