Transformation Utilities

Data transformation utilities help with extracting visual features from images. The current utilities extract visual features using the DinoV2 model and store the results as CSV and pickle files. VisArchPy provides one transformation utility: dino.

Dino

Utility functions to extract visual features using DINOv2 model and the huggingface transformers package.

Examples

The following examples show how to extract visual features of images using the facebook/dinov2-small model.

visarch dino from-file <path-image-file>
import os
from visarchpy.dino.transformer import (transform_to_dinov2,
                                        save_csv_dinov2,
                                        save_pickle_dinov2)

image_file = './test-image.png'
model = 'facebook/dinov2-small'
output_dir = './dinov2'  # directory to save outputs
os.makedirs(output_dir, exist_ok=True)

# fetch name of image file
filename = os.path.basename(imgae_file).split('.')[0]

# extract visual features
results = transform_to_dinov2(iamge_file, model)

# save features as Pandas data frame to CSV file
save_csv_dinov2(os.path.join(output, filename + '.csv'), results['tensor'])

# save model outputs to pickle file
save_pickle_dinov2(os.path.join(output, filename + '.pickle'), results['object'])

Tip

Use visarch dino [SUBCOMMAND] -h to see which options are available in the CLI. Or consult the Python Reference if using Python.

Outputs

The dino transformation tools transform images in a directory into tensors and Python objects. The results are organized as follows.

dinov2  # default output directory
 └── pdf-001  # directory named after the input directory
     ├── 00001-page1-Im0.csv  # Pytorch tensor as Pandas dataframe
     ├── 00001-page1-Im0.pickle  # Huggingface object with full model outputs
     ├── 00001-page1-Im1.csv
     ├── 00001-page1-Im1.pickle
     ├── 00001-page1-Im2.csv
     └── 00001-page1-Im2.pickle

Important

  • The dino transformation tools will overwrite existing files in the output directory.

  • The tensor in the CSV files is a Pytorch tensor converted to a Pandas data frame. The object in the pickle files is a Huggingface object with the full model outputs. See the Huggingface documentation for more information.