Skip to main content
Enterprise-only When working with multimodal data at scale, LanceDB Enterprise makes it easy to define, extract, and transform raw data into useful information and features for your AI applications. LanceDB Enterprise’s Multimodal Feature Engineering package is designed to improve the productivity of AI engineers operating at immense scale. With an API designed to leverage LanceDB’s optimized data storage and retrieval, it streamlines prototyping extraction and transformation tasks, performing experiments, exploring your data, scaling up execution, and moving to production. LanceDB Multimodal Feature Engineering enables researchers to seamlessly transition from experiments in local notebooks to fully-managed distributed job execution on datasets with billions of rows.

Feature Engineering and the geneva Python package are currently only available as part of LanceDB Enterprise. Please contact us if you’re interested in scaling up your feature engineering workloads for your AI and multimodal use cases.
The geneva package uses Python User Defined Functions (UDFs) to define features as columns in a Lance dataset. Adding a feature is straightforward:
1
Prototype your Python function in your favorite environment.
2
Wrap the function with a small UDF decorator (see UDFs).
3
Register the UDF as a virtual column using Table.add_columns().
4
(Optional, advanced) Override where the job runs — see Advanced Execution Contexts. On LanceDB Enterprise, distributed job execution is fully managed, so most users can skip this step.
5
Trigger a backfill operation (see Backfilling).
You can build your Python feature generator function in an IDE or a notebook using your project’s Python versions and dependencies. geneva will automate much of the dependency and version management needed to move from prototype to scale and production.
Ready to write your first feature? Head to Getting Started.

Continue learning

Visit the following pages to learn more about featuring engineering in LanceDB Enterprise:

API Reference

  • geneva.connect() — connect to a Geneva database
  • Connection — manage tables, views, jobs, clusters, and manifests
  • Table — add columns, backfill, search, and manage table data
  • UDF — define user-defined functions for feature computation