Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • Guided learning
      Receive custom learning paths powered by our AI assistant.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Build modular AI pipelines with OpenShift AI and reusable components

June 3, 2026
Ana Biazetti Nelesh Singla Matt Prahl
Related topics:
AI inferenceArtificial intelligenceData science
Related products:
Red Hat AIRed Hat OpenShift AI

    Stop reinventing the wheel each time you build a gen AI workflow. You might be manually coding data prep workflows, fine-tuning infrastructure, or evaluation frameworks from scratch. If so, you are spending time on solved problems instead of differentiating your AI applications. Teams often repeat the same work because they don't have a shared component library.

    Red Hat OpenShift AI helps you build modular AI pipelines. This post covers how to use our existing component catalog as a baseline and extend it to support your organization's custom pipelines. You will find out why these components matter and how to contribute to the community. This path allows you to speed up AI development and focus on what differentiates your applications.

    What are reusable components in Red Hat OpenShift AI?

    Reusable components are standardized, modular building blocks for Kubeflow Pipelines (KFP) workflows. These workflows run on Red Hat OpenShift AI and are called AI pipelines. Think of them as functions you can plug into your AI workflows without writing code from scratch.

    Each component handles a specific task. These tasks can include:

    • Data preprocessing
    • Model training
    • Evaluation
    • Model optimization
    • Deployment

    You might need to load data from Amazon S3 or preprocess text for a large language model (LLM). You might also need to run inference on a fine-tuned model. As the reusable components repository grows, more and more components are becoming available, which allows you to use a verified component instead of spending days writing and debugging your own implementation. This modular architecture allows you to focus on what makes your workflow unique. By selecting from the existing catalog, you can build a flexible pipeline tailored to your use case.

    Figure 1 illustrates how these modular stages assemble into a unified workflow layout.

    Pipeline workflow where Component A flows to B, branching to C, D, and E. All components connect to an Object Store.
    Figure 1: A modular AI pipeline workflow showing sequential data flow from a dataset through components A and B to parallel components C, D, and E, with centralized object storage integration.

    The benefits extend beyond time savings. Reusable components promote:

    • Standardization: Teams use consistent logic across different projects.
    • Collaboration: Shared implementations allow teams to work together more easily.
    • Discovery: A centralized hub helps engineers and scientists find existing building blocks.

    Common components include data loaders and text preprocessing utilities. They also include model fine-tuning runners, evaluation calculators, and deployment helpers. When teams use the same component for tokenization, you eliminate inconsistencies. This standardization also reduces the maintenance burden for your team.

    Why this approach matters for gen AI workflows

    Gen AI workflows are complex. A typical fine-tuning pipeline involves many steps:

    • Data collection and cleaning
    • Format conversion for training frameworks
    • Distributed training orchestration
    • Checkpoint management
    • Evaluation across multiple metrics
    • Deployment with inference optimization

    Without reusable components, teams face painful challenges. They deal with duplicated code across projects. They see inconsistent implementations because different AI engineers make different choices. Teams also test in isolation without community validation. Finally, onboarding takes longer as new members struggle to understand custom code instead of documented components.

    The value proposition differs by audience. AI engineers get tested building blocks that integrate with OpenShift AI infrastructure. They spend less time on plumbing and more on model performance. Data scientists who contribute components gain visibility for their work. They also help others solve similar problems. Technical leaders benefit from reduced development costs and faster time to production. They also see increased consistency across AI initiatives.

    Consider a real-world scenario. Your team needs to fine-tune a language model for customer support. You can assemble a pipeline from existing components quickly. You do not need to write custom code for data loading, tokenization, or training. When you need a specialized preprocessing step, you build one new component. You do not have to rebuild the entire pipeline.

    How composable AI pipelines work

    AI pipelines in Red Hat OpenShift AI are used to build and deploy portable AI workflows. They help data scientists use existing OpenShift infrastructure without becoming Kubernetes experts. The platform provides a Python-centric experience through the KFP SDK. You define pipelines in familiar code rather than YAML configuration files.

    At runtime, each component execution becomes a single container execution. This isolation ensures components do not interfere with each other. They can take advantage of different resources in the cluster, such as GPUs or CPUs. They can also run in parallel when possible. Components can also offload execution to multiple container jobs or distributed workload systems like Kubeflow Trainer or Ray.

    The platform provides a rich user interface (UI) to:

    • Visualize your pipeline structure.
    • Monitor runs in real time.
    • Support scheduled runs.
    • Compare experiments and analyze results.
    • Drill into container logs to see where a failure occurred.

    It also provides a consistent API for automation to trigger runs based on custom events as part of the KFP SDK.

    The composable approach means you write less code and maintain less infrastructure. You chain together focused components that each do one thing well. You no longer need to build monolithic training scripts. This modularity makes pipelines easier to understand, test, and modify. You might need to swap out a data loader or try a different evaluation metric. If so, you change one component instead of refactoring an entire code base.

    The two-repository approach

    Red Hat OpenShift AI uses a tiered repository approach. This tiering strategy balances community contribution with stability.

    The primary upstream repository contains standard, generic components for the Kubeflow community. These items should be generic components that can be used across many platforms and environments.

    The secondary repository serves as a more specialized space. Components go here when they have internal dependencies on Red Hat Data Services functionality or are specifically aligned with an OpenShift AI release. They might also be used for internal product enablement.

    Component anatomy: What is inside?

    Every component and pipeline follows a consistent structure. This tiered layout makes them easy to understand and maintain. The standardized architecture follows the following blueprint:

    components/<category>/<subcategory>/
    ├── __init__.py          # Subcategory package
    ├── OWNERS               # Subcategory maintainers
    ├── README.md            # Auto-generated subcategory index (lists components)
    ├── shared/              # Optional shared utilities package
    │   ├── __init__.py
    │   └── training_utils.py # Common code for components in this subcategory
    └── <component_name>/    # Individual component
        ├── __init__.py
        ├── component.py
        ├── metadata.yaml
        ├── OWNERS
        ├── README.md        # Auto-generated from metadata.yaml
        └── tests/

    The component.py file contains the Python-based KFP component definition. This is where you define inputs, outputs, and the actual logic. You use the KFP SDK decorators to specify component metadata. One important rule enforced by continuous integration (CI): only standard library and KFP imports are allowed at the top level of component files. Heavy dependencies like pandas, torch, or datasets must be imported inside the function body. Each component runs in its own container with its own dependencies installed at runtime via packages_to_install.

    For more details on the required files, check out the contribution guidelines, which cover:

    • Required files for components or pipelines
    • metadata.yaml schema
    • OWNERS file format
    • Dependency management
    • Pre-commit validation
    • Testing and quality (import guard, unit tests, LocalRunner tests)
    • Base image policy
    • README generation
    • Workflow and review process (/lgtm and /approve)

    Here are two sample components for illustration purposes.

    component_one/component.py:

    component_one/component.py                                                                                                      
    from kfp import dsl
    @dsl.component(base_image="python:3.11")                                                                                     
      def component_one(message: str = "Component 1") -> str:                                                                      
          """Print a message and pass it downstream."""                                                                            
          print(message)                                                                                                           
          return message  

    component_two/component.py:

    from kfp import dsl                                                                                                                                 
    @dsl.component(base_image="python:3.11")
    def component_two(input_message: str, message: str = "Component 2") -> str:
          """Receive upstream output, print own message, return combined result."""                                                
          print(f"Received: {input_message}")                                                                                      
          print(message)                                                                                                           
          return f"{input_message} -> {message}" 

    Pipelines organize workflow resources using a similar folder hierarchy:

    pipelines/<category>/<subcategory>/
    ├── __init__.py          # Subcategory package
    ├── OWNERS               # Subcategory maintainers
    ├── README.md            # Auto-generated subcategory index (lists pipelines)
    ├── shared/              # Optional shared utilities package
    │   ├── __init__.py
    │   └── workflow_utils.py # Common code for pipelines in this subcategory
    └── <pipeline_name>/     # Individual pipeline
        ├── __init__.py
        ├── pipeline.py
        ├── metadata.yaml
        ├── OWNERS
        ├── README.md        # Auto-generated from metadata.yaml
        └── tests/
    

    Here is an example of a pipeline using the two components described previously.

    sample_pipeline/pipeline.py:

    from kfp import dsl                                                                                                                                 
    from kfp_components.components.sample.component_one import component_one                                                     
    from kfp_components.components.sample.component_two import component_two                                                     
                                                                                                                                                                                                                                                    @dsl.pipeline(name="sample-pipeline", description="Chain two components together")                                           
      def sample_pipeline(                                                                                                         
          component_one_message: str = "Component 1",                                                                              
          component_two_message: str = "Component 2",                                                                              
      ):                                                                                                                           
          """Run component_one, pass its output to component_two."""                                                               
          task_one = component_one(message=component_one_message)                                                                  
          component_two(input_message=task_one.output, message=component_two_message)

    The repositories are coding-agent-ready with tailored AGENTS.md files that allow AI coding assistants to contribute, use, and maintain components autonomously. The guide defines three agent modes: contributing new components, building pipelines from existing ones, and maintaining repository infrastructure. Each mode includes example prompts you can use directly.

    For instance, you can ask your coding agent to search components for similar functionality and reuse if possible, then scaffold a new component using make component CATEGORY=training NAME=my_model. The agent will then follow the repository conventions, generate the right files, and run validations.

    Recommended practices for using components

    Selecting which existing components to use begins with understanding your workflow requirements. Browse the repository to find components that match your specific requirements. Read the README to understand inputs and limitations. You can also ask your coding agent for guidance after you clone the repositories.

    Check the stability level to assess production readiness. Alpha components work but might change. Beta components are more stable but still evolving. Stable components are safe for production use.

    Start simple when building your first pipeline. A single-component pipeline helps you understand the execution model. You define a pipeline function and instantiate a component with parameters. Then, you compile it to YAML. Run it through the OpenShift AI Pipelines UI to verify it works.

    Figure 2 shows the result of the sample pipeline and components introduced in the previous section.

    OpenShift AI user interface showing a successful pipeline run with two sequential nodes, component-one and component-two.
    Figure 2: A successful sample pipeline run within the Red Hat OpenShift AI console displaying the execution graph and parameters of two sequential components.

    Multicomponent pipelines chain together multiple operations. You pass outputs from one component as inputs to the next. The KFP SDK handles data passing automatically through artifacts. You can extend a pipeline by importing reusable components. You then call them like functions in your code.

    Parallelize pipelines when you have independent operations. You might need to preprocess multiple datasets or evaluate multiple models. If so, you can run components in parallel. The platform schedules containers efficiently across your cluster. This parallelization dramatically reduces total execution time.

    You can run an entire pipeline as a component in another pipeline. This nesting capability enables hierarchical composition. Complex workflows are built from smaller, tested subworkflows. This abstraction promotes reuse at the pipeline level, not just the component level.

    Implement conditional logic to skip specific steps. This logic is based on runtime conditions. If your data is already preprocessed, you can skip that step. If a model meets accuracy thresholds, you can skip additional training. The KFP SDK provides conditional decorators to make this easy.

    Always test with small datasets first. This reduces iteration time and execution costs while you debug logic. Once the pipeline runs successfully, scale up to full data. This practice catches configuration errors early. You will avoid spending resources on large runs that might fail. For more information on how to get started with Kubeflow Pipelines, see the Kubeflow Pipelines Getting Started guide.

    Recommended practices for contributing components

    When you contribute components, you make them more valuable to the community. You should follow established quality standards. In addition to following the contribution guidelines, consider the following practices:

    Start by using the built-in scaffolding commands to generate the required directory structure, boilerplate files, tests, and README:

    # Scaffold a new component                                                                                                   make component CATEGORY=data_processing NAME=my_preprocessor                                                                                                      
    
    # Scaffold a new pipeline                                                                                                    make pipeline CATEGORY=training NAME=my_training_pipeline                                                                                                                                                                           
    
    # Generate or update the README from code                                                                                    make readme TYPE=component CATEGORY=data_processing NAME=my_preprocessor     

    These commands create all required files (component.py, __init__.py, metadata.yaml, OWNERS, README.md, and test stubs) following the exact repository conventions. This automation avoids missing files that will fail CI.

    Input validation and error handling

    Validate input parameters at the start of your component. This validation check prevents long-running failures. Nothing is more frustrating than a pipeline that runs for hours before failing. Check that file paths exist before you start processing. Verify that parameter combinations make sense together. You should also validate data formats early.

    For example, your component might expect a model checkpoint directory. Check that the directory exists and contains the required files. Do this within the first few seconds of execution. This fail-fast approach saves time and compute resources.

    Parameter naming

    Use common prefixes for related parameters. This naming convention improves UI readability for the user. Your component might have multiple model-related parameters. If so, prefix them all with model_. You might use model_name, model_path, or model_version. This common prefix groups them visually in the pipeline UI.

    Choose descriptive names over abbreviations. Use model_name instead of mdl. Use learning_rate instead of lr. Clarity is more important than brevity. Someone reading your pipeline should understand each parameter without checking documentation.

    Support multiple input types

    Components should accept artifact inputs, S3 URLs, and file paths. This multiple-input architecture maximizes flexibility for different users. Some users pass artifacts between pipeline components. Others reference datasets in S3. Some work with local file paths during development.

    Your component should detect the input type and handle it correctly. If you receive an S3 URL, download the data first. If you receive a file path, verify it exists. This flexibility makes components work for many different user scenarios.

    Packages and images

    New components can have dependencies on upstream libraries. You must account for these in your design. The component should override the base image with a specialized image. Alternatively, it can install the required packages at runtime.

    You must use custom images published to the GitHub image registry for upstream repos. For the red-hat-data-services repository, only approved Red Hat images are allowed.

    Quality standards

    Components are tagged with stability levels to clarify their development status. Use alpha for initial development and experiments. Use beta once core functionality works but you are still refining details. Move components to stable after community validation.

    Components are flagged for verification if they are not validated within nine months. This validation prevents component rot as dependencies or SDKs evolve. Plan to revalidate your components periodically.

    All components must pass linting and formatting checks. They must also pass documentation checks. This mandatory review process ensures consistency across the entire repository. Run linters locally before you submit your work.

    Components must compile with the latest kfp.compiler. This catches compatibility issues before they reach users. The CI pipeline runs these checks automatically. Finally, include thorough tests. Cover normal operation, edge cases, and error conditions.

    Start building composable AI pipelines today

    Composable AI pipelines and reusable components accelerate gen AI development. They let you focus on what makes your workflows unique. You no longer have to rebuild common infrastructure.

    An effective way to understand how they work is to start building. Explore existing components in the Kubeflow and Red Hat repositories. Read through the READMEs to understand what is possible.

    Build your first pipeline on Red Hat OpenShift AI. Install the KFP SDK and select a few components for a basic task. Run it locally with small test data first. Then, deploy it to your OpenShift AI instance and scale up.

    Contribute your own components when you need new functionality. Share your work back with the community. Start in the Kubeflow repository and follow the contribution guidelines. Write clear documentation and thorough tests.

    Join the community to connect with other AI engineers and data scientists. Ask questions and share your experiences. Your lessons learned will help others build more efficient workflows.

    Ship your first gen AI workflow faster than you thought possible. Start today.

    Related Posts

    • Fine-tune AI pipelines in Red Hat OpenShift AI 3.3

    • SDG Hub: Building synthetic data pipelines with modular blocks

    • Preventing GPU waste: A guide to JIT checkpointing with Kubeflow Trainer on OpenShift AI

    • Fine-tune a RAG model with Feast and Kubeflow Trainer

    • Fine-tune LLMs with Kubeflow Trainer on OpenShift AI

    • Implement MLOps with Kubeflow Pipelines

    Recent Posts

    • Build modular AI pipelines with OpenShift AI and reusable components

    • Kafka Monthly Digest: May 2026

    • UBI 9 and 10 builders on Paketo Buildpacks with multi-arch support

    • Deploy Hermes Agent on OpenShift AI with vLLM model serving

    • Evaluation-driven development with EvalHub

    What’s up next?

    Learning Path intro-to-OS-LP-feature-image

    Introduction to OpenShift AI

    Learn how to use Red Hat OpenShift AI to quickly develop, train, and deploy...
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.