Education
Technology
Oracle Corporation
Oracle University Podcast delivers convenient, foundational training on popular Oracle technologies such as Oracle Cloud Infrastructure, Java, Autonomous Database, and more to help you jump-start or advance your career in the cloud.
Total 95 episodes
1
2
Go to
03/12/2024

Best of 2024: Preparing to Extend Oracle Fusion Apps Using Visual Builder Studio

What do you need to start customizing the next generation of Oracle Fusion Apps? How do you create new pages for business processes? What level of expertise do you require for this?   Join Lois Houston and Nikita Abraham as they get answers to all these questions and more from Senior Principal OCI Instructor Joe Greenwald.   Survey: https://customersurveys.oracle.com/ords/surveys/t/oracle-university-gtm/survey?k=focus-group-2-link-share-5   Develop Fusion Applications Using Visual Builder Studio: https://mylearn.oracle.com/ou/course/develop-fusion-applications-using-visual-builder-studio/138392/   Build Visual Applications Using Oracle Visual Builder Studio: https://mylearn.oracle.com/ou/course/build-visual-applications-using-oracle-visual-builder-studio/137749/   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X (formerly Twitter): https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started. 00:26 Lois: Welcome to the Oracle University Podcast! I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Team Lead of Editorial Services. Nikita: Hi there! You’re listening to our Best of 2024 series, where over the next few weeks, we’ll be revisiting four of our most popular episodes of the year. Lois: Today’s episode is #2 of 4, and we’re throwing it back to another episode with our friend and Senior Principal OCI Instructor Joe Greenwald. This episode is all about extending Oracle Cloud Applications that are being built using Visual Builder for the front-end. 01:04 Nikita: Right, Lois. We began by asking Joe to explain what’s happening with the redesign and re-architecture of Oracle Cloud Applications using Visual Builder Studio, or VBS.  Joe: That’s right, Niki. Oracle is redesigning and rebuilding its entire suite of Fusion Cloud Applications, over 330 different products, utilizing over 60,000 engineers — that is “60,” not “16”—at Oracle to develop the next generation of Oracle Fusion Applications. What’s most exciting is that the same tools the engineers are using to accomplish this are available to our partners and our customers to use to extend the functionality and capabilities of Fusion Applications to meet their custom needs and processes.  01:45 Lois: That’s pretty awesome! We want to use this time today to ask you about extensions, the types of extensions you can create, and how to use Visual Builder Studio to create those extensions. Nikita: Yeah, can we start with you telling us what an extension is? I’ve gotten the sense that Oracle uses the term extension as both a noun and a verb and that’s a bit confusing to me. Joe: Yeah, good catch, Niki. Yes, Oracle does use the term extension in two ways: both as a noun and a verb. As a noun, an extension is a container for the code changes that you make to your applications. Basically, it’s a Git repository that Oracle creates and manages for you. So, the extension container holds the code changes you make to your page layouts: the fields, their positioning, showing and hiding fields, that sort of thing, as well as page functionality. These code changes you make are stored in the extension and it is this extension with your code changes that is merged with the main Git branch eventually and then deployed using continuous integration/continuous deployment jobs defined in Visual Builder Studio, which manages the project and its assets. Your extension is a Git branch that is an asset of the project. Once your extension code is merged with the main branch and deployed, then the next time someone brings up the application, they’ll see the changes you’ve made in the app. 02:59 Lois: And as a verb? Joe: As a verb, extension means to extend the functionality and the look and feel of the application, though I prefer the term customization or configuration to describe this aspect, as the documentation does, and to avoid confusion, though I’ll admit I’m not always consistent about the terms I use. 03:16 Lois: What types of customizations, or extensions, and I’m using the verb now, are available for Fusion Apps in Visual Builder Studio? Joe: There are three different ways Fusion Apps can be customized effectively, configured, or extended. The first way is what we call a basic extension, where you’re rearranging hiding, or showing, or moving around fields and sections on the page that have been set up to be extendable by the Fusion Application development teams. Things like hiding fields, showing fields, hiding sections, showing sections…  03:45 Nikita: So fairly basic actions… Joe: Yeah exactly and they can be done in Visual Builder Studio Designer by people with minimal VB training, Visual Builder training. And, most recently, if you have access to it, you can do it in the new Express mode, where the page shows you just those things you can work with and just the tools you need to work with the page. This is new and makes it much easier for folks who are not highly technical to make basic changes to the page layout. 04:09 Lois: People like me! That sounds easy enough. Joe: And the next type of extension is more of an intermediate change and requires some training with Visual Builder Studio because you’re creating rules that govern the display of layouts based on certain conditions on the page. These are highly flexible, powerful, and useful for creating customized page layouts based on a variety of factors from page size and orientation to the role of the person using it to values in the actual fields on the page itself. These rules can be combined to create complex rule-based conditions that display exactly what the user should see, given the conditions of the page and their role. I would also include making changes to action chains, which execute sequences of behaviors and navigation, and the actual structure of the application, but this is more advanced.  Lastly, is creating mashup applications, which are stand-alone Visual Builder visual applications, which use data from Fusion apps, and customer data sources, like their own database tables, and potentially third-party APIs to create brand new pages and applications with new functionality, new processes, new procedures, new displays, all of which look just like Fusion Applications and use the same data as Fusion applications. 05:18 Lois: Joe, how do I get started if I want to extend a page?  Joe: The easiest way to do it is to open a page in Fusion Applications and then select Edit Page in Visual Builder Studio from the Profile menu. You’re then prompted for a project to hold the Git repository for the extension container. And since there’s probably already one that exists, after you select the project, an extension Git container is assigned to you. Unless this is the very first time the application has been extended in which case it creates an extension for you. When creating customizations or configurations, we recommend that each application be done in its own separate project. So, for example, if you’re working on Customer Experience Sales, you might do it in Project A and if you’re working on extensions with HCM, you might do it in Project B. And if you decide to create your own pages and flows in your own app, you might do that in Project C.  06:04 Nikita: But why do you need to do this? Joe: That’s just to keep things nice and separate and organized. The tool, Visual Builder Studio, doesn’t really care, but it makes for cleaner development and can help with the management of the development teams. 06:14 Nikita: Ok, Joe, I have a question. How do I know if the page I’m on in Fusion Apps can be edited in Visual Builder? I know there are a lot of legacy pages still out there and they can co-exist with the new VB-based pages. Joe: If the URL of the page you’re on has the word /Redwood in it instead of /faces, then you know this is a page that was created using Visual Builder Studio and you’ll be able to extend it and make changes to it using the Edit in Visual Builder Studio option. So, if you select Edit in Visual Builder Studio, then the page you are on opens inside Visual Builder Studio Designer and you can make changes to any part of the page that has been explicitly enabled for extension by the development team. 06:53 Lois: That’s an important part, right? The application is not extendable by default.  Joe: That’s right, Lois. It is all locked down and you can’t make any changes to it by default. The development team must specifically enable certain parts of the page: sections, fields, layouts, variables, types, action chains, etc. as extendable for you to be able to make changes to it. This ensures the changes the development team makes to the application in the future won’t break your extensions. And conversely, the development team can choose to not extend portions that they do not want you to touch or mess with. Then if they do change that bit of the app in the future, it won’t break the application and you won’t get a big surprise. So, using the Edit page in Visual Builder Studio, you can make both basic changes, like moving, showing, and hiding fields and sections, as well as the more intermediate types of configurations, like using dynamic components to create rule-based layouts that change dynamically based on several conditions such as page size, roles of the user, and field values on the page itself. 07:51 Nikita: What happens if two developers make changes and essentially overwrite each other’s customizations — say one hides a field and another later exposes it? Joe: Well, whoever commits their changes and deploys last wins. The other developer’s changes get overwritten. So, this is something the team would want to consider carefully. It is possible to roll back to an earlier version if one must. And this can be done in Visual Builder Studio — the part that manages project assets like Git repositories. And there are Oracle blog posts about how to do that if you’re interested in learning more. 08:20 Lois: Joe, earlier you mentioned creating new pages and flows, but so far you’ve only talked about modifying existing extendable pages. How do I create new pages and flows? Joe: In a Visual Builder extension, a set of pages and flows is called an App UI. When I use the terms pages and flows, what I’m talking about is a set of pages that are logically related—whatever logical means to the designer and developer—in a group called a flow that you can navigate between. But you can also navigate between flows and even between applications. So, without getting too technical, each application has a default flow, which has a default page where that flow starts when the app first comes up. So, you can think of an App UI as a collection of flows and their pages, and a URL that accesses the default flow and its default page. That’s the page you would see first when accessing that URL. Of course, this can be configured and changed by the developer, as needed. Now, when Oracle creates the original application (for example, digital sales, helpdesk, or something like that), we create an App UI, which contains the pages and flows for that application and is the “entry point” into the app, accessing that App UI’s default flow and its default page and then things flow on from there. Partners and customers can create their own application extensions that are dependent on an Oracle application and even create their own App UI – their own sets of pages and flows to accommodate their own processing and workflow needs. This gives them the ability to add their own processes and rules, and still leverage and navigate to the core application that Oracle built. For example, say Oracle delivered digital sales as an Oracle Cloud Application built using Visual Builder to a customer and the customer needs to add a few pages to do some validation or other type of business processing before entering the digital sales application. What the customer does, in this case, is create a new extension of the Oracle Digital Sales app and an App UI of their own, which would be the set of pages and flows that contain the processing they want to start with before then navigating into the digital sales app to use Oracle’s application. 10:22 Nikita: Wait, did I hear that correctly? We’re creating an extension of an extension or creating an extension on an existing extension? Joe: I know, right? I realize this can sound confusing the first time you hear it or the second time or even the third time. It took me a while to get my head around what they’re talking about. Let’s start with a Fusion application. In a Fusion application, everything is an extension of something. This is just how the code base and the architecture are organized and how they manage the Git repositories and the code base itself. So, Oracle created a base application called the Unified App. The Unified Application contains the basic page structure and common functionality needed for all applications. For example, it contains the header at the top that has the profile and the footer at the bottom of the page that has that little Ask Oracle icon. Within that page, between the header and the footer, are the pages that are created by the developers, whether they be Oracle engineers or partners or customers. They display the contents of the page with the data and the layouts and all of that. In a sense, you can think of the Unified App as an index page, the starting page of the web application. Though that’s not completely true technically, it’s good enough for illustrative purposes. So, Oracle starts with the Unified App and then a development team extends that Unified App to build their product. This is how digital sales did it. This is how customer experience did it. This is how helpdesk did it. They start with the Unified App and they extend that and create an App UI that contains the flows and pages for their specific application, and then add functionality for all the pages and flows, as needed for the design. Partners and customers can then create a new extension that extends the Oracle Application and add their own App UI and their own URL if they want their pages accessed first, before navigating to the Oracle application. For example, if the digital sales application has functionality you’d like to leverage, like it has data services or fragments or page layouts that you want to reuse or other things, you extend the digital sales application, and this extension holds your code changes. You could then create a new App UI, and once deployed, users can use that URL for the new App UI to access your new pages. And your page can then navigate to the Oracle app when it needs to. Though I will say to date, we’re really not seeing much demand for this particular use case, but it is possible. 12:33 Lois: Is that the only option available to customers and partners—to extend an existing Oracle application? Joe: No, Lois. We’re seeing customers and partners create brand new Fusion applications of their own, based on the Unified App Oracle created. In a sense, doing the same thing that our development teams here are doing.  Remember, I said an Oracle development team starts with the Unified App, which has common functionality and look and feel for all applications, and then extends that to add business rules processing, flows, App UI, whatever they need for their specific Oracle application. We’re seeing our partners and customers wanting to build their own applications. Maybe a customer or partner wants to create a Time & Expense application and leverage the Fusion application data and the APIs available, but define their own flows, their own pages, their own processing. This is very easy to do. They’d start by extending the Unified App just like the Oracle development teams do, and then build their own App UI and within that, their own flows, pages, and custom processing. The nice thing about it is that the application looks and works and feels just like a Fusion application and it appears alongside other Fusion applications, because it is a Fusion application. 13:43 Did you know that the Oracle University Learning Community regularly holds live events hosted by Oracle expert instructors. Find out how to prepare for your certification exams. Learn about the latest technology advances and features. Ask questions in real time and learn from an Oracle subject matter expert. From Ask Me Anything about certification to Ask the Instructor coaching sessions, you’ll be able to achieve your learning goals for 2024 in no time. Join a live event today and witness firsthand the transformative power of the Oracle University Learning Community. Visit mylearn.oracle.com to get started.  14:24 Nikita: Welcome back! So Joe, it sounds like there are two different paths or life cycles to create extensions for future applications in Visual Builder Studio. Is that correct? Joe: Yes, exactly. So one path to extending the functionality of Fusion apps is to edit the page in Visual Builder Studio, which opens the page in Visual Builder Designer, and you then make changes to the existing pages, depending on what the development team has made extendable.  14:49 Nikita: But you can’t create new pages and flows in this scenario, right? Joe: This is strictly about modifying an existing page. The other path is creating a new application extension, which is a new application from scratch or extending an existing Oracle application or even an existing partner or customer application. Again, we’re not seeing this typically being done too much. Most partners and customers create new applications or make customizations to existing pages. But the architecture does support it. So, your partner might create a new application based on the production app released by Oracle, and you could extend their application. Or a development team at your site could extend Oracle’s application and you could then extend that team’s application. This is mechanically possible, although I question the use case behind that. Usually, we see our apps being extended – becoming a dependency when there’s code that can be leveraged or reused for a new app and its new App UI. 15:40 Lois: Joe, what did you mean when you say one extension is a dependency of another? Can you talk a bit about dependencies, what that means, how it looks to the developer? Joe: When you extend an application, it becomes a dependency to your application, and you get access to all the resources within that dependency that are marked as extendable by the developer who created that extension. Most useful are things like service connections to REST APIs from Fusion apps data sources, reusable code fragments, and layouts that you can leverage in those cases where you want to create a new App UI. When an extension is listed as a dependency, you’ll see this graphically in Visual Builder Studio Designer. When you see an extension listed as a dependency, it means you can reference any of that extension’s resources that have been marked extendable by the developer. Recall all resources are closed off or hidden by default, but development teams can mark resources as open to being extended and reused, and then you can see and use those resources. So, you can easily add and remove extensions as dependencies in Visual Builder Designer as needed. Now, this can be a nice way to modularize and reuse your resources and assets. To summarize: I can modify an existing page – this is most common, extend an existing application and create a new App UI, which is not common, or I can extend the unified app to create a new app and a new App UI and add other extensions as dependences, as needed, to leverage their services, fragments, and layouts when building my own pages – this is pretty common as well. 17:04 Nikita: There’s one thing I’d like to come back to, Joe. You mentioned something called a mashup application earlier. Can you tell us a little more about that? Joe: To recap: I mentioned a couple of different ways that you can extend Fusion applications. One is changing layouts or creating rule-based layouts. You can also extend existing apps and create your own App UI on top of them or create your own Fusion app from scratch. But these are Fusion apps and they have restrictions.  These can only run within the Fusion applications ecosystem, which means they can only be accessed by people who are registered in the Fusion application ecosystem, and there are some other restrictions (for example, in terms of the APIs you can access). And you also have no access to customer data tables. Mashup applications use the stand-alone Visual Builder Cloud Service, which enables you to create custom visual applications. These are visual applications that run outside the Fusion apps ecosystem. Users only need to be identified to the Identity Cloud Service, IDCS, and then they can get access to these mashup apps, depending on the roles and privileges given to them, of course. These mashup applications can access Fusion apps API data, as well as customer database tables, Excel spreadsheet data, CSV files, and third-party APIs. And all this data can appear on the same page, in the same app, using the same Redwood components, so they look and work just like Fusion applications. 18:22 Lois: I know in the past there’s been some friction to making changes in Fusion applications. Partner and customer developers use different tools than the ones Oracle engineers use and there have been some deployment issues. To wrap up things, can you tell us why customers should use Visual Builder Studio to customize Fusion apps? Joe: Glad to, Lois. The big benefit to customers is that they are using the exact same tools, Visual Builder Designer for page design work and Visual Builder Studio for project and code management, to build the customizations and extensions that Oracle is using to create the applications and extensions that are delivered to them. I can’t emphasize enough how big a deal this is and how wonderful it is for the customer. We’re constantly making the Visual Builder Designer interface easier and easier to work with. We’re currently releasing a new version of Visual Builder Designer—the Express mode version. This version of Designer is lightweight and has only the necessary features required to allow you to make changes to pages and layouts, and create and manage dynamic rule-based layouts. If you need more (for example, you need to create service connections, fragments, and do a lot more of that type of advanced work), then use the advanced version of the Designer. Both are available to you, assuming that your user has the appropriate permission and the Fusion app you are using has implemented Express Designer. 19:37 Lois: OK Joe, what courses does Oracle University offer for me if I wanted to learn more about developing extensions for Fusion apps and creating mashup apps using Visual Builder Studio? Joe: Oracle University has several courses. We have the Develop Visual Applications Using Visual Builder Studio, which focuses on creating the stand-alone custom bespoke mashup visual applications. We also have our Design and Develop Redwood Applications course, which goes into detail about working with the Redwood page templates and components. All these courses are free and available today. And all you need to do is log in to mylearn.oracle.com to get started. 20:10 Nikita: We hope you enjoyed that conversation. Just a quick reminder before we close about the short survey we’ve put together to get your thoughts on the podcast. It’ll take just a few minutes and will help us make the podcast even better. Just click the link in the show notes to participate. Join us next week for another throwback episode. Until then, this is Nikita Abraham... Lois: And Lois Houston, signing off! 20:33 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
21m
26/11/2024

Best of 2024: Introduction to Visual Builder Studio, Visual Builder Cloud Service, Stand-Alone, and JET

The next generation of front-end user interfaces for Oracle Fusion Applications is being built using Visual Builder Studio and Oracle JavaScript Extension Toolkit. However, many of the terms associated with these tools can be confusing.   In this episode, Lois Houston and Nikita Abraham are joined by Senior Principal OCI Instructor Joe Greenwald. Together, they take you through the different terminologies, how they relate to each other, and how they can be used to deliver the new Oracle Fusion Applications as well as stand-alone, bespoke visual web applications.   Survey: https://customersurveys.oracle.com/ords/surveys/t/oracle-university-gtm/survey?k=focus-group-2-link-share-5   Develop Fusion Applications Using Visual Builder Studio: https://mylearn.oracle.com/ou/course/develop-fusion-applications-using-visual-builder-studio/138392/   Build Visual Applications Using Oracle Visual Builder Studio: https://mylearn.oracle.com/ou/course/build-visual-applications-using-oracle-visual-builder-studio/137749/   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X (formerly Twitter): https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started. 00:26 Nikita: Hello and welcome to the Oracle University Podcast! I’m Nikita Abraham, Team Lead of Editorial Services with Oracle University, and with me is Lois Houston, Director of Innovation Programs. Lois: Hi there! If you’ve been following along with us, you’ll know that we’ve had some really interesting seasons this year. We covered Autonomous Database, Artificial Intelligence, Visual Builder Studio and Redwood, OCI Container Engine for Kubernetes, and Oracle Database 23ai New Features. Nikita: And we’ve had some pretty awesome special guests. Do go back and check out those episodes if any of those topics interest you. 01:04 Lois: As we close out the year, we thought this would be a good time to revisit some of our best episodes. Over the next few weeks, you’ll be able to listen to four of our most popular episodes of the year.  Nikita: Right, this is the best of the best–according to you–our listeners.   Lois: Today’s episode is #1 of 4 and is a throwback to a discussion with Senior Principal OCI Instructor Joe Greenwald on Visual Builder Studio. Nikita: We asked Joe about Visual Builder Studio and Oracle JavaScript Extension Toolkit, also known as JET. Together, they form the basis of the technology for the next generation of front-end user interfaces for Oracle Fusion Applications, as well as many other Oracle applications, including most Oracle Cloud Infrastructure (OCI) interfaces. 01:48 Lois: We looked at the different terminologies and technologies, how they relate to each other, and how they deliver the new Oracle Fusion Applications and stand-alone, bespoke visual web applications.  So, let’s dive right in. Nikita: Joe, I’m somewhat thrown by the terminology around Visual Builder, Visual Studio, and JET. Can you help streamline that for us? Lois: Yeah, things that are named the same sometimes refer to different things, and sometimes things with a different name refer to the same thing. 02:18 Joe: Yeah, I know where you’re coming from. So, let’s start with Visual Builder Studio. It’s abbreviated as VBS and can go by a number of different names. Some of the most well-known ones are Visual Builder Studio, VBS, Visual Builder, Visual Builder Stand-Alone, and Visual Builder Cloud Service. Clearly, this can be very confusing. For the purposes of these episodes as well as the training courses I create, I use certain definitions.  02:42 Lois: Can you take us through those? Joe: Absolutely, Lois. Visual Builder Studio refers to a product that comes free with an OCI account and allows you to manage your project-related assets. This includes the project itself, which is a container for all of its assets. You can assign teams to your projects, as well as secure the project and declare roles for the different team members. You manage GIT repositories with full graphical and command-line GIT support, define package, build, and deploy jobs, and create and run continuous integration/continuous deployment graphical and code-managed pipelines for your applications. These can be visual applications, created using the Visual Builder Integrated Development Environment, the IDE, or non-visual apps, such as Java microservices, docker builds, NPM apps, and things like that. And you can define environments, which determine where your build jobs can be deployed. You can also define issues, which allow you to identify, track, and manage things like bugs, defects, and enhancements. And these can be tracked in code review merge requests and build jobs, and be mapped to agile sprints and scrum boards. There’s also support for wikis for team collaboration, code snippets, and the management of the repository and the project itself. So, VBS supports code reviews before code is merged into GIT branches for package, build, and deploy jobs using merge requests. 04:00 Nikita: OK, what exactly do you mean by that? Joe: Great. So, for example, you could have developers working in one GIT branch and when they’re done, they would push their private code changes into that remote branch. Then, they’d submit a merge request and their changes would be reviewed. Once the changes are approved, their code branch is merged into the main branch and then automatically runs a CI/CD package (continuous integration/continuous deployment) package, build, and deploy job on the code. Also, the CI/CD package, build, and deploy jobs can run against any branches, not just the main branch. So Visual Builder Studio is intended for managing the project and all of its assets. 04:37 Lois: So Joe, what are the different tools used in developing web applications? Joe: Well, Visual Builder, Visual Builder Studio Designer, Visual Builder Designer, Visual Builder Design-Time, Visual Builder Cloud Service, Visual Builder Stand-Alone all kind of get lumped together. You can kinda see why. What I’m referring to here are the tools that we use to build a visual web application composed of HTML5, CSS3, JavaScript, and JSON (JavaScript Object Notation) for metadata. I call this Visual Builder Designer. This is an Integrated Development Environment, it’s the “IDE” which runs in your browser. You use a combination of drag and drop, setting properties, and writing and modifying custom and generated code to develop your web applications. You work within a workspace, which is your own private copy of a remote Git branch. When you’re ready to start development work, you open an existing workspace or create a new one based on a clone of the remote branch you want to work on. Typically, a new branch would be created for the development work or you would join an existing branch. 05:38 Nikita: What’s a workspace, Joe? Is it like my personal laptop and drive? Joe: A workspace is your own private code area that stores any changes you make on the Oracle servers, so your code changes are never lost—even when working in a browser-based, network-based tool. A good analogy is, say I was working at home on my own machine. And I would make a copy of a remote GIT branch and then copy that code down to my local machine, make my code changes, do my testing, etc. and then commit my work—create a logical save point periodically—and then when I’m ready, I’d push that code up into the remote branch so it can be reviewed and merged with the main branch. My local machine is my workspace. However, since this code is hosted up by Oracle on our servers, and the code and the IDE are all running in your browser, the workspace is a simulation of a local work area on your own computer. So, the workspace is a hosted allocation of resources for you that’s private. Other people can’t see what’s going on in your workspace. Your workspace has a clone of the remote branch that you’re working with and the changes you make are isolated to your cloned code in your workspace. 06:41 Lois: Ok… the code is actually hosted on the server, so each time you make a change in the browser, the change is written back to the server? Is it possible that you might lose your edits if there’s a networking interruption? Joe: I want to emphasize that while I started out not personally being a fan of web-based integrated development environments, I have been using these tools for over three years and in all that time, while I have lost a connection at times—networks are still subject to interruptions—I’ve never lost any changes that I’ve made. Ever. 07:11 Nikita: Is there a way to save where you are in your work so that you could go back to it later if you need to? Joe: Yes, Niki, you’re asking about commits and savepoints, like in a Git repository or a Git branch. When you reach a logical stopping or development point in your work, you would create a commit or a savepoint. And when you’re ready, you would push that committed code in your workspace up to the remote branch where it can be reviewed and then eventually merged, usually with the main Git branch, and then continuous integration/continuous package and deployment build jobs are run. Now, I’m only giving you a high-level overview, but we cover all this and much more in detail with hands-on practices in our Visual Builder developer courses. Right now, I’m just trying to give you a sense of how these different tools are used. 07:52 Lois: Yeah, that makes sense, Joe. It’s a lot to cover in a short amount of time. Now, we’ve discussed the Visual Builder Designer IDE and workspace. But can you tell us more about Visual Builder Cloud Service and stand-alone environments? What are they used for? What features do they provide? Are they the same or different things? Joe: Visual Builder Cloud Service or Visual Builder Stand-Alone, as it’s sometimes called, is a service that Oracle hosts on its servers. It provides hosting for the deployed web application source code as well as database tables for business objects that we build and maintain to store your customer data. This data can come from XLS or CSV files, or even your own Oracle database customer table data. A custom REST proxy makes calls to external third-party REST services on your behalf and supports several popular authentication mechanisms. There is also integration with the Identity Cloud Service (IDCS) to manage users and their access to your web apps. Visual Builder Cloud Service is a for-fee product. You pay licensing fees for how much you use because it’s a hosted service. Visual Builder Studio, the project asset management aspect I discussed earlier, is free with a standard OCI license. Now, keep in mind these are separate from something like Visual Builder Design Time and the service that’s running in Fusion application environments. What I’m talking about now is creating stand-alone, bespoke, custom visual applications. These are applications that are built using industry-standard HTML5, CSS3, JavaScript, and JSON for metadata and are hosted on the Oracle servers.  09:30 Are you looking for practical use cases to help you plan and apply configurations that solve real-world challenges?  With the new Applied Learning courses for Cloud Applications, you'll be able to practically apply the concepts learned in our implementation courses and work through case studies featuring key decisions and configurations encountered during a typical Oracle Cloud Applications implementation. Applied learning scenarios are currently available for General Ledger, Payables, Receivables, Accounting Hub, Global Human Resources, Talent Management, Inventory, and Procurement, with many more to come!  Visit mylearn.oracle.com to get started. 10:12 Nikita: Welcome back! Joe, you said Visual Builder Cloud Service or Stand-Alone is a for-fee service. Is there a way I can learn about using Visual Builder Designer to build bespoke visual applications without a fee? Joe: Yes. Actually, we’ve added an option where you can run the Visual Builder Designer and learn how to create web apps without using the app hosting or the business object database that stores your customer data or the REST proxy for authentication or the Identity Cloud Service. So you don’t get those features, but you can still learn the fundamentals of developing with Visual Builder Designer. You can call third-party APIs, you can download the source, and run it locally, for example, in a Tomcat server. This is a great and free way to learn how to develop with the Visual Builder Designer. 10:55 Lois: Joe, I want to know more about the kinds of apps you can build in VB Designer and the capabilities that VB Cloud Service provides. Joe: Visual Builder Designer allows you to build custom, bespoke web applications made of interactive webpages; flows of pages for navigation; events that respond when things happen in the app, for example, GUI events like a button is clicked or values are entered into a text field; variables to store the state of the application and the ability to make REST calls, all from your browser. These applications have full access to the Oracle Fusion Applications APIs, given that you have the right security permissions and credentials of course. They can access your customer business data as business objects in our internally hosted database tables or your own customer database tables. They can access third-party APIs, and all these different data sources can appear in the same visual application, on the same page, at the same time. They use the Identity Cloud Service to identify which users can log in and authenticate against the application. And they all use the new Redwood graphical user interface components and page templates, so they have the same look and feel of all Oracle applications. 12:02 Nikita: But what if you’re building or extending Oracle Fusion Applications? Don’t things change a little bit? Joe: Good point, Niki. Yes. While you still work within Visual Builder Studio, that doesn’t change, VBS maintains your project and all your project-related assets, that is still the same. However, in this case, there is no separate hosted Visual Builder Cloud Service or Stand-Alone instance. In this case, Visual Builder is hosted inside of Fusion apps itself as part of the installation. I won’t go into the details of how the architecture works, but the Visual Builder instance that you’re running your code against is part of Fusion applications and is included in the architecture as well as the billing. All your code changes are maintained and stored within a single container called an extension. And this extension is a Git repository that is created for you, or you can create it yourself, depending on how you choose to work within Visual Builder Studio. You create an extension to hold the source code changes that provide a customization or configuration. This means making a change to an existing page or a set of pages or even adding new pages and flows to your Oracle Fusion Applications. You use Visual Builder Studio and Visual Builder Designer in a similar way as to how you would use them for bespoke stand-alone visual applications. 13:12 Lois: I’m trying to envision how this workflow is used. How is it different from bespoke VB app development? Or is it different at all? Joe: So, recall that the Visual Builder Designer is effectively the Integrated Development Environment, the IDE, where you make your code changes by working with both the raw HTML5, CSS3, and JavaScript code, if need be, or the Page Designer for drag and drop, and setting properties and then Live mode to test your work. You use a version of VB Designer to view and modify your customizations, and the code is stored in a Git repository called an extension. So, in that sense, the work of developing pages and flows and such is the same. You still start by creating or, more typically, joining a project and then either create a new extension from scratch or base it on an existing application, or go directly to the page that you want to edit and, on that page, select from your profile menu to edit in Visual Builder Studio. Now, this is a different lifecycle path from bespoke visual applications. With them, you’re not extending an app or modifying individual pages in the same way. You get a choice of which project you want to add your extension to when you’re working with Fusion apps and potentially which repository to store your customizations, unless one already exists and then it’s assigned automatically to hold your code changes. So you make your changes and edits to the portions of the application that have been opened for extensibility by the development team. This is another difference. Once you make your code changes, the workflow is pretty much the same as for a bespoke visual application: do your development work, commit your changes, push your changes to the remote branch. And then typically, your code is reviewed and if the code passes and is approved, it’s merged with the main branch. Then, the package and deploy jobs run to deploy the main code to the production environment or whatever environment you’re targeting. And once the package and deploy jobs complete, the code base is updated and users who log in see the changes that you’ve made. 15:03 Nikita: You mentioned creating apps that combine data from Fusion cloud, applications, customer data, and third-party APIs into one page. Why is it necessary? Why can’t you just do all that in one Fusion Applications extension? Joe: When you create extensions, you are working within the Oracle Fusion Applications ecosystem, that’s what they actually call it, which includes a defined a set of users who have been predefined and are, therefore, known to Fusion Applications. So, if you’re a user and you’re not part of that Fusion Apps ecosystem, you can’t access the pages. Period. That’s how Fusion Apps works to maintain its security and integrity. Secondly, you’re working pretty much solely with the Fusion Applications APIs data sources coming directly from Fusion Applications, which are also available to you when you’re creating bespoke visual apps. When you’re working with Fusion Applications in Visual Builder, you don’t have access to these business objects that give you access to your own customer database data through Visual Builder-generated REST APIs. Business objects are available only to bespoke visual applications in the hosted VB Cloud Service instance. So, your data sources are restricted to the Oracle Fusion Applications APIs and some third-party APIs that work within a narrow set of authentication mechanisms currently, although there are plans to expand this in the future. A mashup app that allows you now to access all these data sources while creating apps that leverage the Redwood Component System, so they look and work like Fusion Apps. They’re a highly popular option for our partners and customers. 16:28 Lois: So, to review, we have two different approaches. You can create a visual application using the for-fee, hosted Visual Builder Cloud Service/Stand-Alone or the one that comes with Oracle Integration Cloud, or you can use the extension architecture for Fusion applications, where you use the designer and create your extensions, and the code is delivered and deployed to Fusion applications code. You haven’t talked about JET yet though, Joe. What is that? Joe: So, JET is an abbreviation. It stands for Oracle JavaScript Extension Toolkit and JET is the underlying technology that makes Visual Builder, visual applications, and Visual Builder Extensions for Fusion Applications possible. Oracle JavaScript Extension Toolkit provides a module-based, open-source toolkit that leverages modern JavaScript, TypeScript, CSS3, and HTML5 to deliver web applications. It’s targeted at JavaScript developers working on client-side applications. It is not for backend development.  It’s a collection of popular, powerful JavaScript libraries and a set of Oracle-contributed JavaScript libraries that make it very simple, easy, and efficient to build front-end applications that can consume and interact with Oracle products and services, especially Oracle Cloud services, but of course it can work with any type of third-party API. 17:44 Nikita: How are JET applications architected, Joe, and how does that relate to Visual Builder pages and flows? Joe: The architecture of JET applications is what’s called a single page architecture. We’ve all seen these. These are where you have a single webpage—think of your index page that provides the header and footer for your webpage—and then the middle portion or the middle content of the page, represented by modules, allow you to navigate from one page or module to another. It also provides the data mapping so that the data elements in the variables and the state of the application, as well as the graphical user interface elements that provide the fields and functionality for the interface for the application, these are all maintained on the client side. If you’re working in pure JET, then you work with these modules at the raw JavaScript code level. And there are a lot of JavaScript developers who want to work like this and create their custom applications from the code up, so to speak. However, it also provides the basis for Visual Builder visual applications and Fusion Apps visual extensions in Visual Builder. 18:41 Lois: How does JET support VB Apps? You didn’t talk much about having to write a bunch of JavaScript and HTML5, so I got the impression that this is all done for you by VB Designer? Joe: Visual Builder applications are composed of HTML5, CSS3, and JavaScript code that is usually generated by the developer when she drags and drops components on to the page designer canvas or sets properties or creates action chains to respond to events. But there’s also a lot of JavaScript object notation (JSON) metadata created at the time that describes the pages, the flows, the navigation, the REST services, the variables, their data types, and other assets needed for the app to function. This JSON metadata is translated at runtime using a large JavaScript extension toolkit library called the Visual Builder Runtime that runs in the browser and real time translates the metadata and other assets in the Visual Builder source code into JET code and assets, which are actually executed at runtime. And it’s very quick, very fast, very efficient, and provides a layer of abstraction between the raw JET code and the Visual Builder architecture of pages, flows, action chains for executing code and events to handle things that occur in the user interface, including saving the state in variables that are mapped to GUI components. For example, if you have an Input text component, you need to have a variable to store the value that was entered into that Input text component between page refreshes. The data can move from the Input text component to the variable, and from the variable to that Input text component if it’s changed programmatically, for example. So, JET manages binding these data values to variables and the UI components on the page. So, a change to a variable value or a change to the contents of the component causes the others to change automatically. Now, this is only a small part of what JET and the frameworks and libraries it uses do for the applications. JET also provides more complex GUI components like lists and tables, and selection lists, and check boxes, and all the sorts of things you would expect in a modern GUI application. 20:37 Nikita: You mentioned a layer of abstraction between Visual Builder Studio Designer and JET. What’s the benefit of working in Visual Builder Designer versus JET itself? Joe: The benefit of Visual Builder is that you work at a higher level of abstraction than having to get down into the more detailed levels of deep JavaScript code, working with modules, data mappings, HTML code, single page architecture navigation, and the related functionalities. You can work at a higher level, a graphical level, where you can drag and drop things onto a design canvas and set properties. The VB architecture insulates you from the more technical bits of JET. Now, this frees the developer to concentrate more on application and page design, implementing logic and business rules, and creating a pleasing workflow and look and feel for the user. This keeps them from having to get caught up in the details of getting this working at the code level. Now if needed, you can write custom JavaScript, HTML5, and CSS3 code, though much less than in a JET app, and all that is part of the VB application source, which becomes part of the code used by JET to execute the application itself. And yet it all works seamlessly together. 21:38 Lois: Joe, I know we have courses in JavaScript, HTML, and CSS. But does a developer getting ready to work in Visual Builder Designer have to go take those courses first or can they start working in VB Designer right away? Joe: Yeah, that question does often comes up: Do I need to learn JET to work with Visual Builder? No, you don’t. That’s all taken care for you in the products themselves. I don’t really think it helps that much to learn JET if you are going to be a VB developer. In some ways, it could even be a bit distracting since some of things you learn to do in JET, you would have to unlearn or not do so much because of what VB does it for you. The things you would have to do manually in code in JET are done for you. This is why we call VB a low code development tool. I mean, you certainly can if you want to, but I would spend more time learning about the different GUI components, page templates, the Visual Builder architecture — events, action chains, and the data provider variables and types. Now, I know JET myself. I started with that before learning Visual Builder, but I use very little of my JET knowledge as a VB developer. Visual Builder Designer provides a nice, abstracted, clean layer of modern visual development on top of JET, while leveraging the power and flexibility of JET and keeping the lower-level details out of my way. 22:49 Nikita: Joe, where can I go to get started with Visual Builder? Joe: Well, for more information, I recommend you take a look at our Develop Fusion Applications course if you’re working with Fusion Applications and Visual Builder Studio. The other course is Develop Visual Applications with Visual Builder Studio and that’s if you’re creating stand-alone bespoke applications. Both these courses are free. We also have a comprehensive course that covers JavaScript, HTML5, and CSS3, and while it’s not required that you take that to be successful, it can be helpful down the road. I would also say that some basic knowledge of HTML5, CSS3, and JavaScript will certainly support you and serve you well when working with Visual Builder. You learn more as you go along and you find that you need to create more sophisticated applications. I would also mention that a lot of the look and feel of the applications in Visual Builder visual applications and Fusion apps extensions and customizations come through JET components, JET styles, and JET variables, and CSS variables, so that’s something that you would want to pursue at some point. There’s a JET cookbook out there. You can search for Oracle JET and look for the JET cookbook and that’s a good introduction to all of that. 23:50 Nikita: We hope you enjoyed that conversation. To learn about some of the courses Joe mentioned, visit mylearn.oracle.com to get started. Lois: Before we wrap up, we’ve got a favor to ask. We’ve created a short survey to capture your thoughts on the podcast. It’ll only take a few minutes of your time. Just click the link in the show notes and share your feedback. We want to make sure we’re delivering the best experience possible so don't hesitate to let us know what's on your mind! Thanks for your support. Join us next week for another throwback episode. Until then, this is Lois Houston… Nikita: And Nikita Abraham, signing off! 24:30 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
24m
19/11/2024

Oracle AI in Fusion Cloud Human Capital Management

In this special episode of the Oracle University Podcast, Lois Houston and Nikita Abraham, along with Principal HCM Instructor Jeff Schuster, delve into the intersection of HCM and AI, exploring the practical applications and implications of this technology in human resources. Jeff shares his insights on bias and fairness, the importance of human involvement, and the need for explainability and transparency in AI systems. The discussion also covers the various AI features embedded in HCM and their impact on talent acquisition, performance management, and succession planning.  Oracle AI in Fusion Cloud Human Capital Management: https://mylearn.oracle.com/ou/learning-path/oracle-ai-in-fusion-cloud-human-capital-management-hcm/136722 Oracle Fusion Cloud HCM: Dynamic Skills: https://mylearn.oracle.com/ou/course/oracle-fusion-cloud-hcm-dynamic-skills/116654/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ Twitter: https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!  00:26 Lois: Hello and welcome to the Oracle University Podcast! I’m Lois Houston, Director of Innovation Programs here at Oracle University, and with me, is Nikita Abraham, Team Lead of Editorial Services. Nikita: Hi everyone! Last week’s conversation was all about Oracle Database 23ai backup and recovery, where we dove into instance recovery and effective recovery strategies. Today’s episode is a really special one, isn’t it, Lois? 00:53 Lois: It is, indeed, Niki. Of course, all of our AI episodes are special. But today, we have our friend and colleague Jeff Schuster with us. I think our listeners are really going to enjoy what Jeff has to share with us. Nikita: Yeah definitely! Jeff is a Principal HCM Instructor at Oracle University. He recently put together this really fantastic course on MyLearn, all about the intersection of HCM and AI, and that’s what we want to pick his brain about today. Hi Jeff! We’re so excited to have you here.  01:22 Jeff: Hey Niki! Hi Lois! I feel special already. Thanks you guys so much for having me. Nikita: You’ve had a couple of busy months, haven’t you?  01:29 Jeff: I have! It’s been a busy couple of months with live classes. I try and do one on AI in HCM at least once a month or so so that we can keep up with the latest/greatest stuff in that area. And I also got to spend a few days at Cloud World teaching a few live classes (about artificial intelligence in HCM, as a matter of fact) and meeting our customers and partners. So yeah, absolutely great week. A good time was had by me.  01:55 Lois: I’m sure. Cloud World is such a great experience. And just to clarify, do you think our customers and partners also had a good time, Jeff? It wasn’t just you, right? Jeff: Haha! I don’t think it was just me, Lois. But, you know, HCM is always a big deal, and now with all the embedded AI functionality, it really wasn’t hard to find people who wanted to spend a little extra time talking about AI in the context of our HCM apps. So, there are more than 30 separate AI-powered features in HCM. AI features for candidates to find the right jobs; for hiring managers to find the right candidates; skills, talent, performance management, succession planning— all of it is there and it really covers everything across the Attract/Grow/Keep buckets of the things that HR professionals do for a living. So, anyway, yeah, lots to talk about with a lot of people! There’s the functional part that people want to know about—what are these features and how do they work? But obviously, AI carries with it all this cultural significance these days. There’s so much uncertainty that comes from this pace of development in that area. So in fact, my Cloud World talk always starts with this really silly intro that we put in place just to knock down that anxiety and get to the more practical, functional stuff. 03:11 Nikita: Ok, we’re going to need to discuss the functional stuff, but I feel like we’re getting a raw deal if we don’t also get that silly intro. Lois: She makes a really good point.  Jeff: Hahaha! Alright, fair enough. Ok, but you guys are gonna have to imagine I’ve got a microphone and a big room and a lot of echo. AI is everywhere. In your home. In your office. In your homie’s home office. 03:39 Lois: I feel like I just watched the intro of a sci-fi movie. Jeff: Yeah. I’m not sure it’s one I’d watch, but I think more importantly it’s a good way to get into discussing some of the overarching things we need to know about AI and Oracle’s approach before we dive into the specific features, so you know, those features will make more sense when we get there?  03:59  Nikita: What are these “overarching” things?  Jeff: Well, the things we work on anytime we’re touching AI at Oracle. So, you know, it starts with things like Bias and Fairness. We usually end up in a pretty great conversation about things like how we avoid bias on the front end by making sure we don’t ingest things like bias-generating content, which is to say data that doesn’t necessarily represent bias by itself, but could be misused. And that pretty naturally leads us into a talk about guardrails. Nikita: Guardrails? Jeff:  Yeah, you can think of those as checkpoints. So, we’ve got rules about ingestion and bias. And if we check the output coming out of the LLM to ensure it complied with the bias and fairness rules, that’s a guardrail. So, we do that. And we do it again on the apps side. And so that’s to say, even though it’s already been checked on the AI side, before we bring the output into the HCM app, it’s checked again. So another guardrail.  04:58 Lois: How effective is that? The guardrails, and not taking in data that’s flagged as bias-generating? Jeff: Well, I’ll say this: It’s both surprisingly good, and also nowhere near good enough.  Lois: Ok, that’s as clear as mud. You want to elaborate on that?  Jeff: Haha! I think all it means is that approach does a great job, but our second point in the whole “standards” discussion is about the significance of having a human in the loop. Sometimes more than one, but the point here is that, particularly in HCM, where we’re handling some really important and sensitive data, and we’re introducing really powerful technology, the H in HCM gets even more important. So, throughout the HCM AI course, we talk about opportunities to have a human in the loop. And it’s not just for reviewing things. It’s about having the AI make suggestions, and not decisions, for example. And that’s something we always have a human in the loop for all the time. In fact, when I started teaching AI for HCM, I always said that I like to think of it is as a great big brain, without any hands.  06:00 Nikita: So, we’re not talking about replacing humans in HCM with AI.                                                                         Jeff: No, but we’re definitely talking about changing what the humans do and why it’s more important than ever what the humans do. So, think of it this way, we can have our embedded AI generate this amazing content, or create really useful predictions, whatever it is that we need. We can use whatever tools we want to get there, but we can still expect people to ask us, “Where did that come from?” or “Does this account for [whatever]?”. So we still have to be able to answer that. So that’s another thing we talk about as kind of an overarching important concept: Explainability and Transparency. 06:41 Nikita: I’m assuming that’s the part about showing our work, right? Explaining what's being considered, how it's being processed, and what it is that you're getting back. Jeff: That’s exactly it. So we like to have that discussion up front, even before we get to things like Gen and Non-Gen AI, because it’s great context to have in mind when you start thinking about the technology. Whenever we’re looking at the tech or the features, we’re always thinking about whether people are appropriately involved, and whether people can understand the AI product as well as they need to.  07:11 Lois: You mentioned Gen and Non-Gen AI. I’ve also heard people use the term “Classic AI.” And lately, a lot more about RAG and Agents. When you're teaching the course, does everybody manage to keep all the terminology straight? Jeff: Yeah, people usually do a great job with this. I think the trick is, you have to know that you need to know it, if that makes sense.  Lois: I think so, but why don’t you spell it out for us. Jeff: Well, the temptation is sometimes to leave that stuff to the implementers or product developers, who we know need to have a deep understanding of all of that. But I think what we’ve learned is, especially because of all the functional implications, practitioners, product owners, everybody needs to know it too. If for no other reason so they can have more productive conversations with their implementers. You need to know that Classic or Non-Generative AI leverages machine learning, and that that’s all you need in order to do some incredibly powerful things like predictions and matching. So in HCM, we’re talking about things like predicting time to hire, identifying suggested candidates for job openings, finding candidates similar to ones you already like, suggesting career paths for employees, and finding recommended successors. All really powerful matching stuff. And all of that stuff uses machine learning and it’s certainly AI, but none of that uses Generative AI to do that because it doesn’t need to. 08:38 Nikita: So how does that fit in with all the hype we’ve been hearing for a long time now about Gen AI and how it’s such a transformative technology that’s going to be more impactful than anything else? Jeff: Yeah, and that can be true too. And this is what we really lean into when we do the AI in HCM course live. It’s much more of a “right AI for the right job” kind of proposition. Lois: So, just like you wouldn’t use a shovel to mix a cake. Use the right tool for the job. I think I’ve got it. So, the Classic AI is what’s driving those kinds of features in HCM? The matching and recommendations?  Jeff: Exactly right. And where we need generative content, that’s where we add on the large language model capability. With LLMs, we get the ability to do natural language processing. So it makes sense that that’s the technology we’d use for tasks like “write me a job description” or “write me performance development tips for my employee”. 09:33 Nikita: Ok, so how does that fit in with what Lois was asking about RAG and Agents? Is that something people care about, or need to? Jeff: I think it’s easiest to think about those as the “what’s next” pieces, at least as it relates to the embedded AI. They kind of deal with the inherent limitations of Gen and Non-Gen components. So, RAG, for example - I know you guys know, but your listeners might not...so what’s RAG stand for? Lois & Nikita: Retrieval. Augmented. Generation. Jeff: Hahaha! Exactly. Obviously. But I think everything an HCM person needs to know about that is in the name. So for me, it’s easiest to read that one backwards. Retrieval Augmented Generation. Well, the Generation just means it’s more generative AI. Augmented means it’s supplementing the existing AI. And Retrieval just tells you that that’s how it’s doing it. It’s going out and fetching something it didn’t already have in order to complete the operation. 10:31 Lois: And this helps with those limitations you mentioned? Nikita: Yeah, and what are they anyway?  Jeff: I think an example most people are familiar with is that large language models are trained on this huge set of information. To a certain point. So that model is trained right up to the point where it stopped getting trained. So if you’re talking about interacting with ChatGPT, as an example, it’ll blow your doors off right up until you get to about October of 2023 and then, it just hasn’t been trained on things after that. So, if you wanted to have a conversation about something that happened after that, it would need to go out and retrieve the information that it needed. For us in HCM, what that means is taking the large language model that you get with Oracle, and using retrieval to augment the AI generation for the things that the large language model wouldn’t have had.  11:22 Nikita: So, things that happened after the model was trained? Company-specific data? What kind of augmenting are you talking about? Jeff: It’s all of that. All those things happen and it’s anything that might be useful, but it’s outside the LLM’s existing scope. So, let’s do an example. Let’s say you and Lois are in the market to hire someone. You’re looking for a Junior Podcast Assistant. We’d like the AI in HCM to help, and in order to do that, it would be great if it could not just generate a generic job description for the posting, but it could really make it specific to Oracle. Even better, to Oracle University.  So, you’d need the AI to know a few more things in order to make that happen. If it knows the job level, and the department, and the organization—already the job posting description gets a lot better. So what other things do you think it might need to know? 12:13 Lois: Umm I’m thinking…does it need to account for our previous hiring decisions? Can it inform that at all? Jeff: Yes! That’s actually a key one. If the AI is aware not only of all the vacancies and all of the transactional stuff that goes along with it (like you know who posted it, what’s its metadata, what business group it was in, and all that stuff)...but it also knows who we hired, that’s huge. So if we put all that together, we can start doing the really cool stuff—like suggesting candidates based not only on their apparent match on skills and qualifications, but also based on folks that we’ve hired for similar positions. We know how long it took to make those hires from requisition open to the employee’s first start date. So we can also do things like predicting time to hire for each vacancy we have with a lot more accuracy. So now all of a sudden, we’re not just doing recruiting, but we have a system that accounts for “how we do it around here,” if that makes any sense.  But the point is, it’s the augmented data, it’s that kind of training that we do throughout ingestion, going out to other sources for newer or better information, whatever it is we need. The ability to include it alongside everything that’s already in the LLM, that’s a huge deal.  13:31  Nikita: Ok, so I think the only one we didn’t get to was Agents. Jeff: Yeah, so this one is maybe a little less relevant in HCM—for now anyway. But it’s something to keep an eye on. Because remember earlier when I described our AI as having a great big brain but no hands?  Lois: Yeah... Jeff: Well, agents are a way of giving it hands. At least for a very well-defined, limited set of purposes. So routine and repetitive tasks. And for obvious reasons, in the HCM space, that causes some concerns. You don’t want, for example, your AI moving people forward in the recruiting process or changing their status to “not considered” all by itself. So going forward, this is going to be a balancing act. When we ask the same thing of the AI over and over again, there comes a point where it makes sense to kind of “save” that ask. When, for example, we get the “compare a candidate profile to a job vacancy” results and we got it working just right, we can create an agent. And just that one AI call that specializes in getting that analysis right. It does the analysis, it hands it back to the LLM, and when the human has had what they need to make sure they get what they need to make a decision out of it, you’ve got automation on one hand and human hands on the other...hand. 14:56 Have you mastered the basics of AI? Are you ready to take your skills to the next level? Unlock the potential of advanced AI with our OCI Generative AI Professional course and certification that covers topics like large language models, the OCI Generative AI Service, and building Q&A chatbots for real-world applications. Head over to mylearn.oracle.com to find out more. 15:26 Nikita: Welcome back! Jeff, you’ve mentioned the “Time to Hire” feature a few times? Is that a favorite with people who take your classes? Jeff: The recruiting folks definitely seem to enjoy it, but I think it’s just a great example for a couple of reasons. First, it’s really powerful non-generative AI. So it helps emphasize the point around the right AI for the right job. And if we’re talking about things in chronological order, it’s something that shows up really early in the hire-to-retire cycle. And, you know, just between us learning nerds, I like to use Time to Hire as an early example because it gets folks in the habit of working through some use cases. You don’t really know if a feature is going to get you what you need until you’ve done some of that. So, for example, if I tell you that Time to Hire produces an estimated number of days to your first hire. And you’re still Lois, and you’re still Niki, and you’re hiring for a Junior Podcast Assistant. So why do you care about time to hire? And I’m asking you for real—What would you do with that prediction if you had it?  16:29 Nikita: I guess I’d know how long it is before I can expect help to arrive, and I could plan my work accordingly. Jeff: Absolutely. What else. What could you do with a prediction for Time to Hire? Lois: Think about coverage? Jeff: Yeah! Exactly the word I was looking for. Say more about that.  Lois: Well, if I know it’s gonna be three months before our new assistant starts, I might be able to plan for some temporary coverage for that work. But if I had a prediction that said it’s only going to be two weeks before a new hire could start, it probably wouldn’t be worth arranging temporary coverage. Niki can hold things down for a couple of weeks. Jeff: See, I’m positive she could! That’s absolutely perfect! And I think that’s all you really need to have in terms of prerequisites to understand any of the AI features in HCM. When you know what you might want to do with it, like predicting the need for temp cover, and you’ve got everything we talked about in the foundation part of the course—the Gen and the Classic, all that stuff, you can look at a feature like Time to Hire and then you can probably pick that up in 30 seconds. 17:29 Nikita: Can we try it? Jeff: Sure! I mean, you know, we’re not looking at screens for this conversation, but we can absolutely try it. You’re a recruiter. If I tell you that Time to Hire is a feature that you run into on the job requisition and it shows you just a few editable fields, and then of course, the prediction of the number of days to hire—tell me how you think that feature is going to work when you get there. Lois: So, what are the fields? And does it matter? Jeff: Probably not really, but of course you can ask. So, let me tell you. Ready? The fields—they are these. Requisition Title, Location, and Education Level.  Nikita: Ok, well, I have to assume that as I change those things… like from a Junior Podcast Assistant to a Senior Podcast Assistant, or change the location from Redwood Shores to Detroit, or change the required education, the time to hire is going to change, right?  Jeff: 100%, exactly. And it does it in real time as you make those changes to those values. So when you pick a new location, you immediately get a new number of days, so it really is a useful tool. But how does it work? Well, we know it’s using a few fields from the job requisition, but that’s not enough. Besides those fields, what else would you need in order to make this prediction work? 18:43 Lois: The part where it translates to a number of days. So, this is based on our historic hiring data? How long it took us to hire a podcast assistant the last time? Jeff: Yep! And now you have everything you need. We call that “historic data from our company” bit “ingestion,” by the way. And there’s always a really interesting discussion around that when it comes up in the course. But it’s the process we use to bring in the HCM data to the AI so it can be considered or predictions exactly like this. Lois: So it’s the HCM data making the AI smarter and more powerful. Nikita: And tailored. Jeff: Exactly, it’s all of that. And obviously, the HCM is better because we’ve given it the AI. But the AI is also better because it has the HCM in it. But look, I was able to give you a quick description of Time to Hire, and you were able to tell me what it does, which data it uses, and how it works in just a few seconds. So, that’s kind of the goal when we teach this stuff. It’s getting everybody ready to be productive from moment #1 because what is it and how does it work stuff is already out of the way, you know?  19:52 Lois: I do know! Nikita: Can we try it with another one? Jeff: Sure! How about we do...Suggested Candidates. Lois: And you’re going to tell us what we get on the screen, and we have to tell you how it works, right? Jeff: Yeah, yeah, exactly. Ok—Suggested Candidates. You’re a recruiter or a hiring manager. You guys are still looking for your Junior Podcast Assistant. On the requisition, you’ve got a section called Suggested Candidates. And you see the candidate’s name and some scores. Those scores are for profile match, skills match, experience match. And there’s also an overall match score, and the highest rated people you notice are sorted to the top of the list. So, you with me so far?  Lois: Yes! Jeff: So you already know that it’s suggesting candidates. But if you care about explainability and transparency like we talked about at the start, then you also care about where these suggested candidates came from. So let’s see if we can make progress against that. Let’s think about those match scores. What would you need in order to come up with match scores like that? 20:54 Nikita: Tell me if I’m oversimplifying this, but everything about the job on the requisition, and everything about the candidate? Their skills and experience? Jeff: Yeah, that’s actually simplified pretty perfectly. So in HCM, the candidate profile has their skills and experience, and the req profile has the req requirements.  Lois: So we’re comparing the elements of the job profile and the person/candidate profile. And they’re weighted, I assume? Jeff: That’s exactly how it works. See, 30 seconds and you guys are nailing these! In fairness, when we discuss these things in the course, we go into more detail. And I think it’s helpful for HCM practitioners to know which data from the person and the job profiles is being considered (and sometimes just as important, which is not being considered). And don’t forget we’re also considering our ingested data. Our previously selected candidates. 21:45 Lois: Jeff, can I change the weighting? If I care more about skills than experience or education, can I adjust the weighting and have it re-sort the candidates? Jeff: Super important question. So let me give you the answer first, which is “no.” But because it’s important, I want to tell you more. This is a discussion we have in the class around Oracle’s Embedded vs. Custom AI. And they’re both really important offerings. With Embedded, what we’re talking about are the features that come in HCM like any other feature. They might have some enablement steps like profile options, and there’s an activation panel. But essentially, that’s it. There’s no inspection panel for you to open up and start sticking your screwdriver in there and making changes. Believe it or not, that’s a big advantage with Embedded AI, if you ask me anyway.  Nikita: It’s an advantage to not be able to configure it? Jeff: In this context, I think you can say that it is. You know, we talk about the advantages about the baked-in, Embedded AI in this course, but one of the key things is that it’s pre-built and pre-tested. And the big one: that it’s ready to use on day one. But one little change in a prompt can have a pretty big butterfly effect across all of your results. So, Oracle provides the Embedded AI because we know it works because we’ve already tested it, and it’s, therefore, ready on day one. And I think that story maybe changes a little bit when you open up the inspection panel and bust out that screwdriver. Now you’re signing up to be a test pilot. And that’s just fundamentally different than “pre-built and ready on day one.” Not that it’s bad to want configuration. 23:24 Lois: That’s what the Custom AI path and OCI are about though, right? For when customers have hyper-specific needs outside of Oracle’s business processes within the apps, or for when that kind of tuning is really required. And your AI for HCM course—that focuses on the Embedded AI instead of Custom, yes? Jeff: That is exactly it, yes. Nikita: You said there are about 30 of these AI features across HCM. So, when you teach the course, do you go through all of them or are there favorites? Ones that people want to spend more time on so you focus on those? Jeff: The professional part of me wants to tell you that we do try to cover all of them, because that explainability and transparency business we talked about at the beginning. That’s for real, so I want our customers to have that for the whole scope.  24:12  Nikita: The professional part? What’s the other part?  Jeff: I guess that’s the part that says sure, we need to hit all of them. But some of them are just inherently more fun to work on. So, it’s usually the learners who drive that in the live classes when they get into something, that’s where we spend the most time. So, I have my favorites too. The learners have their favorites. And we spend time where it’s everybody’s favorite. Lois: Like where? Jeff: Ok, so one is far from the most complex one, but I think it’s really elegant in its simplicity. And it’s the Celebrate feature, where we do employee recognition. There’s an AI Assist available there. So when it’s time to recognize a colleague, you just need to enter the headline or the title, and the AI takes it from there and just writes up the recognition. 24:56 Lois: What about that makes it a good example, Jeff? You said it’s elegant. What do you mean?  Jeff: I think it’s a few things. So, start with the prompt. It’s just the one line—just the headline. And that’s your one input. So, type in the headline, get the recognition below. It’s a great demonstration of not just the simplicity, but the power we get out of that simplicity. I always ask it to recognize my employees for implementing AI features in Oracle HCM, just to see what it comes up with. When it tells the employee that they’re helping the company by automating routine tasks, bringing efficiency to the HR department, and then launches into specific examples of how AI features help in HCM, it really is pretty incredible. So, it’s a simple demo, but it explains a lot about how the Gen AI works. Lois: That’s really cool. 25:45 Nikita: So this one is generative AI. It’s using the large language model to create the recognition based on the prompt, which is basically just whatever you entered in the headline. But how does that help explain how Gen AI works in HCM? Jeff: Well, let’s take our simple prompt for example. There’s a lot happening behind the scenes. It’s taking our prompt, it’s doing its LLM thing, but before it’s done, it’s creating the results in a very specific way. An employee recognition reads really differently than a job description. So, I usually describe this as the hidden part of our prompt. The visible part is what we typed. But it needs to know things like our desired output format. Make sure to use the person’s name, summarize the benefits, and be sure to thank them for their contribution, that kind of stuff. So, those things are essentially hard-coded into the page. And that’s to say, this is another area where we don’t get an inspection panel that lets us go in and tweak the prompt.  26:42 Nikita: And that’s generally how generative AI works? Jeff: Pretty much. Wherever you see an AI Assist button in HCM, that’s more or less what’s going on. And so when you get to some of the other more complex features, it’s helpful to know that that is what’s going on.  Lois: Like where? Jeff: Well, it works that way for the About Me part of your employee profile, for goal creation in performance, and I think a really great example is in performance, where managers are providing the competency development tips. So the prompt there is a little more complex there because it involves the employee’s proficiency rating instead of free text. But still, pretty straightforward. You’re gonna click AI Assist and it’s gonna generate all the development tips for any specific competency listed for that employee. Good development tips. Five of them. Nicely formatted with bullet points. And these aren’t random words assembled by an AI. So they conform to best practices in the development of competencies. So, something is telling the LLM to give us results that are that good, in that particular way.  So, it’s just another good example of the work AI is doing while protected behind the inspection panel that doesn’t exist. So, the coding of that page, in combination with what the LLM generates and the agent that it uses, is what produces the result. That’s generally the approach. In the class, we always have a good time digging into what must be going on behind that inspection panel. Generally speaking, the better feel we have for what’s going on on these pages, the better we’re able to get the results we want, even without having that screwdriver out. 28:21  Nikita: So it’s time well-spent, looking at all the individual features? Jeff: I think so, especially if you’re anticipating really using any of them. So, the good news is, once you learn a few of them and how they work, and what they’re best at, you stop being surprised after a while. But there are always tips and tricks. And like we talked about at the top, explainability and transparency are absolutely key. So, as much as I’m not a fan of the phrase, I do think this is kind of a “knowledge is power” kind of situation. 28:51 Nikita: Sadly, we’re just about out of time for this episode.  Lois: That’s too bad, I was really enjoying this. Jeff, you were just talking about knowledge—where can we get more?  Jeff: Well, like you mentioned at the start, check out the AI in HCM course on MyLearn. It’s about an hour and a half, but it really is time well spent. And we get into detail on everything the three of us discussed here today, and then we have demoscussions of every feature where we show them and how they work and which data they’re using and a whole bunch more. So, there’s that. Plus, I hear the instructor is excellent. Lois: I can vouch for that! Jeff: Well, then you should definitely look into Dynamic Skills. Different instructor. But we have another course, and again I think about an hour and a half, but when you’re done with the AI course, I always feel like Dynamic Skills is where you really wanna go next to really flesh out all the Talent Management ideas that got stirred up while you were having a great time in the AI course.  And then finally, the live classes. It’s always really fun to take live questions while we talk about AI in HCM.   29:54 Nikita: Thanks, Jeff! This has been really interesting.  Lois: Yeah, thanks for being here, Jeff. We’ve loved having you on. Jeff: Thank you guys so much for having me. It’s been a pleasure.  Lois: If you want to learn more about what we discussed, go to the show notes for today’s episode. You’ll find links to the AI for Human Capital Management and Dynamic Skills courses that Jeff mentioned so you can check them out. You can also head over to mylearn.oracle.com to find the live sessions for MyLearn subscribers that Jeff conducts. Nikita: Join us next week as we kick off our “Best of 2024” season, where we’ll be revisiting some of our most popular episodes of the year. Until then, this is Nikita Abraham…  Lois: And Lois Houston, signing off!   30:35 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
31m
12/11/2024

Oracle Database 23ai: Backup and Recovery - Part 2

Lois Houston and Nikita Abraham continue their deep dive into Oracle Database 23ai backup and recovery strategies with Senior Principal Database & MySQL Instructor Bill Millar.   Picking up from Part 1, they explore critical concepts such as instance recovery, checkpoint processes, and the role of redo log files. Bill shares insights into complete and incomplete recovery, flashback technologies, and lots more.   Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-backup-and-recovery/141127/   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   Twitter: https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 
00:26 Nikita: Welcome back to the Oracle University Podcast! I’m Nikita Abraham, Team Lead of Editorial Services with Oracle University, and with me is Lois Houston, Director of Innovation Programs. Lois: Hi everyone! Last week, we had a fantastic chat with Bill Millar, our Senior Principal Database & MySQL Instructor. We dug into the basics of backup and recovery. We touched on everything from a DBA’s role in preventing data loss to handling different types of failures, and even some common mistakes that tend to pop up when managing a database. Nikita: Yeah, if you missed that episode, definitely go back and check it out. It’s packed with useful info, especially if you’re in charge of keeping databases safe. 01:10 Lois: Today, we’re picking up where we left off. We’re going to ask Bill about instance recovery and recovery strategies. Bill, can you kick things off by explaining what instance recovery is? Bill: You can understand instant recovery by becoming familiar with the checkpoint process, the redo log files, and the role of the log writer with the redo log files. Automatically instance or crash recovery. What is it doing? What are the phases of instance recovery? How we possibly can tune that instance recovery. We can use the mean time to recovery advisor that can help us determine how we might tune the instance recovery. 01:51 Nikita: OK, so let’s go through some of these concepts and procedures you mentioned. What is the checkpoint process responsible for exactly? Bill: The checkpoint process itself, it's responsible for updating the data file headers with checkpoint information. When a checkpoint is taken, it is going to write into the controlfiles. It tells the DB writer to write. DB writer writes to the data files, and the checkpoint is also annotated in the data files. So updating controlfiles with that checkpoint information also, controlfiles and database files. It signals that DB writer at full check points again, hey, it's time to write. So that way, it has the latest data written to the data files. The controlfile and datafiles, those are in sync with that. 02:40 Lois: Bill, what about the log writer process and the redo log files? Bill: With the log writer process and the redo log files, the redo log files record the changes to the database himself. It should be multiplexed. 02:53 Nikita: What do you mean by that? Bill: More than one redo log group. Now, the redo log groups, it is recommended that they should be multiplexed. Each group member should be on a different disk or in a different disk group if you're using ASM. 03:10 Nikita: And why is that, Bill? Bill: Because if I lose one, if I lose one redo log group, one member, I can continue to operate with just the one. If I only have one redo log group member and the system comes around and tries to write to it, then my system is going to come to a halt. So the log writer is going to write to those redo logs whenever somebody does a commit. When that redo log buffer is 1/3 full or every three seconds and before DB writer writes. So those are the four mechanism that tells log writer to write from that log buffer to the redo log files. And it'll also write, when we do a shut down, all the buffers will be flushed. And so that way, everything will be in sync when the system is shut down. 04:01 Lois: What are the different modes of operation for a database, Bill? And how do these modes impact the recovery capabilities of the database? Bill: So we have two different modes we can operate in. One is called NOARCHIVELOG mode. It is the default. ARCHIVELOG mode, highly encouraged. But not every environment has to be in ARCHIVELOG mode. 04:21 Nikita: So with ARCHIVELOG mode… Bill: Closed database. You have to close it, recover to the last backup. That's as far as I can go. Actually, I could, depending on what happens, I might be able to apply some redo. Suitable for training and test environments or for data warehouses, we don't have a lot of frequent changes. It's mainly bulk loading data at night and querying during the day. So it might be appropriate for that. Because ARCHIVELOG mode, it is a little overhead. Yes. So with that database, it goes down while it's open. The system, when it comes up, it can recover to the last committed transaction. And this is usually the mode we want to operate in for production environments. So we have that data in the buffer cache. We have that redo being buffered. We have the undo tablespace, keeping track of what the data was before a change. The redo keeps track of what was the change. And if we're in ARCHIVELOG mode, as we switch from one redo log to another, we will generate what's referred to as archived log files, and that's what allows us to do a complete recovery. 05:33 Lois: What happens in the case of automatic instance recovery? Bill: For an automatic instance recovery or crash recovery, our system went down unexpectedly. Because it did not do a clean shutdown, the buffers were not flushed. Everything was not synchronized. So the datafile, controlfile, everything is out of sync. 05:53 Nikita: So, how do the files get synchronized then? Bill: It uses the redo log groups to synchronize the files. It's going to roll forward. It rolls forward the changes that were made. So due to different distinct operations. Roll forward applies committed and uncommitted data. And the redo does not keep track of what was committed and uncommitted. It'll keep track of, hey, I had this transaction, hey, here's a commit for that transaction. But hey, I have a transaction. That was never uncommitted. That's the job of the undo. But rolls forward all those changes. And then anything that did not actually receive a commit, it will roll back the uncommitted data, return to the original state. And that is the job of the undo tablespace. 06:37 Lois: Bill, is it possible to tune instance recovery for better performance? Bill: You can try to tune this instance recovery. Tuning it is touchy. Be careful because you can cause more harm than what you think you might be doing good. The instance recovery, what we're doing, we're trying to-- the transactions between checkpoints. When was the last checkpoint? Because the items between the checkpoints, that's what has to be reapplied. So the last checkpoint to the last redo log, what is that time frame there between those? Well, what we're going to do, we're going to try to control that. We're going to try to control the difference between the checkpoint and the end of the redo log. There is a mean-time recovery advisor. You specify the desired times in seconds or minutes that how often you want that checkpoint to occur. There is a parameter, FAST_START_MTTR parameter that you can set. The default value is zero saying, hey, I'm going to let the system take care of it. And the maximum you can set it is to one hour. 07:46 Nikita: And why 1 hour?
 Bill: The reason being, if I set that to one hour and I have a lot of activity, how long is it going to take? How many transactions can happen within that hour? Yeah, I'm not doing a checkpoint as often, so I'm eliminating that workload. But if it has to recover, how long is it going to take? If I set it too small, the system says, hey, right now, it's going to take me 19 seconds based off statistics. If I said, OK, I want it in five seconds. So what does that mean? Every five seconds, I'm saying do a checkpoint. So what is it doing? OK, time to do a checkpoint. OK, time to go ahead and OK, DB writer write. OK, log writer write. OK, let me update the datafiles and the controlfiles. So you're just thrashing your system. So be careful if you decide to try to manually tune it. And when you go out and look at this mean time to recover, and even if you do it through the command line, you'll see that, that value is most likely going to change throughout the day, depending on the workload that you have. 08:46 Lois: How does the process of restoring and recovering data typically work? Bill: So when we restore, we're restoring our datafiles. All the datafiles, tablespace, controlfiles, archived redo log, server parameter file. Then when we recover, it involves depending on the backup that we use and other factors in there, it is going to apply the redo. So automatically done by RMAN. So I tell it, this is what I want to do. Hey, I want to restore a database. OK, RMAN says, all right, what backup are you going to use? What is it I need to restore? And then we tell it to recover. OK, I know what I need to use to recover. So RMAN can do the work for you. So when we restore and recover due to a manual process and there's different methods that we can use, and depending on the failure, we'll drive what type of restore and recovery we might perform. 09:40 Are you looking for practical use cases to help you plan and apply configurations that solve real-world challenges? With the new Applied Learning courses for Cloud Applications, you'll be able to practically apply the concepts learned in our implementation courses and work through case studies featuring key decisions and configurations encountered during a typical Oracle Cloud Applications implementation. Applied learning scenarios are currently available in General Ledger, Payables, Receivables, Accounting Hub, Global Human Resources, Talent Management, Inventory, and Procurement, with many more to come! Visit mylearn.oracle.com to get started. 10:22 Nikita: Welcome back! Can you talk about the different types of recovery scopes, Bill? How do they compare? Bill: Recovery can have two kinds of scope. All right. One is the complete recovery. We are getting the database back to the current time of the crash with no loss of data. We're going to again bring everything back to the present. Incomplete or point-in-time recovery. We're going to take a database or maybe a tablespace or even a table back to a point-in-time in the past. So from the time that we select to take it to recover, everything that was done after that is null and void, is gone missing. That's why it's called incomplete recovery, because it's not complete. 11:09 Lois: What are the steps that take place during complete recovery? Bill: We restore the datafiles. Changes are applied. We're applying the redo. The datafiles contained committed and uncommitted transactions. The undo is applied. Anything that did not receive an actual commit will take back to the original value. And we have our datafiles recovered. 11:33 Nikita: And what about point-in-time recovery? Bill: Point-in-time recovery, very similar. We're going to restore the datafiles from as far back as necessary. Changes are applied. So the data files are going to contain the committed and uncommitted up to that point-in-time. Database is open, that redo, that undo, anything that did not actually receive a commit. The undo is applied. The point-in-time recovered is complete. We're not applying all the redo, all the changes, only up to the time that we specify. 12:08 Lois: Are there any features that can make point-in-time recovery quicker? Bill: We also have the ability to use flashback database. It is an optional feature. And it can be a quick way to do that point-in-time recovery. It is an alternative to that database point-in-time recovery we just looked at. Faster. No restore is required. It's going to rewind the database. It does require some configuration in the environment. We do have to set up in order to use flashback database. 12:41 Nikita: I want to talk about Oracle’s data protection solutions, particularly when it comes to backup and recovery or disaster recovery. Bill: So for physical data protection-- backup and recovery objective. Yep, that works for both physical and logical. My recovery time, hours to days. Possibly minutes to hours for the logical. And Oracle solution, we have the Recovery Manager that's out of the box, RMAN. Oracle Secure Backup, that is Oracle's media management library system backing up to tape. The logical protection, yes, flashback technologies can help me take care of that very easily. For disaster recovery, physical data protection, recovery time objective, seconds to minutes. We're not going to accomplish that with RMAN. You're going to want to use our Data Guard with the Active Data Guard feature to be able to switch over to a standby database within seconds of a failure. 13:41 Lois: Why would someone choose to use flashback technologies for recovery, Bill? Bill: With the flashback technologies, we can use it for viewing data as past dates. What did it look like? We can wind the database back and forth in time. Assist users in an error analysis and recovery, because we have different technologies. This flashback query, version query, transaction query, those allow me to view what was the value of a row at a time. I can even see what were the changes to a row over a period of time? I can also view the query that caused that change. For error recovery, I can back out a transaction. I can take a table back to a non-current time. I can also flashback a table that was dropped. And I can also take an entire database by using flashback. So the different recovery options I might have with the flashback technology. 14:44 Lois: Thank you so much, Bill. These last two episodes have been so insightful, right Niki? Nikita: I couldn’t agree more, Lois! If you want to know more about backup and recovery configuration and other concepts, visit mylearn.oracle.com and search for the Oracle Database 23ai: Backup and Recovery course. Our upcoming episode is a very special one, where we’ll be discussing Oracle AI in Fusion Cloud Human Capital Management. So, watch out for that! Until next week, this is Nikita Abraham… Lois: And Lois Houston, signing off!
 15:16 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
15m
05/11/2024

Oracle Database 23ai: Backup and Recovery - Part 1

In this two-part special, Lois Houston and Nikita Abraham delve into the critical topic of backup and recovery in Oracle Database 23ai.   Together with Bill Millar, Senior Principal Database & MySQL Instructor, they discuss the role of database administrators, strategies for protecting data, and dealing with various types of data failure.   Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-backup-and-recovery/141127/   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   Twitter: https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:26 Nikita: Welcome to the Oracle University Podcast! I’m Nikita Abraham, Team Lead of Editorial Services with Oracle University, and with me is Lois Houston, Director of Innovation Programs. Lois: Hi there! For the last two weeks, we’ve been having really exciting discussions on Oracle AI Vector Search. We covered the fundamentals, benefits, the vector workflow, and lots more. Today, we’re going to talk about backup and recovery in Oracle Database 23ai with Bill Millar. If you’ve been listening this season, you’ll know that Bill is a Senior Principal Database & MySQL Instructor with Oracle University. Nikita: In this two-part special, we’ll dive into some of the things you need to know about backup and recovery, especially if you’re a database and backup admin. So, if you're the person in charge of keeping data safe and handling disaster recovery, this is definitely worth your time. 01:20 Lois: That’s right, Niki. Hi Bill, thanks for joining us again. What’s the role of a Database Administrator, or DBA, when it comes to backup and recovery? Bill: The DBA is typically responsible for ensuring the database is open and available when needed and at times you need to work with system administrators and other people within your organization to achieve that. But we want to try to protect the database from failure wherever possible. We want to increase the mean time between failures. Hopefully, we don't have failures, and we have to increase that time. But it might mean that we need to ensure we have redundant hardware and that in place, again, maybe out of the realm of the DBA, but people within your organization can help with that. We want to protect those critical components by using the redundancy. And we want to decrease the mean time to recover. Failures happen, but how fast can we get access back to that data after that failure. The faster we can do it, the happier customers are. Minimize the loss of data. It's never good to lose data, especially in a critical environment, but maybe in test and development, maybe not so bad.  02:39 Nikita: How do we ensure a separation of duties for backup and recovery processes? Bill: For a separation of duties, we do have a user called SYSBACKUP. It has the privileges that's required to perform backup and recoveries, the privilege to connect and execute the commands in what we refer to as RMAN, our Recovery Manager. As I said, it has permissions for backup and recovery because you do need to shut down the database, start up the database, those type of things. We're able to connect to that closed database to try to troubleshoot it, to get it to the open state again. It does not include any privileges to access data. The SYSBACKUP user is created when we install the database, when we create the database. We can use it explicitly for privileged user connection. It allows us to connect to the database. So RMAN connects as SYSBACKUP. 03:37 Lois: Bill, what should people keep in mind when figuring out what’s considered critical data? Bill: You want to try to identify your critical data. Some data might be highly required to access and make sure we don't lose don't lose data, but then you might have some environments. OK, I don't need to have them up and running as fast. If we lose a little data, it may not hurt, but we want to identify the difference in the different data that we have on different environments. So we want to also prioritize that critical data, which data do we need access to first because how much will the company lose per hour of downtime because we can't do business. We want to make sure the access data protection requirements. Not everybody has access to everything. And there are different types of disaster that can happen that are going to be totally out of your control. There's the physical disaster, a hurricane or tornado, outages, power outages, component failures, failures within the building itself, corruption of data because of some of these failures. And then, the most dreaded one, the one that happens most often, usually those human errors, the logical errors, where the data is just bad, we are able to access and everything. It's just that something has changed that shouldn't have been changed. We want to make sure we access our recovery requirements.  05:04 Lois: So, what are they? What are those requirements? Bill: We want to base that requirement based on how critical is that data, how soon do we need to have access to that? What is our recovery point objective? Do we have a tolerance for any type of data loss? How frequently should we backups? How often they should be taken? What type of backups will be another thing we'll want to figure out? Is point-in-time recovery required? Are we able to or do we ever need to go back to a previous point in time to do something? It's not always just recovery for a database failure. We might need to do a recovery point in time to a different system so we can investigate something. What is my recovery time objective? Again, what is the tolerance for the downtime? How long can I be down? The downtime, the biggest part of when a system goes down is trying to identify what is the problem, then next is what is going to be my plan to recover, and then perform in the recovery. We might have a tiered required time objective based off of critical data, and then depending on the failure. Is that failure at the entire database? Is it just a tablespace? Is it just a table? Is it just a row? That also determines how long it takes to recover and what type of recovery we might try to perform. What is my backup retention policy? Do I have a requirement to where I have to have my backups off site? And it doesn't mean like back in the old days of mainframe computers, you'd back up to tape and you'd take those tapes off site. You might still do that today. Or, am I backing up to a cloud environment? So what do I need to have for that? What about long-term backups? We work with our day-to-day backups, but there's those backups that require for longer, archives like end of year backups. Some places require to keep their end of year backup for like 10 years. How are we going to handle that? So these are some of the things that we have to think about when we start talking about backup and recovery. 07:23 Did you know that the Oracle University Learning Community regularly holds live events hosted by Oracle expert instructors. Find out how to prepare for your certification exams. Learn about the latest technology advances and features. Ask questions in real time and learn from an Oracle subject matter expert. From Ask Me Anything about certification to Ask the Instructor coaching sessions, you’ll be able to achieve your learning goals for 2024 in no time. Join a live event today and witness firsthand the transformative power of the Oracle University Learning Community. Visit mylearn.oracle.com to get started. 08:04 Nikita: Welcome back! Bill, I want to talk about the different failures that can occur in an Oracle database. How would you categorize them? Bill: There are different category of failure. This is not an all-inclusive list by any means. It's just something that possibly can happen. So they can usually be divided into different categories like statement failure. All right. When doing a select and insert, update, delete, the statement itself fails. A user process fails. Single database session fails for some reason. Network failure, connectivity is lost. The user error, probably one of the most common ones we have to deal with. A user successfully completes an operation, but that operation was erroneous. They dropped the wrong table, updated the wrong row. Then there's the instance failure. The database itself shuts down unexpectedly. And then media failure, usually a hard failure of our disk. Something of memory, something failed and caused an error. 09:12 Lois: Ok. I want to dive a little deeper into each of these categories that you mentioned. Let’s start with statement failures. What are typical problems that one might face? Bill: Attempts to enter invalid data into a table. They're trying to put a numeric field in a date field, and usually just working with the user is going to correct that. Is that the DBA responsible? Yes, no, maybe. They attempt to form operations with insufficient privileges. Attempts to allocate space that fails, well, that depends on are they going-- do they have unlimited storage or do they have a limit? Logic errors in the application. Well, that's where we're going to have to work with those developers to try to correct those type of errors. 09:59 Nikita: What about user process failures? Bill: User performs an abnormal disconnect, doesn't close out properly. It can cause something to hang up or even possibly erroneous data to be updated. A user session is abnormally terminated. Well, usually, we don't have to try to resolve those user type errors, but something we might need to look into. A user experiences a program error that terminates the session. Again, usually it's the application developers, but it's something as a DBA, we might want to keep an eye on. Is it the same person? Is it from the same location? Is it the same module within that application? Maybe there's some things we can help to identify what the possible problem can be. 10:43 Nikita: Bill, tell us about common issues that can lead to network failures. What can we do to mitigate these problems and ensure network resilience? Bill: The listener fails. Well, we can connect a backup listener and configure how it can connect time failover can work. A network interface card fails. Well, again, we're not the hardware people, but can we work with our network, our server team, whatever, to possibly have redundant network cards? The network connection fails itself. Can we configure a backup network connection? 11:18 Lois: And what about user errors? How can we recover from those types of scenarios? Bill: The user inadvertently deletes or modifies data. Well, we have some things we'll look at as far as like rollback a transaction along with the dependent transactions. Rewind that table back to where it should have been. You're also can use LogMiner. You can look at our redo logs to try to figure out where that bad transaction was. User drops a table inadvertently. Well, we can recover the table from the recycle bin if we have the recycle bin on or we may need to recover from a backup. 11:56 Nikita: What are common causes of instance failures, Bill? Bill: The dreaded power outage. Well, hopefully, we have some type of up system to keep us running, even if it's not for continuous operation. Maybe if it's just to allow us to gracefully take a system down. The dreaded hardware failure. If you have a way to predict a hardware failure, you can make a lot of money. Always happens at the most inopportune times. But then again, do we have redundant hardware? Do we have something in place to help allow us to continue to operate in case of a hardware failure? Failure of one of the critical background processes. Why did it fail? We can go out. We can look at our alert log, we have trace files. And then we have, you have the Enterprise Manager Cloud Control. We can do the same thing as looking at the alert log and trace files. But the Enterprise Manager Cloud Control gives us a GUI interface to allow us to do that. 12:53 Lois: Before we let you go, Bill, can you tell us about media and data failures? Bill: Failure of a disk drive, failure of a disk controller, deletion or corruption of a file needed for database operation, well, this is the dreaded media failure. So we're going to restore from a backup. If we need to move, we can move a data file to a different location. We can notify, hey, here's that new location. And then recover by applying any of the incremental backups, any of the redo to get it back to where it should be. And then we have the data failures. We can't access the component, missing data files at OS level. And maybe our system administrators deleted something thinking it wasn't needed, or maybe even a developer on a development type system. Don't have the right permissions. Tablespace is offline. Well, why is it offline? Did somebody took the wrong tablespace offline? We have physical corruptions, block checksum failures. It's inconsistent between the header and footer. Invalid block header field values, like all of them are zeroed out. Then we have the logical corruptions, inconsistent dictionary, corrupt row piece, the inconsistencies, a control file not synchronized with the data files, usually because we recovered something and didn't do it the right way. I/O failures, maybe we just exceeded the number of open files that we're allowed to have. Maybe it's just a network or an I/O error itself. And these are different types of failures that you might experience. Again, it's not an all-inclusive list. It's just a few examples. 14:41 Nikita: I know you said it’s not an all-inclusive list and you were just giving us a few examples, but that seemed quite thorough! Thank you so much, Bill, for walking us through all of that today! Lois: Yeah, I totally agree! Thanks Bill! For more on what we discussed today, visit mylearn.oracle.com. Search for the Oracle Database 23ai: Backup and Recovery course. Next week, we’ll get into instance recovery and recovery strategies. Until then, this is Lois Houston… Nikita: And Nikita Abraham, signing off! 15:15 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
15m
29/10/2024

Oracle AI Vector Search: Part 2

This week, Lois Houston and Nikita Abraham continue their exploration of Oracle AI Vector Search with a deep dive into vector indexes and memory considerations.   Senior Principal APEX and Apps Dev Instructor Brent Dayley breaks down what vector indexes are, how they enhance the efficiency of search queries, and the different types supported by Oracle AI Vector Search.   Oracle Database 23ai: Oracle AI Vector Search Fundamentals: https://mylearn.oracle.com/ou/course/oracle-database-23ai-oracle-ai-vector-search-fundamentals/140188/   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   Twitter: https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!   00:26 Nikita: Welcome back to the Oracle University Podcast! I’m Nikita Abraham, Team Lead of Editorial Services at Oracle University, and with me is Lois Houston, Director of Innovation Programs. Lois: Hi everyone! Last week was Part 1 of our discussion on Oracle AI Vector Search. We talked about what it is, its benefits, the new vector data type, vector embedding models, and the overall workflow. In Part 2, we’re going to focus on vector indices and memory. 00:56 Nikita: And to help us break it all down, we’ve got Brent Dayley back with us. Brent is a Senior Principal APEX and Apps Dev Instructor with Oracle University. Hi Brent! Thanks for being with us today. So, let’s jump right in! What are vector indexes and how are they useful? Brent: Now, vector indexes are specialized indexing data structures that can make your queries more efficient against your vectors. They use techniques such as clustering, and partitioning, and neighbor graphs. Now, they greatly reduce the search space, which means that your queries happen quicker. They're also extremely efficient. They do require that you enable the vector pool in the SGA. 01:42 Lois: Brent, walk us through the different types of vector indices that are supported by Oracle AI Vector Search. How do they integrate into the overall process? Brent: So Oracle AI Vector Search supports two types of indexes, in-memory neighbor graph vector index. HNSW is the only type of in-memory neighbor graph vector index that is supported. These are very efficient indexes for vector approximate similarity search. HNSW graphs are structured using principles from small world networks along with layered hierarchical organization. And neighbor partition vector index, inverted file flat index, is the only type of neighbor partition index supported. It is a partition-based index which balances high search quality with reasonable speed. 02:35 Nikita: Brent, you mentioned that enabling the vector pool in the SGA is a requirement when working with vector indexes. Can you explain that process for us? Brent: In order for you to be able to use vector indexes, you do need to enable the vector pool area. And in order to do that, what you need to do is set the vector memory size parameter. You can set it at the container database level. And the PDB inherits it from the CDB. Now bear in mind that the database does have to be balanced when you set the vector pool. 03:12 Lois: Ok. Are there any other considerations to keep in mind when using vector indices? Brent: Vector indexes are stored in this pool, and vector metadata is also stored here. And you do need to restart the database. So large vector indexes do need lots of RAM, and RAM constrains the vector index size. You should use IVF indexes when there is not enough RAM. IVF indexes use both the buffer cache as well as disk. 03:42 Nikita: And what about memory considerations? Brent: So to remind you, a vector is a numerical representation of text, images, audio, or video that encodes the features or semantic meaning of the data, instead of the actual contents, such as the words or pixels of an image. So the vector is a list of numerical values known as dimensions with a specified format. Now, Oracle does support the int8 format, the float32 format, and the float64 format. Depending on the format depends on the number of bytes. For instance, int8 is one byte, float32 is four bytes. Now, Oracle AI Vector Search supports vectors with up to 65,535 dimensions. 04:34 Lois: What should we know about creating a table with a vector column? Brent: Now, Oracle Database 23ai does have a new vector data type. The new data type was created in order to support vector search. The definition can include the number of dimensions and can include the format. Bear in mind that either one of those are optional when you define your column. The possible dimension formats are int, float 32, and float 64. Float 32 and float 64 are IEEE standards, and Oracle Database will automatically cast the value if needed. 05:18 Nikita: Can you give us a few declaration examples? Brent: Now, if we just do a vector type, then the vectors can have any arbitrary number of dimensions and formats. If we describe the vector type as vector * , *, then that means that vectors can have an arbitrary number of dimensions and formats. Vector and vector * , * are equivalent. Vector with the number of dimensions specified, followed by a comma, and then an asterisk, is equivalent to vector number of dimensions. Vectors must all have the specified number of dimensions, or an error will be thrown. Every vector will have its dimension stored without format modification. And if we do vector asterisk common dimension element format, what that means is that vectors can have an arbitrary number of dimensions, but their format will be up-converted or down-converted to the specified dimension element format, either INT8, float 32, or float 64. 06:25 Working towards an Oracle Certification this year? Take advantage of the Certification Prep live events in the Oracle University Learning Community. Get tips from OU experts and hear from others who have already taken their certifications. Once you’re certified, you’ll gain access to an exclusive forum for Oracle-certified users. What are you waiting for? Visit mylearn.oracle.com to get started.   06:52 Nikita: Welcome back! Brent, what is the vector constructor and why is it useful? Brent: Now, the vector constructor is a function that allows us to create vectors without having to store those in a column in a table. These are useful for learning purposes. You use these usually with a smaller number of dimensions. Bear in mind that most embedding models can contain thousands of different dimensions. You get to specify the vector values, and they usually represent two-dimensional like xy coordinates. The dimensions are optional, and the format is optional as well. 07:29 Lois: Right. Before we wrap up, can you tell us how to calculate vector distances? Brent: Now, vector distance uses the function VECTOR_DISTANCE as the main function. This allows you to calculate distances between two vectors and, therefore, takes two vectors as parameters. Optionally, you can specify a metric. If you do not specify a metric, then the default metric, COSINE, would be used. You can optionally use other shorthand functions, too. These include L1 distance, L2 distance, cosine distance, and inner product. All of these functions also take two vectors as input and return the distance between them. Now the VECTOR_DISTANCE function can be used to perform a similarity search. If a similarity search query does not specify a distance metric, then the default cosine metric will be used for both exact and approximate searches. If a similarity search does specify a distance metric in the VECTOR_DISTANCE function, then an exact search with that distance metric is used if it conflicts with the distance metric specified in a vector index. If the two distance metrics are the same, then this will be used for both exact as well as approximate searches. 08:58 Nikita: I was wondering Brent, what vector distance metrics do we have access to? Brent: We have Euclidean and Euclidean squared distances. We have cosine similarity, dot product similarity, Manhattan distance, and Hamming similarity. Let's take a closer look at the first of these metrics, Euclidean and Euclidean squared distances. This gives us the straight-line distance between two vectors. It does use the Pythagorean theorem. It is sensitive to both the vector size as well as the direction. With Euclidean distances, comparing squared distances is equivalent to comparing distances. So when ordering is more important than the distance values themselves, the squared Euclidean distance is very useful as it is faster to calculate than the Euclidean distance, which avoids the square root calculation. 09:58 Lois: And the cosine similarity metrics? Brent: It is one of the most widely used similarity metrics, especially in natural language processing. The smaller the angle means they are more similar. While cosine distance measures how different two vectors are, cosine similarity measures how similar two vectors are. Dot product similarity allows us to multiply the size of each vector by the cosine of their angle. The corresponding geometrical interpretation of this definition is equivalent to multiplying the size of one of the vectors by the size of the projection of the second vector onto the first one or vice versa. Larger means that they are more similar. Smaller means that they are less similar. Manhattan distance is useful for describing uniform grids. You can imagine yourself walking from point A to point B in a city such as Manhattan. Now, since there are buildings in the way, maybe we need to walk down one street and then turn and walk down the next street in order to get to our result. As you can imagine, this metric is most useful for vectors describing objects on a uniform grid such as city blocks, power grids, or perhaps a chessboard. 11:27 Nikita: And finally, we have Hamming similarity, right? Brent: This describes where vector dimensions differ. They are binary vectors, and it tells us the number of bits that require change to match. It compares the position of each bit in the sequence. Now, these are usually used in order to detect network errors. 11:53 Nikita: Brent, thanks for joining us these last two weeks and explaining what Oracle AI Vector Search is. If you want to learn more about what we discussed today, visit mylearn.oracle.com and search for the Oracle Database 23ai: Oracle AI Vector Search Fundamentals course.   Lois: This concludes our season on Oracle Database 23ai New Features for administrators. In our next episode, we’re going to talk about database backup and recovery, but more on that later! Until then, this is Lois Houston… Nikita: And Nikita Abraham signing off! 12:29 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
12m
22/10/2024

Oracle AI Vector Search: Part 1

In this episode, Senior Principal APEX and Apps Dev Instructor Brent Dayley joins hosts Lois Houston and Nikita Abraham to discuss Oracle AI Vector Search. Brent provides an in-depth overview, shedding light on the brand-new vector data type, vector embeddings, and the vector workflow.   Oracle Database 23ai: Oracle AI Vector Search Fundamentals: https://mylearn.oracle.com/ou/course/oracle-database-23ai-oracle-ai-vector-search-fundamentals/140188/   Oracle Database 23ai: SQL Workshop: https://mylearn.oracle.com/ou/course/oracle-database-23ai-sql-workshop/137830/   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   Twitter: https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode.   ---------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!   00:26 Lois: Hello and welcome to the Oracle University Podcast! I’m Lois Houston, Director of Innovation Programs here at Oracle University. Joining me as always is our Team Lead of our Editorial Services, Nikita Abraham. Nikita: Hi everyone! Thanks for tuning in over the last few months as we’ve been discussing all the Oracle Database 23ai new features. We’re coming to the end of the season, and to close things off, in this episode and the next one, we’re going to be talking about the fundamentals of Oracle AI Vector Search. In today’s episode, we’ll try to get an overview of what vector search is, why Oracle Vector Search stands out, and dive into the new vector data type. We’ll also get insights into vector embedding models and the vector workflow. 01:11 Lois: To take us through all of this, we’re joined by Brent Dayley, who is a Senior Principal APEX and Apps Development Instructor with Oracle University. Hi Brent! Thanks for joining us today. Can you tell us about the new vector data type? Brent: So this data type was introduced in Oracle Database 23ai. And it allows you to store vector embeddings alongside other business data. Now, the vector data type allows a foundation to store vector embeddings. 01:42 Lois: And what are vector embeddings, Brent? Brent: Vector embeddings are mathematical representations of data points. They assign mathematical representations based on meaning and context of your unstructured data. You have to generate vector embeddings from your unstructured data either outside or within the Oracle Database. In order to get vector embeddings, you can either use ONNX embedding machine learning models or access third-party REST APIs. Embeddings can be used to represent almost any type of data, including text, audio, or visual, such as pictures. And they are used in proximity searches. 02:28 Nikita: Hmmm, proximity search. And similarity search, right? Can you break down what similarity search is and how it functions? Brent: So vector data tends to be unevenly distributed and clustered into groups that are semantically related. Doing a similarity search based on a given query vector is equivalent to retrieving the k nearest vectors to your query vector in your vector space. What this means is that basically you need to find an ordered list of vectors by ranking them, where the first row is the closest or most similar vector to the query vector. The second row in the list would be the second closest vector to the query vector, and so on, depending on your data set. What we need to do is to find the relative order of distances. And that's really what matters rather than the actual distance. Now, similarity searches tend to get data from one or more clusters, depending on the value of the query vector and the fetch size. Approximate searches using vector indexes can limit the searches to specific clusters. Exact searches visit vectors across all clusters. 03:44 Lois: Ok. I want to move on to vector embedding models. What are they and why are they valuable? Brent: Vector embedding models allow you to assign meaning to what a word, or a sentence, or the pixels in an image, or perhaps audio. It allows you to quantify features or dimensions. Most modern vector embeddings use a transformer model. Bear in mind that convolutional neural networks can also be used. Depending on the type of your data, you can use different pretrained open source models to create vector embeddings. As an example, for textual data, sentence transformers can transform words, sentences, or paragraphs into vector embeddings. 04:33 Nikita: And what about visual data? Brent: For visual data, you can use residual network also known as ResNet to generate vector embeddings. You can also use visual spectrogram representation for audio data. And that allows us to use the audio data to fall back into the visual data case. Now, these can also be based on your own data set. Each model also determines the number of dimensions for your vectors. 05:02 Lois: Can you give us some examples of this, Brent? Brent: Cohere's embedding model, embed English version 3.0, has 1,024 dimensions. Open AI's embedding model, text-embedding-3-large, has 3,072 dimensions. 05:24 Want to get the inside scoop on Oracle University? Head over to the Oracle University Learning Community. Attend exclusive events. Read up on the latest news. Get first-hand access to new products. Read the OU Learning Blog. Participate in Challenges. And stay up-to-date with upcoming certification opportunities. Visit mylearn.oracle.com to get started.  05:50 Nikita: Welcome back! Let’s now get into the practical side of things. Brent, how do you import embedding models? Brent: Although you can generate vector embeddings outside the Oracle Database using pre-trained open source embeddings or your own embedding models, you also have the option of doing those within the Oracle Database. In order to use those within the Oracle Database, you need to use models that are compatible with the Open Neural Network Exchange Standard, or ONNX, also known as Onyx. Oracle Database implements an Onyx runtime directly within the database, and this is going to allow you to generate vector embeddings directly inside the Oracle Database using SQL. 06:35 Lois: Brent, why should people choose to use Oracle AI Vector Search? Brent: Now one of the biggest benefits of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data, all in one single system. This is very powerful, and also a lot more effective because you don't need to add a specialized vector database. And this eliminates the pain of data fragmentation between multiple systems. It also supports Retrieval Augmented Generation, also known as RAG. Now this is a breakthrough generative AI technique that combines large language models and private business data. And this allows you to deliver responses to natural language questions. RAG provides higher accuracy and avoids having to expose private data by including it in the large language model training data. 07:43 Nikita: In the last part of our conversation today, I want to ask you about the Oracle AI Vector Search workflow, starting with generating vector embeddings. Brent: Generate vector embeddings from your data, either outside the database or within the database. Now, embeddings are a mathematical representation of what your data meaning is. So what does this long sentence mean, for instance? What are the main keywords out of it? You can also generate embeddings not only on your typical string type of data, but you can also generate embeddings on other types of data, such as pictures or perhaps maybe audio wavelengths. 08:28 Lois: Could you give us some examples? Brent: Maybe we want to convert text strings to embeddings or convert files into text. And then from text, maybe we can chunk that up into smaller chunks and then generate embeddings on those chunks. Maybe we want to convert files to embeddings, or maybe we want to use embeddings for end-to-end search. Now you have to generate vector embeddings from your unstructured data, either outside or within the Oracle Database. You can either use the ONNX embedding machine learning models or you can access third-party REST APIs. You can import pre-trained models in ONNX format for vector generation within the database. You can download pre-trained embedding machine learning models, convert them into the ONNX format if they are not already in that format. Then you can import those models into the Oracle Database and generate vector embeddings from your data within the database. Oracle also allows you to convert pre-trained models to the ONNX format using Oracle machine learning for Python. This enables the use of text transformers from different companies. 09:51 Nikita: Ok, so that was about generating vector embeddings. What about the next step in the workflow—storing vector embeddings? Brent: So you can create one or more columns of the vector data type in your standard relational data tables. You can also store those in secondary tables that are related to the primary tables using primary key foreign key relationships. You can store vector embeddings on structured data and relational business data in the Oracle Database. You do store the resulting vector embeddings and associated unstructured data with your relational business data inside the Oracle Database. 10:30 Nikita: And the third step is creating vector indexes? Brent: Now you may want to create vector indexes in the event that you have huge vector spaces. This is an optional step, but this is beneficial for running similarity searches over those huge vector spaces. So once you have generated the vector embeddings and stored those vector embeddings and possibly created the vector indexes, you can then query your data with similarity searches. This allows for Native SQL operations and allows you to combine similarity searches with relational searches in order to retrieve relevant data. 11:15 Lois: Ok. I think I’ve got it. So, Step 1, generate the vector embeddings from your unstructured data. Step 2, store the vector embeddings. Step 3, create vector indices. And Step 4, combine similarity and keyword search. Brent: Now there is another optional step. You could generate a prompt and send it to a large language model for a full RAG inference. You can use the similarity search results to generate a prompt and send it to your generative large language model in order to complete your RAG pipeline. 11:59 Lois: Thank you for sharing such valuable insights about Oracle AI Vector Search, Brent. We can’t wait to have you back next week to talk about vector indices and memory. Nikita: And if you want to know more about Oracle AI Vector Search, visit mylearn.oracle.com and check out the Oracle Database 23ai: Oracle AI Vector Search Fundamentals course. Lois: Yes, and if you're serious about advancing in your development journey, we recommend taking the Oracle Database 23ai SQL workshop. It’s designed for those who might be familiar with SQL from other database platforms or even those completely new to SQL. Nikita: Yeah, we’ll add the link to the workshop in the show notes so you can find it easily. Until next week, this is Nikita Abraham… Lois: And Lois Houston signing off! 12:45 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
13m
15/10/2024

Automatic Transaction Quarantine

In this episode, Lois Houston and Nikita Abraham explore the Automatic Transaction Quarantine feature with Senior Principal Database & MySQL Instructor, Bill Millar. Bill explains that this feature isolates transactions that could potentially cause system crashes, preventing them from impacting the entire container database. They also discuss the key advantages of automatic transaction quarantine in maintaining database stability and availability.   Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-new-features-for-administrators/140830/   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X: https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:26 Nikita: Welcome to the Oracle University Podcast! I’m Nikita Abraham, Team Lead: Editorial Services with Oracle University, and with me is Lois Houston, Director of Innovation Programs. Lois: Hi there! In our last episode, we looked at an Oracle Database 23ai new feature called Automatic Transaction Rollback, and we spoke about why it is such an important feature for database administrators. 00:51 Nikita: Today, we’re going to talk about another new feature called Automatic Transaction Quarantine. We’ll discuss what it is, go through the steps to monitor and identify quarantine transactions, explore how an issue is resolved once a quarantined transaction has been identified, and end by looking at quarantined transaction escalation, and how it helps to protect not only your PDB, but also your container database. Lois: Back with us is Bill Millar, our Senior Principal Database & MySQL Instructor with Oracle University. Hi Bill! What is automatic transaction quarantine and why do we need it? 01:27 Bill: The good news is that starting in 23c with the database quarantines, it's going to isolate a transaction or transactions that could possibly cause a system crash, so you can avoid crashes. It's going to isolate those transactions that potentially could cause a problem. However, those transactions must be manually resolved by the DBA so that the row locks are released from those bad transactions. A transaction recovery basically is going to isolate failure and also identify what is the cause of that corruption. So when a system restarts, transaction can fail to recover while the other transactions can be recovered. So with the transaction recovery, basically, we know when the system recovers, the SMON is going to use the redo and the undo. 02:27 Nikita: Can you explain that in a little more detail? How does transaction recovery work and why is it so critical for database stability? Bill: It does the redo to roll forward the database. However, at that point, it'll go ahead and open the database, allow it to start being used while it is applying the undo. And when it cannot apply that undo, that's when the system is going to mark that transaction as bad for that. That is what is transaction recovery. Whereas instance recovery is basically the same thing, except now you're in a RAC environment. And it's unable to be recovered on one of the instances within your RAC environment. Because it can be, it'll have those rows locked, and it can affect the other instances. So SMON might be unable to perform that recovery, so it could cause that PDB or the CDB to crash. OK, now, nobody can access any information. So once if that entire container crashes, recovery is going to stop. If it has a bad transaction, recovery stops. So it might be because of physical data, might be because of the index is corrupt, might be logical corruption. So it stops that interactive transaction recovery process. So not only does it stop the recovery of the transaction that is trying to be recovered by SMON, it's going to stop the rest of the inactive transactions. Those row locks are held. And it can impact critical operations. Yeah, if my system can't do anything, yes, it's going to have an impact. The DBAs must resolve what is that bad transaction, how to get rid of it, how we're going to get around it? 04:12 Lois: Bill, what’s the workflow a DBA would follow when a transaction is quarantined? Bill: So in the system, when that transaction recovery failure is, OK, I've found this dead transaction. I'm going to quarantine. I'm going to say, hey, you have something you need to take care of for that. So it's not recovered by the SMON. So what's going to happen? So there is also is going to be a limit. So if it does reach that limit and the limit is three, then you're going to have to step in and try to take care of that very quickly. The shut down abort will be performed on the PDB. So the good news there is that it's going to keep it from impacting the entire container. If the limit isn't reached, well, then, OK, hey, we have this bad transaction that's going to quarantine, is going to populate. There's a couple of views that you can go out and look at. There's a CDB quarantine transactions or a DBA quarantine transactions. Those views you can look at. And then once we determine that, what are we going to do to try to recover it? If we're going to try to recover it, then we can go ahead and drop that bad transaction. It'll help free up the rows. That way, everything can start working again. That PDB can be opened. 05:30 Nikita: What can you tell us about monitoring quarantined transactions? What specific views or logs should DBAs monitor? Bill: So you can view. You'll see these quarantine transactions in several different places. One is the alert queue. It's going to be sent to the alert queue. That is what is going to notify Enterprise Manager Cloud Control, also populates it within the AWR. Back in 21c, we added the attention log. It shows critical events. Hey, you need to take a look at this. It also can populate it. It will populate it to the alert log. So remember you have the V$DIAG_ALERT that you can look at. Or, if you're familiar with or you use the ADRCI, automatic diagnostic repair recovery advisor, so you can also look at the alert log there. So there are two new views, the CDB_QUARANTINED_TRANSACTION, the DBA_QUARANTINE_TRANSACTIONS working with multi-tenant. The CDB, I can see all the quarantine transactions from the root container, the DBA_QUARANTINE_TRANSACTIONS what I see if I'm in a specific PDB. But it's going to give me the information. 06:52 Lois: What about resolving quarantined transactions? Bill: Monitoring is a must to be able to identify, hey, we have bad transactions that we need to-- quarantine transactions we need to take care of. You can apply the appropriate MOS note if you're not sure what to do. Like anything else, if something happens-- and hopefully, you're not getting quarantined transactions daily or anything like that. But once we start doing a few things, we remember how to do them. 07:21 Lois: And, how do we take care of this? Bill: Well, you always have the ability to go to My Oracle Support. There is a view called-- that CDB quarantine transaction that we talked about that we can look at, hey, here's the reason. And we might use that to go out there and search My Oracle Support and/or contact Oracle Support. 07:49 Do you have an idea for a new course or learning opportunity? We’d love to hear it! Visit the Oracle University Learning Community and share your thoughts with us on the Idea Incubator. Your suggestion could find a place in future development projects! Visit mylearn.oracle.com to get started.  08:09 Nikita: Welcome back! Bill, what are some of the common causes of quarantined transactions? Could you share some examples with us? And how do you resolve them? Bill: One could be physical corruptions. It could either be logical or physical. So maybe because media failed. Hardware bits get flipped. So that might be able to be easily fixed by using the RMAN Block Media Recovery. And that's the lowest level of recovery that we can apply. And then there's logical corruptions. This is the recommended order when trying to resolve logical corruptions. First level is the Block Media Recovery. And then, after that, if the Block Media Recovery fails, then possibly, how about re-creating that data segment? So either truncate or drop it, and then recover it from another source. So once you drop the segment, the transaction then is going to skip trying to recover it. It's no longer there. So it's, OK, hey, I'm successful now. And then, the last resort type method is to drop that undo segment. There's an offline rollback segment that you can use. But it's recommended-- it's best to avoid that-- again, kind of a last-ditch effort to try to fix something. There are other options that you might try. However, these options do end up being a loss of data. Why? Because we're going to do a point-in-time recovery. So we can go back to a table point-in-time recovery. So we start with the Block Media Recovery. OK, we can't. OK, so how about if we go back before that transaction and try to recover the table at that time? So it will be a loss of data. Then, the next level is, we can't do the table. Can we do the entire tablespace? That might be an option. Might flashback the database if we are using-- if we have Flashback Database on. Again, that's just another method of point-in-time recovery. And then also do a database point-in-time recovery. If we can do the database point-in-time recovery flashback at the PDB level, so that way it's not impacting the entire container, hopefully, we don't have to try to do a point-in-time recovery at the database level. So we wouldn't want to do that. That would something really drastic would have to happen to force us to do the entire container. But we want to do that at the PDB level. 10:54 Lois: Ok. So the issue is resolved. What happens next? Bill: So once we have the issue resolved that caused that, SMON is still going to try to do transaction recovery because why? That quarantined transaction says, hey, I've still got this bad transaction there. So once that transaction has been fixed, we need to drop that quarantined transaction. So that way, SMON says, hey, I have this transaction. I need to recover. SMON will keep from trying to do that. So there is a DDL command to drop that quarantined transaction. So remember, from the views, the quarantined transaction views, that's where we saw the undo segment. We saw the slot number. We saw the quarantined transaction slot number. So that way, we can drop that transaction by using that. 11:51 Nikita: How does the escalation process work for quarantined transactions? And why is it important to protect the PDB and the container database? Bill: So quarantined transaction escalation-- we might have multiple transactions fail, depending on the corruption level. It might have multiple blocks for that that have failed. So just to quarantine a bad transaction may not help whatsoever. It depends on what the root cause is for the failures and how many are happening at that time. So the database with these bad transactions will continuously run in an inconsistent state. So it could be dangerous if we have multiples of the same issue and that. So with that system running in an inconsistent state, things will continue to spread. Things will continue to get worse. That's why, once that level of 3 is reached, we go ahead, and we do a shut down abort on that PDB. Because if a transaction can't be recovered, there's no need in trying to do any other type of shutdown. So with this escalation process, it does benefit us because, again, SMON is going to continuously try to recover that bad transaction for that. OK, SMON's going to keep trying. It's not going to work. And at some point, it might cause it to crash. So by stopping it before it continues getting worse, damaging more, we're going to go ahead and say we're escalating this issue to where we're shutting down the PDB. Fault tolerance, so meaning that we have higher availability of the rest of the container. So it's not going to crash the entire container. So the PDB can continue to operate when we are trying to resolve transactions except in the case where it exceeds the amount, and it does a shutdown abort on that PDB. So with that escalation, we reach that limit of 3 for that. We do a Shutdown Abort on that PDB. That transaction recovery is disabled. OK. Don't try to recover any transactions. Why? Because we know we have a few of them. So it's shut down, so we're going to go out and look at our quarantine transactions views, what's the reason for that, how many do we have? And then, once we resolve the issue, we are going to enable recovery again because it turns off the recovery option before it allows us to open that PDB. It's not going to be in a consistent state, though. So now we can go ahead and alter the system and, OK, go ahead and allow recovery of transactions again. 14:42 Lois: Thank you, Bill, for walking us through the details of automatic transaction quarantine and telling us how to manage and resolve these complex scenarios. Nikita: Yeah, thanks Bill! To learn more about what we discussed today, visit mylearn.oracle.com and search for the Oracle Database 23ai New Features for Administrators course. Join us next week for a discussion on some more Oracle Database 23ai new features. Until then, this is Nikita Abraham… Lois: And Lois Houston signing off! 15:13 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
15m
08/10/2024

Automatic Transaction Rollback

Join Lois Houston and Nikita Abraham as they discuss the Automatic Transaction Rollback feature with Senior Principal Database & MySQL Instructor, Bill Millar. Bill explains that in the 23ai release, transactions blocking other transactions can now be automatically rolled back, depending on certain parameters. Bill highlights the advantages of using automatic transaction rollback, which eliminates the time-consuming process of manually terminating blocking transactions. They also cover the workload reduction benefits for database administrators.   Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-new-features-for-administrators/140830/   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X: https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode.   -------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:26 Lois: Hello and welcome back to the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Team Lead of Editorial Services. Nikita: Hi everyone! Last week, we looked at two Oracle Database 23ai new features related to Data Manipulation Language, or DML. One was Unrestricted Parallel DMLs and the other was Unrestricted Direct Loads. Do check out that episode if you missed it. 00:56 Lois: Today, we have Senior Principal Database & MySQL Instructor, Bill Millar, with us. He’s been on several times this season taking us through all the different 23ai new features. In this episode, we’re going to ask him about the Automatic Transaction Rollback feature. Hi Bill! What is automatic transaction rollback and why is it an important feature for database administrators? 01:22 Bill: We can now have transactions that are blocking other transactions, depending on some settings, to automatically roll back. It does require some parameters to be set. Rows basically get locked in a single row. Each row is locked based off of what type of activity is being performed on that row, such as inserts, updates, deletes, merge, select for updates. 01:52 Nikita: And how were things before this feature? Bill: Traditionally, the database administrator had to research and manually terminate blocking transactions, or there are some things that resource manager might have been able to do. 02:05 Lois: This seems like such a game-changer for DBAs, Bill. So, how does it work? Bill: So there are some parameters that control the automatic rollback. One is the transaction priority. We're going to set that priority for a transaction either to medium, high, or low. We have the high priority wait target and a medium priority wait target that we can set. The high wait target will terminate if a medium transaction is blocking that high target based off of the values that we set, the medium transaction can be terminated. A medium transaction will terminate a low priority. So if a transaction designated as low exceeds the blocking time that we set for the medium priority wait time, then it'll be terminated. Whereas, the high priority will terminate both medium and low transactions. We have the rollback mode. We're either going to roll back or we're going to track, depending on what we're trying to do. 03:10 Nikita: So, if I decide that I want to use automatic transaction rollback… if I decide to implement it…I’ll need to set those parameters, right? Bill: So we can set those at a session level. We also have some system level wait targets. What are the wait times for the medium, high transactions? How long they are going to wait for those lower transactions? And then we also have the rollback mode. Are we actually going to roll back or are we just going to track for right now? We have to determine what is going to be the wait times for those transactions that we want to wait before those lower transactions, priority transactions are rolled back? At that session level, we're going to set the session. High is the default. So if we want transactions to run at a lower, we have to set those. So we can set the medium or low because that's going to determine how they're rolled back. So, what is that rollback order? Again the low, we'll roll back any low that's blocking mediums. High, we'll roll back any mediums or lows that are blocking. So you do need to have the understanding of that application, and how critical are the different transactions, because if you start rolling back transactions, what? It does-- If you roll back the transactions, it does generate a little research, a little bit more work on why did that happen. 04:38 Lois: Yeah… you don't want to set it without really understanding what you’re doing. Ok, so, what else do I need to know? Bill: So we do have the system level wait targets again. How long is the high priority transaction going to wait for a lower transaction before it rolls it back? How long that medium priority is going to wait? We use the ALTER SYSTEM SET command. It does have a range of values from one second to 2,147,483,647 seconds. That's like 68 years. Might not want to wait 68 years for a transaction to be rolled back. We can set it at the PDB level. Each pluggable can have a different value. And it can have a different value in the different RAC instances. We have those system level wait targets that we want to set. Automatic rollback. In order for it to function, all the parameters have to be set properly. What is that transaction priority? We saw the medium, high, low. What is the wait target? How long is the medium is going to wait? How long is the low is going to wait? We set that in seconds. The order of those transactions determine how they are terminated. 05:53 Lois: Earlier on, you mentioned rollback mode. Can you tell us a little more about it? Bill: So with that automatic rollback mode, there's only two valid values. It is considered advanced parameter. We can either set it in rollback, which is the default, or we can put it in track mode. Track mode gives us the ability to try it out. I guess you can say. It will say, hey, if I would have been running, if I would have been used, I would have terminated this transaction. It'll show me the number of times it would have happened for high priority, the number of times it would happen for a medium priority. It is modifiable in the PDB, but however, the track mode must be the same in each instance. So that rollback mode, again, that is the default value for that. So statistics are going to be available. So how many high priority rollbacks occurred? How many medium rollbacks occurred? In that track mode, I have to set that value. I do have to have the time set for how long is it going to wait for those, so the high and medium. And those priorities has to be set in the session. So statistics are available for the high and the medium in the track mode. Not only when we're actually rolling back, but also tracking. Again, this gives us the ability, by having it in the track mode, gives us the ability to do a little testing with it first. 07:27 The Oracle University Learning Community is an excellent place to collaborate and learn with Oracle experts and fellow learners. Grow your skills, inspire innovation, and celebrate your successes. All your activities, from liking a post to answering questions and sharing with others, will help you earn a valuable reputation, badges, and ranks to be recognized in the community. Visit mylearn.oracle.com to get started. 07:55 Nikita: Welcome back! Bill, when it comes to monitoring, how do you keep track of these rollbacks? Bill: For monitoring our rollback transactions, the data dictionary information is available to assist with monitoring our transaction priority. So from the V$TRANSACTIONS, there are columns available allowing us to do that. Based off that transaction priority shows what is the wait target for that. And then also each of the priority of those transactions. We can view this information, it will be populated to the alert log. So we can see that session ID, what was session ID of that? What was the transaction ID? What was the priority? What was the system identifier for that? It tells you-- even tells you the parameter and tells you what that wait time was set at. If it was a medium transaction that was terminated, it shows, OK, it was a medium. So we can view the alert log. And we can look for these terminations. Gives an idea of what's being done. 09:01 Nikita: And finally, what are the key advantages of using automatic transaction rollback? Bill: It eliminates a very manual process. It can be very time-consuming for the DBA to go out there and try to find what's the blocking session. Yep, I'll go ahead and do an ALTER SYSTEM. I'll kill that session trying to track it down, finding the views to look at it to say, OK, Yeah, this is the blocking one. I want to go ahead and take care of it. Resource manager doesn't really fully address blocking transactions. Some things that can do for that. We have the maximum estimate execution time. So that's the number in CPU seconds allowed for that call. It's terminated. It doesn't matter whether it's blocking another session or not in that case or even another transaction. It just says, OK, you exceeded this time. I'm going to terminate you. Then we also have the max idle time again. That's maximum session idle time. All right. You haven't been doing anything to session, we're going to terminate you. And then we have the MAX_IDLE_BLOCKER. That's the time duration of an idle session can block another session. Again, it's going to check OK, is the session actually idle? But these don't really address the issue of, hey, I have a higher priority transaction waiting for a lower transaction that's blocking it. 10:27 Lois: Thank you, Bill, for that breakdown. This feature is such a time saver. Nikita: Yeah, and such good way to reduce the manual workload for DBAs. Thanks Bill! Lois: To learn more about what we discussed today and view some of the demonstrations of this feature, visit mylearn.oracle.com and search for the Oracle Database 23ai New Features for Administrators course. Join us next week for a discussion on some more Oracle Database 23ai new features. Until then, this is Lois Houston… Nikita: And Nikita Abraham signing off! 11:02 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
11m
01/10/2024

Unrestricted Parallel DMLs and Direct Loads

In this episode, hosts Lois Houston and Nikita Abraham discuss new features in Oracle Database 23ai related to Data Manipulation Language (DML). They are joined by Senior Principal Database & MySQL Instructor, Bill Millar, who explains the concept of unrestricted parallel DMLs and their importance in speeding up large operations and maintaining summary tables. The discussion then turns to unrestricted direct loads, examining the evolution of direct loads with 23ai and the broader impact of these changes.   Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-new-features-for-administrators/140830/   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X: https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00  Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:26 Nikita: Welcome to the Oracle University Podcast! I’m Nikita Abraham, Principal Technical Editor with Oracle University, and with me is Lois Houston, Director of Innovation Programs. Lois: Hi there! In our last episode, we discussed a ground-breaking caching solution in Oracle Database 23ai, known as True Cache. We spoke about its configuration and deployment, and explored how to apply True Cache to our applications. Nikita: Today, we’re going to talk about two Oracle Database 23ai new features related to Data Manipulation Language, or DML. The first is Unrestricted Parallel DMLs and then we’ll move on to Unrestricted Direct Loads. We’ll talk about the situation prior to 23ai, identify the improvements that have been made, and look at their benefits. 01:15 Lois: And returning for another episode is Bill Millar, our Senior Principal Database & MySQL Instructor with Oracle University. Hi Bill! So, to start, can you explain what unrestricted parallel DMLs are and why they are important, especially in the context of Oracle Database? Bill: The Oracle Database allows DML statements such as inserts, updates, deletes, merge to be executed in parallel by breaking those statements into smaller task. These transactions can contain multiple DML statements. And they can modify multiple different tables. So transactions with the parallel DML is going to use the execution method by breaking up those large operations to execute the transaction in parallel. It helps speed up the large operations. And it's advantageous to data warehouse environments where we're maintaining like summary tables, historical tables. And even in OLTP systems, it can be beneficial for long-running batch jobs. The scale up. Well, it's basically dividing the executing SQL against those large tables and indexes into those smaller units of work. 02:36 Nikita: So, what were the limitations prior to 23ai? Bill: So once that object was modified by APLL statement, the object cannot be read or modified later in the same transaction. After that parallel DML modifies a table, there is no follow-on DML or query on the same table within that same transaction. If any attempt to access a table modified by that parallel statement, the transaction would be rejected. You're only allowed to query on those tables prior to that DML on that object itself. 03:16 Lois: Ok… So with these new improvements, I’m guessing some of these restrictions have been removed? Bill: In this case, in the same session, you can query the table multiple times. You can perform conventional DML on the same table within the same session. And you can also have multiple direct loads in the same session without having to do that commit. But there are still some restrictions with it. Heap tables only. You can't do it with any clustered tables or IOT, Index Organized Tables. Non-ASSM, the Automatic Segment Space Management tables. The temp table is not under ASSM. Why? Because it has to have uniform extents or any other tablespaces that you created with the uniform extents. So those restrictions still apply. So some of the improvements are some of the restrictions can help reduce the overhead. We can enable Parallel DML within that session. Allows the multiple operations on the same object. And it doesn't require that commit for each separate operation. Makes it a little bit easier to use by removing some of these limitations. Now users can run parallel DMLs and any combination of statements within that same transaction. And it can help simplify and speed up data loading analytic processes by making the database, the parallel execution and parallel queries, at the same time within that same session, again, eliminating having to do commits. 04:58 Nikita: Thanks for that summary of all the improvements, Bill. Now, how do you enable this? Is it enabled by default? Bill: To enable the Parallel DML mode, it is required for a session. It is disabled by default. That's because the Parallel DML and Serial DML, they have different locking, different ways to handle the transactions, different disk space requirements. When Parallel DML is enabled in a session, all DML statements are considered for parallel execution. Only a statement is considered for parallel execution when the Enable Parallel DML hint is used if I don't set it for a session. The sessions DML mode does not influence any parallelism of DDL statements. When the Parallel DML is disabled, no DML is executed in parallel, even if the hint is used. 05:59 Lois: Bill, I would like to dig a little deeper into the benefits. How do these lifted restrictions improve the overall performance and reduce overhead? Bill: There's no longer that requirement to commit everything separately. So that's going to reduce the overhead, not having to do the commit all the time. The scalability of accessing those large objects, executing parallel makes the decision support systems, those data warehouses and batch OLTP jobs or any other larger DML operation execute faster. By removing that one touch limitation, it allows the parallel DML statements to be read or modified by later statements of the same transaction in the same session. It's very similar to the non-parallel statements. And even OLTP systems can also benefit, for example, maintaining a larger operation, such as the creation of indexes, refreshing tables, or even creating summary tables. 07:14 Did you know that Oracle University offers free courses on Oracle Cloud Infrastructure? You’ll find training on everything from cloud computing, database, and security to artificial intelligence and machine learning, all free to subscribers. So, what are you waiting for? Pick a topic, leverage the Oracle University Learning Community to ask questions, and then sit for your certification. Visit mylearn.oracle.com to get started. 07:42 Nikita: Welcome back! Let's move on to the next new feature, which is unrestricted direct loads. Bill, what was the situation with direct loads like, prior to 23ai? Bill: After a direct load and prior-- it was always prior to a commit, queries in additional DMLs were not allowed on that same table. You might encounter the ORA error, the 12838, saying, hey, you can't read or modify this in parallel. That's because the DML on that direct load had access to that and that session for that. So you might have received that error. The enq contention, the wait event for the direct load issue in a different session from the other sessions during the direct load is having to wait, because of that queuing that-- because a transaction gets that table, locks that information to keep that table from being modified until that direct load has actually committed. Within the same transaction, within the same session, trying to do multiple DMLs with the-- while it is being modified with the direct loads itself. Unlike conventional loads, the direct loads, as the new blocks and extents are added to the segment, the high water mark does not actually get moved until the actual commit itself. So that's why there is restrictions in the same session or even in other sessions to be able to do anything. So to prevent the errors, the applications had to do a commit immediately after that direct load to prevent those errors from happening. Well now, there are restrictions when that direct load was done prior to that commit for that. The same table in the same session, couldn't query, couldn't do any additional DMLs, couldn't do any additional parallel DMLs. And even in other sessions, queries were not allowed on the same tables that was in use by the other session. So no additional conventional DMLs, no additional parallel DMLs were allowed. 10:09 Lois: Ok.. it was restrictive in what could be done. So, how have direct loads evolved with the 23ai release? Bill: Some of those previous restrictions have been lifted in that same session with that same table. So now you can immediately-- and notice that we're talking here, same session, same table. All right. So you can query multiple times within that same session. You can perform additional DML and you can also do multiple direct loads in the same session without having to do that commit. However, there still are restrictions. It has to be a heap table. It does not work with index organized tables or clustered tables. And the tablespace, if it's not using the automatic segment space management, it cannot-- it does not apply to those either, or if tables with a uniform extents-- tablespace with uniform extents. That's why anything in the temporary table is also included. Why? Because the temporary tablespace has to be uniform extents. 11:17 Nikita: So, what are the restrictions lifted for different sessions on the same table? Bill: Sessions can query that table, can perform conventional DML on that, able to also concurrently perform a direct load, and I can roll back to a save point. So you can see those added features can be very beneficial. But there's still restrictions that apply. It still applies to heap tables only, and it still applies to only tablespaces that are using the automatic segment space management for that. Of course, that includes the temporary tablespace and it doesn't work with tablespaces that have uniform extents. Your application DML might need to query the data after that direct load without committing, applications that might need to modify data within that same transaction as that direct load. You can enable multiple append hint. So you can specify the hint in addition to pending hint to disable. You can specify the no multi-append hint to disable it. 12:27 Lois: Bill, what’s the broader impact of these changes. How do these improvements make things more development-friendly? Bill: So changes to the direct load make things a little bit more development friendly by removing those directions after that direct load itself. So previous restrictions when loading-- querying the data kept us from doing multiple things at the same time. So now I can query on that table direct load from the same session, from a different session. I can do conventional DMLs on the table within the same session. It allows me to do a rollback on it. I can do direct loads on the same table within the same session. Again, I can also allow rollback to a save point. As long as my compatibility is set to 21.0.0.0, I will be able to go ahead and benefit from this feature. And there is no increase with it as far as the space usage or causing any fragmentation to the table. So that will not be an issue. 13:35 Nikita: Well, that’s the end of our time together, but I want to thank you, Bill, for sharing your expertise with us. Lois: To learn more about what we discussed today, visit mylearn.oracle.com and search for the Oracle Database 23ai New Features for Administrators course. Join us next week for a discussion on some more Oracle Database 23ai new features. Until then, this is Lois Houston… Nikita: And Nikita Abraham signing off! 14:03 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
14m
24/09/2024

True Cache

Hosts Lois Houston and Nikita Abraham are joined by Senior Principal Database & MySQL Instructor Bill Millar who explains Oracle's newest caching solution called True Cache. Available in Oracle Database 23ai, True Cache is an automatically managed, in-memory, read-only cache that improves application performance dramatically. Bill provides an overview of its features and highlights the benefits of using True Cache.   Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-new-features-for-administrators/140830/   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X: https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:26 Lois: Hello and welcome to the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor. Nikita: Hi everyone! Last week, we had quite a power-packed episode. We discussed the 23ai new feature for Automatic SQL Plan Management. We also looked at the 23ai automatic feature that enhances SecureFiles LOB Write Performance as well as the update to Wide Columns. 00:59 Lois: Yeah, and in today’s episode, we will look at True Cache, another 23ai new feature. To tell us all about it, we have Bill Millar back with us. Bill is a Senior Principal Database & MySQL Instructor with Oracle University. We'll ask Bill to give us an overview of True Cache, talk about its configuration and deployment, and discuss how to apply True Cache to our applications.  Nikita: To kick things off, Bill, can you give us a high-level overview of what True Cache is? How does it differ from other caching solutions like Redis or Memcached? 01:35 Bill: True Cache is an in-memory cache. It is read-only. True Cache is deployed in front of a primary database, and it is automatically managed. It keeps the most frequently accessed data in the cache, and it keeps the cache consistent with the primary database. They call it diskless, but it's not. It does require some space for SP file, redo logs, control files, and such. But it's very similar to Active Data Guard.  The queries can be offloaded to the True Cache for faster query response. And the data in the query cache is consistent. Unlike other mid-tier caches like Redis or Memcached, a query to the True Cache returns only committed data, and the data is always consistent. It's secure. Why? Because we implement our Oracle database security policies and you can control access to the cache.  02:33 Lois: So, why should we use True Cache?  Bill: Improve application performance without having to rewrite any applications. That can save considerable amount of time, effort, and expense. Reduces the application response time. So the closer the True Cache is to the application, the faster the response. Now, you do need a large amount of memory. We're talking memory here. It's an in-memory storage area, and depending on how you configure it, you can have it shared, you can have it divided. We mentioned it's automatically maintained. So there's no application changes required, and it is transparent to the application. Again, simplifies that development and maintenance.  03:15 Nikita: How does it impact application performance, and what kind of scenarios would benefit the most from implementing True Cache? Bill: So at a high-level view, True Cache or primary database, the application configuration serves as other things that are going to decide where is it going to query the data from, from the True Cache or from the primary database.  The True Cache satisfies that query. And that's where the data will be fetched from. If not, then from the primary database. On start up, True Cache is empty. So it starts reading large chunks of data to populate the True Cache. So after a block is cached, then again, it can be automatically updated, apply the redo to it-- very similar to the Oracle Active Data Guard. In the data returned, it is always going to be consistent.  04:04 Lois: Is it going to be current data? Bill: Maybe, maybe not. If it's been updated in the primary, if they redo apply hasn't occurred yet, then it's not the most consistent. But as far as the query cache is concerned, it is the most current because we only display consistent. You can have multiple True Caches. You can save the same database application service to the True Cache as you can partition it.  04:28 Nikita: I'm curious about the memory requirements, Bill. How crucial is memory for True Cache's performance? Bill: You need to have significant amount of memory. Memory, memory, memory. So True Cache is completely memory, memory. So I want to have all my data possible in there. The more memory you have, the less likely something is going to age out. And of course, just like with the standard caching, you can also pin objects to stay in the True Cache.  Yeah, like I said, there are some requirements for storage, even though it's called diskless because of, again, redo log files, configuration files like the control files, SP file. And again it is read only.  05:11 Lois: Can you explain the differences between using physical and logical connections with True Cache? How does this impact the way applications interact with the database? Bill: So with using the True Cache, we have two physical connections, and we can have one to the primary database and one to the True Cache. Each connection has a database application service associated with it, and it's going to choose which connection to use based whether it's going to go to the True Cache or to the primary database.  The second way is the application maintains one logical connection that uses the application service for the primary database. It's the JDBC thin driver, starting with Oracle Database 23 is available. It's going to maintain the physical connections to the primary database and the True Cache itself. Now, the logical connection, the one logical and one physical, is for Java applications only.  Applications that work with JSON, we extend the HTTP entity tag support for that. So a database GET request to the True Cache is going to compute the ETag, insert it into the return document.  06:27 Nikita: But what happens if there’s a mismatch when the modified document is put back into the primary database?  Bill: Well, then the database is going to verify. OK, what happens with that?  It's going to verify the document row still matches that ETag for that. If with that put command, let's say, I have new data here, the row is going to match that ETag that was automatically updated. If there's no match, another user has changed the data and the PUT request is rejected. So the PUT request can be retired using the new data.  07:05 Are you planning to become an Oracle Certified Professional this year? Whether you're a seasoned IT pro or just starting your career, getting certified can give you a significant boost. And don't worry, we've got your back! Join us at one of our cert prep live events in the Oracle University Learning Community. You'll get insider tips from seasoned experts and learn from other professionals' experiences. Plus, once you've earned your certification, you'll become part of our exclusive forum for Oracle-certified users. So, what are you waiting for? Head over to mylearn.oracle.com and create an account to jump-start your journey towards certification today! 07:48 Nikita: Welcome back! Now, how do you configure True Cache, Bill? Bill: You can configure True Cache one of two ways. You can either use the Database Configuration Assistant, which actually makes it a little simpler to configure it, and you can also manually create it.  You have some environment options. One is a uniform configuration where you can deploy identical True Cache that use the same database application service. Another way is partition configuration. The data is going to be divided across multiple True Caches, which, each cache is a different subset of the data. You can also deploy True Cache in a RAC environment. As one might expect, there are some additional configuration steps for a RAC environment.  You want to make sure you verify your configuration, that the database application services are working as expected after you configure it. And then, optionally, you can enable DML redirection. What that will do, it writes data to the primary database, and that data is automatically updated in the cache. It's very similar how to the Oracle Active Data Guard works. Because the DML redirection uses more resources, it's not recommended for update-intensive applications.  There is a parameter, a ADG_REDIRECT_DML initialization parameter, that you will set to True in order to do that.  09:18 Lois: Bill, what are the specific challenges or considerations that administrators should be aware of during the configuration process?  Bill: You do need to make sure your network is configured for True Cache in the primary database. So optionally, you can create a remote listener for high availability. But you create your True Cache. You go ahead, and make sure that you have your primary database. You want the network configuration for both of those. And then you create the True Cache. Once the True Cache is created, you're going to create the application services associated with the database. And then, you're going to start the database application services on the True Cache.  When it comes to naming the application service names, each primary database application is going to be associated with a corresponding True Cache application service. To help simplify things a little bit, in the naming convention, you'll notice in our examples-- for example, if we have SALES as the primary database service, then we have the True Cache, we have SALES_TC, standing for True Cache, so it's easily identified.  You don't have to do that, but it's kind of recommended to do that, some way that you're going to identify it. So we're going to start our True Cache services. And you only start the True Cache services on the True Cache instances. Because it's the database services on the database that you need to make sure are started. And they are read-only.  10:46 Lois: Are there some best practices for maximum availability architecture? Bill: Uniform configuration seems to be a popular one. Why? Because I am going to have the both True Caches can be shared. That way, hopefully, I'm getting full usage out of both. And maybe if I have one service going to one, it might be minimally used. Whereas, the other one might be over. Hey, I could use more memory over here.  We'll also recommend use the JDBC 23ai UCP, Universal Connection Pool, for the application. So that can lessen the impact. If one True Cache becomes unavailable, as far as, OK, I need to reroute over here-- benefit of uniform configuration also. Prepopulate the cache. You want to go ahead and run the critical workload for that. If you have a planned outage, and you need to shut down the True Cache, you want to make sure you stop the database application service on that True Cache.  And then, how are you going to design your True Cache? Are you going to partition it? Are you going to have uniform? Which partition option are you going to use? So you can try to design that to help minimize the number of fetches it has to do from the primary database. And the more you can keep in the True Cache, the better the performance is going to be.  12:09 Nikita: What do I need to keep in mind when it comes to managing True Cache? Bill: One thing you might need to do for managing the True Cache is to monitor the True Cache. There's a couple different ways that we can do it. One, you can use the V$ view, the V$TRUE_CACHE view. And, of course, you can always use the Automatic Workload Repository.  12:30 Lois: Bill, we already spoke about this a bit, but can you tell us more about using True Cache in an application? Bill: There's two ways of using True Cache, as we've seen, physical and logical. Physical, it's going to maintain two connections, front one to the primary database and one to the True Cache. The application can decide which connection to use, based off of what it is trying to do.  If it's just reading, long as it's for a service that's configured with True Cache, it can read the True Cache. If it's going to write something, it's going to update, insert, whatever the case might be, it's for the primary database. And you can use any existing client driver as long as you're using the physical connection method. Any programming language will also work.  With the one logical connection method, it uses the application service for the primary database. You're going to use the JDBC Thin driver, starting with 23ai. You can use it and it maintains the connection to the primary database and True Cache. This model only works with Java applications, though. It maintains the physical connections.  We're going to enable the driver connection. And then, we're going to set the read only. We're going to set it to read only, true. Read only, false, whatever the case might be. And the read only mode is false for a connection by default. False is the default. Java applications only.  14:14 Nikita: What are some best practices for load balancing in a uniform configuration? Bill: You have multiple--multiple True Caches. They're going to service the same database application. They're going to cache the same data. It's the listener that's going to distribute the load balances. So the listener will automatically distribute and load each session to each cache. It will do it randomly and it will do it based off a load. Where can it configure? Where can it send for the best performance.  To route the request to the best performing True Cache, you want to make sure that you are using the same listener. So that remote listener parameter should point to the same listener, which is also the primary database listener. Single instance primary database local listener or scan listener, whichever one you're using, points to the primary. For the application for the JDBC URL, should point to the primary database.  You'll remember that Thin driver is going to create that logical connection, and it's going to create the physical connection to the primary database into each True Cache. To simplify things and possibly avoid connection issues, you might consider using the LISTENER_NETWORK, so the initialization parameter instead of specifying the remote and local listener separately. Because with the local--with the listener networks, all listeners within the same network name will cross register.  15:44 Lois: Before we wrap up, are there any complementary features that you would recommend using alongside True Cache to further enhance performance or simplify management? Bill: There are features that can complement True Cache-- the server-side result set cache.  So you can create--you can go ahead and create the result set that's part of the library cache set aside, a portion of that. You're going to go in, you're going to configure what objects will use that. You can still use that even with True Cache.  There's also the KEEP Buffer Pool that can be used. It's a separate pool that you set aside as part of the buffer cache. And you want to make sure you size it so the object that you want to keep in memory in the buffer cache that you size it appropriately. But again, some configuration, you configure the key pool, plus also you go in and alter the objects to use it.  And then lastly, there's the database smart flash cache. So again, if your data doesn't fit into memory, you can expand the capacity of by adding flash devices. When you configure the flash cache, if you are using transparent data encryption data, the local flash devices is not supported. So if it's TD encrypted on the primary database, it's going to stay in the buffer cache of the primary database.  17:11 Nikita: Ok! I think we can close the episode with that. Thank you, once again, for joining us, Bill.  Lois: Yes thanks! We’re learning so much from you. To learn more about what we discussed today, including the various configuration options that are available, visit mylearn.oracle.com and search for the Oracle Database 23ai New Features for Administrators course. Join us next week for a discussion on some more Oracle Database 23ai new features. Until then, this is Lois Houston… Nikita: And Nikita Abraham signing off! 17:46 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
18m
17/09/2024

Enhancements in SQL Plan Management, SecureFiles LOB Write Performance, and Column Width

Join Lois Houston and Nikita Abraham, along with Senior Principal Database & Security Instructor Ron Soltani, as they discuss how the new Automatic SQL Plan Management feature in Oracle Database 23ai improves performance consistency and simplifies management. Then, Senior Principal Database & MySQL Instructor Bill Millar shares insights into two new features: one that enhances SecureFiles LOB Write Performance, improving read and write speeds, and another that increases the column limit in a table to 4,096, making it easier to handle complex data.   Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-new-features-for-administrators/137192/207062   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X: https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:26 Nikita: Welcome to the Oracle University Podcast! I’m Nikita Abraham, Principal Technical Editor with Oracle University, and joining me is Lois Houston, Director of Innovation Programs. Lois: Hi there! Last week, we looked at the Oracle Database 23ai enhancements that have been made to Hybrid Columnar Compression and Fast Ingest. In today’s episode, we’ll talk about the 23ai new feature for Automatic SQL Plan Management with Ron Soltani, a Senior Principal Database & Security Instructor with Oracle University.  01:01 Nikita: And later on, we’ll be joined by Bill Millar, another Senior Principal Database & MySQL Instructor, who will tell us about the 23ai automatic feature that enhances SecureFiles LOB Write Performance. We’ll also get him to talk about the Wide Columns update. So, let’s get started. Hi Ron! What have been the common challenges with SQL plans and database performance? Ron: One of the problems that we have always had, if you remember, was when data changes, database setting configuration, parameter changes, SQL that were operating very well could now behave badly using the SQL plan that were associated to them. And remember, the same SQL plan generally Oracle likes to continuously reuse.  So the SQL plans were put in the baseline in the past, and we could have those SQL plan baseline, which are a set of approved plans to be used for a SQL from the SQL history stored in AWR, then could be used for the optimizer to choose from. However, which plan to choose and which one would be the best one to use, this is what the problem has been in managing the SQL plan baselines, and a lot of the operation would have been done manually.  02:22 Lois: And what have we done to overcome this?  Ron: So now this new system will going to perform all of those operations automatically for us. Now it can search the Automatic Workload Repository. It can find SQL plans for a particular SQL statement, then look for any alternative plans that may available in alternate sources like SQL tuning sets. And then validate those plans and see if those plans are going to be good and to be used as SQL plan baseline for executing SQL statement by the optimizer. 03:00 Nikita: So we now have the Automatic SQL Plan Management Evolve Advisor to help manage operations automatically, right? Can you tell us a little more about it? How does it ensure optimal performance? Ron: This is an automatic advisor that is created that can go look for different plans and validate the plans by examining them, making sure that they are not causing any regression compared to the previous operation, and then evolve that plan into a good baseline.  This simplifies management of the baseline repository for a SQL statement. So as data changes, as parameters changes, optimizer could come up with different type of plans that are set within this baseline that has been validated to be good baseline for each situational operation. So this way you reduce a lot of hard parsing operations.  04:00 Lois: And how does the SQL Evolve Advisor work, Ron? Ron: First, it will check the AWR to find what are the top SQLs that has been found. Then it will look to see if these top SQLs who did not perform well with the plan that they have, that's why they're top SQL, have other alternative plans that are stored in the SQL plan history, in AWR, or available in any other sources.  Then if it finds any additional plans, it will go ahead and add all of those plans into the plan history. So in the plan history, now you have accumulation of all the plans available in AWR and anything that has been brought from other sources. Then it will test every one of those plans and validate that by use of the plan, the SQL statement will not deprivate and get slower. The performance is either similar or actually better. So normally, there is a percentage that the SQL should improve. So we will then validate these baselines.  And finally, once the baselines or those plans have been validated, they will be accepted, and then they will be added as SQL plan baselines. They will remain in the statement history, in the AWR, and will be available for optimizer for the future use.  05:28 Nikita: What are the benefits of this? Ron: Number one is Autonomous Database. As you know, they want to automate all management, including management of the SQL execution due to changes that are happening for the application, for the data, or the database and its environment.  It totally eliminates any manual intervention for management of the statement, and it can transparently repair any statement that had been affected by a major change.  06:00 Lois: What sort of problems does this feature solve for us? Ron: Of course, this is a performance consistency. We want to make sure that every statement performed to its best performance and any specific changes that may impact those SQL statements would be taken into an account, and a better plan, if available, would then be available for use.  It also improves the application performance level, therefore database service level will get much improvement. And the SQL execution plans will be automatically managed behind the scene by expanding these baselines, by managing all of these baseline history and all of that that is managed by this automatic SQL plan management environment automatically.  06:50 Nikita: And when do we use this?  Ron: If there is a change in a database environment, like you add SGA, the change into the shared pool, change in the size of the buffer cache or any type of storage effects. So all of those can actually affect the SQL execution.  Now all of those changes, including data changes, can cause a SQL plan to not behave very well or behave as well as it was doing before. Therefore, if particular plans do not perform as well as they did before, that affects the performance of the application. This also affects the performance of the database and the instance.  07:35 Lois: So, how do we use this environment?  Ron: Well, best news that I have for you in that is that there is nothing manual needs to be done. All we need to do is, number one, make sure that we enable foreground automatic SQL plan management that we done through the package for the DBMS SPM for SQL plan management.  You will use the package with the configure option, and you enable the auto SPM evolve task, and you set it to auto. Once this is done, now the SQL evolve plan management and advisor are enabled, and they will then monitor your statements, review all of the top SQLs as they are found with all of the ADDM operation, and then do their work in looking for better plans and being able to maintain the SQL plan baselines we talked about.  Now for you to be able to view, monitor, and see how these operations are going, if it is enabled, you can take a look at the DBA SQL plan baseline's view. There are many, many columns in that particular baseline, and there are also columns that has been added that tell you where is the plan generated from, if a plan is approved, and any other user interaction with the plan or settings can then be verified using that DBA SQL plan baseline view.  09:13 Are you looking for practical use cases to help you plan and apply configurations that solve real-world challenges? With the new Applied Learning courses for Cloud Applications, you'll be able to practically apply the concepts learned in our implementation courses and work through case studies featuring key decisions and configurations encountered during a typical Oracle Cloud Applications implementation. Applied learning scenarios are currently available for General Ledger, Payables, Receivables, Accounting Hub, Global Human Resources, Talent Management, Inventory, and Procurement, with many more to come! Visit mylearn.oracle.com to get started. 09:54 Nikita: Welcome back! Let’s bring Bill into the conversation. Hi  Bill! Can you tell us about the 23ai automatic feature that enhances SecureFiles LOB Write Performance?  Bill: The key here is that it is automatic and transparent. There's no parameters set. Nothing to configure in table, no hints, and nothing that you have to do with these improvements. It is tightly integrated with SecureFiles LOB infrastructure.  So now, multiple LOBs can be handled in a single transaction and can be buffered simultaneously. This will help with mixed workloads, switching between the LOBs that are writing in a single transaction. The PGA will adaptively resize based off the size for these large writes for the LOBs if you're using the No Cache option. Remember, no cache is going to bypass the buffer cache and does direct reads and writes from the PGA.  JSON type will be transformed into the OSON Oracle data type. It is an optimized native binary storage format for JSON data.  11:15 Lois: Ok. So, going forward, there will be better read and write performance for LOBs. Bill: Multiple LOBs in a single transaction can be buffered simultaneously, improving mixed workloads. We just talked about the PGA. Automatically, the buffer is automatically resized.  And the improved JSON support. The reason it will recognize, hey, this is a JSON data type. But traditionally, JSON data types were small.  So they were small to medium size. So the range from 32k to 32 meg was considered small to medium whereas LOBs were designed for data types larger than 100 meg. So by recognizing this a JSON data type, it can take advantage of the LOB architecture.  Other enhancements will also include the acceleration of compressed LOBs, the pen and compression caching, and improves the poor performance of your reads and writes to compressed LOBs. It's faster than previously.  12:24 Nikita: Bill, what do you think about the recent increase in the column limit? Previously, the limit was 1000 columns per table, which sometimes posed issues when migrating from other systems that allowed more than 1,000 columns, right?  Bill: Maybe because of workload requirements, the whole machine learning, the internet of things workloads, IOTs can have hundreds of thousands of attributes, dimensional attribute columns for that. And even our very own blockchain tables reserves up to 40 hidden virtual columns, so that takes away from the total amount.  Virtual columns count towards the column limits and some applications as they drop columns, what it does, it just converts them to unused, and it still applies towards the limit the number of columns that you can have to that limit. There were workarounds. However, they were most likely not the best way to do it, like column switching, table splitting for that. But big data really use cases, really saw where files have or required more than 1,000 columns.  13:42 Lois: So, now that we can have 4,096 columns in a table, I’m sure it’s made handling complex data a lot easier.  Bill: So by increasing this, since other systems do support higher column limits, it can-- the increase can make migration from other systems easier and possibly even a little bit more attractive while it can make applications a little bit simpler because the 1,000 column limit was not always optimal for analytics. Where 1,000 might have been plenty for OLTP type environments, but not for the analytics, especially when it comes to machine learning and those internet of things that we talked about, where the previous workarounds, like splitting the tables, really caused more performance issue than anything else.  So we want to avoid those suboptimal workarounds. And the nice thing is there's no change to the SQL. So once you have that-- well, if we were doing SQL, if we had tables that were split and we're trying to do things that is actually going to help improve that SQL, now, we don't have multiple objects that we're dealing with.  14:57 Nikita: How do we actually go about increasing the column limit to 4,096? Bill: You do have to have the compatibility set to 23c. Why? Because it's a new feature. There is a new initialization parameter called Max columns, and you do set that. There's two different ways, two different values. We can set it to standard or we can set it to extended.  It is dynamic. When it's set to standard, it's only 1,000. When we set it to extended, it's going to allow the 4,096. It is modifiable at the PDB level. However, it will inherit what's at the root level, if it's not explicitly set at a PDB. It can't alter it in a session for that. And multiple instances of the RAC environment must use the same value.  Now one thing, notice that it cannot be set to standard if I created a table that had more than 1,000 columns. One thing that might get you, when you drop a table that has more 1,000 columns and you try to set it back to standard, it might tell you, hey, you have tables that have more than 1,000 columns. Don't forget your recycle bin unless you did a drop table purge.  16:09 Lois: Are there any performance considerations to keep in mind, Bill? Bill: There's really no DML or query performance degradation for the tables. However, it might require, as you would expect, the increase in memory when we have the new column limits. It might require additional shared pool, additional SGA with the additional columns, more buffer cache as we're bringing blocks in.  So that's shared pool along with the PGA. And also we can add in buffer cache in there, because that increased column count is going to be increase in the total PGA memory usage. And those are kind of expected for that. But the big advantage is it gives us the ability to eliminate some of these suboptimal workarounds that we had in the past.  17:02 Nikita: Ok! We covered a lot today so thank you Bill and Ron.  Lois: To learn more about what we discussed today, visit mylearn.oracle.com and search for the Oracle Database 23ai New Features for Administrators course. Join us next week for a discussion on some more Oracle Database 23ai new features. Until then, this is Lois Houston… Nikita: And Nikita Abraham signing off! 17:27 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
17m
10/09/2024

Hybrid Columnar Compression & Fast Ingest

In this episode, hosts Lois Houston and Nikita Abraham speak with Senior Principal Database & MySQL Instructor Bill Millar about the enhanced performance of Hybrid Columnar Compression, the different compression levels, and how to achieve the best compression for your tables. Then, they delve into Fast Ingest, what’s new in Oracle Database 23ai, and the benefits of these improvements.   Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-new-features-for-administrators/137192/207062   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X: https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:26 Lois: Hello and welcome to the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor. Nikita: Hi everyone! In our last episode, we spoke about the 23ai improvements in time and data handling and data storage with Senior Principal Instructor Serge Moiseev. Today, we’re going to discuss the enhancements that have been made to the performance of Hybrid Columnar Compression. We'll look at how Hybrid Columnar Compression was prior to 23ai, learn about the changes that have been made, talk about how to use this compression in 23ai, and look at some performance factors. After that, we’ll move on to Fast Ingest, the improvements in 23ai, and how it is managed. 01:15 Lois: Yeah, this is a packed episode and to take us through all this, we have Bill Millar back on the podcast. Bill is a Senior Principal Database & MySQL Instructor with Oracle University. Hi Bill! Thanks for joining us. So, let’s start with how Hybrid Columnar Compression was prior to 23ai. What can you tell us about it? Bill: We support all kinds of platforms from the Database Enterprise Edition on up to the high engineered systems for that and even the Exadata Cloud at the Customer. We have four different levels of compression. One is considered the warehouse compression where we do a COLUMN STORE COMPRESS FOR QUERY LOW and COLUMN STORE COMPRESS FOR QUERY HIGH. The COLUMN STORE COMPRESS FOR QUERY HIGH is the default, unless another compression level is specified. With the archive compression, we have the COLUMN STORE COMPRESSED FOR ARCHIVE LOW and also COLUMN STORE COMPRESS FOR ARCHIVE HIGH. With the Hybrid Columnar Compression warehouse and archive, the array inserts are compressed immediately. But, however, some conditions have to be met. It has to be a locally-- to use these, it has to be a locally managed tablespace, the automatic segment space management. And compatibility level, at least 12 too or higher when these values have been introduced. There are different compressors that are used for the compression hidden from the customer. It just depends on what is selected as to what is going to be the compression that's going to be used for--  notice that with the COLUMN STORE FOR QUERY HIGH and for ARCHIVE LOW, the zlib compression method is used, whereas if you select the ARCHIVE HIGH, the Bzip2. And in 19C, we added the Zstandard. And it's available for the MEMORY COMPRESS FOR CAPACITY HIGH.  03:30 Nikita: So, what’s happened in 23ai? Bill: When in 23c, to take advantage of the changes in compression, the compatibility level has to be set at least to 23.0.0 or higher. When a table is created or altered with the hybrid column compression, the Zstandard will automatically be selected. So it doesn't matter which one of the four you select, that will be the one that is selected. It is internally set transparent to the user. There is no new SQL format that has to be used in order for the Zstandard compression to be applied. And the Database Compatibility Mode has to be at least at 23.0.0 or higher. Only then can the format of the Hybrid Column Compression storage use that Zstandard compression. If we already have compressed data blocks in existing tables, they're going to remain in their original format.  04:31 Lois: And are the objects regenerated? Bill: If the objects are-- they might be regenerated if they were deleted in another operation. If you want to completely take advantage of the new compression, all you have to do is alter table move. And that's going to go ahead and trigger the recompression of that, whereas any newly created tables that are created will use the Zstandard by default. 05:00 Nikita: What are the performance factors we need to think about, Bill? Bill: There are some performance factors that we do need to consider, the ratio, the amount of space reduction in storage that we're going to achieve, the time spent compressing the data, the CPU cost to compress that data, and also, is there any decompression rate, time spent decompressing the data when we're doing queries on it? 05:24 Lois: And not all tables are equal, are they? Bill: Not all tables are equal. Some might get better performance by different compression level than others for that. So how we can basically have to test our results, there is a compression advisor that's available, that you can use to give you a recommendation on what compression to use. But only through testing can we really see the availability, the benefits of using that compression for an application. So best compression, just as in previous versions, the higher the compression levels, the more CPU it's going to use. The higher the compression level, the more space savings that we're going to achieve for that as we are doing those direct path inserts. So there's always that cost. 06:20 Did you know that the Oracle University Learning Community regularly holds live events hosted by Oracle expert instructors. Find out how to prepare for your certification exams. Learn about the latest technology advances and features. Ask questions in real time and learn from an Oracle subject matter expert. From Ask Me Anything about certification to Ask the Instructor coaching sessions, you’ll be able to achieve your learning goals for 2024 in no time. Join a live event today and witness firsthand the transformative power of the Oracle University Learning Community. Visit mylearn.oracle.com to get started.  07:01 Nikita: Welcome back! Let’s now move on to the enhancements that have been made to fast ingest. We’ll begin with an overview of fast ingest, how to use it, and the improvements and benefits. And then we’ll look at some features for managing fast ingest. Bill, why don’t you start by defining fast ingest for us?  Bill: Traditionally the fast ingest, also referred to as deferred inserts, is faster than processing a single row at a time. It can support high-volume transactions like from the Internet of Things applications, where you have hundreds of thousands of items coming in trying to write to the database. They are faster, because the inserts don't use the traditional buffer cache. They use a pool that will size out of the large pool. And then they're later written to disk using the SMCO, the space management coordinator. Instead of using the buffer cache, they're going to write into an area of the large pool. The space management coordinator, it has these helper threads, however many-- that's just a number for that-- that will buffer. And as buffer is filled based off size of that algorithm, it will then write those deferred inserts into the database itself. 08:24 Lois: So, do deferred inserts support constraints? Bill: Deferred writes do support constraints in index just as for regular inserts. However, performance benchmarks that have been done recommend that you disable constraints, if you're going to use the fast ingest. 08:41 Lois: Can you tell us a bit about the streaming and ingest mechanism?  Bill: We declare a table with the memoptimize for write. We can do that in the create table statement, or we can alter the table for that. The data is written to the large pool, unlike traditionally writing items to the buffer cache. It's going to write to the ingest buffer, the large pool. And it's going to be drained. It's going to be written from that area by using those background processes to write to the actual database itself. So the very high throughput, since drainers issues deferred writes in large batches. So we're not having to wait especially for the buffer cache. OK, I need space. OK, I need to write. I need to free up blocks. Very ideal for these streaming inserts, sensor readings, alarms, door locks. Those type of things. 09:33 Nikita: How does performance improve with this? Bill: With the benchmarks we have done, we have found that the performance can be up to 75% faster by going ahead and doing the fast ingest versus traditional inserts. The 23 million inserts per second on a single X6-2 server with the benchmarks that we have. 09:58 Nikita: Are there any considerations to keep in mind? Bill: With the fast ingest, some things to consider for that. The written data, you might need to validate to make sure it's there. So you might have input files that are writing to that that are loading it. You might want to hang on to those, before that data destroyed. Have some kind of mechanism to validate, yes, it was written. There is a possible loss of data. Why? Because unlike the buffer cache that has the recovery mechanism with the redo and the undo, there is none with that large pool. So that's why if the system crashes, and the buffers haven't been flushed yet, then it's possible loss of data. There's no queries from the large pool meaning that if I want to query the information that the fast ingest is loading into the table, it doesn't go and see what's sitting in the buffer in the large pool like it does with the buffer cache. Index and constraints are checked but only at flush time. And the memoptimize pool size is a fixed amount of space that we're going to allocate-- of memory that we're going to allocate to use for the memoptimize write. We can enable a table for the fast ingest, enable with the memoptimize for write. We can create a table and do it. We can also alter a table. We already have a table existing. All we have to do is alter it. And we want to use that, the fast ingest, for these tables. 11:21 Lois: Do we have options for the writing operation, Bill? Bill: You do have options for the writing operation. We have the parameters, the memoptimize write where we can turn that on. We can also use it in a hint. It is set at the root level, it. Is not modifiable at the PDB level. It's set at the root level, It is a static parameter. We can also do things in our session. We want to verify, OK, is the memoptimize write on? We can verify a table is enabled. So with the fast ingest, the data inserts, you can also use a hint. You can also set this at a session level.  If you decide there's something that you don't want to use the memoptimize write for, then you can disable it for a table.  12:11 Nikita: Bill, what are some of the benefits of the enhancements made in 23ai? Bill: With some of the enhancements-- so now, some table attributes are now supported-- we can now have common default values for a column. We can use transparent data encryption. We can also use the fast inserts, any inline LOBs, along with virtual columns. We've also added partitioning support. We can do subpartitioning and we can also do interval partitioning, along with auto list. So we've added some items that previously prevented us from doing the fast inserts. It does provide additional flexibility, especially with the enhancements and the restrictions that we have removed. It allows to use that fast insert, especially in a data warehouse-type environment. It can also use-- in the Cloud, it can use encrypted tablespaces, because remember, in the Cloud, we always encrypt, by default, users' data. So now, it also gives us the ability to use it in that Cloud environment because of that change. We have faster background flushing for the loads.  13:36 Lois: And how is it faster now?  Bill: Because we bypassed the traditional buffer cache. Faster ingest for those direct ingest. So again, bypassing the traditional inserts and using the buffer cache gives the ability to bulk load into large pool, then flush to the database so that way, we have access to that data for possible faster analytics of those internet of things, especially when it comes to the temperature of the temperature sensors. We need to know when a temperature of something is out of bounds very quickly. Or maybe it's sensors for security. We need to know when there's a problem with the security. 14:20 Nikita: How difficult is it to manage this? Bill: Management is fairly simple. We have the MEMOPTIMIZE_WRITE_AREA_SIZE parameter that we're going to say-- it is dynamic. It does not require a restart. However, all instances in a RAC environment must have the same value. So we have the write area. What are we going to set? And then the MEMOPTIMIZE_WRITE, by default, it uses a hint. Or we can go ahead and we can just set that to all. It is allocated from the large pool. You manually set it. And we can see how much is actually being allocated to the pool. We can go out and look at our alert log for that information.  There's also a view. The MEMOPTIMIZE_WRITE_AREA has some columns. What is the total memory allocated for the large pool? How much is currently used by the fast ingest? How much free space? As you're using it, you might want to go out and do a little checking, or do you have enough space? Are you not allocating enough space? Or have you allocated too much?  It'll also show the total number of writes, and also, the number-- the writers is currently the users that are using it.  And the container ID, what is the container within that container database? What's the pluggable or pluggables that's using the fast ingest? There is a subprogram, the DBMS_MEMOPTIMIZE that we have access to that we possibly can use. So there are some procedures. Here, we can return the rows of the low and high water mark of the sequence numbers. And the key here is across all the sessions. We can see the high water mark, sequence number of the rows written to the large pool for the current session. And we can also flush all the ingest data from the large pool to disk for the current session. 16:26 Lois: What if I want to flush them all for all sessions?  Bill: Well, that's where we have the WRITE_FLUSH procedure. So it's going to flush the fast ingest data of the Memoptimize Rowstore from the large pool for all the sessions. As a DBA, that's one that you most likely will want to be using, especially if it's going to be before I do a shutdown or something along that line. 16:49 Nikita: Ok! On that note, I think we can end this episode. Thank you so much for taking us through all that, Bill. Lois: Yes, thanks Bill. If you want to learn more about what we discussed today, visit mylearn.oracle.com and search for Oracle Database 23ai New Features for Administrators. Join us next week for a discussion on some more Oracle Database 23ai new features. Until then, this is Lois Houston… Nikita: And Nikita Abraham signing off! 17:21 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
17m
03/09/2024

Time & Data Handling & Data Storage

In this episode, hosts Lois Houston and Nikita Abraham discuss improvements in time and data handling and data storage in Oracle Database 23ai. They are joined by Senior Principal Instructor Serge Moiseev, who explains the benefit of allowing databases to have their own time zones, separate from the host operating system. Serge also highlights two data storage improvements: Automatic SecureFiles Shrink, which optimizes disk space usage, and Automatic Storage Compression, which enhances database performance and efficiency. These features aim to reduce the reliance on DBAs and improve overall database management.   Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-new-features-for-administrators/137192/207062   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X: https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode.   ---------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:26 Nikita: Welcome to the Oracle University Podcast! I’m Nikita Abraham, Principal Technical Editor with Oracle University, and joining me is Lois Houston, Director of Innovation Programs. Lois: Hi there! Over the past two weeks, we've delved into database sharding, exploring what it is, Oracle Database Sharding, its benefits, and architecture. We’ve also examined each new feature in Oracle Database 23ai related to sharding. If that sounds intriguing to you, make sure to check out those episodes. And just to remind you, even though most of you already know, 23ai was previously known as 23c. 01:04  Nikita: That’s right, Lois. In today’s episode, we’re going to talk about the 23ai improvements in time and data handling and data storage with one of our Senior Principal Instructors at Oracle University, Serge Moiseev. Hi Serge! Thanks for joining us today. Let’s start with time and data handling. I know there are two new changes here in 23ai: the enhanced time zone data upgrade and the improved system data and system timestamp data handling. What are some challenges associated with time zone data in databases? 01:37 Serge: Time zone definitions change from time to time due to legislative reasons. There are certain considerations. Changes include daylight savings time when we switch, include the activity that affects the Oracle Database time zone files.  Time zone files are modified and used by the administrators. Customers select the time zone file to use whenever it's appropriate. And customers can manage the upgrade whenever it happens.  The upgrades affect columns of type TIMESTAMP with TIME ZONE. Now, the upgrades can be online or offline.  02:24 Lois: And how have we optimized this process now?  Serge: Oracle Database 23c improves the upgrade by reducing the resources used, by selectively using the updates and minimizing the application impact. And only the data that has dependencies on the time zone would be impacted by the upgrade.  The optimization of the time zone file upgrade does not really change the upgrade process, so upgrade can be done offline. Database would be unavailable for a prolonged period of time, which is not optimal for today's database availability requirements.  Online upgrade, in this case, we want to minimize the application impact while the data is being upgraded. With the 23c database enhancement for time zone file change handling, the modified data is minimized, which means that the database updates only impacted rows. And it reduces the impact to the applications and other database operations.  03:40 Nikita: Serge, how does updating only the impacted rows improve the efficiency of the upgrade process? Serge: The benefits of enhanced timezone update include customers who manage large fleet of databases. They will benefit tremendously with a lower downtime. The DBAs will benefit due to the faster updates and less resource consumption needed to apply those updates. And that improves the efficiency of the update process.  Tables with no affected data are simply skipped and not touched. All results in the significant resource savings on the upgrade of the time zone files. It applies to all customers that utilize timestamp with time zone columns for their data storage.  04:32 Lois: Excellent! Now, what can you tell us about the improved system data and system timestamp data handling?  Serge: Date and time in Oracle databases depends on the system time as well as the database settings. System time now can be set as the local time zone for an individual database.  04:53 Nikita: How was it before this update? Serge: Before 23c, the time has always matched the time zone of the database host operating system. Now, imagine that we use either multitenant environments or cloud-based environments when the host OS system time zone is not really the same as the application that runs in a different geographic locality or affects data from other locations.  And system time obviously applies not only to the data stored and updated in the database rows but also to the scheduler, the flashback, to a place to materialized view refresh, Recovery Manager, and other time-sensitive features in the database itself.  Now, with the database time versus operating system time, there is a need to be more selective. It is desired that the applications use the same database time in the same time zone as the applications are actually being used in.  And multitenant and cloud databases will certainly experience a mismatch between the host operating system time zone, which is not local for the applications that run in some other geographical locations or not recognizing some, for example, daylight savings time.  So migration challenge is obviously present. If you want to migrate from a specific on-premises database to either multitenant or cloud, you would experience the host operating system time zone by default.  06:38 Lois: And that’s obviously not convenient for the applications, right? Serge: Well, the database-specific time in Oracle Database 23c, any cloud database can set local time zone to whatever the customer's requirements are explicitly. And any pluggable database can also set its own local time zone to customer's requirements, not inheriting the time zone from the container database it is currently running in.  This simplifies migration to multitenant or cloud for applications that are time-sensitive. And it offers more intuitive, easier database monitoring, and development.  07:23 Working towards an Oracle Certification this year? Take advantage of the Certification Prep live events in the Oracle University Learning Community. Get tips from OU experts and hear from others who have already taken their certifications. Once you’re certified, you’ll gain access to an exclusive forum for Oracle-certified users. What are you waiting for? Visit mylearn.oracle.com to get started.   07:51 Nikita: Welcome back! Let’s move on to the data storage improvements. We have two updates here as well, automatic secure file shrink and automatic storage compression. Let’s start with the first one. But before we get into it, Serge, can you explain what SecureFiles are?  Serge: SecureFiles are the default storage mechanism for large objects in Oracle Database. They are strongly recommended by Oracle to store and manage large object data.  The LOBs are stored in segments. Those segments may incur large amounts of free space over time. Because of the updates to the LOB data, the fragmentation of the space used is growing depending, of course, on the frequency and the scope of the updates.  The storage efficiency could be improved by shrinking segments with the free space removed. And manual secure files shrinking has become available since Oracle Database 21c, requiring administrators to perform these tasks manually.  Traditional SecureFiles required the time-consuming DBA activities. DBAs would need to manually identify eligible LOB segments either using Segment Advisor or PL/SQL or built-in database views.  Once identified, the administrators would manually execute shrink operations on very large LOBs which takes too much time and may result in excessive disk space consumption. For example, code to operate this shrinking would look like ALTER TABLE some table SHRINK SPACE CASCADE.  That would shrink all LOB segments in a particular table. If you want to scope the shrinking to a single column, the code would be required to ALTER TABLE some table MODIFY LOB, followed by the column name SHRINK SPACE.  This affects only a single column in a table with LOBs.  10:01 Lois: So, how has automatic secure shrinking made things better? Serge: Automatic SecureFile shrink removes the emphasis from the DBAs to manually perform these tasks. And it results in the more optimal use of space over time.  It is integrated into the automated database maintenance tasks. The automation once enabled runs every 30 minutes, collects eligible LOB segments, and shrinks them offline. The execution time and freed space would vary depending on the fragmentation and the size of the LOBs. Each shrink execution may reclaim up to 5 gigabytes of unused disk space from each LOB segment that is idle.  On the high level, automatic SecureFile shrink improves the Oracle Database 23c storage usage efficiency. It is part of the ongoing Oracle Database improvement effort and transparently reclaims the free space with negligible to no impact on performance of the database operations.  Again, this is done in the background without affecting the running processes. It makes Oracle database 23c less dependent on the DBA activities while reducing the disk space required to store SecureFiles, reducing the usage of LOB segments.  Automatic securefile shrink runs incrementally in small steps over time. Some of the features are tunable. And it is supported for all types of large objects, storage, compressed, encrypted, and duplicated the object segments.  11:50 Nikita: Right, and note that this feature is turned on out-of-the-box in the Autonomous Database 23ai in Oracle Cloud. Now, let’s talk about Automatic Storage Compression, Serge.  Serge: With Automatic Storage Compression and Automatic Clustering, the storage compression gives you the background compression functionality. Directly loaded data is first uncompressed to speed up the actual load process. Rows are then moved into hybrid columnar compression format in the background asynchronously.  The automatic clustering applies advanced heuristic algorithms to cluster the stored data depending on the workload and data access patterns and the data access is optimized to more efficiently make use of database table indices, zone maps, and join zone maps.  Automatic Storage Compression advantages include the improvements to Oracle Database 23c storage efficiency as well. It is part of the continuous improvement, part of the ongoing Oracle Database improvement effort. And it brings performance gains, speeds up uncompressed data loads while compressing in the background.  The latencies to load and compress data are because of that also reduced. With the hybrid columnar compression in particular, this works in combination.  And it results in less DBA activities, makes the Database Management less dependent on the DBA time and availability and effort.  Automatic Storage Compression performs operations asynchronously on the data that has already been loaded. To control Automatic Storage Compression on-premises, it must be enabled explicitly. And you have to have heatmap enabled on your Oracle Database objects.  Table must use hybrid columnar compression and be placed on the tablespace with the SEGMENT SPACE MANAGEMENT AUTO and allowing autoallocation. And this feature, again, is transparent for the Autonomous Database 23c in the Oracle Cloud. 14:21 Lois: Thanks for that quick rundown of the new features, Serge. We really appreciate you for taking us through them. To learn more about what we discussed today, visit mylearn.oracle.com and search for the Oracle Database 23ai New Features for Administrators course. Join us next week for a discussion on some more Oracle Database 23ai new features. Until then, this is Lois Houston… Nikita: And Nikita Abraham signing off! 14:50 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
15m
27/08/2024

Database Sharding: Part 2

Join hosts Lois Houston and Nikita Abraham in Part 2 of the discussion on database sharding with Ron Soltani, a Senior Principal Database & Security Instructor. They talk about sharding native replication, directory-based sharding, and coordinated backup and restore for sharded databases, explaining how these features work and their benefits. Additionally, they explore the automatic bulk data move on sharding keys and the ability to split and move partition sets, highlighting the flexibility and efficiency they bring to data management.   Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-new-features-for-administrators/137192/207062   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X: https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative  podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:26 Lois: Hello and welcome to the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor. Nikita: Hi everyone! In our last episode, we dove into database sharding and Oracle Database Sharding in particular. If you haven’t listened to it yet, I’d suggest you go back and do so before you listen to this episode because it will give you a lot of context.  00:53 Lois: Right, Niki. Today, we will discuss all the 23ai new features related to database sharding. We will cover sharding native replication, directory-based sharding, coordinated backup and restore for sharded databases, and a few more.  Nikita: And we’re so happy to have Ron Soltani back on the podcast. If you don’t already know him, Ron is a Senior Principal Database & Security Instructor with Oracle University. Hi Ron! Let’s talk about sharding native replication, which is RAFT-based, meaning that it is reliable and fault tolerant-based, usually providing subzero or subsecond zero data loss replication support. Tell us more about it, please.  01:33 Ron: This is completely transparent replication built in within Oracle sharding that duplicates data across the different shards. So data are generally put into chunks. And then the chunks are replicated either between three or five different shards, depending on how much of the fault tolerance is required.  This is completely provided by the Oracle sharding database, and does not require use of any other component like GoldenGate and Data Guard. So if you remember when we talked about the architecture, we said that each shard, each database can have a Data Guard component, whether through GoldenGate or whether through Data Guard to have a standby.  And that way support high availability with the sharding native replication, you don't rely on the secondary database. You actually-- the shards will back each other up by holding replicas and being able to globally manage the replica, make sure everything is preserved, and manage all of the fault operations.  Now this is a logical replication, generally consensus-based, kind of like different components all aware of each other. They know which component is good, depending on the load, depending on the failure. The sharded databases behind the scene decide who is actually serving the data to the client. That can provide subsecond failovers with zero data loss.  03:15 Lois: And what are the benefits of this? Ron: Major benefits for having sharding native replication is that it is completely transparent to the application or any of the structures. You just identify that you want to go ahead and use this replication and identify the replication factor. The rest is managed by the Oracle sharded database behind the scene.  It supports fast failover with zero data loss, usually subsecond failovers. And depending on the number of replicas, it can even tolerate multiple failures like two server failures.  And when the loads are submitted, the loads are also load-balanced across all of these shards based on where the data is located, based on the replicas. So this way, it can also provide you with a little bit of a better utilization of the hardware and load administration.  So generally, it's designed to help you keep your regular SQL-based databases without having to resolve to FauxSQL or NoSQL environment getting into other databases. 04:33 Nikita: So next is directory-based sharding. Can you tell us what directory-based sharding is, Ron?  Ron: Directory-based sharding basically allows the user to define the values that are used and combined for different partition, so better control, location of the data, in what partition, what shard. So this allows you to set up a good configuration.  Now, many times we may have a key that may not be large enough for hash partitioning to distribute the data enough. Sometimes we may not even know what keys are going to come in the future. And these need to be built in the future. So having to build these, you really don't want to have to go reorganize the whole data based on new hash functions, and so when data cannot be managed and distributed using hash partitioning or when we need full control over combination of where data exists.  05:36 Lois: Can you give us a practical example of how this works? Ron: So let's say our company is very small in three different countries. So I can combine those three countries into one single shard. And then have three other big countries, each one sitting in their own individual shards. So all of this done through this directory-based sharding. However, what is good about this is the directory is created, which is a table, created behind the scene, stored in the catalog, available to the client that is cached with them, used for connection mapping, used for data access. So it can give you a lot of very high-level benefits.  06:24 Nikita: Speaking of benefits, what are the key advantages of using directory-based sharding? Ron: First benefit allow you to group the data together based on the whatever values you want, depending on what location you want to put them as far as across the shards are concerned. So all of that is much better and easier controlled by us or by the designers. Now, this is when there is not enough values available. So when you're going to use hash-based partition, that would result into an uneven distribution of the data.  Therefore, we may be able to use this directory for better distribution of the data since we understand the data structure better than just the hash function. And having a specification where you can go ahead and create future component, future partitions, depending on how large they're going to be. Maybe you're creating them with an existing shard, later put them in another shard. So capability of having all of those controls become essential for management of this specific type of data.  If a shard value, the key value is required, for example, as we said, client getting too big or can use the key value, split it or get multiple key value. Combine them. Move data from one location to another. So all of these components maintain automatically behind the scene by us providing the changes. And then the directory sharding and then the sharded database manages all of the data structure, movement, everything behind the scene using some of the future functionalities. And finally, large chunk of data, all of that can then be moved from one location to another. This is part of the automatic chunk data move and whatnot, but utilized within the directory-based sharding to allow us the control of this data and how we're going to move and manage the data based on the load as the load or the size of the data changes.  08:50 Lois: Ron, what is the purpose of the coordinated backup and restore system in Oracle Database Sharding? Ron: So, basically when we talk about a coordinated backup and restore, remember in a sharded database, I have different databases. Each database is a shard. When you take a backup, each database creates its own backup.  So to have consistent data across all of the shards for the whole schema, it is extremely important for these databases to be coordinated when the backup is taken, when the restore is being done. So you have consistency of the data maintained across all of the shards.  09:28 Nikita: So, how does this coordination actually happen? Ron: You don't submit this through our main. You submit this through the Global Management tool that is used for the sharded database. And it's the Global Management tool that is actually submit your request to each database, but maintains the consistency of when the actual backup is taken, what SCN.  So that SCN coordination across all of the shards is then maintained for the backup so you can create a consistent backup or restore to a consistent point in time across the sharded database. So now this system was enhanced in 23C to support multiple destinations.  So you can now send your backup to an object store. You can send it to ZDLRA. You can send it to Amazon S3. So multiple locations can now be defined where you can send these backups to. You can also use multiple recovery catalogs.  So let's say I have data that is located on different countries and we have requirement that data for each country must stay in that country. So I need to also use a separate catalog to maintain that partition.  So now I can use multiple catalog and define which catalog is maintaining which partition to satisfy those type of requirements or any data administration requirement when it comes to backup recovery. In addition, you can also now specify different type of encryption to be used, whether you want to have different type of encryption algorithm for each of the databases that you're backing up that is maintained. It can be identified, and then set up for each one of those components.  So these advancements now allow you to manage this coordinated backup and restore with all of the various specific configuration that may be required based on the data organization. So the encryption, now can also be done across that, as I mentioned, for different algorithms. And you can define different components.  Finally, there is much better error handling and response available through this global system. Since things have been synchronized, you get much better information into diagnosing any issues.  12:15 Want to get the inside scoop on Oracle University? Head over to the Oracle University Learning Community. Attend exclusive events. Read up on  the latest news. Get first-hand access to new products. Read the OU Learning Blog. Participate in challenges. And stay up-to-date with upcoming certification opportunities. Visit www.mylearn.oracle.com to get started.  12:41 Nikita: Welcome back! Continuing with the updates… next up is the automatic bulk data move on sharding keys. Ron, can you explain how this works and why it's significant? Ron: And by the way, this doesn't have to be a bulk data. This could be just an individual row or it could be bulk data, a huge piece of data that is going to be moved.  Now, in the past, when the shard key of an existing record was going to be updated, we basically had to remove that row from the table, so moving it to a temporary table or moving it to another location. Basically, you're deleting the row, and then change the value and reinsert the row so the row would then be inserted into the proper location.  That causes a lot of work and requires specific code-writing and whatnot to manage those specific type of situations. And of course, if there is a lot of data, now, you're moving those bulk data in twice.  13:45 Lois: Yeah… you’re moving it to one location and then moving it back in. That’s a lot of double work, not to mention that it all needs to be managed manually, right? So, how has this process been improved? Ron: So now, basically, you can just go ahead and update the value of the partition key, and then data will then automatically move to the new location. So this gives you complete flexibility of the shard key values.  This is also completely transparent, and again, completely managed behind the scenes. All you do is identify what is going to be changed. Then the database will maintain the actual data location and movement behind the scenes.  14:31 Lois: And what are some of the specific benefits of this feature? Ron: Basically, it allows you to now be flexible, be able to update the shard key without having to worry about, oh, which location does this value have to exist? Do I have to delete it, reinsert it? And all of those different operations.  And this is done automatically by Oracle database, but it does require for you to enable row movement at the table level. So for tables that are expected to have partition key updates kind of without knowing when that happens, can happen, any time it happens by the clients directly or something, then we may need to enable row movement at the table level and leave it enabled. It does have tiny bit of overhead of maintaining these row locations behind the scenes when enabled, as it maintains some metadata behind the scenes.  But for cases that, let's say I know when the shard key is going to be changed, and we can use, let's say, a written procedure or something for that when the particular shard key is going to be changed. Then when the shard key is updated, the data will then automatically move to the new location based on that shard key operation. So we don't need to move the data manually in and out or to different locations.  16:03 Nikita: In our final segment, I want to bring up the update on splitting and moving a partition set, or basically subpartitioning tables and then being able to move all of the data associated with that in a bulk data move to a new location. Ron, can you explain how this process works?  Ron: This gives us a lot of flexibility for data management based on future requirements, size of the data, key changes, or key management requirements.  So generally when we use a composite sharding, remember, this is a combination of user-defined partitioning plus the system partitioning put together. That kind of defines a little bit more control over how the shards are, where the data is distributed evenly across the shards.  So sometimes based on this type of configuration, we may actually need to split partition and that can cause the shard key values to be now assigned to a new shard space based on the partitioning reconfiguration. So data, this needs to be automatically managed. So when you go ahead and split partition or partitionsets, then the data based on your configuration, based on your identification can automatically move to the new location automatically between those shard spaces.  17:32 Lois: What are some of the key advantages of this for clients? Ron: This provides a huge benefit to clients because it allows them flexibility of better managing their configuration, expanding both configuration servers, the structures for better management of the data and the load. Data is completely online during all of this data move. Since this is being done behind the scenes by the database, it does not impact the availability of the data for anyone who is actually using the data.  And then, data is generally moved using transportable tablespaces in big bulk and big chunks. So it's almost like copying portions of the files. If you remember in Oracle database, we could take a backup of big files as image copy in pieces. This is kind of similar where chunks of data can then be moved and then transported if possible depending on the organization of the data itself for those particular partitions.  18:48 Lois: So, what does it look like in practice? Ron: Well, clients now can go ahead and rearrange their data structure based on the adjustments of the partitioning that already exists within the sharded database. The bulk data move then automatically triggers once the customer execute the statement to go ahead and restructure the partitioning. And then all of the client, they're still accessing data. All of the data operation are completely maintained behind the scene.  19:28 Nikita: Thank you for joining us today, Ron. If you want to learn more about what we discussed today, visit mylearn.oracle.com and search for the Oracle Database 23ai New Features for Administrators course. Join us next week for a discussion on some more Oracle Database 23ai new features. Until then, this is Nikita Abraham… Lois: And Lois Houston signing off! 19:51 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
20m
20/08/2024

Database Sharding: Part 1

In this two-part episode, hosts Lois Houston and Nikita Abraham are joined by Ron Soltani, a Senior Principal Database & Security Instructor, to discuss the ins and outs of database sharding. In Part 1, they delve into the fundamentals of database sharding, including what it is and how it works, specifically looking at Oracle Database Sharding and its benefits. They also explore the architecture of a sharded database, examining components such as shards, shard catalogs, and shard directors.   Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-new-features-for-administrators/137192/207062   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X: https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:26 Nikita: Hello and welcome to the Oracle University Podcast. I’m Nikita Abraham, Principal Technical Editor with Oracle University, and with me is Lois Houston, Director of Innovation Programs. Lois: Hi there! The last two weeks of the podcast have been dedicated to all things database security. We discussed why it’s so important and looked at all the new features related to database security that have been released in Oracle Database 23ai, previously known as 23c.  00:55 Nikita: Today’s episode is also going to be the first of two parts, and we’re going to explore database sharding with Ron Soltani. Ron is a Senior Principal Database & Security Instructor with Oracle University. We’ll ask Ron about what database sharding is and then talk specifically about Oracle Database Sharding. We’ll look at the benefits of it and also discuss the architecture.  Lois: All this will help us to prepare for next week’s episode when we dive into each 23ai new feature related to Oracle Database Sharding. So, let’s get to it. Hi Ron! What’s database sharding?  01:32 Ron: This is basically an architecture to allow you to divide data for better computing and scaling across multiple environments instead of having a single system performing the work. So this allows you to do hyperscale computing and other different technologies that are included that will allow you to distribute your queries and all other requests across these multiple components to be able to get a very fast response.  Now many times with this distributed segment across each kind of database that is called a shard allow you to have some geographical location component while you are not really sharing any of the servers or the components. So it allows you separation and data management for each of the shards separately. However, when it comes to the application, the sharded database is totally invisible. So as far as the application is concerned, they connect to a global service, submit their statements. Everything else is managed then by the sharded database underneath.  With sharded tables, basically it gets distributed across each shard. Normally, this is done through horizontal partitioning. And then the data depending on the partitioning scheme will be distributed across like server A, server B, server C, which are independent servers that are running independent databases.  03:18 Nikita: And what about Oracle Database Sharding specifically? Ron: The Oracle Database Sharding allows you to automate how the data is distributed, replicated, and maintain the kind of a directory that defines the complete sharding scheme, while everything is distributed across many servers with no sharing whether the hardware or software. It allows you to have a very good scaling to be able to scale based on this partitioning across all of these independent servers.  And based on the subset and the discrete data configuration, you can go ahead and distribute this data across these components where each shard is an independent data location or data component, a subset of data that can be used, whether individually on its own or globally across all of the shards together. And as we said to the application, the Oracle Database Sharding also looks as a single component.  04:35 Lois: Ron, what are some of the benefits of Oracle Database Sharding? Ron: With Oracle Database, you basically have linear scaling capability across as many shards as you like. And all of the different database configurations are supported with this. So you can have rack databases across the shards, Oracle Data Guard, GoldenGate. So all of the different components are still used to give you all of the high availability and every other kind of functionality that we generally used to having a single database with.  It provides you with fault toleration. So each component could be down. It could have its own replicated data. It doesn't affect other location and availability of the data in those other locations.  And finally, depending on data sovereignty and configuration, you could actually distribute data geographically across the different locations based on requirements and also data access to provide a higher speed for local data management.  05:46 Lois: I’d like to understand more about the architecture of Oracle Database Sharding. Ron, can you first give us a broad overview of how Oracle Database Sharding is structured? Ron: When it comes to dealing with Oracle Database architecture, the components include, first, your shards. The shards-- each one is an independent Oracle Database depending on the partitioning you decide on a partition key and then how the actual data is divided across those shards.  06:18 Nikita: So, these shards are like separate pieces of the database puzzle…Ok. What’s next in the architecture? Ron: Then you have shard catalog. Shard catalog is a catalog of your sharding configuration, is aware of all of the components in the shard, and any kind of replicated object that master object exists in the shard catalog to be maintained from there.  And it also manages the global queries acting as a proxy. So queries can be distributed across multiple shards. The data from the shards returned back to the catalog to group together and then sent back to the client.  Now, this shard catalog is basically another version of an Oracle Database that is created independently of the shards that include the actual data, and its job is to maintain this catalog functionality.  07:19 Nikita: Got it. And what about the shard director? Ron: The shard director is like another form of a global service manager.  So it understands the sharding by being able to access the catalog, knows where everything exists. The client connection pool will hit the shard director. In general, communication and then whether it's being distributed to the shard catalog to be able to proxy it, or, if the key is available, then the director can send the query directly to the shard based on the key where the data exists. So the shard can then respond to the client directly. So all of the connection pool and the components for global administration, generally managed by the shard director.  08:11 Nikita: Can we dive into each of these components in a little more detail? Let’s go backwards and start with the shard director. Ron: The shard director, as we said, this is like a global service manager. It acts as a regional listener where all of the connection requests will be coming to the shard director and then distributed from that depending on the type of connection that is being used.  Now the director understands the topology--maintains the complete understanding of the mapping of the data against the shards. And based on the shard key, if the request are specified on the specific key, it can then route the connection request directly to the shard that is appropriate where the data resides for the direct response.  09:03 Lois: And what can you tell us about the shard catalog? Ron: The shard catalog, this is another Oracle Database that is created for special purpose of holding the topology of the sharded database. And have all of the centralized information metadata about your sharded database. It also act as a proxy.  So, if a client request comes in without providing a shard key, then the request would go to the catalog. It can be distributed to all of the shards. So the shards that you actually have the data can respond, but the data can then be combined and sent back to the client. So, it also creates the master copy of all the duplicate tables that are created in the shard database.  09:56 Lois: Ok. I’ve got it. Now, let’s talk more about the shards themselves. Ron: Each shard is basically a database. And data is horizontally partitioned to be placed on each of these shards. So, this physical database is called the shard. And depending on the topology of your sharding, there could be user sharding, for example, where multiple keys are in a single shard or could be a system sharding that based on the hash value data is distributed whether singly or multiple data components across each shard.  Now, this is completely transparent to the application. So, as far as application is concerned, this is a single database and the response everything that they do is generally just operating as a single database interaction.  However, when it comes to the administrators, each shard is a separate database. Each shard can be managed independently and can have its own standby and other components that is then set up for high availability and management of the data operations.  11:21 Do you have an idea for a new course or learning opportunity? We’d love to hear it! Visit the Oracle University Learning Community and share your thoughts with us on the Idea Incubator.  Your suggestion could find a place in future development projects! Visit mylearn.oracle.com to  get started.  11:41 Nikita: Welcome back! Let’s move on to global services and the various sharding methods. Ron, can you explain what global services are and how they function in a sharded database? Ron: Global services is generally the service that is used for the application to be able to connect to the sharded database. This is provided and supported through the shard director. So clients are routed using this global service.  12:11 Lois: What are the different sharding methods that are available?  Ron: When it comes to sharding methods that were available, originally we started with the system sharding, which is a hash partition, basically data is distributed evenly across the shards. Then we needed to allow for the user-defined sharding because sometimes it's not about just distributing the data evenly, it's also about controlling where the data goes to be able to control individual query execution based on the keys. And even for data sovereignty and position of the data itself.  And then a composite sharding, which provides you kind of a combination of the user-defined sharding and the system hash sharding that gives you a little bit of a combination of the two to better distribute your data across the shard.  And finally, sub-partitioning all types of sub-partitionings are supported to provide a better structure of the data depending on the application schema design.  13:16 Nikita: Ron, how do clients typically connect to a sharded database? Ron: When it comes to the client connections, all the client connections are generally routed to the director and then managed from there. So there are multiple ways that clients can connect. One could be a direct connect. With a direct connect, they're providing the shard key in the request. Therefore, the director knowing the topology can route the client directly to the shard that has the data.  The proxy routing is done by the catalog. This is when generally a shard key is not provided or data is requested from many shards. So data will then request is then sent to the catalog. The catalog database will then distribute the query to the shards, collects the results, and then combine sending it back to the client acting as a proxy sitting in the middle.  And the middle tier routing, this is when you can expose the middle tier to the structure of your sharding. So when the middle tier send the request, the request identifies which shard the data is going to. So take advantage of that from the middle tier. So the data is then routed properly. But that requires exposing the structures and everything in the middle tier.  14:40 Lois: Let’s dive a bit deeper into direct routing. What are the advantages of using this method? Ron: With the client request routing, as we talked about the direct routing, this allows the applications to get very quick data access when they know the key that is used for the distribution of the data. And that is used to access the data from the shard.  This provides you a direct connection to a shard from the shard director. And once the connection is established, then the queries can get data directly across the shard with the key that is supplied. So the RAC respond for that particular subset of data with the data request.  Now with the direct routing again, you get some advantages. The advantage is you have much better performance for capturing subset of the data because you don't have to wait for every shard to respond for a particular query.  If you want to distribute data geographically or based on the specific key, of course, all of that is perfectly supported. And kind of allows you to now distribute your query to actually the location where the data exists.  So for example, data that is in Canada can then be locally accessed in Canada through this direct access. And of course, when it comes to management of your client connection, load balancing of those connections. And of course, supporting all types of queries and application requests.  16:18 Nikita: And what about routing by proxy? Ron: The proxy routing is when queries do not supply the actual sharding key, where identifies which shard the data reside. Or the actual routing cannot be properly identified. Then the shard director will send the request to the catalog performing the work as the proxy.  So proxy will then send a request to all of the shards. If any shards can be eliminated, would be. But generally all of the shards that could have any portion of the data will then get the request. The requests are then sent back to the proxy. And then the proxy will then coordinate the data going back and forth between the client.  And the shard catalog basically hands this type of data access to the catalog to act as the proxy. And then the catalog is-- the shard director is no longer part of the connection management since everything is then handled by the shard catalog itself.  17:37 Lois: Can you explain middle tier routing, Ron? Ron: This generally allows you to use the middle tier to define which shard your data is being routed into. This is a type of routing that can be used where the data geographically have some sovereignty or the application is aware of the structure.  So the middle tier is exposed to the sharded database topology. So understand exactly what these components are based on the specific request on the shard key, then the middle tier can then route the application to the appropriate location for the connection.  And then the middle tier, and then the either one shard or the subset of shard will maintain those connections for the data access going back and forth since the topology is now being managed by the middle tier. Of course, all of the work that is done here still is known in the catalog, will be registered in the catalog. So catalog is fully aware of any operations that are going on, whether connection is done through middle tier or through direct routing.  18:54 Nikita: Ron, can you tell us how query execution and DDL operations work in a sharded database? Ron: When it comes to the query execution of the application, there are no changes, no requirement for identifying specifically how the data is distributed. All of that is maintained behind the scene based on your sharding topology.  For the DDL, most of your tables, most of the structures work exactly the same way as it did before. There are some general structures that are associated to the sharded database that we will originally create and set up with mapping. Once the mappings are configured, then the rest of the components are created just like a regular database.  19:43 Lois: Ok. What about the deployment process? Is it complicated to set up a sharded database? Ron: The deployment for the sharded database is fully automated using Terraform, Kubernetes, and scripts that are put together. Basically what you do is you provide some of your configuration information, structure of your topology through an input file, like a parameter file type of a thing.  And then you execute the scripts and then it will build everything else based on the structure that you have provided.  20:19 Nikita: What if someone wants to migrate from a non-sharded database to a sharded database? Is there support for that? Ron: If you are going to migrate from a regular database to a sharded database, there are two components that are fully shard aware.  First, you have the Shard Advisor. This can look at your current structure, the schema, how the data is distributed. And the workload and how the data is used to give you recommendation in what type of sharding would work best based on the workload.  And then Data Pump is fully aware of the sharding component. Normally, we use Data Pump and load into each of the databases individually on its own. So instead of one job having to read all the data and move data across many shards, data can be loaded individually across each shard using Data Pump for much faster operations.  21:18 Lois: Ron, thank you for joining us today. Now that we’ve had a good understanding of Oracle Database Sharding, we’ll talk about the new 23ai features related to this topic next week. Nikita: And if you want to learn more about what we discussed today, visit mylearn.oracle.com and search for the Oracle Database 23ai New Features for Administrators course. Until next week, this is Nikita Abraham… Lois: And Lois Houston signing off! 21:45 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
22m
13/08/2024

Database Security: Part 2

In this episode, hosts Lois Houston and Nikita Abraham continue their exploration of Oracle Database 23ai's database security capabilities. They are joined once again by Ron Soltani, a Senior Principal Database & Security Instructor, who delves into the intricacies of the new hybrid read-only mode for pluggable databases, the flexibility of read-only users and sessions, and the newly introduced developer role. They also discuss simplified schema-level privileges and the integration of Azure Active Directory with Oracle Database.   Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-new-features-for-administrators/137192/207062   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X: https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode.   ---------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative  podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:26 Lois: Hello and welcome to the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me today is Nikita Abraham, Principal Technical Editor. Nikita: Hi everyone! In our last episode, we discussed database security, why it is so important, and all its different components. Today, we’re going to be continuing that conversation by looking at all the new features related to database security that have been released in Oracle Database 23ai, previously known as 23c.  00:59 Lois: And we’re so happy to have Ron Soltani back as our guide. Ron is a Senior Principal Database & Security Instructor with Oracle University. Hi Ron! Thanks for joining us again! We have a list of the new features related to database security and we’d like to ask you about them one by one, starting with the new mode for pluggable databases. What’s that about?  01:21 Ron: With the hybrid read-only mode for pluggable database, the database could be in the read/write mode or read-only mode, depending on the user that is actually connected. So one of the things we have to realize is the regular read-only mode has one major issue. The major issue is everything, including data dictionary, including SysAux and all of the other elements are also locked up read-only.  So we cannot do any database maintenance. We cannot collect statistics to monitor anything. So you pretty much have to hard tune everything for the load you want and maintain everything. And this happens in many warehouse environments, in environments where the data itself is generally loaded. And then just heavily read. So it requires to be in a read-only mode to protect it.  So with a hybrid read-only mode, if you are a local user in the PDB, even a PDB administrator-- so I can create a local user in the PDB as a PDB administrator. And grant that PDB administrator even sysdba privilege. But once the PDB is open hybrid read-only mode, even for that user, the PDB is read-only. However, if a common user connect, who is, as you know, is a CDB user. Generally, CDB-level privileges granted and considered CDB administrators.  If they connect to the PDB, then the PDB is actually in read/write mode. So now, they can take snapshots. They can use all of the database tools to monitor how things are going. They can perform maintenance. So this allows us to be able to perform patching, maintenance, and other database-related operation. 03:17 Nikita: So you don’t have to flip back and forth between read-only, read/write, read-only, read/write…  Ron: Because you know if we have database read/write to go to read-only, generally, we would have to shut down the database, then go to read-only. Then from read-only, we can go to read/write. But then going back to read-only, we have to shut down again.  Lois: Which was the issue with the normal read-only on the pluggable database, right? I’m glad that’s been made easier. Ok… Moving on to the next new feature, which is read-only users and sessions. What can you tell us about this one, Ron? 03:51 Ron: As we previously discussed, you can put the PDB in the hybrid read-only mode. But then now the PDB is read-only for all local application users. However, let's say we have an environment where you have multiple application users.  One needs to be able to perform maintenance and perform updates where other sessions who are just reading the data to protect against all security element, and then better performance and operation management. We are going to set up read-only.  So setting up read-only at the pluggable database, that can be very high level depending on the application need. So with the read-only users and session, this will give you capability of setting read-only either for a particular user.  So when the user connects, all the user can do is read-only process. We do a lot of testing, for example. And we have users that may have read/write privilege in the test environment, then we want to go ahead and perform other operation.  So we would have to take privileges away, set the read-only, then go back and change again to read/write. So performing all of those different type of tests and even with the development has always been an issue.  So having granular capability of managing at a user or a session level can give us a major benefit of better granularly managing all application needs without sacrificing either security or having extra components that would have to be done by administrators.  05:33 Nikita: Yeah, this gives you a lot of flexibility and you don’t have to keep temporarily changing privileges or configuring specific types of sessions. It’s also an easy way to control user behavior, right? Ron: An application, as we said, have the schema owner that today we want to have a schema-only user for the schema owner. That is usually nobody connect us.  But then we have multiple schema users that one may be used for performing updates, one is used for administration, and one can be used for read-only. So this can give me a mechanism to manage that, or if a particular operation needs to run and for security purposes, that particular session needs to be set to read-only. So that gives us major control over it.  And in the cloud environment, this can be a very, very good component for better managing all of the security levels, where you can enable very fine-grained control while supporting all functionality of the application.  06:39 Lois: Ok. So, can you tell us about this new developer role in the database?  Ron: If we think about application administration, usually we create a schema owner. And we start by giving that the schema owner privileges-- grant them a resource role. By having resource role, they can create simple objects. But when you design an application, you need to implement it, test that, and then deploy it.  Today, there are many, many complex objects that can be used at the application level to manage the application. So today, we grant the resource role to the schema owner. Then we wait until they complain. They don't have privilege for certain object they want to create. Then we're going to have to grant them privileges as needed, and that used to be the way the security had worked.  But today since we have a schema only account where we can only enable the account when we want to do any type of schema work, and then it's locked up so the schema is protected, giving the schema owner the application role, the DB application role, now that has all the privileges in it, should not cause any security issue when managed properly, and will provide them with all of the privileges that they need to perform their work, including there are many complex schema structure like analytical views, hierarchies, dimensions, data-specific types that you can create.  And many of these type of privileges are not just assigned through a regular privilege assignment. Some of them are assigned through procedures.  08:21 Lois: And could you give us some examples of how this feature could be used? Ron: So there are many different ways of granting all of these granule privileges. So at the time that we go ahead and perform development of the schema and all of that depending on what's available, we don't know really what privileges do we need.  And as we said, there are many packages that we may be able to use to create complex objects that then gradually have to go ahead and get privileges on executing those packages and to be able to use them. And as we said at the time we actually performed the application, many of these objects, we may not even know we're going to use them until later on becomes evident or it may be a better structure to represent what we want.  So having to add and continuously deal with these type of changes can become extremely kind of cumbersome and tedious. It also delays all of the operations, especially now that the application schema owner can be secured. So we can grant this developer role to the schema owner, give the schema owner all privileges that is needed very quickly that they can now manage their schemas and manage all complex objects for that schema operation.  So the role is called db developer role. And just like any other role, you would connect as an administrator, grant db developer role to the schema owner. Now, we don't need to grant the resource role and all other things, because everything here is included in the db developer role.  10:01 The Oracle University Learning Community is an excellent place to collaborate and learn with Oracle experts and fellow learners. Grow your skills, inspire innovation, and celebrate your  successes. All your activities, from liking a post to answering questions and sharing with others, will help you earn a valuable reputation, badges, and ranks to be recognized in the community. Visit www.mylearn.oracle.com to get started. 10:28 Nikita: Welcome back! Ron, how have schema-level privileges been simplified in 23ai?  Ron: To be able to understand this, first we can review the privilege assignment in Oracle Database. First, you can be granted a privilege at an object level, so you can perform certain work on a particular object.  However, let's say I have a user account that I'm going to use an app user who's going to have to read from multiple objects within a particular schema. Now this granting at the object level is too low because I have to go at each object and assign the privileges needed on that particular object to the user.  Or we had our system privilege, for example, grant create any table to a user. The problem with that is now you can create any table within the schema that I want you to work with. But that privilege goes across all the schemas in the database, of course, not the database schemas itself-- those are protected, but across all user schemas.  11:34 Lois: Right. So, you're getting that privilege on other schemas that you may not really need that privilege for... Ron: So now the gap is kind of met with creating a schema-level privilege that allows you to grant the same any privilege but on all objects of a particular schema and not granted across all the schemas.  So this now allows us to much better be able to manage schemas, have schema user accounts with different level privileges on all the objects that they need to perform the type of work that they need to, without having to granularly assign each one of those privileges as we used to create many different roles with different privileges needed, then try to control the users by granting them those roles. Here, these are much better simplified by going through the schema-level privilege.  12:34 Nikita: Ron, I want to ask you about the new feature on creating audit policies at the column level. Ron: So if you remember, in the past, we talked about we can create audit policies with the old system where you would identify what to audit. But then you had to manage a whole bunch of parameters and security. And protecting audit even from the administrator were major issues.  In 12, Oracle identified or added the unified audit, which gives you protection on the audit schema. Even administrators cannot access it. You manage it through privileges that are assigned specifically to users who are going to manage the audit.  And it also allow you to audit Oracle operations, tools like Data Pump, like RMAN. So you can create a really secure audit environment monitoring everything in the database using unified audit and then maintain and manage those audits.  One of the important aspect of auditing is generating the minimal amount of audits. So this way, audits can be reviewed because if you generate too much audit, it is very hard to automate either using an automated system to review the audits or having users to review those audits.  Furthermore, if we wanted to then audit specific columns and different operation like SELECT, DML, we would have had to use the row-level security and build additional policies to be able to then individually monitor those columns, which not very simple to use and manage. And then the audits are put in different tables. Having to maintain all of those, relate them has always caused major issue overall.  So the benefit of having now this column-level audit added to the normal unified audit policies is that you can go ahead and build now your audits instead of at the table level, only for a particular column. This is going to reduce the false positive results that are generated because if I'm going to put update on a table, not updating any column can generate an audit.  But if I put update on the column salary, then only if the salary is updated, the audit is generated. So that can give me just the audits that are needed without the additional false positive audits that are generally generated.  15:08 Lois: Ron, can you talk to us about the management of authorization for Unified Audit administration, especially when using Database Vault? Ron: So first as we know for the Unified Audit, you have audit admin privilege and audit viewer privilege.  If you want to be able to create and administer and manage all of the audit information, including the audit purging and time periods and all of that, you have to have audit admin privilege. If you want to be able to read and generate the reports or things like that from the audits that have been created, you have to have audit viewer privilege.  Now we also have Oracle Database Vault. Database Vault kind of uses a row level security, but not on the end user data. It applies this row level security and administration on Oracle data dictionary. And allows you to control when particular object can be used, at what level can they be used?  And give you complete control over how the actual database and the objects are used and become available to other users in the database, including other administrators, even schema owners. So when the Database Vault is then applied and enabled, in the past, we could have managed the Unified Audit, which was kind of very funky to put one of the major security functions outside the main security Administration utility of the database.  So now, the Unified Audit has been incorporated into the Database Vault. So you can now use Database Vault to go ahead and set up the privileges and configuration for the authorizations required for managing Unified Audit. This also controls all the high-level users, including SYS, SYSTEM, and anyone who may have DBA roles or other high-level privileges.  So this allows us to now enable the Database Vault, and then manage the authorizations for the Unified Audit through Database Vault. Therefore, all authorization administration is unified under the same security tool, which is Database Vault.  17:28 Nikita: The final new feature to discuss is the integration of Microsoft Azure Active Directory with the Oracle database environment. What can you tell us about it, Ron? Ron: This has been requested by many of the clients who use other platforms and active directories and then need to access either the Oracle OCI, Oracle Cloud where the databases are running or having Oracle databases even in a local environment.  So wanted to be able to now allow this to happen. So if you remember, originally we had capability of mapping users from the database into Oracle Active Directory. So this way the user's role privileges can be centrally managed and the user does not inherit any privileges in the database. So if the user directly connect to database, has no privileges. Connect properly through Active Directory, everything enabled.  Then in Database 18, they created the commonly managed users, the CMUs. Where we could now map a third party Active Directory and then be able to use that into connecting to Oracle database for authentication and user administration. However, many of our clients use Microsoft Azure Active Directory. And they wanted to be able to integrate that particular Active Directory into Oracle environment, especially in the Oracle OCI Database as a service environment.  So to be able to do that, Oracle has multiple components that they have built to allow this to be able to now be configured and used. So the client can use these Active Directory for their user administration centrally.  19:20 Lois: With that, I think we’ve covered all the new features related to database security in 23ai. Thanks so much for taking us through all of them and giving us some context. Nikita: Yeah, it’s really been so helpful. To learn more about these new features and watch some demonstrations on them, visit mylearn.oracle.com and search for the Oracle Database 23ai New Features for Administrators course. Join us next week for a discussion on some more Oracle Database 23ai new features. Until then, this is Nikita Abraham… Lois: And Lois Houston signing off! 19:54 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
20m
06/08/2024

Database Security: Part 1

Join hosts Lois Houston and Nikita Abraham, along with Senior Principal Database & Security Instructor Ron Soltani, as they dive into the critical topic of database security. In the first of a two-part series on database security in Oracle Database 23ai, they discuss the importance of protecting data against external and internal threats, common security risks like phishing and SQL injection, and the principle of least privilege.   Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-new-features-for-administrators/137192/207062   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X: https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative  podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:26 Nikita: Hello and welcome to the Oracle University Podcast. I’m Nikita Abraham, Principal Technical Editor with Oracle University, and joining me is Lois Houston, Director of Innovation Programs. Lois: Hi there! In case you missed last week’s episode, we’ve begun a new season of the podcast, talking about all the new features in Oracle Database 23ai. We covered blockchain tables and new features, and today’s episode is going to be one of two that will be dedicated to database security. Nikita: Right, Lois. So, in Part 1, we want to set the scene, so to speak, by looking at an overview of database security so that when we discuss some of the new features, we’ll know exactly where they actually fit into the process. Joining us for these two episodes is Ron Soltani. Ron is a Senior Principal Database & Security Instructor with Oracle University.  01:16 Lois: Hi Ron! Thanks for being with us today. To start off, let's discuss the importance of database security. Why is database security so critical today? Ron: Security requirements, describes the need for keeping things private and make sure that we protect against threat, against data destruction. We also have, today, data that is global. Therefore, there is consolidation of the data. There is globalization.  There is data sourcing, locational, where the data is actually located, rules opposed by different governments, and guidelines that enforce a certain type of security administration on the data. And finally, there are many different companies or organizations that actually come up with either guidelines or rules that must be followed for security aspect that we must set up and build compliance.  02:24 Nikita: Ron, what are some of the common security risks that databases face?  Ron: Security risk can include external threats that could be unauthorized users trying to use phishing, get privileged user information, and get in as a privileged user to do whatever damage they want. Denial-of-service attack, one of the most common attacks out there where the attackers just create or attack the components, like a listener, for example, in a database, and cause a situation where the listener can no longer establish connection to the database. So now no client can connect to the database to get data, which is that denial-of-service attacks.  Having unauthorized access to the data-- so again, this is generally done through phishing or sometimes even SQL injection. SQL injection also allows you to insert SQL statement in the application where it's not expected, where it can then convert into an executable in the database and then have unwanted data returned for the user.  03:42 Nikita: Sorry, can you explain that? Ron: For example, when you go to Google, you want to run a search. They expect you to say, meaning of a particular word.  Now, what if I knew the structure of the data organization in Google? And instead of just putting in meaning of whatever word, I actually plug in a SQL statement that then passed along to the Google system to be executed. And then that SQL, if the components and everything exist and within the privileges of what is being executed, could expose some information to me. So that's the idea with being able to perform that type of operation.  04:24 Lois: Ok. So, those are external threats. But, could you also have internal threats? Ron: Internal threat could be abused by someone who is privileged, could be sabotage of the system and the data. It could be data complexity that creates an environment where data is not properly being secured and even accidental damage. It's a security issue.  And then finally, if there is a damage, we do need to be able to perform recovery. So we create backups and data access in those. Therefore, those recovery information must be properly secured. And finally, the omission, being able to block access or cause issues with the data. Then having external threats coming in through the internal abuse, so internal abuse could actually open door to allow external threats to get in.  Now, the final type of security risk could be coming in from partners who have privilege to be able to load or access and get data. For example, I may sell a particular product. But the product description is actually coming from the product distributor.  05:47 Nikita: Yeah, so they have access to push that product information into your system. So, what are the typical points of attack for a database? I’m familiar with phishing. Ron: People send you emails or do something to be able to get information from the pieces and things that come back. For example, this is one of the reason for many operation. We would return false error messages. Like in Oracle database, if you don't have privilege on the table and you try to select it, we tell you a table or a view does not exist. So this way, you don't know if it's a table, you don't know if it's a view. And as far as you know, it doesn't exist. So the name you have does not correspond to any particular data.  06:32 Nikita: That’s clever! Ron: If we would tell you don't have privilege, now you know the name of this table exists. So now I just got to find a way of hacking the table. So this is basically phishing means, extracting different pieces of information through different channels, being able to put them together.  Then in database, we have some privilege known accounts that if not protected can be a vulnerable access. The back doors into the database. For example, somebody being able to get to the operating system DBA group, and then connect to the database without user ID and password.  That's why we have to protect every layer. Any debug codes that may be available that could reference how the operation of the system is actually going. Creating cross-scripting between the different data and then operations that goes on. And as we talked about, SQL injection.  07:28 Lois: Can you dive a little deeper into SQL injection, Ron? Ron: With SQL injection, you kind of have to understand that, in general, SQL injection means somebody, like we said, knows the structure of something, knows the structure of the way the application is operating, and then be able to inject a SQL statement where they would generally put a condition or pass some parameters or some information to the application.  So, then that SQL statement becomes part of that statement and submitted to the database. Now, we need to understand SQL injection is not about the person, is not about generally your overall configuration of the database.  The most important aspect of SQL injection is about the session that is actually doing the work. For example, if I am a DBA and I am going to collect statistics for a table. If I connect a SYS DBA to collect that statistics and somebody hacks into my session and inject a SQL drop database. Database gone, because the session has SYS DBA privilege.  But if I have a user that only has create session privilege and execute a script. And in this script, I write the statement to collect statistics and I give that script only the privilege to collect stats. So now I can connect as that user with that minimal privilege, just execute the script.  So now anyone inject any SQL into the session, that will never be executed because the session has no privilege. So this is the important of SQL injection for us to understand that the importance is what happens at the session level.  And many of the security element we will see, like read-only session, hybrid read-only PDB and things like that are related into this type of SQL injection or abuse.  09:30 Lois: Yeah, we are looking forward to talking through those new features in the next episode. Ron: So the common vulnerabilities can be exploited, and also any of the users that are part of the operations that can be set up into the string and supplied into middle of statements and things like that.  09:56 Did you know that Oracle University offers free courses on Oracle Cloud Infrastructure? You’ll find training on everything from cloud computing, database, and security, artificial intelligence and machine learning, all free to subscribers. So, what are you waiting for? Pick a topic, leverage the Oracle University Learning Community to ask questions, and then sit for your certification. Visit www.mylearn.oracle.com to get started. 10:25 Nikita: Welcome back! One of the concepts I wanted to ask you about, Ron, is the principle of least privilege. Can you explain what it really means? Ron: Principles of least privilege, again, means the work that needs to be done has to have minimal privilege.  10:40 Nikita: But, we’ve always thought about that, right? Giving a user minimal privileges… Ron: Well, back in the old days, we used to execute everything as a schema owner.  Therefore, we had privilege on all the data. Then we said, OK, let's create schema users and only give them like a read privilege or this privilege. So they can only do the type of work they need to do, which is fine.  But at the same time, that can be very complex. Now I need a lot of different users and whatnot. So when it comes to principles of least privilege, this is generally about only installing whatever software that is required. Only enable or turn on whatever machines and segments that is going to be used.  Have proper operating system level users and privileges configured for all of the software that is installed at the operating system level. Have proper administrator account that are properly maintained. Set up privilege user account for each operation.  So when we do maintenance and database administration, we are not creating very high-level privileged session. That's why some of the differences privileges was created in database 12 and up, like Sys backup, Sys DG, Sys RAC. So you don't inherit the privilege as Sys DBA to actually do the work. You only have privileges for what you need.  And, of course, limit the user's access to particular object and things that they need to do. However, as I mentioned, this is not just about the user level. This is also about the session level. If I'm going to do maintenance and I'm connecting as a schema owner, somebody inject a SQL drop table, table gone. So that's why it is very important for us to be able to have control over how sessions can also operate within the database.  12:34 Lois: Right, so, what about the strategy of defense in depth? Ron: Defense in depth. That means we have to strengthen and apply security at every level, whether it being at the securities applied at the operating system in the database, in the application, in the network.  So we have to have policies defining all the different security levels. Most important, train users. So no mistakable damages. Harden every component, including the operating system. Set up proper firewalls.  Set up proper network security, like use of the Oracle firewall that protects against unwanted SQL statements. We can compare SQL statement to a whitelist of acceptable statements. And then other database security features like VPD, the auditing as we will talk about, and other components to give you an overall very secure environment.  13:35 Nikita: Ron, what are the fundamental aspects of managing security only within the database… not including the operating system or the application?  Ron: So first, we have to have confidentiality. Confidentiality means that we need to make sure that all of the data is properly secured at a data level, whether it be both at the storage level, in the database for data usage, and we have many different ways of doing confidentiality management.  Number one, properly creating users, maintaining users with the proper password through proper authentication. And then setting up authorization that privileges may not be enough because if I give you select privilege, you can see every column, every row.  So I may need row level security, data redaction, data masking for duplication, and other mechanism to help us manage even subset of data for that particular security.  14:35 Lois: Ok… so that’s confidentiality. What’s next? Ron: Data integrity means that we need to make sure that data is not destroyed, whether it being addressed in the database, in memory, in data file, in backup, in exports, or during transmission in the network.  So we usually apply encryption and check-summing not only to protect the data, but also to validate, make sure it's not corrupted.  Next data availability, which means today, especially, we are 24/7 operation. And remember we talked about denial of service attack on a database. That usually attacks on the listener, because if the listener is crashed, nobody can connect.  We have to then utilize available tools and components like RAC to have multiple instances in case a particular host crashes. And I lose a particular instance. Data Guard in case my storage and a whole database crashes. The PDB and real-time PDB management with duplication, having a PDB standbys that are maintained and managed behind the scene. Using PDB snapshots, which are point in time. Preserve data that I can use it for restoring data at those particular point in time.  Backup recovery through RMAN or other backup recovery processes. So in case data is damaged, I can restore it and recover it. And finally, auditing. Auditing historically was always known as after effect.  16:09 Nikita: That’s what I was wondering… You only see what’s going on after something happens, right?  Ron: It also can be a deterrent when people know they are being audited, they're more careful, don't make mistakes. Try not to, of course, do anything you get caught. And today, this auditing can also be set up in a way that it cannot only catch what is going on. It can actually help us better secure data and have much better responses.  Now the problem with auditing has always been the overhead. That's why the unified audits that provides us with much less overhead for management can give us an extreme detailed audits. And then the new features allows us to even more reduce the amount of audits that are generated by only auditing at the column level and better protection for those audits.  By the way, in the older days, most auditing was done at the app, because we never knew who the end user the app is. But today with being able to have Active Directory mapped into the database and information passed between the two, all audits can actually come back centrally to the database.  17:21 Lois: So, to wrap up today’s conversation, Ron, can you just summarize database security for us? All the things we need to think about. Ron: So database security starts with making sure, number one, our network is secure and we are accessing the data through a very secure connection coming in from the user.  If required to be, could have a three-tier environment where the clients go through a first external firewall to get to the middle tier. Then from the middle tier go through internal firewall to get to the database, or if this is like a direct access and things like that, setting up secure network coming in through that like for administration, remote administrations, and operations.  Then, setting up proper authentication and authentication management, configuring detail, access control and setting up multiple level of security for data accesses, not just at the table level, even at the row and column level.  And building a complete data confidentiality by not only adding in storage encryption and all of the management of the data, even on components sitting outside the database, of course, we have Oracle components that can manage some of those for you, like sample backups, RMAN, and things like that.  And to get this complete data confidentiality, you also add in, as we said, an efficient auditing that can then describe any issues, tell us where the problem is, how it happened. And if we set up an audit system that is very focused, then we can even tie it up to triggers, to notifications.  So they're very quickly responded to, because the problem with audits has always been there is just way too much of them, therefore nobody ever reviews them to see exactly what has happened. So many vulnerabilities may go on detected until a major damage happens.  And that's how you can know that this is common out there in a lot of businesses when you hear in the news. And so and so company was broken in and so much data was stolen. Well, if proper security were set up and the network is being hacked in and proper alert system and automated system were configured to be able to catch these in a proper auditing real-time, then maybe corrective action could have stopped a lot of those damages.  20:04 Nikita: Thanks for that wonderful overview, Ron. In our next episode, we’re going to go through each of the new security features and try to understand how Oracle is tightening the screws around security. Lois: And if you want to learn more about what we discussed today, visit mylearn.oracle.com and search for the Oracle Database 23ai New Features for Administrators course. Until next week, this is Lois Houston… Nikita: And Nikita Abraham signing off! 20:31 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
21m
30/07/2024

Blockchain Tables

In this episode of the Oracle University Podcast, hosts Lois Houston and Nikita Abraham kick off a new season with a deep dive into the latest features of Oracle Database 23ai. Joined by Bill Millar, a Senior Principal Database & MySQL Instructor, they explore the new enhancements to blockchain tables, such as row versions, user chains, delegate signer, and countersignature. So, if you're curious about harnessing the power of blockchain tables for your database needs, this is the perfect episode for you!   Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-new-features-for-administrators/137192/207062   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X: https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode.     --------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative  podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:26 Lois: Hello and welcome to the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor. Nikita: Hi everyone! Thank you for joining us as we begin a new season of the podcast. For the next few weeks, we’re going to explore all the new features in Oracle Database 23ai, previously known as 23c. These episodes will be great for you if you’re a database administrator, a developer, or even a database architect. Lois: Right Niki, and while anyone can listen to the podcast, you’re probably going to get the most out of this season if you have prior knowledge or experience with the previous versions of Oracle Database and have used SQL to manage Oracle Databases. Throughout this season, we’ll discuss new features in database availability, architecture, manageability, performance, and security.  01:21 Nikita: Exactly. Today, we're diving into the world of blockchain tables and the new features introduced. First, we'll try to get an overview of blockchain tables that were introduced in 21c. Then, we'll discuss the new features in 23ai, including row versions, user chains, delegate signer, and countersignature.  Lois: So, let’s get started. To take us through all this, we are joined today by Bill Millar. Bill is a Senior Principal Database & MySQL Instructor with Oracle University. Hi Bill! Thanks for joining us. To begin, what is a blockchain table? 01:59 Bill: Well, a blockchain table provides the means for recording transactions where only insert operations are allowed. And rows are protected or restricted based on time as defined when the table is created. This makes the rows tamper-resistant with their chaining algorithms.  02:16 Nikita: Bill, take us through some common attributes of a blockchain table. Bill: They are append only, protects the current data in the table. Made tamper-resistant with their hashing algorithm. And optionally, they can be digitally signed. However, they are mandatory in blockchain platform transactions. Transaction logs, audit trails, compliance information, they can most benefit from using blockchain tables.  02:44 Lois: Bill, let’s talk for a minute about the blockchain tables being tamper-resistant. What makes a blockchain table tamper-proof?  Bill: Well, with the insert only tables, each row is going to be chained to the previous row, except the first row. There's nothing to change it to. So once a row is added, it changes it to the previous row, to the previous row. Rows are linked when the transaction commits. We don't link them beforehand because you might roll back.  03:13 Nikita: Do we have some considerations or guidelines for managing blockchain tables? Bill: One, they may be partitioned. You can specify retention at a table level, the blockchain table itself. You can use the no drop clause. And you can also define it blockchain tables at the row level when you create that blockchain table. Defining a retention period for the table itself or a retention period for the rows.  03:41 Nikita: And are there any restrictions when using blockchain tables? Bill: There are several restrictions for the blockchain table. Some of them are… There are some data types that are not supported. The row ID, long, timestamp with time zone, and so forth. And there are other operations not allowed. A few of them are updating rows, merging rows, truncating, dropping them partitions. Converting a regular table to a blockchain table or vice versa.  So you do want to make sure that you understand the restrictions if you decide that you're going to use a blockchain table. There are some things you can alter in a blockchain table. One is you can modify a retention period.  It cannot be reduced. However, you can make it longer.  04:30 Lois: Ok, I think I’ve got it. So, coming to the 23ai features, what’s new with blockchain tables? Could you give us a brief overview of them before we dive into each one? Bill: So we have the user chain, just a chain of rows based off to three user-defined columns. Previously, the system defined the chain.  The row versions…it allows me to have multiple historical views of a row that's going to be-- that is maintained with the blockchain table. We have the log history. The flashback data archive history tables are now blockchain tables.  And there's also a countersignature. So you can request the time of signing a row that it has a signature for that. That signature metadata is going to be stored within the row, within some hidden columns. And then you can also have a delegate signer. It's an alternate to the user who is allowed to sign rows inserted by that primary user.  05:31 Nikita: What are some advantages of using blockchain tables? Bill: There are benefits of using the blockchain tables in transparent from fraud protection and users don't know as they're inserting the data. You can detect it by verifying the rows in the blockchain table. They are not part of the database itself. It can be more secure when you're validating them. And it is easier than distributed blockchains where multiple blockchains with identical data is being maintained across multiple different platforms.  06:03 Lois: And what about benefits specifically from the 23ai new features?  Bill: We have allowed increased flexibility. Just the user-defined itself, instead of having it just rely on the system-defined. It can guarantee row versioning. The blockchain log history to record and protect the changes. The counter signature, along with the digital signature, can help protect it even more.  So you must specify a version. There is no default version, so you must specify whether either it's going to be version 1 or version 2 and create the table. Version 1 is the version from 21c. You have to specify version 2 if you're going to take advantage of some of the new features in 23c.  And with these two different versions, it does reduce the number of columns that you are going to have accessible. Version 1 uses 20 additional columns to maintain that blockchain information, whereas a version 2 blockchain table is going to use 40 additional columns. So that reduces the number of columns that you can use by 40.  Even though version 2 does use more columns for the hidden information, it does have its benefit. It does allow you to add, drop columns. You can drop partitions with version 2. You have distributed transactions. And you can also use with replication, such as Oracle Golden Gate and Active Data Guard.  07:32 Nikita: Are there restrictions when it comes to using blockchain tables? Bill: Again, make sure that you understand the requirements of your tables when determining if blockchain table is going to be appropriate for your application or not.  XMLTypes are not supported. Can't truncate. Doesn't work with sharded tables. Can't work with different policies such as the automatic data optimization, virtual private database, label security. Cannot use the DBMS_REDEFINITION package on a blockchain table.  08:10 Are you planning to become an Oracle Certified Professional this year? Whether you're a seasoned IT pro or just starting your career, getting certified can give you a significant boost. And don't worry, we've got your back! Join us at one of our cert prep live events in the Oracle University Learning Community. You'll get insider tips from seasoned experts and learn from other professionals' experiences. Plus, once you've earned your certification, you'll become part of our exclusive forum for Oracle-certified users. So, what are you waiting for? Head over to www.mylearn.oracle.com and create an account to jump-start your journey towards certification today! 08:53 Nikita: Welcome back! Let’s get into each of those 23ai new features, Bill. What can you tell us about the row versions feature? Bill: With the row version option, it allows you to have multiple historic views of a row corresponding to a set of user-defined columns. Previously, only the system would define the columns.  When you create these, it automatically creates a view to allow you to view information about that blockchain table with the row version. The system is going to create the view with the same columns. However, the name of that view, it's going to take whatever that table name that you create and it's going to append the _Las$ onto it for that.  And it has not only the same columns of your table, but it also has additional columns in there. One of them be that last row version. This is going to allow you to see, what is the latest version of that row? In order to use the row versions, you must specify with the row version clause when you create the table.  It is also supported with or without primary key. The primary key column must not be identical to the set of the row version column. There are some restrictions, though. So you must specify-- you must specify a row version name with it. And remember, three columns is the maximum. You don't have to have three. You can have one, two, or three. And then the fields that are restricted to the types-- number, char, varchar, and raw.  And it cannot be used with version 1 blockchain tables, meaning blockchain tables came out in 21C. So if you have 21C, you cannot create it. It's a 23C feature. That's why that is like that. So you're going to specify with the row version. And then you're going to give it that row version name because that is required. And then up to three different columns that you want to use.  10:58 Lois: What about user chains? How do they enhance blockchain tables? Bill: So with the user chains, previously again, only the system chains were available. It randomly selected how to change the tables, what columns to chain it with.  Well now a user chain can be defined by the end user. And set up one, two, or three. Well, how many rows do you want to chain? Have that chain apply to. Again, the column types that we just talked about that are only supported. The number of the char, varchar, and raw. But with the user chains and you being able to identify the columns, it adds that additional flexibility to allow you to have this tamper-resistant table to be used by your applications.  So to create that blockchain table, user chain is defined when you create the table. So you're going to define when you create the table what is going to be that chain for that. When you do create that, any rows that have the same change values will be grouped together.  For example, let's say a banking application. I have an account. I make deposits. I make withdrawals. I do balance inquiries because that's all based off of that same field, that account, it'll group those together within the chain.   It does apply the hashing value to the columns that are stored within that chain.  12:27 Lois: Bill, can you explain the blockchain table delegate signer feature?  Bill: What it is, optionally, a signature that can be applied to provide additional security against tampering for that. However, if you do use it, it does require a digital certificate when adding a signature to a row.  Signatures are validated using that digital certificate and any signature algorithm for that. The delegate is an alternate. And it can be used instead of addition to just a user signature. So when I am the user, I create a row, it adds my signature, I can add my certificate to it or now I can have a delegate to do that for me.  So it can be digitally signed by the delegate. It can be signed by the delegate instead of the user itself. So that way, it's verified. Yes, that is good. Well, maybe users are not able to sign the rows they created, but they trust the delegate.  13:32 Nikita: And the last new feature to discuss is a blockchain table countersignature. Bill: A countersignature is going to provide additional guarantees that, hey, this data has been securely stored within our table itself. You can request a countersignature. It is requested at the time of signing a row. So what it's going to do is it's going to record that signature metadata in that row and the counting signature in the signed bytes that can be returned to the caller to verify, yes, that I might want to retrieve that information to use in another source for that.  So we can use that. As we said here, that candidate signature and the sign bytes, we might put it in another data store, might put it in our Oracle blockchain platform. For this non-repudiation purposes, basically what that means is that, hey, it's proof of the origin, the authenticity of it, the integrity of that data. Well, I want to pass that information to something else, another application or source or whatever. So yes, this is trusted information for that. So it gives that additional security. So it assures that the sender that their message was delivered plus gives proof of that sender's identity.  Countersignatures are saved in the blockchain table, that happens to be a blockchain table itself. The countersignature is computed using the bytes, using that hashing algorithm. It's going to include that end user signature, the delegate, or both. Remember, the end user can sign, a delicate can, or it can use both of that information for that. Even though we do save that information in the blockchain table, we recommend if you're going to use this, you might want to store that information outside of the database for those non-repudiation purposes.  15:37 Lois: Thank you so much, Bill, for taking us though all these updates. We look forward to having you back soon to talk us through some more of these new features. Nikita: To learn more about blockchain tables, visit mylearn.oracle.com and search for the Oracle Database 23ai: New Features for Administrators course. Join us next week for a discussion on some more Oracle Database 23ai new features. Until then, this is Nikita Abraham… Lois: And Lois Houston signing off! 16:06 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
16m
23/07/2024

Database Essentials

Join hosts Lois Houston and Nikita Abraham, along with Hope Fisher, Oracle’s Product Manager for Database Technologies, as they break down the basics of databases, explore different database management systems, and delve into database development.   Whether you're a newcomer or just need a refresher, this quick, informative episode is sure to offer you some valuable insights.   Oracle MyLearn: https://mylearn.oracle.com/ou/course/database-essentials/133032/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:26 Nikita: Hello and welcome to the Oracle University Podcast. I’m Nikita Abraham, Principal Technical Editor with Oracle University, and with me is Lois Houston, Director of Innovation Programs. Lois: Hi there! For the last seven weeks, we’ve been exploring the world of OCI Container Engine for Kubernetes with our senior instructor Mahendra Mehra. We covered key aspects of OKE to help you create, manage, and optimize Kubernetes clusters in Oracle Cloud Infrastructure. So, be sure you check out those episodes if you’re interested in Kubernetes. 01:00 Nikita: Today, we’re doing something a little different. We’ve had a lot of episodes on different aspects of Oracle Database, but what if you’re just getting started in this world? We wanted you to have something that you could listen to as well. And so we have Hope Fisher with us today. Hope is a Product Manager for Database Technologies at Oracle, and we’re going to ask her to take us through the basics of database, the different database management systems, and database development.  Lois: Hi Hope! Thanks for joining us for this episode. Before we dive straight into terminologies and concepts, I want to take a step back and really get down to the basics. We sometimes use the terms data and information interchangeably, but they’re not the same, right? 01:43 Hope: Data is raw material or a set of facts and observations. Information is the meaning derived from the facts. The difference between data and information can be explained by using an example, such as test scores. In one class, if every student receives a numbered score and the scores can be calculated to determine a class average, the class average can be calculated to determine the school average. So in this scenario, each student's test score is one piece of data. And information is the class’s average score or the school's average score. There is no value in data until you actually do something with it. 02:24 Nikita: Right, so then how do we make all this data useful? Do we create a database system?  Hope: A database system provides a simple function—treat data as a collection of information, organize it, and make the data usable by providing easy access to it and giving you a place where that data can be stored. Every organization needs to collect and maintain data to meet its requirements. Most organizations today use a database to automate their information systems. An information system can be defined as a formal system for storing and processing data. A database is an organized collection of data put together as a unit. The rationale of a database is to collect, store, and retrieve related data for use by database applications. A database application is a software program that interacts with the database to access and manipulate data. A database is usually managed by a Database Administrator, also known as a DBA. 03:25 Nikita: Hope, give us some examples of database systems. Hope: Popular examples of database systems include Oracle Database, MySQL, which is also owned by Oracle, Microsoft SQL server, Postgres, and others. There are relational database management systems. The acronym is DBMS. Some of the strengths of a DBMS include flexibility and scalability. Given the huge amounts of information that modern businesses need to handle, these are important factors to consider when surveying different types of databases. 03:59 Lois: This may seem a little bit silly, but why not just use spreadsheets, Hope? Why use databases? Hope: The easy answer is that spreadsheets are designed for specific problems, relatively small amounts of data and individual users. Databases are designed for lots of data, shared information use, and complex data analysis. Spreadsheets are typically used for specific problems or small amounts of data. Individual users generally use spreadsheets. In a database, cells contain records that come from external tables. Databases are designed for lots of data. They are intended to be shared and used for more complex data analysis. They need to be scalable, secure, and available to many users. This differentiation means that spreadsheets are static documents, while databases can be relational. 04:51 Nikita: Hope, what are some common database applications?  Hope: Database applications are used in far and wide use cases that most commonly can be grouped into three areas. Applications that run companies called enterprise applications. Enterprise applications are designed to integrate computer systems that run all phases of an enterprise's operations to facilitate cooperation and coordination of work across the enterprise. The intent is to integrate core business processes, like sales, accounting, finance, human resources, inventory, and manufacturing. Applications that do something very specific, like healthcare applications-- specialized software is software that's written for a specific task rather than for a broad application area.  And then there are also applications that are used to examine data and turn it into information, like a data warehouse, analytics, and data lake. 05:54 Lois: We’ve spoken about data lakes before. But since this is an episode about the basics of database, can you briefly tell us what a data lake is? Hope: A data lake is a place to store your structured and unstructured data as well as a method for organizing large volumes of highly diverse data from diverse sources. Data lakes are becoming increasingly important as people, especially in businesses and technology, want to perform broad data exploration and discovery. Bringing data together into a single place or most of it into a single place makes that simpler. 06:29 Nikita: Thanks for that, Hope. So, what kind of organizations use databases? And, who within these organizations uses databases the most? Hope: Almost every enterprise uses databases. Enterprises use databases for a variety of reasons and in a variety of ways. Data and databases are part of almost any process of the enterprise. Data is being collected to help solve business needs and drive value. Many people in an organization work with databases. These include the application developers who create applications that support and drive the business. The database administrator or DBA maintains and updates the database. And the end user uses the data as needed. 07:19 Do you want to stay ahead of the curve in the ever-evolving AI  landscape? Look no further than our brand-new OCI Generative AI Professional course and certification. For a limited time only, we’re offering both the course and certification for free. So, don’t miss out on this exclusive opportunity to get certified on Generative AI at no cost. Act fast because this offer is valid only until July 31, 2024. Visit https://education.oracle.com/genai to get started. That’s https://education.oracle.com/genai. 07:57 Nikita: Welcome back. Now that we’ve discussed foundational database concepts, I want to move on to database management systems. Take us through what a database management system is, Hope. Hope: A Database Management System, DBMS, has the following elements. The kernel code manages memory and storage for the DBMS. The repository of metadata is called a data dictionary. The query language enables applications to access the data. Oracle database functions include data definitions, storage, structure, and security. Additional functionality also provides for user access control, backup and recovery, integrity, and communications. There are many different database types and management systems. The most common is the relational database management system. 08:51 Nikita: And how do relational databases store data?  Hope: Essentially and very simplistically, there are key elements of the relational database. Database table containing rows and columns; the data in the table, which is stored a row at a time; and the columns which contain attributes or related information. And then the different tables in a database relate to one another and share a column. 09:17 Lois: Customers usually have a mix of applications and data structures, and ideally, they should be able to implement a data management strategy that effectively uses all of their data in applications, right? How does Oracle approach this?  Hope: Oracle's approach to this enterprise data management strategy and architecture is converged database to all different data types and workloads. The converged database is a database that has native support for all modern data types and, of course, traditional relational data.  By providing support for all of these data types, a converged database can run all sorts of workloads, from transaction processing to analytics and machine learning to blockchain to support the applications and systems. Oracle provides a single database engine that supports all data models, process types, and development environments. It also addresses many kinds of workloads against the same data sets. And there's no need to use dozens of specialized databases. Deploying several single-purpose databases would increase costs, complexity, and risk. 10:25 Nikita: In the final part of our conversation today, I want to bring up database development. Hope, how are databases developed?  Hope: Data modeling is the first part of the database development process. Conceptual data modeling is the examination of a business and business data to determine the structure of business information and the rules that govern it. This structure forms the basis for database design. A conceptual model is relatively stable over long periods of time. Physical data modeling, or database building, is concerned with implementation in each technical software and hardware environment. The physical implementation is highly dependent on the current state of technology and is subject to change as available technologies rapidly change. Conceptual model captures the functional and informational needs of a business and is used to identify important entities and their relationships.  A logical model includes the entities and relationships. This is also called an entity relationship model and provides the details of the relationships.  11:34 Lois: I think that’s a good place to wrap up our episode. To know more about the Oracle Database architecture, offerings, and so on, visit mylearn.oracle.com. Thanks for joining us today, Hope.  Nikita: Join us next week for another episode of the Oracle University Podcast. Until then, this is Nikita Abraham… Lois: And Lois Houston, signing off! 11:55 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
12m
16/07/2024

Container Engine for Kubernetes: Security Practices

In the season's final episode, hosts Lois Houston and Nikita Abraham interview senior OCI instructor Mahendra Mehra about the security practices that are vital for OKE clusters on OCI.   Mahendra shares his expert insights on the importance of Kubernetes security, especially in today's digital landscape where the integrity of data and applications is paramount.   OCI Container Engine for Kubernetes Specialist: https://mylearn.oracle.com/ou/course/oci-container-engine-for-kubernetes-specialist/134971/210836   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X (formerly Twitter): https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode.   ---------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:26 Nikita: Welcome to the Oracle University Podcast! I’m Nikita Abraham, Principal Technical Editor with Oracle University, and with me is Lois Houston, Director of Innovation Programs. Lois: Hi there! In our last episode, we spoke about self-managed nodes and how you can manage Kubernetes deployments. Nikita: Today is the final episode of this series on OCI Container Engine for Kubernetes. We’re going to look at the security side of things and discuss how you can implement vital security practices for your OKE clusters on OCI, and safeguard your infrastructure and data.  00:59 Lois: That’s right, Niki! We can’t overstate the importance of Kubernetes security, especially in today's digital landscape, where the integrity of your data and applications is paramount. With us today is senior OCI instructor, Mahendra Mehra, who will take us through Kubernetes security and compliance practices. Hi Mahendra! It’s great to have you here. I want to jump right in and ask you, how can users add a service account authentication token to a kubeconfig file? Mahendra: When you set up the kubeconfig file for a cluster, by default, it contains an Oracle Cloud Infrastructure CLI command to generate a short-lived, cluster-scoped, user-specific authentication token. The authentication token generated by the CLI command is appropriate to authenticate individual users accessing the cluster using kubectl and the Kubernetes Dashboard. However, the generated authentication token is not appropriate to authenticate processes and tools accessing the cluster, such as continuous integration and continuous delivery tools. To ensure access to the cluster, such tools require long-lived non-user-specific authentication tokens. One solution is to use a Kubernetes service account. Having created a service account, you bind it to a cluster role binding that has cluster administration permissions. You can create an authentication token for this service account, which is stored as a Kubernetes secret. You can then add the service account as a user definition in the kubeconfig file itself. Other tools can then use this service account authentication token when accessing the cluster. 02:47 Nikita: So, as I understand it, adding a service account authentication token to a kubeconfig file enhances security and enables automated tools to interact seamlessly with your Kubernetes cluster. So, let’s talk about the permissions users need to access clusters they have created using Container Engine for Kubernetes. Mahendra: For most operations on Container Engine for Kubernetes clusters, IAM leverages the concept of groups. A user's permissions are determined by the IAM groups they belong to, including dynamic groups. The access rights for these groups are defined by policies. IAM provides granular control over various cluster operations, such as the ability to create or delete clusters, add, remove, or modify node pool, and dictate the Kubernetes object create, delete, view operations a user can perform. All these controls are specified at the group and policy levels. In addition to IAM, the Kubernetes role-based access control authorizer can enforce additional fine-grained access control for users on specific clusters via Kubernetes RBAC roles and ClusterRoles.  04:03 Nikita: What are Kubernetes RBAC roles and ClusterRoles, Mahendra? Mahendra: Roles here defines permissions for resources within a specific namespace and ClusterRole is a global object that will provide access to global objects as well as non-resource URLs, such as API version and health endpoints on the API server. Kubernetes RBAC also includes RoleBindings and ClusterRoleBindings. RoleBinding grants permission to subjects, which can be a user, service, or group interacting with the Kubernetes API. It specified an allowed operation for a given subject in the cluster. RoleBinding is always created in a specific namespace. When associated with a role, it provides users permission specified within that role related to the objects within that namespace. When associated with a ClusterRole, it provides access to namespaced objects only defined within that cluster rule and related to the roles namespace. ClusterRoleBinding, on the other hand, is a global object. It associates cluster roles with users, groups, and service accounts. But it cannot be associated with a namespaced role. ClusterRoleBinding is used to provide access to global objects, non-namespaced objects, or to namespaced objects in all namespaces. 05:36 Lois: Mahendra, what’s IAM’s role in this? How do IAM and Kubernetes RBAC work together? Mahendra: IAM provides broader permissions, while Kubernetes RBAC offers fine-grained control. Users authorized either by IAM or Kubernetes RBAC can perform Kubernetes operations. When a user attempts to perform any operation on a cluster, except for create role and create cluster role operations, IAM first determines whether a group or dynamic group to which the user belongs has the appropriate and sufficient permissions. If so, the operation succeeds. If the attempted operation also requires additional permissions granted via a Kubernetes RBAC role or cluster role, the Kubernetes RBAC authorizer then determines whether the user or group has been granted the appropriate Kubernetes role or Kubernetes ClusterRoles. 06:41 Lois: OK. What kind of permissions do users need to define custom Kubernetes RBAC rules and ClusterRoles?  Mahendra: It's common to define custom Kubernetes RBAC rules and ClusterRoles for precise control. To create these, a user must have existing roles or ClusterRoles with equal or higher privileges. By default, users don't have any RBAC roles assigned. But there are default roles like cluster admin or super user privileges. 07:12 Nikita: I want to ask you about securing and handling sensitive information within Kubernetes clusters, and ensuring a robust security posture. What can you tell us about this? Mahendra: When creating Kubernetes clusters using OCI Container Engine for Kubernetes, there are two fundamental approaches to store application secrets. We can opt for storing and managing secrets in an external secrets store accessed seamlessly through the Kubernetes Secrets Store CSI driver. Alternatively, we have the option of storing Kubernetes secret objects directly in etcd.  07:53 Lois: OK, let’s tackle them one by one. What can you tell us about the first method, storing secrets in an external secret store? Mahendra: This integration allows Kubernetes clusters to mount multiple secrets, keys, and certificates into pods as volumes. The Kubernetes Secrets Store CSI driver facilitates seamless integration between our Kubernetes clusters and external secret stores. With the Secrets Store CSI driver, our Kubernetes clusters can mount and manage multiple secrets, keys, and certificates from external sources. These are accessible as volumes, making it easy to incorporate them into our application containers. OCI Vault is a notable external secrets store. And Oracle provides the Oracle Secrets Store CSI driver provider to enable Kubernetes clusters to seamlessly access secrets stored in Vault. 08:54 Nikita: And what about the second method? How can we store secrets as Kubernetes secret objects in etcd? Mahendra: In this approach, we store and manage our application secrets using Kubernetes secret objects. These objects are directly managed within etcd, the distributed key value store used for Kubernetes cluster coordination and state management. In OKE, etcd reads and writes data to and from block storage volumes in OCI block volume service. By default, OCI ensures security of our secrets and etcd data by encrypting it at rest. Oracle handles this encryption automatically, providing a secure environment for our secrets. Oracle takes responsibility for managing the master encryption key for data at rest, including etcd and Kubernetes secrets. This ensures the integrity and security of our stored secrets. If needed, there are options for users to manage the master encryption key themselves. 10:06 Lois: OK. We understand that managing secrets is a critical aspect of maintaining a secure Kubernetes environment, and one that users should not take lightly. Can we talk about OKE Container Image Security? What essential characteristics should container images possess to fortify the security posture of a user’s applications? Mahendra: In the dynamic landscape of containerized applications, ensuring the security of containerized images is paramount.  It is not uncommon for the operating system packages included in images to have vulnerabilities. Managing these vulnerabilities enables you to strengthen the security posture of your system and respond quickly when new vulnerabilities are discovered. You can set up Oracle Cloud Infrastructure Registry, also known as Container Registry, to scan images in a repository for security vulnerabilities published in the publicly available Common Vulnerabilities and Exposures Database. 11:10 Lois: And how is this done? Is it automatic? Mahendra: To perform image scanning, Container Registry makes use of the Oracle Cloud Infrastructure Vulnerability Scanning Service and Vulnerability Scanning REST API. When new vulnerabilities are added to the CVE database, the container registry initiates automatic rescanning of images in repositories that have scanning enabled. 11:41 Do you want to stay ahead of the curve in the ever-evolving AI landscape? Look no further than our brand-new OCI Generative AI Professional course and certification. For a limited time only, we’re offering both the course and certification for free! So, don’t miss out on this exclusive opportunity to get certified on Generative AI at no cost. Act fast because this offer is valid only until July 31, 2024. Visit https://education.oracle.com/genai to get started. That’s https://education.oracle.com/genai. 12:20 Nikita: Welcome back! Mahendra, what are the benefits of image scanning? Mahendra: You can gain valuable insights into each image scan conducted over the past 13 months. This includes an overview of the number of vulnerabilities detected and an overall risk assessment for each scan. Additionally, you can delve into comprehensive details of each scan featuring descriptions of individual vulnerabilities, their associated risk levels, and direct links to the CVE database for more comprehensive information. This historical and detailed data empowers you to monitor, compare, and enhance image security over time. You can also disable image scanning on a particular repository by removing the image scanner.  13:11 Nikita: Another characteristic that container images should have is unaltered integrity, right?  Mahendra: For compliance and security reasons, system administrators often want to deploy software into a production system. Only when they are satisfied that the software has not been modified since it was published compromising its integrity. Ensuring the unaltered integrity of software is paramount for compliance and security in production environment. 13:41 Lois: Mahendra, what are the mechanisms that guarantee this integrity within the context of Oracle Cloud Infrastructure? Mahendra: Image signatures play a pivotal role in not only verifying the source of an image but also ensuring its integrity. Oracle's Container Registry facilitates this process by allowing users or systems to push images and sign them using a master encryption key sourced from the OCI Vault.  It's worth noting that an image can have multiple signatures, each associated with a distinct master encryption key. These signatures are uniquely tied to an image OCID, providing granularity to the verification process. Furthermore, the process of image signing mandates the use of an RSA asymmetric key from the OCI Vault, ensuring a robust and secure validation of the image's unaltered integrity. 14:45 Nikita: In the context of container images, how can users ensure the use of trusted sources within OCI? Mahendra: System administrators need the assurance that the software being deployed in a production system originates from a source they trust. Signed images play a pivotal role, providing a means to verify both the source and the integrity of the image. To further strengthen this, administrators can create image verification policies for clusters, specifying which master encryption keys must have been used to sign images. This enhances security by configuring container engine for Kubernetes clusters to allow the deployment of images signed with specific encryption keys from Oracle Cloud Infrastructure Registry. Users or systems retrieving signed images from OCIR can trust the source and be confident in the image's integrity. 15:46 Lois: Why is it imperative for users to use signed images from Oracle Cloud Infrastructure Registry when deploying applications to a Container Engine for Kubernetes cluster?  Mahendra: This practice is crucial for ensuring the integrity and authenticity of the deployed images.  To achieve this enforcement. It's important to note that an image in OCIR can have multiple signatures, each linked to a different master encryption key. This multikey association adds layers of security to the verification process. A cluster's image verification policy comes into play, allowing administrators to specify up to five master encryption keys. This policy serves as a guideline for the cluster, dictating which keys are deemed valid for image signatures.  If a cluster's image verification policy doesn't explicitly specify encryption keys, any signed image can be pulled regardless of the key used. Any unsigned image can also be pulled potentially compromising the security measures. 16:56 Lois: Mahendra, can you break down the essential permissions required to bolster security measures within a user’s OKE clusters? Mahendra: To enable clusters to include master encryption key in image verification policies, you must give clusters permission to use keys from OCI Vault. For example, to grant this permission to a particular cluster in the tenancy, we must use the policy—allow any user to use keys in tenancy where request.user.id is set to the cluster's OCID. Additionally, for clusters to seamlessly pull signed images from Oracle Cloud Infrastructure Registry, it's vital to provide permissions for accessing repositories in OCIR. 17:43 Lois: I know this may sound like a lot, but OKE container image security is vital for safeguarding your containerized applications. Thank you so much, Mahendra, for being with us through the season and taking us through all of these important concepts. Nikita: To learn more about the topics covered today, visit mylearn.oracle.com and search for the OCI Container Engine for Kubernetes Specialist course. Join us next week for another episode of the Oracle University Podcast. Until then, this is Nikita Abraham… Lois Houston: And Lois Houston, signing off! 18:16 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
18m
09/07/2024

Working with Self-Managed Nodes and Managing Kubernetes Deployments

In this episode, hosts Lois Houston and Nikita Abraham speak with senior OCI instructor Mahendra Mehra about the capabilities of self-managed nodes in Kubernetes, including how they offer complete control over worker nodes in your OCI Container Engine for Kubernetes environment.   They also explore the various options that are available to effectively manage your Kubernetes deployments.   OCI Container Engine for Kubernetes Specialist: https://mylearn.oracle.com/ou/course/oci-container-engine-for-kubernetes-specialist/134971/210836   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X (formerly Twitter): https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:26 Nikita: Hello and welcome to the Oracle University Podcast! I’m Nikita Abraham, Principal Technical Editor with Oracle University, and with me is Lois Houston, Director of Innovation Programs. Lois: Hi everyone! Last week, we discussed how OKE virtual nodes can offer you a complete serverless Kubernetes experience. Nikita: Yeah, and in today’s episode, we’ll focus on self-managed nodes, where you get complete control over the worker nodes within your OKE environment. We’ll also talk about how you can manage your Kubernetes deployments. 00:57 Lois: To tell us more about this, we have Mahendra Mehra, a senior OCI instructor with Oracle University. Hi Mahendra! Welcome back! Let’s get started with self-managed nodes. Can you tell us what they are? Mahendra: In Container Engine for Kubernetes, a self-managed node is essentially a worker node that you personally create and host on a compute instance or instance pool within the compute service. Unlike managed nodes or virtual nodes, self-managed nodes are not grouped into node pools by default. They are often referred to as Bring Your Own Nodes, also abbreviated as BYON. If you wish to streamline administration and manage multiple self-managed nodes collectively, you can utilize the compute service to create a compute instance pool for hosting these nodes. This allows for greater flexibility and customization in your Kubernetes environment. 01:58 Nikita: Mahendra, what are some practical usage scenarios for OKE self-managed nodes? Mahendra: These nodes offer a range of advantages for specific use cases. Firstly, for specialized workloads, leveraging the compute service allows you to configure compute instances with shapes and image combination that may not be available for managed nodes or virtual nodes. This includes options like GPU shapes for hardware accelerated workloads or high frequency processor cores for demanding high-performance computing tasks. Secondly, if you require complete control over your compute instance configuration, self-managed nodes are the ideal choice. This gives you the flexibility to tailor each node to your specific requirements. Additionally, self-managed nodes are particularly well suited for Oracle Cloud Infrastructure cluster networks. These nodes provide high bandwidth, low latency RDMA connectivity, making them a preferred option for certain networking setups. Lastly, the use of compute instance pools with self-managed nodes enables the creation of infrastructure for handling complex distributed computing tasks. This can greatly enhance the efficiency of your Kubernetes environment. Consider these points carefully to determine the optimal use of OKE self-managed nodes in your deployments. 03:30 Lois: What do we need to consider before creating a self-managed node and integrating it into a cluster? Mahendra: There are two crucial aspects to address. Firstly, you need to confirm that the cluster to which you plan to add a self-managed node is configured appropriately.  Secondly, it's essential to choose the right image for the compute instance hosting the self-managed node.  03:53 Nikita: Can you dive a little deeper into these prerequisites? Mahendra: To successfully integrate a self-managed node into your cluster, you must ensure that the cluster is an enhanced cluster. This is a crucial prerequisite for the addition of self-managed nodes. The flannel CNI plugin for pod networking should be utilized, not the VCN-native pod networking CNI plugin. This ensures optimal pod networking for your self-managed nodes. The control plane nodes of the cluster must be running Kubernetes version 1.25 or later. This is essential for compatibility and optimal performance. Lastly, maintain compatibility between the Kubernetes version on control plane nodes and worker nodes with a maximum allowable difference of two minor versions. This ensures a smooth and stable operation of your Kubernetes environment. Keep these cluster requirements in mind as you prepare to add self-managed nodes to your OKE cluster. 04:55 Lois: What about the image requirements when creating self-managed nodes? Mahendra: Choose either Oracle Linux 7 or Oracle Linux 8 image, for your self-managed nodes. Ensure that the selected image has a release date of March 28, 2023 or later. Obtain the image OCID, also known as Oracle Cloud Identifier, from the respective sources. When specifying an image, be mindful of the Kubernetes version it contains. It's your responsibility to select an image with a Kubernetes version that aligns with the Kubernetes version skew support policy. Keep in mind that the Container Engine for Kubernetes does not automatically check the compatibility. So it's up to you to ensure harmony between the Kubernetes version on the self-managed node and the cluster's control plane nodes. These considerations will help you make informed choices when configuring images for your self-managed nodes. 05:57 Nikita: I really like the flexibility and customization OKE self-managed nodes offer. Now I want to switch gears a little and ask you about OCI Service Operator for Kubernetes. Can you tell us a bit about it? Mahendra: OCI Service Operator for Kubernetes is an open-source Kubernetes add-on that transforms the way we manage and connect OCI resources within our Kubernetes clusters. This powerful operator enables you to effortlessly create, configure, and interact with OCI resources directly from your Kubernetes environment, eliminating the need for constant navigation between the Oracle Cloud Infrastructure Console, CLI, or other tools. With the OCI Service Operator, you can seamlessly leverage kubectl to call the operator framework APIs, providing a streamlined and efficient workflow. 06:53 Lois: On what framework is the OCI Service Operator built? Mahendra: OCI Service Operator for Kubernetes is built using the open-source Operator Framework toolkit. The Operator Framework manages Kubernetes-native applications called operators in an effective, automated, and scalable way. The Operator Framework comprises essential components like Operator SDK. This leverages the Kubernetes controller-runtime library, providing high-level APIs and abstractions for writing operational logic. Additionally, it offers tools for scaffolding and code generation. 07:35 Do you want to stay ahead of the curve in the ever-evolving AI landscape? Look no further than our brand-new OCI Generative AI Professional course and certification. For a limited time only, we’re offering both the course and certification for free! So, don’t miss out on this exclusive opportunity to get certified on Generative AI at no cost. Act fast because this offer is valid only until July 31, 2024. Visit https://education.oracle.com/genai to get started. That’s https://education.oracle.com/genai. 08:14 Nikita: Welcome back! Mahendra, are there any other components within OCI Service Operator to manage Kubernetes deployments? Mahendra: The other essential component is Operator Lifecycle Manager, also abbreviated as OLM. OLM extends Kubernetes by introducing a declarative approach to install, manage, and upgrade operators within a cluster. The OCI Service Operator for Kubernetes is intelligently packaged as an Operator Lifecycle Manager bundle, simplifying the installation process on Kubernetes clusters. This comprehensive bundle encapsulates all necessary objects and definitions, including CRDs, RBACs, ConfigMaps, and deployments, making it effortlessly deployable on a cluster. 09:02 Lois: So much that users can take advantage of! What about OCI Service Operator’s integration with other OCI services?  Mahendra: One of its standout features is its seamless integration with a range of OCI services. The first one is Autonomous Database, specifically tailored for transaction processing, mixed workloads, analytics, and data warehousing. Enjoy automated patching, upgrades, and tuning, allowing routine maintenance tasks to be performed without human intervention. The next on the list is MySQL HeatWave, a fully-managed Database Service designed for developing and deploying secure cloud-native applications using widely adopted MySQL open-source database. Third on the list is OCI Streaming service. Experience a fully managed, scalable, and durable solution for ingesting and consuming high-volume data streams in real time. Next is Service Mesh. This service offers a set of capabilities to facilitate communication among microservices within a cloud-native application. The communication is centrally managed and secured, ensuring a smooth and secure interaction. The OCI Service Operator for Kubernetes serves as a versatile bridge, seamlessly connecting your Kubernetes clusters with these powerful Oracle Cloud Infrastructure services. 10:31 Nikita: That’s awesome! I’ve also heard about Ingress Controllers. Can you tell us what they are? Mahendra: A Kubernetes Ingress Controller serves as the enforcer of rules defined in a Kubernetes Ingress. Its primary role is to manage, load balance, and route incoming traffic to specific service pods residing on worker nodes within the cluster. At the heart of this process is the Kubernetes Ingress Resource. Think of it as a blueprint, a rich configuration holding routing rules and options, specifically crafted for handling HTTP and HTTPS traffic. It serves as a powerful orchestrator for managing external communication with services inside the cluster. 11:15 Lois: Mahendra, how do Ingress Controllers bring about efficiency? Mahendra: Efficiency comes with consolidation. With a single ingress resource, you can neatly gather routing rules for multiple services. This eliminates the need to create a Kubernetes service of type LoadBalancer for each service seeking external or private network traffic. The OCI native ingress controller is a powerhouse. It crafts an OCI Flexible Load Balancer, your gateway to efficient request handling. The OCI native ingress controller seamlessly adapts to changes in routing rules with real-time updates. 11:53 Nikita: And what about integration with an OKE cluster? Mahendra: Absolutely. It harmonizes with the cluster for streamlined traffic management. Operating as a single pod on a randomly selected worker node, it ensures a balanced workload distribution. 12:08 Lois: Moving on, let’s talk about running applications on ARM-based nodes and GPU nodes. We’ll start with ARM-based nodes.  Mahendra: Typically, developers use ARM-based worker nodes in Kubernetes cluster to develop and test applications. Selecting the right infrastructure is crucial for optimal performance.  12:28 Nikita: What kind of options do developers have when running applications on ARM-based nodes? Mahendra: When it comes to running applications on ARM-based nodes, you have a range of options at your fingertips. First up, consider the choice between ARM-based bare metal shapes and flexible VM shapes. Each comes with its own unique advantages. Now, let's talk about the heart of it all, the Ampere A1 Compute instances. These instances are driven by the cutting edge Ampere Altra processor, ensuring high performance and efficiency for your workloads. You must specify the ARM-based node pool shapes during cluster or node pool creation, whether you choose to navigate through the user-friendly console, leverage the flexibility of the API, or command with precision through the CLI, the process remains seamless. 13:23 Lois: Can you define pods to run exclusively on ARM-based nodes within a heterogeneous cluster setup? Mahendra: In scenarios where a cluster comprises node pools with ARM-based shapes alongside other shapes, such as AMD64, you can employ a powerful tool called node selector in the pod specification. This allows you to precisely dictate that an application should exclusively run on ARM-based worker nodes, ensuring your workloads aligns with the desired architecture. 13:55 Nikita: And before we end this episode, can you explain why developers must run applications on GPU nodes? Mahendra: Originally designed for graphics manipulations, GPUs prove highly efficient in parallel data processing. This makes them a top choice for deploying data-intensive applications. Our GPU nodes utilize cutting edge NVIDIA graphics cards ensuring efficient and powerful data processing. Seamless access to this computing prowess is made possible through CUDA libraries. To ensure smooth integration, be sure to select a GPU shape and opt for an Oracle Linux GPU image preloaded with the essential CUDA libraries. CUDA here is Compute Unified Device Architecture, which is a parallel computing platform and application-programming interface model created by NVIDIA. It allows developers to use NVIDIA graphics-processing units for general-purpose processing, rather than just rendering graphics. 14:57 Nikita: Thank you, Mahendra, for another insightful session. We appreciate you joining us today. Lois: For more information on everything we discussed, go to mylearn.oracle.com and search for the OCI Container Engine for Kubernetes Specialist course. You’ll find plenty of demos and skill checks to supplement your learning. Join us next week when we’ll discuss vital security practices for your OKE clusters on OCI. Until next time, this is Lois Houston… Nikita: And Nikita Abraham, signing off! 15:28 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
15m
02/07/2024

Working with OKE Virtual Nodes

Want to gain insights into how virtual nodes provide a serverless Kubernetes experience?   Join hosts Lois Houston and Nikita Abraham, along with senior OCI instructor Mahendra Mehra, as they compare managed nodes and virtual nodes. Continuing from the previous episode, they explore how virtual nodes enhance Kubernetes deployments in Oracle Cloud Infrastructure.   OCI Container Engine for Kubernetes Specialist: https://mylearn.oracle.com/ou/course/oci-container-engine-for-kubernetes-specialist/134971/210836   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X (formerly Twitter): https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:25 Lois: Welcome to the Oracle University Podcast! I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor. Nikita: Hey everyone! In our last episode, we examined OCI Container Engine for Kubernetes, including its key features and benefits. Lois: Yeah, that was an interesting one. Today, we’re going to discuss virtual nodes and their role in enhancing Kubernetes deployments in Oracle Cloud Infrastructure. Nikita: We’re going to compare virtual nodes and managed nodes, and look at their differences and advantages. To take us through all this, we have Mahendra Mehra with us. Mahendra is a senior OCI instructor with Oracle University.  01:09 Lois: Hi Mahendra! From our discussion last week, we know that when creating a node pool with Container Engine for Kubernetes, we have the option of specifying the type of Oracle nodes as either managed nodes or virtual nodes. But I’m sure there are some key differences in the features supported by each type, right?  Mahendra: The primary point of differentiation between virtual nodes and managed nodes is in their management approach. When it comes to managed nodes, users are responsible for managing the nodes. They have the flexibility to configure them to meet the specific requirements. Users are also responsible for upgrading Kubernetes on managed nodes and for managing cluster capacity. You can create managed nodes and node pools in both basic clusters and enhanced clusters, whereas in virtual nodes, virtual nodes provide a serverless Kubernetes, experience, enabling users to run containerized applications at scale. The Kubernetes software is upgraded and security patches are applied while respecting application's availability requirements.  You can only create virtual nodes and virtual node pools in enhanced clusters. 02:17 Nikita: What about differences in terms of resource allocation? Are there any differences we should be aware of? Mahendra: When it comes to managed nodes, the resource allocation is at the node pool level and the users specify CPU and memory resource requirements for a given node pool. In the virtual nodes, the resource allocation is done at the pod level, where you can specify the CPU and memory resource requirements, but this time, as requests and limits in the pod specification.  02:45 Lois: What about differences in the approach to load balancing? Mahendra: When it comes to managed nodes, load balancing is between the worker nodes, whereas in virtual nodes, load balancing is between pods.  Also, load balancer security list management is never enabled, and you always must manually configure security rules. When using virtual nodes, load balances distribute traffic among pods' IP addresses and then assign node port.  03:12 Lois: And when it comes to pod networking? Mahendra: Under managed nodes, both the VCN-Native Pod Networking CNI plugin and the flannel CNI plugin are supported. When it comes to virtual nodes, only VCN-Native Pod Networking is supported. Also, only one VNIC is attached to each virtual node. Remember, IP addresses are not pre-allocated before pods are created. And the VCN-Native Pod Networking CNI plugin is not shown as running in the kube-system namespace. Pod subnet route tables must have route rules defined for a NAT gateway and a service gateway. 03:48 Nikita: OK… I have a question, Mahendra. When it comes to scaling Kubernetes clusters and node pools, can users adjust the cluster capacity in response to their changing requirements? Mahendra: When it comes to managed nodes, customers can scale the cluster and node pool up and down by changing the number of managed node pools and nodes respectively. They also have an option to enable autoscaling to automatically scale managed node pools and pods. When it comes to virtual nodes, operational overhead of cluster capacity management is handled for you by OCI. A virtual node pool scales automatically and can support up to 1000 pods per virtual node. Users also have an option to increase the number of virtual node pools or virtual nodes to scale up the cluster or node pool respectively. 04:37 Lois: And what about the pricing for each? Mahendra: Under managed nodes, you pay for the compute instances that execute applications, whereas under virtual nodes, you pay for the exact compute resources consumed by each Kubernetes pod. 04:55 Do you want to stay ahead of the curve in the ever-evolving AI landscape? Look no further than our brand-new OCI Generative AI Professional course and certification. For a limited time only, we’re offering both the course and certification for free! So, don’t miss out on this exclusive opportunity to get certified on Generative AI at no cost. Act fast because this offer is valid only until July 31, 2024. Visit https://education.oracle.com/genai to get started. That’s https://education.oracle.com/genai. 05:34 Nikita: Welcome back! We were just discussing how when you have to choose between virtual nodes and managed nodes for your Kubernetes cluster, you need to consider several key points of differentiation, like the management approach, resource allocation, load balancing, pod networking, scaling, and pricing.  Lois: Yeah, it’s important to understand the benefits and drawbacks of each approach to make informed decisions. Mahendra, now let’s talk about the prerequisites to configure clusters with virtual nodes and the IAM policies that are required to use virtual nodes. Mahendra: Before you can use virtual nodes, you always have to set up at least one IAM policy, which is required in all circumstances by both tenancy administrators and non-administrator users. This basically means, to create and use clusters with virtual nodes and virtual node pools, you must endorse Container Engine for Kubernetes service to allow virtual nodes to create container instances in the Container Engine for Kubernetes service tenancy with a VNIC connected to a subnet of a VCN in your tenancy. All you need to do is create a policy in the root compartment with policy statements from the official documentation page. You will find them under the Working with Virtual Nodes section within the Container Engine topic.  06:55 Lois: Mahendra, how do you create and configure virtual nodes and virtual node pools? Mahendra: Creating virtual nodes is a pivotal step and it involves setting up a virtual node pool in a new cluster. This is exclusively applicable to enhanced clusters. You can initiate this process using the console, the CLI, or the API. Configuring your virtual node pools involves defining critical parameters. Firstly, we have the node count. This represents the number of virtual nodes you wish to create within your virtual node pool. These nodes will be strategically placed in the availability domains that you specify. Now, it's important to carefully consider the placement of these nodes. You can distribute them across different availability domains, ensuring high availability for your applications. Additionally, you have the option to place these nodes in a regional subnet, which is the recommended approach for optimal performance. 07:53 Nikita: Isn’t the pod shape another important parameter? Can you tell us a bit about it? Mahendra: Pod shape refers to the type of shape you want for pods running on your virtual nodes within the virtual node pool. The pod shape is crucial as it determines the processor type on which you want your pods to run. It is important to note that only shapes available in your tenancy and supported by Container Engine for Kubernetes will be shown. So choose a shape that aligns with the requirements of your applications and services. A noteworthy point is that you explicitly specify the CPU and memory resource requirements for virtual nodes in the pod specification file. This ensures that your virtual nodes have the necessary resources to handle the workloads of your applications. Precision in specifying these requirements is key to achieving optimal performance. 08:49 Lois: What is the network setup for virtual nodes?  Mahendra: The pod running on virtual nodes utilize VCN-native pod networking, and it's crucial to specify how these pods in the node pool communicate with each other. This involves setting up a pod subnet, which is a regional subnet configured specially to host pods. The pod subnet you specify for virtual nodes must be private. Oracle recommends that the pod subnet and the virtual node subnets are the same. In addition to subnet configurations, you have the option to use security rules in network security group to control access to the pod subnet. This involves defining security rules within one or more NSGs that you specify with a maximum limit of five network security groups. Also, it is worth noting that using network security group is recommended over using security list. Now, let's shift our focus to virtual node communication. For this, you will configure a virtual node subnet. This subnet can be either a regional subnet, which is recommended, or an availability domain-specific subnet. And it's designed to host your virtual nodes. 10:02 Nikita: What are some key considerations for virtual node subnets? Mahendra: If you've specified load balancer subnets, ensure that the virtual node subnets are different. As with pod communication, Oracle recommends that the pod subnet and the virtual node subnet are the same, with the added condition that the virtual node subnet must be private. 10:23 Lois: Mahendra, can you take us through the fundamental tasks involved in managing virtual nodes and virtual node pools? Mahendra: Whether you're creating a new enhanced cluster using the Console, or looking to scale up an existing one, the creation process is versatile.  Creating virtual nodes involves establishing a virtual node pool. Virtual nodes can only be created within enhanced clusters. Listing virtual nodes task offers visibility into virtual nodes within a virtual node pool. Whether you prefer Console, CLI, or the API, you have the flexibility to choose the method that suits your workflow best. For a comprehensive understanding of your virtual node pools, navigate to the Cluster List page, and click on the name of the cluster. This will unveil the specifics of the virtual node pool you are interested in. Now let's talk about updating virtual node pools. Whether your initiating a new enhanced cluster, or expanding an existing one, the update process ensures your cluster aligns with your evolving requirements. You can easily update the virtual node pool’s name for clarity. You can also dynamically change the number of virtual nodes to meet the workload demands, and you can fine tune the Node Placement using options like Availability Domain and Fault Domain settings. Moving on to an essential aspect of node pool management, that is deletion. It's crucial to understand that deleting a node pool is a permanent action. Once deleted, the node pool cannot be recovered.  12:04 Lois: Before we wrap up, Mahendra, can you talk about the critical factors when allocating CPU, memory, and storage resources to pods provisioned by virtual nodes within your OKE cluster? Mahendra: To ensure optimal performance, OKE calculates CPU and memory allocations at the pod level, a distinctive feature when using virtual nodes. This approach stands in contrast to the traditional worker node-level allocation. The allocation process takes into account several factors. First one is the CPU and memory requests and limits. These are specified for each container in the pod spec file, if present. Secondly, number of containers in the pod. The total number of containers impacts the overall resource requirements. And kube-proxy and container runtime requirements. A small but essential consideration taking up 0.25 GB of memory and negligible CPU. Pod CPU and memory requests must meet a minimum of 0.125 OCPUs and 0.5 GB of memory. 13:12 Nikita: Thank you, Mahendra, for this really insightful session. If you’re interested in learning more about the topics we discussed today, head over to mylearn.oracle.com and search for the OCI Container Engine for Kubernetes Specialist course.  Lois: You’ll find demos that you watch as well as skill checks that you can attempt to better your understanding. In our next episode, we’ll journey into the world of self-managed nodes and discuss how to manage Kubernetes deployments. Until then, this is Lois Houston…  Nikita: And Nikita Abraham, signing off! 13:45 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
14m
25/06/2024

Introduction to OCI Container Engine for Kubernetes

Curious about how OCI Container Engine for Kubernetes (OKE) can transform the way your development team builds, deploys, and manages cloud-native applications? Listen to hosts Lois Houston and Nikita Abraham explore OKE's key features and benefits with senior OCI instructor Mahendra Mehra.   Mahendra breaks down complex concepts into digestible bits, making it easy for you to understand the magic behind OKE.   OCI Container Engine for Kubernetes Specialist: https://mylearn.oracle.com/ou/course/oci-container-engine-for-kubernetes-specialist/134971/210836   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X (formerly Twitter): https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:25 Nikita: Hello and welcome to the Oracle University Podcast. I’m Nikita Abraham, Principal Technical Editor with Oracle University, and with me is Lois Houston, Director of Innovation Programs. Lois: Hi there! If you’ve been listening to us these last few weeks, you’ll know we’ve been discussing containerization, the Oracle Cloud Infrastructure Registry, and the basics of Kubernetes. Today, we’ll dive into the world of OCI Container Engine for Kubernetes, also referred to as OKE.  Nikita: We’re joined by Mahendra Mehra, a senior OCI instructor with Oracle University, who will take us through the key features and benefits of OKE and also talk about working with managed nodes. Hi Mahendra! Thanks for joining us today. 01:09 Lois: So, Mahendra, what is OKE exactly? Mahendra: Oracle Cloud Infrastructure Container Engine for Kubernetes is a fully managed, scalable, and highly available service that empowers you to effortlessly deploy your containerized applications to the cloud. But that's just the beginning. OKE can transform the way you and your development team build, deploy, and manage cloud native applications. 01:36 Nikita: What would you say are some of its most defining features?    Mahendra: One of the defining features of OKE is the flexibility it offers. You can specify whether you want to run your applications on virtual nodes or opt for managed nodes. Regardless of your choice, Container Engine for Kubernetes will efficiently provision them within your existing OCI tenancy on Oracle Cloud Infrastructure. Creating OKE cluster is a breeze, and you have a couple of fantastic tools at your disposal-- the console and the rest API. These make it super easy to get started. OKE relies on Kubernetes, which is an open-source system that simplifies the deployment, scaling, and management of containerized applications across clusters of hosts. Kubernetes is an incredible system that groups containers into logical units known as pods. And these pods make managing and discovering your applications very simple. Not to mention, Container Engine for Kubernetes uses Kubernetes versions that are certified as conformant by the Cloud Native Computing Foundation, also abbreviated as CNCF. And here's the icing on the cake. Container Engine for Kubernetes is ISO-compliant. The other two ISO-IEC standards—27001, 27017, and 27018. That's your guarantee of a secure and reliable platform.   03:08 Lois: That’s great. But how do you access all this power? Mahendra: You can define and create your Kubernetes cluster using the intuitive console and the robust rest API. Once your clusters are up and running, you can manage them using the Kubernetes command line, also known as kubectl, the user-friendly Kubernetes dashboard, and the powerful Kubernetes API. 03:32 Nikita: I love the idea of an intuitive console and being able to manage everything from a centralized place. Lois: Yeah, that’s fantastic! Mahendra, can you talk us through the magic that happens behind the scenes? What’s Oracle’s role in all this? Mahendra: All the master nodes or control plane nodes are managed by Oracle. This includes components like etcd, the API server, and the controller manager among others. To ensure reliability, we make sure multiple copies of these master components are distributed across different availability domains. And we don't stop there. We also manage the Kubernetes dashboard and even handle the self-healing mechanism of both the cluster and the worker nodes. All of these are meticulously created and managed within your Oracle tenancy. 04:19 Lois: And what happens at the user’s end? What is their responsibility? Mahendra: At your end, you have the power to manage your worker nodes. Using different compute shapes, you can create and control them in your own user tenancy. So, as you can see, it's a perfect blend of Oracle's expertise and your control. 04:38 Nikita: So, in your opinion, why should users consider OKE their go-to solution for all things Kubernetes? Mahendra: Imagine a world where building and maintaining Kubernetes environments, be it master nodes or worker nodes, is no longer complex, costly, or even time-consuming. OKE is here to make your life easier by seamlessly integrating Kubernetes with various container life cycle management products, which includes container registries, CI/CD frameworks, networking solutions, storage options, and top-notch security features. And speaking of security, OKE gives you the tools you need to manage and control team access to production clusters, ensuring granular access to Kubernetes cluster in a straightforward process. It empowers developers to deploy containers quickly, provides devops teams with visibility and control for seamless Kubernetes management, and brings together Kubernetes container orchestration with Oracle's advanced cloud infrastructure. This results in robust control, top tier security, IAM, and consistent performance. 05:50 Nikita: OK…a lot of benefits! Mahendra, I know there have been ongoing enhancements to the OKE service. So, when creating a new cluster with Container Engine for Kubernetes, what is the cluster type we should specify?  Mahendra: The first type is the basic clusters. Basic clusters support all the core functionality provided by Kubernetes and Container Engine for Kubernetes. Basic clusters come with a service-level objective, but not a financially backed service level agreement. This means that Oracle guarantees a certain level of availability for the basic cluster, but there is no monetary compensation if that level is not met. On the other hand, we have the enhanced clusters. Enhanced clusters support all available features, including features not supported by basic clusters.  06:38 Lois: OK. So, can you tell us more about the features supported by enhanced clusters? Mahendra: As we move towards a more digitized world, the demand for infrastructure continues to rise. However, with virtual nodes, managing the infrastructure of your cluster becomes much simpler. The burden of manually scaling, upgrading, or troubleshooting worker nodes is removed, giving you more time to focus on your applications rather than the underlying infrastructure. Virtual nodes provide a great solution for managing large clusters with a high number of nodes that require frequent updates or scaling. With this feature, you can easily simplify the management of your cluster and focus on what really matters, that is your applications. Managing cluster add-ons can be a daunting task. But with enhanced clusters, you can now deploy and configure them in a more granular way. This means that you can manage both essential add-ons like CoreDNS and kube-proxy as well as a growing portfolio of optional add-ons like the Kubernetes Dashboard.  With enhanced clusters, you have complete control over the add-ons you install or disable, the ability to select specific add-on versions, and the option to opt-in or opt-out of automatic updates by Oracle. You can also manage add-on specific customizations to tailor your cluster to meet the needs of your application. 08:05 Lois: Do users need to worry about deploying add-ons themselves? Mahendra: Oracle manages the lifecycle of add-ons so that you don't have to worry about deploying them yourself. This level of control over add-ons gives you the flexibility to customize your cluster to meet the unique needs of your applications, making managing your cluster a breeze. 08:25 Lois: What about scaling? Mahendra: Scaling your clusters to meet the demands of your workload can be a challenging task. However, with enhanced clusters, you can now provision more worker nodes in a single cluster, allowing you to deploy larger workloads on the same cluster which can lead to better resource utilization and lower operational overhead. Having fewer larger environments to secure, monitor, upgrade, and manage is generally more efficient and can help you save on cost. Remember, there are limits to the number of worker nodes supported on an enhanced cluster, so you should review the Container Engine for Kubernetes limits documentation and consider the additional considerations when defining enhanced clusters with large number of managed nodes.  09:09  Nikita: Ensuring the security of my cluster would be of utmost importance to me, right? How would I do that with enhanced clusters? Mahendra: With enhanced clusters, you can now strengthen cluster security through the use of workload identity. Workload identity enables you to define OCI IAM policies that authorize specific pods to make OCI API calls and access OCI resources. By scoping the policies to Kubernetes service account associated with application pods, you can now allow the applications running inside those pods to directly access the API based on the permissions provided by the policies. 09:48 Nikita: Mahendra, what type of uptime and server availability benefits do enhanced clusters provide? Mahendra: You can now rely on a financially backed service level agreement tied to Kubernetes API server uptime and availability. This means that you can expect a certain level of uptime and availability for your Kubernetes API server, and if it degrades below the stated SLA, you'll receive compensation. This provides an extra level of assurance and helps ensure that your cluster is highly available and performant. 10:20 Lois: Mahendra, do you have any tips for us to remember when creating basic and enhanced clusters? Mahendra: When using the console to create a cluster, a new cluster is created as an enhanced cluster by default unless you explicitly choose to create a basic cluster. If you don't select any enhanced features during cluster creation, you have the option to create the new cluster as a basic cluster. When using CLI or API to create a cluster, you can specify whether to create a basic cluster or an enhanced cluster. If you don't explicitly specify the type of cluster to create, a new cluster is created as a basic cluster by default. Creating a new cluster as an enhanced cluster enables you to easily add enhanced features later even if you didn't select any enhanced features initially. If you do choose to create a new cluster as a basic cluster, you can still choose to upgrade the basic cluster to an enhanced cluster later on. However, you cannot downgrade an enhanced cluster to a basic cluster. These points are really important while you consider selection of a basic cluster or an enhanced cluster for your usage. 11:34 Do you want to stay ahead of the curve in the ever-evolving AI landscape? Look no further than our brand-new OCI Generative AI Professional course and certification. For a limited time only, we’re offering both the course and certification for free! So, don’t miss out on this exclusive opportunity to get certified on Generative AI at no cost. Act fast because this offer is valid only until July 31, 2024. Visit https://education.oracle.com/genai to get started. That’s https://education.oracle.com/genai. 12:13 Nikita: Welcome back! I want to move on to serverless Kubernetes with virtual nodes. But I think before we do that, we first need to have a basic understanding of what managed nodes are.   Mahendra: Managed nodes run on compute instances within your tenancy, and are at least partly managed by you. In the context of Kubernetes, a node is a compute host that can be either a virtual machine or a bare metal host. As you are responsible for managing managed nodes, you have the flexibility to configure them to meet your specific requirements. You are responsible for upgrading Kubernetes on managed nodes and for managing cluster capacity. Nodes are responsible for running a collection of pods or containers, and they are comprised of two system components: the kubelet, which is the host brain, and the container runtime such as CRI-O, or containerd.  13:07 Nikita: Ok… so what are virtual nodes, then? Mahendra: Virtual nodes are fully managed and highly available nodes that look and act like real nodes to Kubernetes. They are built using the open source CNCF Virtual Kubelet Project, which provides the translation layer between OCI and Kubernetes. 13:25 Lois: So, what makes Oracle’s managed virtual Kubernetes product different? Mahendra: OCI is the first major cloud provider to offer a fully managed virtual Kubelet product that provides a serverless Kubernetes experience through virtual nodes. Virtual nodes are configured by customers and are located within a single availability and fault domain within OCI. Virtual nodes have two main components: port management and container instance management. Virtual nodes delegates all the responsibility of managing the lifecycle of pods to virtual Kubernetes while on a managed node, the kubelet is responsible for managing all the lifecycle state. The key distinction of virtual nodes is that they support up to a 1,000 pods per virtual node with the expectation of supporting more in the future. 14:15 Nikita: What are the other benefits of virtual nodes? Mahendra: Virtual nodes offer a fully managed experience where customers don't have to worry about managing the underlying infrastructure of their containerized applications. Virtual nodes simplifies scaling patterns for customers. Customers can scale their containerized application up or down quickly without worrying about the underlying infrastructure, and they can focus solely on their applications. With virtual nodes, customers only pay for the resources that their containerized application use. This allows customers to optimize their costs and ensures that they are not paying for any unused resources. Virtual nodes can support over 10 times the number of pods that a normal node can. This means that customer can run more containerized applications on virtual nodes, which reduces operational burden and makes it easier to scale applications. Customers can leverage container instances in serverless offering from OCI to take advantage of many OCI functionalities natively. These functionalities include strong isolation and ultimate elasticity with respect to compute capacity. 15:26 Lois: When creating a cluster using Container Engine for Kubernetes, we have the flexibility to customize the worker nodes within the cluster, right? Could you tell us more about this customization? Mahendra: This customization includes specifying two key elements. Firstly, you can select the operating system image to be used for worker nodes. This image serves as a template for the worker node's virtual hard drive, and determines the operating system and other software installed. Secondly, you can choose the shape for your worker nodes. The shape defines the number of CPUs and the amount of memory allocated to each instance, ensuring it meets your specific requirements. This customization empowers you to tailor your OKE cluster to your exact needs. It is important to note that you can define and create OKE clusters using both the console and the REST API. This level of control is specially valuable for your development team when building, deploying, and managing cloud native applications.  You have the option to specify whether applications should run on virtual nodes or managed nodes. And Container Engine for Kubernetes efficiently provisions them on Oracle Cloud Infrastructure within your existing OCI tenancy. This flexibility ensures that you can adapt your OKE cluster to suit the specific requirements of your projects and workloads. 16:56 Lois: Thank you so much, Mahendra, for giving us your time today. For more on the topics we discussed, visit mylearn.oracle.com and look for the OCI Container Engine for Kubernetes Specialist course. Join us next week as we dive deeper into working with OKE virtual nodes. Until then, this is Lois Houston… Nikita: And Nikita Abraham, signing off! 17:18 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
17m
18/06/2024

Basics of Kubernetes

In this episode, Lois Houston and Nikita Abraham, along with senior OCI instructor Mahendra Mehra, dive into the fundamentals of Kubernetes. They talk about how Kubernetes tackles challenges in deploying and managing microservices, and enhances software performance, flexibility, and availability.   OCI Container Engine for Kubernetes Specialist: https://mylearn.oracle.com/ou/course/oci-container-engine-for-kubernetes-specialist/134971/210836   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X (formerly Twitter): https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:26 Lois: Hello and welcome to another episode of the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor. Nikita: Hi everyone! We’ve spent the last two episodes getting familiar with containerization and the Oracle Cloud Infrastructure Registry. Today, it’s going to be all about Kubernetes. So if you've heard of Kubernetes but you don't know what it is, or you've been playing with Docker and containers and want to know how to take it to the next level, you’ll want to stay with us. Lois: That’s right, Niki. We’ll be chatting with Mahendra Mehra, a senior OCI instructor with Oracle University, about the challenges in containerized applications within a complex business setup and how Kubernetes facilitates container orchestration and improves its effectiveness, resulting in better software performance, flexibility, and availability. 01:20 Nikita: Hi Mahendra. To start, can you tell us when you would use Kubernetes?  Mahendra: While deploying and managing microservices in a distributed environment, you may run into issues such as failures or container crashes. Issues such as scheduling containers to specific machines depending upon the configuration. You also might face issues while upgrading or rolling back the applications which you have containerized. Scaling up or scaling down containers across a set of machines can be troublesome.  01:50 Lois: And this is where Kubernetes helps automate the entire process?  Mahendra: Kubernetes is a portable, extensible, open source platform for managing containerized workloads and services that facilitates both declarative configuration and automation.  You can think of a Kubernetes as you would a conductor for an orchestra. Similar to how a conductor would say how many violins are needed, which one play first, and how loud they should play, Kubernetes would say, how many webserver front-end containers or back-end database containers are needed, what they serve, and how many resources are to be dedicated to each one. 02:27 Nikita: That’s so cool! So, how does Kubernetes work?  Mahendra: In Kubernetes, there is a master node, and there are multiple worker nodes. Each worker node can handle multiple pods. Pods are just a bunch of containers clustered together as a working unit. If a worker node goes down, Kubernetes starts new pods on the functioning worker node. 02:47 Lois: So, the benefits of Kubernetes are… Mahendra: Kubernetes can containerize applications of any scale without any downtime. Kubernetes can self-heal containerized applications, making them resilient to unexpected failures.  Kubernetes can autoscale containerized applications as for the workload and ensure optimal utilization of cloud resources. Kubernetes also greatly simplifies the process of deployment operations. With Kubernetes, however complex an operation is, it could be performed reliably by executing a couple of commands at the most. 03:19 Nikita: That’s great. Mahendra, can you tell us a bit about the architecture and main components of Kubernetes? Mahendra: The Kubernetes cluster has two main components. One is the control plane, and one is the data plane. The control plane hosts the components used to manage the Kubernetes cluster. And the data plane basically hosts all the worker nodes that can be virtual machines or physical machines. These worker nodes basically host pods which run one or more containers. The containers running within these pods are making use of Docker images, which are managed within the image registry. In case of OCI, it is the container registry. 03:54 Lois: Mahendra, you mentioned nodes and pods. What are nodes? Mahendra: It is the smallest unit of computing hardware within the Kubernetes. Its work is to encapsulate one or more applications as containers. A node is a worker machine that has a container runtime environment within it. 04:10 Lois: And pods? Mahendra: A pod is a basic object of Kubernetes, and it is in charge of encapsulating containers, storage resources, and network IPs. One pod represents one instance of an application within Kubernetes. And these pods are launched in a Kubernetes cluster, which is composed of nodes. This means that a pod runs on a node but can easily be instantiated on another node. 04:32 Nikita: Can you run multiple containers within a pod? Mahendra: A pod can even contain more than one container if these containers are relatively tightly coupled. Pod is usually meant to run one application container inside of it, but you can run multiple containers inside one pod. Usually, it is only the case if you have one main application container and a helper container or some sidecar containers that has to run inside of that pod. Every pod is assigned a unique private IP address, using which the pods can communicate with one another. Pods are meant to be ephemeral, which means they die easily. And if they do, upon re-creation, they are assigned a new private IP address. In fact, Kubernetes can scale a number of these pods to adapt for the incoming traffic, consequently creating or deleting pods on demand. Kubernetes guarantees the availability of pods and replicas specified, but not the liveliness of each individual pod. This means that other pods that need to communicate with this application or component cannot rely on the underlying individual pod's IP address. 05:35 Lois: So, how does Kubernetes manage traffic to this indecisive number of pods with changing IP addresses? Mahendra: This is where another component of Kubernetes called services comes in as a solution. A service gets allocated a virtual IP address and lives until explicitly destroyed. Requests to the services get redirected to the appropriate pods, thus the services of a stable endpoint used for inter-component or application communication. And the best part here is that the lifecycle of service and the pods are not connected. So even if the pod dies, the service and the IP address will stay, so you don't have to change their endpoints anymore. 06:13 Nikita: What types of services can you create with Kubernetes? Mahendra: There are two types of services that you can create. The external service is created to allow external users to connect the containerized applications within the pod. Internal services can also be created that restrict the communication within the cluster. Services can be exposed in different ways by specifying a particular type. 06:33 Nikita: And how do you define these services? Mahendra: There are three types in which you can define services. The first one is the ClusterIP, which is the default service type that exposes services on an internal IP within the cluster. This type makes the service only reachable from within the cluster. You can specify the type of service as NodePort. NodePort basically exposes the service on the same port of each selected node in the cluster using a network address translation and makes the service accessible from the outside of the cluster using the node IP and the NodePort combination. This is basically a superset of ClusterIP. You can also go for a LoadBalancer type, which basically creates an external load balancer in the current cloud. OCI supports LoadBalancer types. It also assigns a fixed external IP to the service. And the LoadBalancer type is a superset of NodePort. 07:25 Lois: There’s another component called ingress, right? When do you used that? Mahendra: An ingress is used when we have multiple services on our cluster, and we want the user requests routed to the services based on their pod, and also, if you want to talk to your application with a secure protocol and a domain name. Unlike NodePort or LoadBalancer, ingress is not actually a type of service. Instead, it is an entry point that sits in front of the multiple services within the cluster. It can be defined as a collection of routing rules that govern how external users access services running inside a Kubernetes cluster. Ingress is most useful if you want to expose multiple services under the same IP address, and these services all use the same Layer 7 protocol, typically HTTP. 08:10 Lois: Mahendra, what about deployments in Kubernetes?  Mahendra: A deployment is an object in Kubernetes that lets you manage a set of identical pods. Without a deployment, you will need to create, update, and delete a bunch of pods manually. With the deployment, you declare a single object in a YAML file, and the object is responsible for creating the pods, making sure they stay up-to-date and ensuring there are enough of them running. You can also easily autoscale your applications using a Kubernetes deployment. In a nutshell, the Kubernetes deployment object lets you deploy a replica set of your pods, update the pods and the replica sets. It also allows you to roll back to your previous deployment versions. It helps you scale a deployment. It also lets you pause or continue a deployment. 08:59 Do you want to stay ahead of the curve in the ever-evolving AI landscape? Look no further than our brand-new OCI Generative AI Professional course and certification. For a limited time only, we’re offering both the course and certification for free! So, don’t miss out on this exclusive opportunity to get certified on Generative AI at no cost. Act fast because this offer is valid only until July 31, 2024. Visit https://education.oracle.com/genai to get started. That’s https://education.oracle.com/genai. 09:37 Nikita: Welcome back! We were talking about how useful a Kubernetes deployment is in scaling operations. Mahendra, how do pods communicate with each other?  Mahendra: Pods communicate with each other using a service. For example, my application has a database endpoint. Let's say it's a MySQL service that it uses to communicate with the database. But where do you configure this database URL or endpoints? Usually, you would do it in the application properties file or as some kind of an external environment variable. But usually, it's inside the build image of the application. So for example, if the endpoint of the service or the service name, in this case, changes to something else, you would have to adjust the URL in the application. And this will cause you to rebuild the entire application with a new version, and you will have to push it to the repository. You'll then have to pull that new image into your pod and restart the whole thing. For a small change like database URL, this is a bit tedious. So for that purpose, Kubernetes has a component called ConfigMap. ConfigMap is a Kubernetes object that maintains a key value store that can easily be used by other Kubernetes objects, such as pods, deployments, and services. Thus, you can define a ConfigMap composed of all the specific variables for your environment. In Kubernetes, now you just need to connect your pod to the ConfigMap, and the pod will read all the new changes that you have specified within the ConfigMap, which means you don't have to go on to build a new image every time a configuration changes. 11:07 Lois: So then, I’m just wondering, if we have a ConfigMap to manage all the environment variables and URLs, should we be passing our username and password in the same file? Mahendra: The answer is no. Password or other credentials within a ConfigMap in a plain text format would be insecure, even though it's an external configuration. So for this purpose, Kubernetes has another component called secret. Kubernetes secrets are secure objects which store sensitive data, such as passwords, OAuth tokens, and SSH keys with the encryption within your cluster. Using secrets gives you more flexibility in a pod lifecycle definition and control over how sensitive data is used. It reduces the risk of exposing the data to unauthorized users. 11:50 Nikita: So, you’re saying that the secret is just like ConfigMap or is there a difference? Mahendra: Secret is just like ConfigMap, but the difference is that it is used to store secret data credentials, for example, database username and passwords, and it's stored in the base64 encoded format. The kubelet service stores this secret into a temporary file system. 12:11 Lois: Mahendra, how does data storage work within Kubernetes? Mahendra: So let's say we have this database pod that our application uses, and it has some data or generates some data. What happens when the database container or the pod gets restarted? Ideally, the data would be gone, and that's problematic and inconvenient, obviously, because you want your database data or log data to be persisted reliably for long term. To achieve this, Kubernetes has a solution called volumes. A Kubernetes volume basically is a directory that contains data accessible to containers in a given pod within the Kubernetes platform. Volumes provide a plug-in mechanism to connect ephemeral containers with persistent data stores elsewhere. The data within a volume will outlast the containers running within the pod. Containers can shut down and restart because they are ephemeral units. Data remains saved in the volume even if a container crashes because a container crash is not enough to cut off a pod from a node. 13:10 Nikita: Another main component of Kubernetes is a StatefulSet, right? What can you tell us about it?  Mahendra: Stateful applications are applications that store data and keep tracking it. All databases such as MySQL, Oracle, and PostgreSQL are examples of Stateful applications. In a modern web application, we see stateless applications connecting with Stateful application to serve the user request. For example, a Node.js application is a stateless application that receives new data on each request from the user. This application is then connected with a Stateful application, such as MySQL database, to process the data. MySQL stores the data and keeps updating the database on the user's request.  Now, assume you deployed a MySQL database in the Kubernetes cluster and scaled this to another replica, and a frontend application wants to access the MySQL cluster to read and write data. The read request will be forwarded to both these pods. However, the write request will only be forwarded to the first primary pod. And the data will be synchronized with other pods. You can achieve this by using the StatefulSets. Deleting or scaling down a StatefulSet will not delete the volumes associated with the Stateful applications. This gives you your data safety. If you delete the MySQL pod or if the MySQL pod restarts, you can have access to the data in the same volume.  So overall, a StatefulSet is a good fit for those applications that require unique network identifiers; stable persistent storage; ordered, graceful deployment and scaling; as well as ordered, automatic rolling updates. 14:43 Lois: Before we wrap up, I want to ask you about the features of Kubernetes. I’m sure there are countless, but can you tell us the most important ones? Mahendra: Health checks are used to check the container's readiness and liveness status. Readiness probes are intended to let Kubernetes know if the app is ready to serve the traffic.  Networking plays a significant role in container orchestration to isolate independent containers, connect coupled containers, and provide access to containers from the external clients. Service discovery allows containers to discover other containers and establish connections to them. Load balancing is a dedicated service that knows which replicas are running and provides an endpoint that is exposed to the clients. Logging allows us to oversee the application behavior.  The rolling update allows you to update a deployed containerized application with minimal downtime using different update scenarios. The typical way to update such an application is to provide new images for its containers. Containers, in a production environment, can grow from few to many in no time. Kubernetes makes managing multiple containers an easy task. And lastly, resource usage monitoring-- resources such as CPU and RAM must be monitored within the Kubernetes environment. Kubernetes resource usage looks at the amount of resources that are utilized by a container or port within the Kubernetes environment. It is very important to keep an eye on the resource usage of the pods and containers as more usage translates to more cost. 16:18 Nikita: I think we can wind up our episode with that. Thank you, Mahendra, for joining us today. Kubernetes sure can be challenging to work with, but we covered a lot of ground in this episode.  Lois: That’s right, Niki! If you want to learn more about the rich features Kubernetes offers, visit mylearn.oracle.com and search for the OCI Container Engine for Kubernetes Specialist course. Remember, all the training is free, so you can dive right in! Join us next week when we'll take a look at the fundamentals of Oracle Cloud Infrastructure Container Engine for Kubernetes. Until then, Lois Houston… Nikita: And Nikita Abraham, signing off! 16:57 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
17m
11/06/2024

Oracle Cloud Infrastructure Registry

In this episode, hosts Lois Houston and Nikita Abraham, along with senior OCI instructor Mahendra Mehra, discuss how Oracle Cloud Infrastructure Registry simplifies the development-to-production workflow for developers.   Listen to Mahendra explain important container registry concepts, such as images, repositories, image tags, and image paths, as well as how they relate to each other.   OCI Container Engine for Kubernetes Specialist: https://mylearn.oracle.com/ou/course/oci-container-engine-for-kubernetes-specialist/134971/210836   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X (formerly Twitter): https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:26 Nikita: Hello and welcome to the Oracle University Podcast. I’m Nikita Abraham, Principal Technical Editor with Oracle University, and I’m joined by Lois Houston, Director of Innovation Programs. Lois: Hi there! This is our second episode on OCI Container Engine for Kubernetes, and today we’re going to spend time discussing container registries with our colleague and senior OCI instructor, Mahendra Mehra. Nikita: We’ll talk about how you can become proficient in managing Oracle Cloud Infrastructure Registry, a vital component in your container workflow.  00:58 Lois: Hi Mahendra, can you explain what Oracle Cloud Infrastructure Registry, or OCIR, is and how it simplifies the container image management process? Mahendra: OCIR is an Oracle-managed registry designed to simplify the development-to-production workflow for developers. It offers a range of functionalities, serving as a private docker registry for internal use where developers can easily store, share, and manage container images.  The strength of OCIR lies in its high available and scalable architecture. Leveraging OCI to ensure reliable deployment of applications, developers can use OCIR not only as a private registry but also as a public registry, facilitating the pulling of images from public repositories for users with internet access. 01:55 Lois: But what sets OCIR apart? Mahendra: What sets OCIR apart is its compliance with the Open Container Initiative standards, allowing the storage of container images conforming to the OCI specifications. It goes a step further by supporting manifest lists, sometimes known as multi-architecture images, accommodating diverse architectures like ARM and AMD64. Additionally, OCIR extends its support to Helm charts. Security is a priority with OCIR, offering private access through a service gateway. This means that OCI resources within a VCN in the same region can securely access OCIR without exposing them to the public internet. 02:46 Nikita: OK. What are some other key advantages of OCIR?  Mahendra: Firstly, OCIR seamlessly integrates with the Container Engine for Kubernetes, ensuring a cohesive container management experience. In terms of security, OCIR provides flexibility by allowing registries to be either private or public, giving administrators control over accessibility. It is intricately integrated with IAM, offering straightforward authentication through OCI Identity. Another notable benefit is regional availability. You can efficiently pull container images from the same region as your deployments. For high-performance, availability, and low-latency image operations, OCIR leverages the robust infrastructure of OCI, enhancing the overall reliability of image push and pull operations. OCIR ensures anywhere access, allowing you to utilize container CLI for image operations from various locations, be it on the cloud, on-premises, or even from personal laptops.  03:57 Lois: I believe OCIR has repository quotas? Is there a cap on them?  Mahendra: In each enabled region for your tenancy, you can establish up to 500 repositories with a cumulative storage limit of 500 GB. Each repository is capable of holding up to 100,000 images. Importantly, charges apply only for stored images. 04:21 Nikita: That’s good to know, Mahendra. I want to move on to basic container registry concepts. Maybe we can start with what an image is. Mahendra: Image is basically a read-only template with instructions for creating a container. It holds the application that you want to run as a container, along with any dependencies that are required. Container registry is an Open Container Initiative-compliant registry. As a result, you can store any artifacts that conform to Open Container Initiative specifications, such as Docker images, manifest lists, sometimes also known as multi-architecture images, and Helm charts. 05:02 Lois: And what’s a repository then? Mahendra: It's a meaningfully named collection of related images which are grouped together for convenience in a container registry. There are different versions of the same source image, which are grouped together into the same repository.  You can have multiple images stored under this repository. The only thing that you need to keep changing is the image version. Every image version is given a tag. And the tag uniquely identifies the image. 05:33 Lois: Is it possible to make the repository public or private? Mahendra: Depending upon your need, a repository can be made private or public. One important thing to note is that the user needs to have an OCI username and authentication token before being able to push/pull an image from the OCIR. 05:52 Nikita: There are so many terms that you come across when working with repositories and container registry, right? Could you take us through them and explain  how they relate to each other? I’ve heard of the region key and tenancy namespace. Mahendra: The region key identifies the container registry region that you are using. A tenancy namespace is an auto-generated random and immutable string of alphanumeric characters. The tenancy namespace can be retrieved from the value of your object storage namespace field. Repository name is the name of a repository in container registry, to and from which you can push and pull images. Repository names can include one or more slash characters and are unique across all the compartments in the entire tenancy. You should note that although a repository name can include slash characters, the slash does not represent a hierarchical directory structure. It is simply one character in the string of characters. As a convenience, you might choose to start the name of different repositories with the same string. A registry identifier is the combination of your container registry region key and the tenancy namespace.  07:07 Lois: What about an image tag and an image path? How do they differ from each other? Mahendra: A tag or an image tag is a string used to refer to a particular image in a known registry. The term "image name" is sometimes used as a shorthand way to refer to a particular image in a particular repository. A tag can be a numerical value or it can be a string. An image path is a fully qualified path to a particular image in a registry. It extends the repository path by adding tags associated with the image.  07:46 Do you want to stay ahead of the curve in the ever-evolving AI landscape? Look no further than our brand-new OCI Generative AI Professional course and certification. For a limited time only, we’re offering both the course and certification for free. So, don’t miss out on this exclusive opportunity to get certified on Generative AI at no cost. Act fast because this offer is valid only until July 31, 2024. Visit https://education.oracle.com/genai to get started. That’s https://education.oracle.com/genai. 08:24 Nikita: Welcome back! Mahendra, from what you’ve told us, OCIR seems like such a pivotal tool for modern containerized workflows, with its seamless integration, robust security measures, regional accessibility, efficient image management. So, how do we actually manage OCIR?  Mahendra: Managing OCIR can be done in three ways. Starting with managing the repository itself, followed by managing the images within the repository, and, last but not the least, managing the overall security of your repository alongside the images. 08:58 Nikita: Can we dive into each of these approaches in a little more detail? How does managing the repository itself work? Mahendra: You can create an empty repository in a compartment and give it a name that's unique across all the compartments in the entire tenancy. There is a limit to the number of repositories you can have in a given region in a tenancy. So, when you no longer need a repository, it makes sense to delete it from the Oracle Cloud Infrastructure registry. Make a note that when you delete a repository, it can take up to 48 hours for the deletion to take effect and for the storage to actually be released. When you create a new repository in Oracle Cloud Infrastructure Registry, you specify the compartment in which you want to create it. Having created the repository in one compartment, you can subsequently move it to a different compartment. The reasons can be many. It can be to change the users who are authorized to use the repository or to change how the billing for a repository is charged. 09:52 Lois: OK. And what about managing images within the repository?  Mahendra: You can view the images stored on OCIR using the OCI Console or using Docker images command from your Docker client after logging in to the OCIR repo. To push an image, you first used the Docker tag command to create a copy of the local source image as a new image. As a name for the new image, you specify the fully-qualified path to the target location in your container registry where you want to push the image, including the name of a repository. In order to pull an image, you must be logged in into the OCIR registry using the auth token and use the Docker pull command followed by a fully-qualified name of the image you wish to download on your Docker client. 10:36 Nikita: What happens when you no longer need an old image or you simply want to clean up the list of image tags in a repository? Mahendra: You can delete images from the Oracle Cloud Infrastructure Registry. You can undelete an image you've previously deleted for up to 48 hours after you deleted it. After that time, the image is permanently removed from the container registry. You can set up image retention policies to automatically delete images that meet particular selection criteria. 11:02 Lois: What sort of selection criteria? Mahendra: Criterias can be images that have not been pulled for a certain number of days or images that have not been tagged for a certain number of days. It can also be images that have not been given particular Docker tags specified as exempt from the automatic deletion. There's an hourly process that checks images against the selection criteria, and any that meet the selection criteria are automatically deleted. In each region in a tenancy, there's a global image retention policy. The default criteria of the policy is to retain all images so that no images are automatically deleted. However, you can change the global image retention policy so that the images are deleted if they meet certain criteria that you specify. A region's global image retention policy applies to all the repository within that region unless it is explicitly overridden by one or more custom image retention policies. Only one custom image retention policy at a time can be applied to a repository. If a repository has already been added to a custom retention policy and you want to add repository to a different custom retention policy, you have to remove the policy from the first retention policy before adding it to the second one. 12:15 Lois: Mahendra, what should we keep in mind when we’re dealing with the global image retention policy? Mahendra: The global image retention policy are specific to a particular region. To delete images consistently in different regions in your tenancy, you need to set up image retention policies in each region with identical selection criteria.  If you want to prevent images from being deleted on the basis of Docker tags they've been given, you need to specify those tags as exempt in a comma-separated list. When you want to clean up the list of images in a repository without actually deleting the images, you can remove the tags from the images in OCIR. Removing images is referred to as untagging. 12:53 Nikita: OK…and the last approach was managing the overall security of your repository alongside the images, right?  Mahendra: While managing security, you are given fine grained control over the operations that users are allowed to perform on repositories within the Container Registry. Using the concept of users and groups, you can control repository access by setting up identity access management policies at the tenancy and at the compartment level.  You can write policies to allow inspect, read, use, and manage operations on the repository based on the requirements. You can set up Oracle Cloud Infrastructure Registry to scan images in a repository for security vulnerabilities published in the publicly available common vulnerabilities and exposure databases. To perform image scanning, container registry makes use of the Oracle Cloud Infrastructure vulnerability-scanning service and vulnerability scanning REST API. 13:46 Nikita: What do I need to have in place before I can push and pull Docker images to and from Oracle Cloud Infrastructure Registry? Mahendra: The first thing is, your tenancy must be subscribed to one or more of the regions in which the container registry is available. You can check the same within the Oracle documentation. The next thing is, you need to have access to the Docker command line interface to push and pull images on your local machine. The third thing is, users must belong to a group to which a policy grants the appropriate permission or belong to a tenancies administrator group, which by default have access permissions on the container registry. Lastly, user must already have an Oracle Cloud Infrastructure username and an authentication token, which enables them to perform operations on the container registry. 14:29 Lois: Thank you, Mahendra, for sharing your insights on OCIR with us. To watch demos on managing OCIR, visit mylearn.oracle.com and search for the OCI Container Engine for Kubernetes Specialist course. Nikita: Mahendra will be back next week to walk us through the basics of Kubernetes. Until then, this is Nikita Abraham… Lois: And Lois Houston, signing off! 14:53 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
15m
04/06/2024

What is Containerization?

Welcome to a new season of the Oracle University Podcast, where we delve deep into the world of OCI Container Engine for Kubernetes. Join hosts Lois Houston and Nikita Abraham as they ask senior OCI instructor Mahendra Mehra about the transformative power of containers in application deployment and why they're so crucial in today's software ecosystem.   Uncover key differences between virtualization and containerization, and gain insights into Docker components and commands.   Getting Started with Oracle Cloud Infrastructure: https://oracleuniversitypodcast.libsyn.com/getting-started-with-oracle-cloud-infrastructure-1   Networking in OCI: https://oracleuniversitypodcast.libsyn.com/networking-in-oci   OCI Identity and Access Management: https://oracleuniversitypodcast.libsyn.com/oci-identity-and-access-management   OCI Container Engine for Kubernetes Specialist: https://mylearn.oracle.com/ou/course/oci-container-engine-for-kubernetes-specialist/134971/210836   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X (formerly Twitter): https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode.   ---------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:26 Lois: Hello and welcome to the Oracle University Podcast! I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor.  Nikita: Hi everyone! Welcome to a new season of the Oracle University Podcast. This time around, we’re going to delve into the world of OCI Container Engine for Kubernetes, or OKE. For the next couple of weeks, we’ll cover key aspects of OKE to help you create, manage, and optimize Kubernetes clusters in Oracle Cloud Infrastructure. 00:58 Lois: So, whether you’re a cloud native developer, Kubernetes administrator and developer, a DevOps engineer, or site reliability engineer who wants to enhance your expertise in leveraging the OCI OKE service for cloud native application solutions, you’ll want to tune in to these episodes for sure. And if that doesn’t sound like you, I’ll bet you will find the season interesting even if you’re just looking for a deep dive into this service. Nikita: That’s right, Lois. In today’s episode, we’ll focus on concepts of containerization, laying the foundation for your journey into the world of containers. And taking us through all this is Mahendra Mehra, a senior OCI instructor with Oracle University. 01:38 Lois: Hi Mahendra! We’re so glad to start our look at containerization with you today. Could you give us an overview? Why is it important in today's software world? Mahendra: Containerization is a form of virtualization, operates by running applications in isolated user spaces known as containers.  All these containers share the same underlying operating system. The container engine, pivotal in containerization technologies and container orchestration platforms, serves as the container runtime environment. It effectively manages the creation, deployment, and execution of containers. 02:18 Lois: Can you simplify this for a novice like me, maybe by giving us an analogy?  Mahendra: Imagine a container as a fully packaged and portable computing environment. It's like a digital suitcase that holds everything an application needs to run—binaries, libraries, configuration files, dependencies, you name it. And the best part, it's all encapsulated and isolated within container. 02:46 Nikita: Mahendra, how is containerization making our lives easier today?  Mahendra: In olden days, running an application meant matching it with your machine's operating system. For example, Windows software required a Windows machine. However, containerization has rewritten this narrative. Now, it's ancient history. With containerization, you create a single software package, a container that gracefully runs on any device or operating systems. What's fascinating is that these containers seamlessly run while sharing the host operating system. The container engine is like a shadow abstracted from the host operating system with limited access to underlying resources. Think of it as a super lightweight virtual machine. The beauty of this, the containerized application becomes a globetrotter, seamlessly running on bare metal within VMs or on the cloud platforms without needing tweaks for each environment. 03:52 Nikita: How is containerization different from traditional virtualization? Mahendra: On one side, we have traditional virtualization. It’s like having multiple houses on a single piece of land, and each house or virtual machine has its complete setup—wall, roofs, and utilities. This setup, while providing isolation, can be resource-intensive with each virtual machine carrying its entire operating system. Now, let's shift gears to containerization, the modern day superhero. Imagine a high-rise building where each floor represents a container. These containers share the same building or host operating system, but have their private space or isolated user space. Here's the magic. They are super lightweight, don't carry extra baggage of a full operating system and can swiftly move between different floors. 04:50 Lois: Ok, gotcha. That sounds pretty efficient! So, what are the direct benefits of containerization?  Mahendra: With containerization technology, there's less overhead during startup and no need to set up a separate guest OS for each application since they all share the same OS kernel. Because of this high efficiency, containerization is commonly used for packing up the many individual microservices that make up modern applications. Containerization unfolds a spectrum of benefits, delivering unparalleled portability as containers run uniformly across diverse platforms. This agility, fostered by open source container engines, empowers developers with cross-platform flexibility. The speed of containerized applications known for their lightweight nature reduces cost, boosts efficiency, and accelerates start times. Fault isolation ensures robustness, allowing independent operations without affecting others. Efficiency thrives as containers share the OS kernel and reusable layers, optimizing server utilization. The ease of management is achieved through orchestration platforms like Kubernetes automating essential tasks. Security remains paramount as container isolation and defined permissions fortify the infrastructure against malicious threats. Containerization emerges not just as a technology but as a transformative force, redefining how we build, deploy, and manage applications in the digital landscape. 06:37 Lois: It sure makes deployment efficient, scalability, and seamless! Mahendra, various components of Docker architecture work together to achieve containerization goals, right? Can you walk us through them? Mahendra: A developer or a DevOps professional communicates with Docker engine through the Docker client, which may be run on the same computer as Docker engine in case of development environments or through a remote shell. So whenever a developer fires a Docker command, the client sends them to the Docker Daemon which carries them out. The communication between the Docker client and the Docker host is usually taken place through REST APIs. The Docker clients can communicate with more than one Daemon at a time.  Docker Daemon is a persistent background process that manages Docker images, containers, networks, and storage volumes. The Docker Daemon constantly listens to the Docker API request from the Docker clients and processes them. Docker registries are services that provide locations from where you can store and download Docker images. In other words, a Docker registry contains repositories that host one or more Docker images. Public registries include Docker Hub and Docker Cloud and private registries can also be used. Oracle Cloud Infrastructure offers you services like OCIR, which is also called a container registry, where you can host your own private or public registry. 08:02 Do you want to stay ahead of the curve in the ever-evolving AI landscape? Look no further than our brand-new OCI Generative AI Professional course and certification. For a limited time only, we’re offering both the course and certification for free. So, don’t miss out on this exclusive opportunity to get certified on Generative AI at no cost. Act fast because this offer is valid only until July 31, 2024. Visit https://education.oracle.com/genai to get started. That’s https://education.oracle.com/genai. 08:39 Nikita: Welcome back! Mahendra, I’m wondering how virtual machines are different from containers. How do virtual machines work? Mahendra: A hypervisor or a virtual machine monitor is a software, firmware, or hardware that creates and runs virtual machines. It is placed between the hardware and the virtual machines, and is necessary to virtualize the server. Within each virtual machine runs a unique guest operating system. VMs with different operating systems can run on the same physical server. A Linux VM can sit alongside a Windows VM and so on. Each VM has its own binaries, libraries, and application that it services. And the VM may be many gigabytes in size. 09:22 Lois: What kind of benefits do we see from virtual machines? Mahendra: This technique provides a variety of benefits like the ability to consolidate applications into a single system, cost savings through reduced footprints, and faster server provisioning. But this approach has its own drawbacks. Each VM includes a separate operating system image, which adds overhead in memory and storage footprint. As it turns out, this issue adds complexity to all the stages of software development lifecycle, from development and test to production and disaster recovery as well. It also severely limits the portability of applications between different cloud providers and traditional data centers. And this is where containers come to the rescue.  10:05 Lois: OK…how do containers help in this situation? Mahendra: Containers sit on top of a physical server and its host operating system—typically, Linux or Windows. Each container shares the host OS kernel and usually the binaries and libraries as well. But the shared components are read only. Sharing OS resources such as libraries significantly reduces the need to reproduce the operating system code. A server can run multiple workloads with a single operating system installation. Containers are thus exceptionally lightweight. They are only megabytes in size and take just seconds to start. What this means in practice is you can put two or three times as many applications on a single server with containers than you can put on a virtual machine. Compared to containers, virtual machines take minutes to run and are order of magnitude larger than an equivalent container measured in gigabytes versus megabytes. 11:01 Nikita: So then, is there ever a time you should use a virtual machine? Mahendra: You should use a virtual machine when you want to run applications that specifically require a new OS, also when isolation and security are your priority over everything else. In most scenarios, a container will provide a lighter, faster, and more cost-effective solution than the virtual machines. 11:22 Lois: Now that we’ve discussed containerization and the different Docker components, can you tell us more about working with Docker images? We first need to know what a Dockerfile is, right?  Mahendra: A Dockerfile is a text file that defines a Docker image. You’ll use a Dockerfile to create your own custom Docker image. In other words, you use it to define your custom environment to be used in a Docker container. You'll want to create your own Dockerfile when existing images won't meet your project needs to different runtime requirements, which means that learning about Docker files is an essential part of working with Docker. Dockerfile is a step-by-step definition of building up a Docker image. It provides a set of standard instructions to be used in Dockerfile that Docker will execute when you issue a Docker build command. 12:09 Nikita: Before we wrap up, can you walk us through some Docker commands? Mahendra: Every Dockerfile must start with a FROM instruction. The idea behind this is that you need a starting point to build your image. It can be from scratch or from an existing image available in the Docker registry.  The RUN command is used to execute a command and will wait till the command finishes its execution. Since most of the images are Linux-based, a good practice is to set up a directory you will work in. That's the purpose of work directory line. It defines a directory and moves you in. The COPY instruction helps you to copy your source code into the image. ENV provides default values for variables that can be accessed within the containers. If your app needs to be reached from outside the container, you must open its listening port using the EXPOSE command. Once your application is ready to run, the last thing to do is to specify how to execute it. You must add the CMD line with the same command with all the arguments you used locally to launch your application. This command can also be used to execute commands at runtime for the containers, but we can be more flexible using the ENTRYPOINT command. Labels are used in Dockerfile to help organize your Docker images.   13:20 Lois: Thank you, Mahendra, for joining us today. I learned a lot! And if you want to learn more about working with Docker images, go to mylearn.oracle.com and search for the OCI Container Engine for Kubernetes Specialist course. The course is free so you can get started right away. Nikita: Yeah, a fundamental understanding of core OCI services, like Identity and Access Management, networking, compute, storage, and security, is a prerequisite to the course and will certainly serve you well when leveraging the OCI OKE service. And the quickest way to gain this knowledge is by completing the OCI Foundations Associate learning path on MyLearn and getting certified. You can also listen to episodes from our first season, called OCI Made Easy, where we discussed these topics. We’ll put a few links in the show notes so you can easily find them.  Lois: We’re looking forward to having Mahendra join us again next week when we’ll talk about container registries. Until next time, this is Lois Houston… Nikita: And Nikita Abraham signing off! 14:24 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
14m
28/05/2024

Encore Episode: OCI AI Services

Listen to Lois Houston and Nikita Abraham, along with Senior Principal Product Manager Wes Prichard, as they explore the five core components of OCI AI services: language, speech, vision, document understanding, and anomaly detection, to help you make better sense of all that unstructured data around you.   Oracle MyLearn: https://mylearn.oracle.com/ou/learning-path/become-an-oci-ai-foundations-associate-2023/127177   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X (formerly Twitter): https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Himanshu Raj, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 The world of artificial intelligence is vast and everchanging. And with all the buzz around it lately, we figured it was the perfect time to revisit our AI Made Easy series. Join us over the next few weeks as we chat about all things AI, helping you to discover its endless possibilities. Ready to dive in? Let’s go! 00:33 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:46 Nikita: Welcome to the Oracle University Podcast! I’m Nikita Abraham, Principal Technical Editor with Oracle University, and with me is Lois Houston, Director of Innovation Programs. Lois: Hi there! In our last episode, we spoke about OCI AI Portfolio, including AI and ML services, and the OCI AI infrastructure. Nikita: Yeah, and in today’s episode, we’re going to continue down a similar path and take a closer look at OCI AI services. 01:16 Lois: With us today is Senior Principal Product Manager, Wes Prichard. Hi Wes! It’s lovely to have you here with us. Hemant gave us a broad overview of the various OCI AI services last week, but we’re really hoping to get into each of them with you. So, let’s jump right in and start with the OCI Language service. What can you tell us about it? Wes: OCI Language analyzes unstructured text for you. It provides models trained on industry data to perform language analysis with no data science experience needed.  01:48 Nikita: What kind of big things can it do? Wes: It has five main capabilities. First, it detects the language of the text. It recognizes 75 languages, from Afrikaans to Welsh.  It identifies entities, things like names, places, dates, emails, currency, organizations, phone numbers--14 types in all. It identifies the sentiment of the text, and not just one sentiment for the entire block of text, but the different sentiments for different aspects.  02:17 Nikita: What do you mean by that, Wes? Wes: So let's say you read a restaurant review that said, the food was great, but the service sucked. You'll get food with a positive sentiment and service with a negative sentiment. And it also analyzes the sentiment for every sentence.  Lois: Ah, that’s smart. Ok, so we covered three capabilities. What else? Wes: It identifies key phrases in the text that represent the important ideas or subjects. And it classifies the general topic of the text from a list of 600 categories and subcategories.  02:48 Lois: Ok, and then there’s the OCI Speech service...  Wes: OCI Speech is very straightforward. It locks the data in audio tracks by converting speech to text. Developers can use Oracle's time-tested acoustic language models to provide highly accurate transcription for audio or video files across multiple languages.  OCI Speech automatically transcribes audio and video files into text using advanced deep learning techniques. There's no data science experience required. It processes data directly in object storage. And it generates timestamped, grammatically accurate transcriptions.  03:22 Nikita: What are some of the main features of OCI Speech? Wes: OCI Speech supports multiple languages, specifically English, Spanish, and Portuguese, with more coming in the future. It has batching support where multiple files can be submitted with a single call. It has blazing fast processing. It can transcribe hours of audio in less than 10 minutes. It does this by chunking up your audio into smaller segments, and transcribing each segment, and then joining them all back together into a single file. It provides a confidence score, both per word and per transcription. It punctuates transcriptions to make the text more readable and to allow downstream systems to process the text with less friction.  And it has SRT file support.  04:06 Lois: SRT? What’s that? Wes: SRT is the most popular closed caption output file format. And with this SRT support, users can add closed captions to their video. OCI Speech makes transcribed text more readable to resemble how humans write. This is called normalization. And the service will normalize things like addresses, times, numbers, URLs, and more.  It also does profanity filtering, where it can either remove, mask, or tag profanity and output text, where removing replaces the word with asterisks, and masking does the same thing, but it retains the first letter, and tagging will leave the word in place, but it provides tagging in the output data.  04:49 Nikita: And what about OCI Vision? What are its capabilities? Wes: Vision is a computed vision service that works on images, and it provides two main capabilities-- image analysis and document AI. Image analysis analyzes photographic images. Object detection is the feature that detects objects inside an image using a bounding box and assigning a label to each object with an accuracy percentage. Object detection also locates and extracts text that appears in the scene, like on a sign.  Image classification will assign classification labels to the image by identifying the major features in the scene. One of the most powerful capabilities of image analysis is that, in addition to pretrained models, users can retrain the models with their own unique data to fit their specific needs.  05:40 Lois: So object detection and image classification are features of image analysis. I think I got it! So then what’s document AI?  Wes: It's used for working with document images. You can use it to understand PDFs or document image types, like JPEG, PNG, and Tiff, or photographs containing textual information.  06:01 Lois: And what are its most important features? Wes: The features of document AI are text recognition, also known as OCR or optical character recognition.  And this extracts text from images, including non-trivial scenarios, like handwritten texts, plus tilted, shaded, or rotated documents. Document classification classifies documents into 10 different types based on visual appearance, high-level features, and extracted keywords. This is useful when you need to process a document, based on its classification, like an invoice, a receipt, or a resume.  Language detection analyzes the visual features of text to determine the language rather than relying on the text itself. Table extraction identifies tables in docs and extracts their content in tabular form. Key value extraction finds values for 13 common fields and line items in receipts, things like merchant name and transaction date.  07:02 Want to get the inside scoop on Oracle University? Head over to the Oracle University Learning Community. Attend exclusive events. Read up on the latest news. Get first-hand access to new products. Read the OU Learning Blog. Participate in Challenges. And stay up-to-date with upcoming certification opportunities. Visit mylearn.oracle.com to get started.  07:27 Nikita: Welcome back! Wes, I want to ask you about OCI Anomaly Detection. We discussed it a bit last week and it seems like such an intelligent and efficient service. Wes: Oracle Cloud Infrastructure Anomaly Detection identifies anomalies in time series data. Equipment sensors generate time series data, but all kinds of business metrics are also time-based. The unique feature of this service is that it finds anomalies, not just in a single signal, but across many signals at once. That's important because machines often generate multiple signals at once and the signals are often related.  08:03 Nikita: Ok you need to give us an example of this! Wes: Think of a pump that has an output pressure, a flow rate, an RPM, and an electrical current draw. When a pump's going to fail, anomalies may appear across several of those signals but at different times. OCI Anomaly Detection helps you to identify anomalies in a multivariate data set by taking advantage of the interrelationship among signals.  The service contains algorithms for both multi-signal, as in multivariate, single signal, as in univariate anomaly detection, and it automatically determines which algorithm to use based on the training data provided. The multivariate algorithm is called MSET-2, which stands for Multivariate State Estimation technique, and it's unique to Oracle.  08:49 Lois: And the 2? Wes: The 2 in the name refers to the patented enhancements by Oracle labs that automatically identify and fix data quality issues resulting in fewer false alarms and more accurate results.  Now unlike some of the other AI services, OCI Anomaly Detection is always trained on the customer's data. It's trained using actual historical data with no anomalies, and there can be as many different trained models as needed for different sets of signals.  09:18 Nikita: So where would one use a service like this? Wes: One of the most obvious applications of this service is for predictive maintenance. Early warning of a problem provides the opportunity to deploy maintenance resources and schedule downtime to minimize disruption to the business.  09:33 Lois: How would you train an OCI Anomaly Detection model? Wes: It's a simple four-step process to prepare a model that can be used for anomaly detection. The first step is to obtain training data from the system to be monitored. The data must contain no anomalies and should cover the normal range of values that would be experienced in a full business cycle.  Second, the training data file is uploaded to an object storage bucket.  Third, a data set is created for the training data. So a data set in this context is an object in the OCI Anomaly Detection service to manage data used for training and testing models.  And fourth, the model is trained. A wizard in the user interface steps the user through the required inputs, such as the training data set and some training parameters like the target false alarm probability.  10:23 Lois: How would this service know about the data and whether the trained model is univariate or multivariate? Wes: When training OCI Anomaly Detection models, the user does not need to specify whether the intended model is for multivariate or univariate data. It does this detection automatically.  For example, if a model is trained with 10 signals and 5 of those signals are determined to be correlated enough for multivariate anomaly detection, it will create an internal multivariate model for those signals. If the other five signals are not correlated with each other, it will create an internal univariate model for each one.  From the user's perspective, the result will be a single OCI anomaly detection model for the 10 signals. But internally, the signals are treated differently based on the training. A user can also train a model on a single signal and it will result in a univariate model.  11:16 Lois: What does this OCI Anomaly Detection model training entail? How does it ensure that it does not have any false alarms? Wes: Training a model requires a single data file with no anomalies that should cover a complete business cycle, which means it should represent all the normal variations in the signal. During training, OCI Anomaly Detection will use a portion of the data for training and another portion for automated testing. The fraction used for each is specified when the model is trained.  When model training is complete, it's best practice to do another test of the model with a data set containing anomalies to see if the anomalies are detected and if there are any false alarms. Based on the outcome, the user may want to retrain the model and specify a different false alarm probability, also called F-A-P or FAP. The FAP is the probability that the model would produce a false alarm. The false alarm probability can be thought of as the sensitivity of the model. The lower the false alarm probability, the less likelihood of it reporting a false alarm, but the less sensitive it will be to detecting anomalies. Selecting the right FAP is a business decision based on the need for sensitive detections balanced by the ability to tolerate false alarms.  Once a model has been trained and the user is satisfied with its detection performance, it can then be used for inferencing.  12:44 Nikita: Inferencing? Is that what I think it is?  Wes: New data is submitted to the model and OCI Anomaly Detection will respond with anomalies that are detected. The input data must contain the same signals that the model was trained on. So, for example, if the model was trained on signals A, B, C, and D, then for detection inferencing, the same four signals must be provided. No more, no less. 13:07 Lois: Where can I find the features of OCI Anomaly Detection that you mentioned?  Wes: The training and inferencing features of OCI Anomaly Detection can be accessed through the OCI console. However, a human-driven interface is not efficient for most business scenarios.  In most cases, automating the detection of anomalies through software is preferred to be able to process hundreds or thousands of signals using many trained models. The service provides multiple software interfaces for this purpose.  Each trained model is accessible through a REST API and an HTTP endpoint. Additionally, programming language-specific SDKs are available for multiple languages, including Python. Using the Python SDK, data scientists can work with OCI Anomaly Detection for both training and inferencing in an OCI Data Science notebook.  13:58 Nikita: How can a data scientist take advantage of these capabilities?  Wes: Well, you can write code against the REST API or use any of the various language SDKs. But for data scientists working in OCI Data Science, it makes sense to use Python.  14:12 Lois: That’s exciting! What does it take to use the Python SDK in a notebook… to be able to use the AI services? Wes: You can use a Notebook session in OCI Data Science to invoke the SDK for any of the AI services.  This might be useful to generate new features for a custom model or simply as a way to consume the service using a familiar Python interface. But before you can invoke the SDK, you have to prepare the data science notebook session by supplying it with an API Signing Key.  Signing Key is unique to a particular user and tenancy and authenticates that user to OCI when invoking the SDK. So therefore, you want to make sure you safeguard your Signing Key and never share it with another user.  14:55 Nikita: And where would I get my API Signing Key? Wes: You can obtain an API Signing Key from your user profile in the OCI Console. Then you save that key as a file to your local machine.  The API Signing Key also provides commands to be added to a config file that the SDK expects to find in the environment, where the SDK code is executing. The config file then references the key file. Once these files are prepared on your local machine, you can upload them to the Notebook session, where you will execute SDK code for the AI service.  The API Signing Key and config file can be reused with any of your notebook sessions, and the same files also work for all of the AI services. So, the files only need to be created once for each user and tenancy combination.  15:48 Lois: Thank you so much, Wes, for this really insightful discussion. To learn more about the topics covered today, you can visit mylearn.oracle.com and search for the Oracle Cloud Infrastructure AI Foundations course. Nikita: And remember, that course prepares you for the Oracle Cloud Infrastructure AI Foundations Associate certification that you can take for free! So, don’t wait too long to check it out. Join us next week for another episode of the Oracle University Podcast. Until then, this is Nikita Abraham… Lois Houston: And Lois Houston, signing off! 16:23 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
16m
21/05/2024

Encore Episode: The OCI AI Portfolio

Oracle has been actively focusing on bringing AI to the enterprise at every layer of its tech stack, be it SaaS apps, AI services, infrastructure, or data.   In this episode, hosts Lois Houston and Nikita Abraham, along with senior instructors Hemant Gahankari and Himanshu Raj, discuss OCI AI and Machine Learning services. They also go over some key OCI Data Science concepts and responsible AI principles.   Oracle MyLearn: https://mylearn.oracle.com/ou/learning-path/become-an-oci-ai-foundations-associate-2023/127177   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X (formerly Twitter): https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Himanshu Raj, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 The world of artificial intelligence is vast and everchanging. And with all the buzz around it lately, we figured it was the perfect time to revisit our AI Made Easy series. Join us over the next few weeks as we chat about all things AI, helping you to discover its endless possibilities. Ready to dive in? Let’s go! 00:33 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:46 Lois: Welcome to the Oracle University Podcast! I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor. Nikita: Hey everyone! In our last episode, we dove into Generative AI and Language Learning Models.  Lois: Yeah, that was an interesting one. But today, we’re going to discuss the AI and machine learning services offered by Oracle Cloud Infrastructure, and we’ll look at the OCI AI infrastructure. Nikita: I’m also going to try and squeeze in a couple of questions on a topic I’m really keen about, which is responsible AI. To take us through all of this, we have two of our colleagues, Hemant Gahankari and Himanshu Raj. Hemant is a Senior Principal OCI Instructor and Himanshu is a Senior Instructor on AI/ML. So, let’s get started! 01:36 Lois: Hi Hemant! We’re so excited to have you here! We know that Oracle has really been focusing on bringing AI to the enterprise at every layer of our stack.  Hemant: It all begins with data and infrastructure layers. OCI AI services consume data, and AI services, in turn, are consumed by applications.  This approach involves extensive investment from infrastructure to SaaS applications. Generative AI and massive scale models are the more recent steps. Oracle AI is the portfolio of cloud services for helping organizations use the data they may have for the business-specific uses.  Business applications consume AI and ML services. The foundation of AI services and ML services is data. AI services contain pre-built models for specific uses. Some of the AI services are pre-trained, and some can be additionally trained by the customer with their own data.  AI services can be consumed by calling the API for the service, passing in the data to be processed, and the service returns a result. There is no infrastructure to be managed for using AI services.  02:58 Nikita: How do I access OCI AI services? Hemant: OCI AI services provide multiple methods for access. The most common method is the OCI Console. The OCI Console provides an easy to use, browser-based interface that enables access to notebook sessions and all the features of all the data science, as well as AI services.  The REST API provides access to service functionality but requires programming expertise. And API reference is provided in the product documentation. OCI also provides programming language SDKs for Java, Python, TypeScript, JavaScript, .Net, Go, and Ruby. The command line interface provides both quick access and full functionality without the need for scripting.  03:52 Lois: Hemant, what are the types of OCI AI services that are available?  Hemant: OCI AI services is a collection of services with pre-built machine learning models that make it easier for developers to build a variety of business applications. The models can also be custom trained for more accurate business results. The different services provided are digital assistant, language, vision, speech, document understanding, anomaly detection.  04:24 Lois: I know we’re going to talk about them in more detail in the next episode, but can you introduce us to OCI Language, Vision, and Speech? Hemant: OCI Language allows you to perform sophisticated text analysis at scale. Using the pre-trained and custom models, you can process unstructured text to extract insights without data science expertise. Pre-trained models include language detection, sentiment analysis, key phrase extraction, text classification, named entity recognition, and personal identifiable information detection.  Custom models can be trained for named entity recognition and text classification with domain-specific data sets. In text translation, natural machine translation is used to translate text across numerous languages.  Using OCI Vision, you can upload images to detect and classify objects in them. Pre-trained models and custom models are supported. In image analysis, pre-trained models perform object detection, image classification, and optical character recognition. In image analysis, custom models can perform custom object detection by detecting the location of custom objects in an image and providing a bounding box.  The OCI Speech service is used to convert media files to readable texts that's stored in JSON and SRT format. Speech enables you to easily convert media files containing human speech into highly exact text transcriptions.  06:12 Nikita: That’s great. And what about document understanding and anomaly detection? Hemant: Using OCI document understanding, you can upload documents to detect and classify text and objects in them. You can process individual files or batches of documents. In OCR, document understanding can detect and recognize text in a document. In text extraction, document understanding provides the word level and line level text, and the bounding box, coordinates of where the text is found.  In key value extraction, document understanding extracts a predefined list of key value pairs of information from receipts, invoices, passports, and driver IDs. In table extraction, document understanding extracts content in tabular format, maintaining the row and column relationship of cells. In document classification, the document understanding classifies documents into different types.  The OCI Anomaly Detection service is a service that analyzes large volume of multivariate or univariate time series data. The Anomaly Detection service increases the reliability of businesses by monitoring their critical assets and detecting anomalies early with high precision. Anomaly Detection is the identification of rare items, events, or observations in data that differ significantly from the expectation.  07:55 Nikita: Where is Anomaly Detection most useful? Hemant: The Anomaly Detection service is designed to help with analyzing large amounts of data and identifying the anomalies at the earliest possible time with maximum accuracy. Different sectors, such as utility, oil and gas, transportation, manufacturing, telecommunications, banking, and insurance use Anomaly Detection service for their day-to-day activities.  08:23 Lois: Ok…and the first OCI AI service you mentioned was digital assistant… Hemant: Oracle Digital Assistant is a platform that allows you to create and deploy digital assistants, which are AI driven interfaces that help users accomplish a variety of tasks with natural language conversations. When a user engages with the Digital Assistant, the Digital Assistant evaluates the user input and routes the conversation to and from the appropriate skills.  Digital Assistant greets the user upon access. Upon user requests, list what it can do and provide entry points into the given skills. It routes explicit user requests to the appropriate skills. And it also handles interruptions to flows and disambiguation. It also handles requests to exit the bot.  09:21 Nikita: Excellent! Let’s bring Himanshu in to tell us about machine learning services. Hi Himanshu! Let’s talk about OCI Data Science. Can you tell us a bit about it? Himanshu: OCI Data Science is the cloud service focused on serving the data scientist throughout the full machine learning life cycle with support for Python and open source.  The service has many features, such as model catalog, projects, JupyterLab notebook, model deployment, model training, management, model explanation, open source libraries, and AutoML.  09:56 Lois: Himanshu, what are the core principles of OCI Data Science?  Himanshu: There are three core principles of OCI Data Science. The first one, accelerated. The first principle is about accelerating the work of the individual data scientist. OCI Data Science provides data scientists with open source libraries along with easy access to a range of compute power without having to manage any infrastructure. It also includes Oracle's own library to help streamline many aspects of their work.  The second principle is collaborative. It goes beyond an individual data scientist’s productivity to enable data science teams to work together. This is done through the sharing of assets, reducing duplicative work, and putting reproducibility and auditability of models for collaboration and risk management.  Third is enterprise grade. That means it's integrated with all the OCI Security and access protocols. The underlying infrastructure is fully managed. The customer does not have to think about provisioning compute and storage. And the service handles all the maintenance, patching, and upgrades so user can focus on solving business problems with data science.  11:11 Nikita: Let’s drill down into the specifics of OCI Data Science. So far, we know it’s cloud service to rapidly build, train, deploy, and manage machine learning models. But who can use it? Where is it? And how is it used? Himanshu: It serves data scientists and data science teams throughout the full machine learning life cycle.  Users work in a familiar JupyterLab notebook interface, where they write Python code. And how it is used? So users preserve their models in the model catalog and deploy their models to a managed infrastructure.  11:46 Lois: Walk us through some of the key terminology that’s used. Himanshu: Some of the important product terminology of OCI Data Science are projects. The projects are containers that enable data science teams to organize their work. They represent collaborative work spaces for organizing and documenting data science assets, such as notebook sessions and models.  Note that tenancy can have as many projects as needed without limits. Now, this notebook session is where the data scientists work. Notebook sessions provide a JupyterLab environment with pre-installed open source libraries and the ability to add others. Notebook sessions are interactive coding environment for building and training models.  Notebook sessions run in a managed infrastructure and the user can select CPU or GPU, the compute shape, and amount of storage without having to do any manual provisioning. The other important feature is Conda environment. It's an open source environment and package management system and was created for Python programs.  12:53 Nikita: What is a Conda environment used for? Himanshu: It is used in the service to quickly install, run, and update packages and their dependencies. Conda easily creates, saves, loads, and switches between environments in your notebooks sessions. 13:07 Nikita: Earlier, you spoke about the support for Python in OCI Data Science. Is there a dedicated library? Himanshu: Oracle's Accelerated Data Science ADS SDK is a Python library that is included as part of OCI Data Science.  ADS has many functions and objects that automate or simplify the steps in the data science workflow, including connecting to data, exploring, and visualizing data. Training a model with AutoML, evaluating models, and explaining models. In addition, ADS provides a simple interface to access the data science service mode model catalog and other OCI services, including object storage.  13:45 Lois: I also hear a lot about models. What are models? Himanshu: Models define a mathematical representation of your data and business process. You create models in notebooks, sessions, inside projects.  13:57 Lois: What are some other important terminologies related to models? Himanshu: The next terminology is model catalog. The model catalog is a place to store, track, share, and manage models.  The model catalog is a centralized and managed repository of model artifacts. A stored model includes metadata about the provenance of the model, including Git-related information and the script. Our notebook used to push the model to the catalog. Models stored in the model catalog can be shared across members of a team, and they can be loaded back into a notebook session.  The next one is model deployments. Model deployments allow you to deploy models stored in the model catalog as HTTP endpoints on managed infrastructure.  14:45 Lois: So, how do you operationalize these models? Himanshu: Deploying machine learning models as web applications, HTTP API endpoints, serving predictions in real time is the most common way to operationalize models. HTTP endpoints or the API endpoints are flexible and can serve requests for the model predictions. Data science jobs enable you to define and run a repeatable machine learning tasks on fully managed infrastructure.  Nikita: Thanks for that, Himanshu.  15:18 Did you know that Oracle University offers free courses on Oracle Cloud Infrastructure? You’ll find training on everything from cloud computing, database, and security, artificial intelligence, and machine learning, all free to subscribers. So, what are you waiting for? Pick a topic, leverage the Oracle University Learning Community to ask questions, and then sit for your certification. Visit mylearn.oracle.com to get started.  15:46 Nikita: Welcome back! The Oracle AI Stack consists of AI services and machine learning services, and these services are built using AI infrastructure. So, let’s move on to that. Hemant, what are the components of OCI AI Infrastructure? Hemant: OCI AI Infrastructure is mainly composed of GPU-based instances. Instances can be virtual machines or bare metal machines. High performance cluster networking that allows instances to communicate to each other. Super clusters are a massive network of GPU instances with multiple petabytes per second of bandwidth. And a variety of fully managed storage options from a single byte to exabytes without upfront provisioning are also available.  16:35 Lois: Can we explore each of these components a little more? First, tell us, why do we need GPUs? Hemant: ML and AI needs lots of repetitive computations to be made on huge amounts of data. Parallel computing on GPUs is designed for many processes at the same time. A GPU is a piece of hardware that is incredibly good in performing computations.  GPU has thousands of lightweight cores, all working on their share of data in parallel. This gives them the ability to crunch through extremely large data set at tremendous speed.  17:14 Nikita: And what are the GPU instances offered by OCI? Hemant: GPU instances are ideally suited for model training and inference. Bare metal and virtual machine compute instances powered by NVIDIA GPUs H100, A100, A10, and V100 are made available by OCI.  17:35 Nikita: So how do we choose what to train from these different GPU options?  Hemant: For large scale AI training, data analytics, and high performance computing, bare metal instances BM 8 X NVIDIA H100 and BM 8 X NVIDIA A100 can be used.  These provide up to nine times faster AI training and 30 times higher acceleration for AI inferencing. The other bare metal and virtual machines are used for small AI training, inference, streaming, gaming, and virtual desktop infrastructure.  18:14 Lois: And why would someone choose the OCI AI stack over its counterparts? Hemant: Oracle offers all the features and is the most cost effective option when compared to its counterparts.  For example, BM GPU 4.8 version 2 instance costs just $4 per hour and is used by many customers.  Superclusters are a massive network with multiple petabytes per second of bandwidth. It can scale up to 4,096 OCI bare metal instances with 32,768 GPUs.  We also have a choice of bare metal A100 or H100 GPU instances, and we can select a variety of storage options, like object store, or block store, or even file system. For networking speeds, we can reach 1,600 GB per second with A100 GPUs and 3,200 GB per second with H100 GPUs.  With OCI storage, we can select local SSD up to four NVMe drives, block storage up to 32 terabytes per volume, object storage up to 10 terabytes per object, file systems up to eight exabyte per file system. OCI File system employs five replicated storage located in different fault domains to provide redundancy for resilient data protection.  HPC file systems, such as BeeGFS and many others are also offered. OCI HPC file systems are available on Oracle Cloud Marketplace and make it easy to deploy a variety of high performance file servers.  20:11 Lois: I think a discussion on AI would be incomplete if we don’t talk about responsible AI. We’re using AI more and more every day, but can we actually trust it? Hemant: For us to trust AI, it must be driven by ethics that guide us as well. Nikita: And do we have some principles that guide the use of AI? Hemant: AI should be lawful, complying with all applicable laws and regulations. AI should be ethical, that is it should ensure adherence to ethical principles and values that we uphold as humans. And AI should be robust, both from a technical and social perspective. Because even with the good intentions, AI systems can cause unintentional harm. AI systems do not operate in a lawless world. A number of legally binding rules at national and international level apply or are relevant to the development, deployment, and use of AI systems today. The law not only prohibits certain actions but also enables others, like protecting rights of minorities or protecting environment. Besides horizontally applicable rules, various domain-specific rules exist that apply to particular AI applications. For instance, the medical device regulation in the health care sector.  In AI context, equality entails that the systems’ operations cannot generate unfairly biased outputs. And while we adopt AI, citizens right should also be protected.  21:50 Lois: Ok, but how do we derive AI ethics from these? Hemant: There are three main principles.  AI should be used to help humans and allow for oversight. It should never cause physical or social harm. Decisions taken by AI should be transparent and fair, and also should be explainable. AI that follows the AI ethical principles is responsible AI.  So if we map the AI ethical principles to responsible AI requirements, these will be like, AI systems should follow human-centric design principles and leave meaningful opportunity for human choice. This means securing human oversight. AI systems and environments in which they operate must be safe and secure, they must be technically robust, and should not be open to malicious use.  The development, and deployment, and use of AI systems must be fair, ensuring equal and just distribution of both benefits and costs. AI should be free from unfair bias and discrimination. Decisions taken by AI to the extent possible should be explainable to those directly and indirectly affected.  23:21 Nikita: This is all great, but what does a typical responsible AI implementation process look like?  Hemant: First, a governance needs to be put in place. Second, develop a set of policies and procedures to be followed. And once implemented, ensure compliance by regular monitoring and evaluation.  Lois: And this is all managed by developers? Hemant: Typical roles that are involved in the implementation cycles are developers, deployers, and end users of the AI.  23:56 Nikita: Can we talk about AI specifically in health care? How do we ensure that there is fairness and no bias? Hemant: AI systems are only as good as the data that they are trained on. If that data is predominantly from one gender or racial group, the AI systems might not perform as well on data from other groups.  24:21 Lois: Yeah, and there’s also the issue of ensuring transparency, right? Hemant: AI systems often make decisions based on complex algorithms that are difficult for humans to understand. As a result, patients and health care providers can have difficulty trusting the decisions made by the AI. AI systems must be regularly evaluated to ensure that they are performing as intended and not causing harm to patients.  24:49 Nikita: Thank you, Hemant and Himanshu, for this really insightful session. If you’re interested in learning more about the topics we discussed today, head on over to mylearn.oracle.com and search for the Oracle Cloud Infrastructure AI Foundations course.  Lois: That’s right, Niki. You’ll find demos that you watch as well as skill checks that you can attempt to better your understanding. In our next episode, we’ll get into the OCI AI Services we discussed today and talk about them in more detail. Until then, this is Lois Houston… Nikita: And Nikita Abraham, signing off! 25:25 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
16m
14/05/2024

Encore Episode: Generative AI and Large Language Models

In this week’s episode, Lois Houston and Nikita Abraham, along with Senior Instructor Himanshu Raj, take you through the extraordinary capabilities of Generative AI, a subset of deep learning that doesn’t make predictions but rather creates its own content.   They also explore the workings of Large Language Models.   Oracle MyLearn: https://mylearn.oracle.com/ou/learning-path/become-an-oci-ai-foundations-associate-2023/127177   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X (formerly Twitter): https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode.   ---------------------------------------------------------   Episode Transcript:   00:00 The world of artificial intelligence is vast and everchanging. And with all the buzz around it lately, we figured it was the perfect time to revisit our AI Made Easy series. Join us over the next few weeks as we chat about all things AI, helping you to discover its endless possibilities. Ready to dive in? Let’s go! 00:33 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:46 Lois: Hello and welcome to the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor.  Nikita: Hi everyone! In our last episode, we went over the basics of deep learning. Today, we’ll look at generative AI and large language models, and discuss how they work. To help us with that, we have Himanshu Raj, Senior Instructor on AI/ML. So, let’s jump right in. Hi Himanshu, what is generative AI?  01:21 Himanshu: Generative AI refers to a type of AI that can create new content. It is a subset of deep learning, where the models are trained not to make predictions but rather to generate output on their own.  Think of generative AI as an artist who looks at a lot of paintings and learns the patterns and styles present in them. Once it has learned these patterns, it can generate new paintings that resembles what it learned. 01:48 Lois: Let's take an example to understand this better. Suppose we want to train a generative AI model to draw a dog. How would we achieve this? Himanshu: You would start by giving it a lot of pictures of dogs to learn from. The AI does not know anything about what a dog looks like. But by looking at these pictures, it starts to figure out common patterns and features, like dogs often have pointy ears, narrow faces, whiskers, etc. You can then ask it to draw a new picture of a dog.  The AI will use the patterns it learned to generate a picture that hopefully looks like a dog. But remember, the AI is not copying any of the pictures it has seen before but creating a new image based on the patterns it has learned. This is the basic idea behind generative AI. In practice, the process involves a lot of complex maths and computation, and there are different techniques and architectures that can be used, such as variational autoencoders (VAs) and Generative Adversarial Networks (GANs).  02:48 Nikita: Himanshu, where is generative AI used in the real world? Himanshu: Generative AI models have a wide variety of applications across numerous domains. For the image generation, generative models like GANs are used to generate realistic images. They can be used for tasks, like creating artwork, synthesizing images of human faces, or transforming sketches into photorealistic images.  For text generation, large language models like GPT 3, which are generative in nature, can create human-like text. This has applications in content creation, like writing articles, generating ideas, and again, conversational AI, like chat bots, customer service agents. They are also used in programming for code generation and debugging, and much more.  For music generation, generative AI models can also be used. They create new pieces of music after being trained on a specific style or collection of tunes. A famous example is OpenAI's MuseNet. 03:42 Lois: You mentioned large language models in the context of text-based generative AI. So, let’s talk a little more about it. Himanshu, what exactly are large language models? Himanshu: LLMs are a type of artificial intelligence models built to understand, generate, and process human language at a massive scale. They were primarily designed for sequence to sequence tasks such as machine translation, where an input sequence is transformed into an output sequence.  LLMs can be used to translate text from one language to another. For example, an LLM could be used to translate English text into French. To do this job, LLM is trained on a massive data set of text and code which allows it to learn the patterns and relationships that exist between different languages. The LLM translates, “How are you?” from English to French, “Comment allez-vous?”  It can also answer questions like, what is the capital of France? And it would answer the capital of France is Paris. And it will write an essay on a given topic. For example, write an essay on French Revolution, and it will come up with a response like with a title and introduction. 04:53 Lois: And how do LLMs actually work? Himanshu: So, LLM models are typically based on deep learning architectures such as transformers. They are also trained on vast amount of text data to learn language patterns and relationships, again, with a massive number of parameters usually in order of millions or even billions. LLMs have also the ability to comprehend and understand natural language text at a semantic level. They can grasp context, infer meaning, and identify relationships between words and phrases.  05:26 Nikita: What are the most important factors for a large language model? Himanshu: Model size and parameters are crucial aspects of large language models and other deep learning models. They significantly impact the model’s capabilities, performance, and resource requirement. So, what is model size? The model size refers to the amount of memory required to store the model's parameter and other data structures. Larger model sizes generally led to better performance as they can capture more complex patterns and representation from the data.  The parameters are the numerical values of the model that change as it learns to minimize the model's error on the given task. In the context of LLMs, parameters refer to the weights and biases of the model's transformer layers. Parameters are usually measured in terms of millions or billions. For example, GPT-3, one of the largest LLMs to date, has 175 billion parameters making it extremely powerful in language understanding and generation.  Tokens represent the individual units into which a piece of text is divided during the processing by the model. In natural language, tokens are usually words, subwords, or characters. Some models have a maximum token limit that they can process and longer text can may require truncation or splitting. Again, balancing model size, parameters, and token handling is crucial when working with LLMs.  06:49 Nikita: But what’s so great about LLMs? Himanshu: Large language models can understand and interpret human language more accurately and contextually. They can comprehend complex sentence structures, nuances, and word meanings, enabling them to provide more accurate and relevant responses to user queries. This model can generate human-like text that is coherent and contextually appropriate. This capability is valuable for context creation, automated writing, and generating personalized response in applications like chatbots and virtual assistants. They can perform a variety of tasks.  Large language models are very versatile and adaptable to various industries. They can be customized to excel in applications such as language translation, sentiment analysis, code generation, and much more. LLMs can handle multiple languages making them valuable for cross-lingual tasks like translation, sentiment analysis, and understanding diverse global content.  Large language models can be again, fine-tuned for a specific task using a minimal amount of domain data. The efficiency of LLMs usually grows with more data and parameters. 07:55 Lois: You mentioned the “sequence to sequence tasks” earlier. Can you explain the concept in simple terms for us? Himanshu: Understanding language is difficult for computers and AI systems. The reason being that words often have meanings based on context. Consider a sentence such as Jane threw the frisbee, and her dog fetched it.  In this sentence, there are a few things that relate to each other. Jane is doing the throwing. The dog is doing the fetching. And it refers to the frisbee. Suppose we are looking at the word “it” in the sentence. As a human, we understand easily that “it” refers to the frisbee. But for a machine, it can be tricky. The goal in sequence problems is to find patterns, dependencies, or relationships within the data and make predictions, classification, or generate new sequences based on that understanding. 08:48 Lois: And where are sequence models mostly used? Himanshu: Some common example of sequence models includes natural language processing, which we call NLP, tasks such as machine translation, text generation sentiment analysis, language modeling involve dealing with sequences of words or characters.  Speech recognition. Converting audio signals into text, involves working with sequences of phonemes or subword units to recognize spoken words. Music generation. Generating new music involves modeling musical sequences, nodes, and rhythms to create original compositions.  Gesture recognition. Sequences of motion or hand gestures are used to interpret human movements for applications, such as sign language recognition or gesture-based interfaces. Time series analysis. In fields such as finance, economics, weather forecasting, and signal processing, time series data is used to predict future values, detect anomalies, and understand patterns in temporal data. 09:56 The Oracle University Learning Community is an excellent place to collaborate and learn with Oracle experts and fellow learners. Grow your skills, inspire innovation, and celebrate your successes. All your activities, from liking a post to answering questions and sharing with others, will help you earn a valuable reputation, badges, and ranks to be recognized in the community. Visit mylearn.oracle.com to get started.  10:23 Nikita: Welcome back! Himanshu, what would be the best way to solve those sequence problems you mentioned? Let’s use the same sentence, “Jane threw the frisbee, and her dog fetched it” as an example. Himanshu: The solution is transformers. It's like model has a bird's eye view of the entire sentence and can see how all the words relate to each other. This allows it to understand the sentence as a whole instead of just a series of individual words. Transformers with their self-attention mechanism can look at all the words in the sentence at the same time and understand how they relate to each other.  For example, transformer can simultaneously understand the connections between Jane and dog even though they are far apart in the sentence. 11:13 Nikita: But how? Himanshu: The answer is attention, which adds context to the text. Attention would notice dog comes after frisbee, fetched comes after dog, and it comes after fetched.  Transformer does not look at it in isolation. Instead, it also pays attention to all the other words in the sentence at the same time. But considering all these connections, the model can figure out that “it” likely refers to the frisbee.  The most famous current models that are emerging in natural language processing tasks consist of dozens of transformers or some of their variants, for example, GPT or Bert. 11:53 Lois: I was looking at the AI Foundations course on MyLearn and came across the terms “prompt engineering” and “fine tuning.” Can you shed some light on them? Himanshu: A prompt is the input or initial text provided to the model to elicit a specific response or behavior. So, this is something which you write or ask to a language model. Now, what is prompt engineering? So prompt engineering is the process of designing and formulating specific instructions or queries to interact with a large language model effectively.  In the context of large language models, such as GPT 3 or Burt, prompts are the input text or questions given to the model to generate responses or perform specific tasks.  The goal of prompt engineering is to ensure that the language model understands the user's intent correctly and provide accurate and relevant responses. 12:47 Nikita: That sounds easy enough, but fine tuning seems a bit more complex. Can you explain it with an example? Himanshu: Imagine you have a versatile recipe robot named chef bot. Suppose that chef bot is designed to create delicious recipes for any dish you desire.  Chef bot recognizes the prompt as a request for a pizza recipe, and it knows exactly what to do. However, if you want chef bot to be an expert in a particular type of cuisine, such as Italian dishes, you fine-tune chef bot for Italian cuisine by immersing it in a culinary crash course filled with Italian cookbooks, traditional Italian recipes, and even Italian cooking shows.  During this process, chef bot becomes more specialized in creating authentic Italian recipes, and this option is called fine tuning. LLMs are general purpose models that are pre-trained on large data sets but are often fine-tuned to address specific use cases.  When you combine prompt engineering and fine tuning, and you get a culinary wizard in chef bot, a recipe robot that is not only great at understanding specific dish requests but also capable of following a specific dish requests and even mastering the art of cooking in a particular culinary style. 14:08 Lois: Great! Now that we’ve spoken about all the major components, can you walk us through the life cycle of a large language model? Himanshu: The life cycle of a Large Language Model, LLM, involves several stages, from its initial pre-training to its deployment and ongoing refinement.  The first of this lifecycle is pre-training. The LLM is initially pre-trained on a large corpus of text data from the internet. During pre-training, the model learns grammar, facts, reasoning abilities, and general language understanding. The model predicts the next word in a sentence given the previous words, which helps it capture relationships between words and the structure of language.  The second phase is fine tuning initialization. After pre-training, the model's weights are initialized, and it's ready for task-specific fine tuning. Fine tuning can involve supervised learning on labeled data for specific tasks, such as sentiment analysis, translation, or text generation.  The model is fine-tuned on specific tasks using a smaller domain-specific data set. The weights from pre-training are updated based on the new data, making the model task aware and specialized. The next phase of the LLM life cycle is prompt engineering. So this phase craft effective prompts to guide the model's behavior in generating specific responses.  Different prompt formulations, instructions, or context can be used to shape the output.  15:34 Nikita: Ok… we’re with you so far. What’s next? Himanshu: The next phase is evaluation and iteration. So models are evaluated using various metrics to access their performance on specific tasks. Iterative refinement involves adjusting model parameters, prompts, and fine tuning strategies to improve results.  So as a part of this step, you also do few shot and one shot inference. If needed, you further fine tune the model with a small number of examples. Basically, few shot or a single example, one shot for new tasks or scenarios.  Also, you do the bias mitigation and consider the ethical concerns. These biases and ethical concerns may arise in models output. You need to implement measures to ensure fairness in inclusivity and responsible use.  16:28 Himanshu: The next phase in LLM life cycle is deployment. Once the model has been fine-tuned and evaluated, it is deployed for real world applications. Deployed models can perform tasks, such as text generation, translation, summarization, and much more. You also perform monitoring and maintenance in this phase.  So you continuously monitor the model's performance and output to ensure it aligns with desired outcomes. You also periodically update and retrain the model to incorporate new data and to adapt to evolving language patterns. This overall life cycle can also consist of a feedback loop, whether you gather feedbacks from users and incorporate it into the model’s improvement process.  You use this feedback to further refine prompts, fine tuning, and overall model behavior. RLHF, which is Reinforcement Learning with Human Feedback, is a very good example of this feedback loop. You also research and innovate as a part of this life cycle, where you continue to research and develop new techniques to enhance the model capability and address different challenges associated with it. 17:40 Nikita: As we’re talking about the LLM life cycle, I see that fine tuning is not only about making an LLM task specific. So, what are some other reasons you would fine tune an LLM model? Himanshu: The first one is task-specific adaptation. Pre-trained language models are trained on extensive and diverse data sets and have good general language understanding. They excel in language generation and comprehension tasks, though the broad understanding of language may not lead to optimal performance in specific task.  These models are not task specific. So the solution is fine tuning. The fine tuning process customizes the pre-trained models for a specific task by further training on task-specific data to adapt the model's knowledge.  The second reason is domain-specific vocabulary. Pre-trained models might lack knowledge of specific words and phrases essential for certain tasks in fields, such as legal, medical, finance, and technical domains. This can limit their performance when applied to domain-specific data.  Fine tuning enables the model to adapt and learn domain-specific words and phrases. These words could be, again, from different domains.  18:56 Himanshu: The third reason to fine tune is efficiency and resource utilization. So fine tuning is computationally efficient compared to training from scratch.  Fine tuning reuses the knowledge from pre-trained models, saving time and resources. Fine tuning requires fewer iterations to achieve task-specific competence. Shorter training cycles expedite the model development process. It conserves computational resources, such as GPU memory and processing power.  Fine tuning is efficient in quicker model deployment. It has faster time to production for real world applications. Fine tuning is, again, scalable, enabling adaptation to various tasks with the same base model, which further reduce resource demands, and it leads to cost saving for research and development.  The fourth reason to fine tune is of ethical concerns. Pre-trained models learns from diverse data. And those potentially inherit different biases. Fine tune might not completely eliminate biases. But careful curation of task-specific data ensures avoiding biased or harmful vocabulary. The responsible uses of domain-specific terms promotes ethical AI applications.  20:14 Lois: Thank you so much, Himanshu, for spending time with us. We had such a great time learning from you. If you want to learn more about the topics discussed today, head over to mylearn.oracle.com and get started on our free AI Foundations course. Nikita: Yeah, we even have a detailed walkthrough of the architecture of transformers that you might want to check out. Join us next week for a discussion on the OCI AI Portfolio. Until then, this is Nikita Abraham… Lois: And Lois Houston signing off! 20:44 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
21m
07/05/2024

Encore Episode: Deep Learning

Did you know that the concept of deep learning goes way back to the 1950s? However, it is only in recent years that this technology has created a tremendous amount of buzz (and for good reason!). A subset of machine learning, deep learning is inspired by the structure of the human brain, making it fascinating to learn about.   In this episode, Lois Houston and Nikita Abraham interview Senior Principal OCI Instructor Hemant Gahankari about deep learning concepts, including how Convolution Neural Networks work, and help you get your deep learning basics right.   Oracle MyLearn: https://mylearn.oracle.com/ou/learning-path/become-an-oci-ai-foundations-associate-2023/127177   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X (formerly Twitter): https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Himanshu Raj, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 The world of artificial intelligence is vast and everchanging. And with all the buzz around it lately, we figured it was the perfect time to revisit our AI Made Easy series. Join us over the next few weeks as we chat about all things AI, helping you to discover its endless possibilities. Ready to dive in? Let’s go! 00:33 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular  Oracle technologies. Let’s get started! 00:47 Lois: Hello and welcome to the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor. Nikita: Hi everyone!  Lois: Today, we’re going to focus on the basics of deep learning with our Senior Principal OCI Instructor, Hemant Gahankari. Nikita: Hi Hemant! Thanks for being with us today. So, to get started, what is deep learning? 01:14 Hemant: Hi Niki and hi Lois. So, deep learning is a subset of machine learning that focuses on training Artificial Neural Networks, abbreviated as ANN, to solve a task at hand. Say, for example, image classification. A very important quality of the ANN is that it can process raw data like pixels of an image and extract patterns from it. These patterns are treated as features to predict the outcomes.  Let us say we have a set of handwritten images of digits 0 to 9. As we know, everyone writes the digits in a slightly different way. So how do we train a machine to identify a handwritten digit? For this, we use ANN.  ANN accepts image pixels as inputs, extracts patterns like edges and curves and so on, and correlates these patterns to predict an outcome. That is what digit does the image has in this case.  02:17 Lois: Ok, so what you’re saying is given a bunch of pixels, ANN is able to process pixel data, learn an internal representation of the data, and predict outcomes. That’s so cool! So, why do we need deep learning? Hemant: We need to specify features while we train machine learning algorithm. With deep learning, features are automatically extracted from the data. Internal representation of features and their combinations is built to predict outcomes by deep learning algorithms. This may not be feasible manually.  Deep learning algorithms can make use of parallel computations. For this, usually data is split into small batches and processed parallelly. So these algorithms can process large amount of data in a short time to learn the features and their combinations. This leads to scalability and performance. In short, deep learning complements machine learning algorithms for complex data for which features cannot be described easily.  03:21 Nikita: What can you tell us about the origins of deep learning? Hemant: Some of the deep learning concepts like artificial neuron, perceptron, and multilayer perceptron existed as early as 1950s. One of the most important concept of using backpropagation for training ANN came in 1980s.  In 1990s, convolutional neural networks were also introduced for image analysis tasks. Starting 2000, GPUs were introduced. And 2010 onwards, GPUs became cheaper and widely available. This fueled the widespread adoption of deep learning uses like computer vision, natural language processing, speech recognition, text translation, and so on.  In 2012, major networks like AlexNet and Deep-Q Network were built. 2016 onward, generative use cases of the deep learning also started to come up. Today, we have widely adopted deep learning for a variety of use cases, including large language models and many other types of generative models.  04:32 Lois: Hemant, what are various applications of deep learning algorithms?  Hemant: Deep learning algorithms are targeted at a variety of data and applications. For data, we have images, videos, text, and audio. For images, applications can be image classification, object detection, and segmentation. For textual data, applications are to translate the text or detect a sentiment of a text. For audio, the applications can be music generation, speech to text, and so on.  05:05 Lois: It's important that we select the right deep learning algorithm based on the data and application, right? So how do we do that?  Hemant: For image tasks like image classification, object detection, image segmentation, or facial recognition, CNN is a suitable architecture. For text, we have a choice of the latest transformers or LSTM or even RNN. For generative tasks like text summarization or question answering, transformers is a good choice. For generating images, text to image generation, transformers, GANs, or diffusion models are available choices. 05:45 Nikita: Let’s dive a little deeper into Artificial Neural Networks. Can you tell us more about them, Hemant? Hemant: Artificial Neural Networks are inspired by the human brain. They are made up of interconnected nodes called as neurons.  Nikita: And how are inputs processed by a neuron?  Hemant: In ANN, we assign weights to the connection between neurons. Weighted inputs are added up. And if the sum crosses a specified threshold, the neuron is fired. And the outputs of a layer of neuron become an input to another layer.  06:16 Lois: Hemant, tell us about the building blocks of ANN so we understand this better. Hemant: So first building block is layers. We have input layer, output layer, and multiple hidden layers. The input layer and output layer are mandatory. And the hidden layers are optional. The layers consist of neurons. Neurons are computational units, which accept an input and produce an output.  Weights determine the strength of connection between neurons. So the connections could be between input and a neuron, or it could be between a neuron and another neuron. Activation functions work on the weighted sum of inputs to the neuron and produce an output. Additional input to the neuron that allows a certain degree of flexibility is called as a bias.  07:05 Nikita: I think we’ve got the components of ANN straight but maybe you should give us an example. You mentioned this example earlier…of needing to train ANN to recognize handwritten digits from images. How would we go about that? Hemant: For that, we have to collect a large number of digit images, and we need to train ANN using these images.  So, in this case, the images consist of 28 by 28 pixels, which act as input layer. For the output, we have 10 neurons which represent digits 0 to 9. And we have multiple hidden layers. So, for example, we have two hidden layers which are consisting of 16 neurons each.  The hidden layers are responsible for capturing the internal representation of the raw images. And the output layer is responsible for producing the desired outcomes. So, in this case, the desired outcome is the prediction of whether the digit is 0 or 1 or up to digit 9.  So how do we train this particular ANN? So the first thing we use is the backpropagation algorithm. During training, we show an image to the ANN. Let’s say it is an image of digit 2. So we expect output neuron for digit 2 to fire. But in real, let’s say output neuron of a digit 6 fired.  08:28 Lois: So, then, what do we do?  Hemant: We know that there is an error. So we correct an error. We adjust the weights of the connection between neurons based on a calculation, which we call as backpropagation algorithm. By showing thousands of images and adjusting the weights iteratively, ANN is able to predict correct outcomes for most of the input images. This process of adjusting weights through backpropagation is called as model training.  09:01 Do you have an idea for a new course or learning opportunity? We’d love to hear it! Visit the Oracle University Learning Community and share your thoughts with us on the Idea Incubator. Your suggestion could find a place in future development projects! Visit mylearn.oracle.com to get started.  09:22 Nikita: Welcome back! Let’s move on to CNN. Hemant, what is a Convolutional Neural Network?  Hemant: CNN is a type of deep learning model specifically designed for processing and analyzing grid-like data, such as images and videos. In the ANN, the input image is converted to a single dimensional array and given as an input to the network.   But that does not work well with the image data because image data is inherently two dimensional. CNN works better with the two dimensional data. The role of the CNN is to reduce the images into a form, which is easier to process and without losing features, which are critical for getting a good prediction.  10:10 Lois: A CNN has different layers, right? Could you tell us a bit about them?  Hemant: The first one is input layer. Input layer is followed by feature extraction layers, which is a combination and repetition of convolutional layer with ReLu activation and a pooling layer.  And this is followed by a classification layer. These are the fully connected output layers, where the classification occurs as output classes. The class with the highest probability is the predicted class. And finally, we have the dropout layer. This layer is a regularization technique used to prevent overfitting in the network.  10:51 Nikita: And what are the top applications of CNN? Hemant: One of the most widely used applications of CNNs is image classification. For example, classifying whether an image contains a specific object, say cat or a dog.  CNNs are also used for object detection tasks. The goal here is to draw bounding boxes around objects in an image. CNNs can perform pixel-level segmentation, where each pixel in the image is labeled to represent different objects or regions. CNNs are employed for face recognition tasks as well, identifying and verifying individuals based on facial features.  CNNs are widely used in medical image analysis, helping with tasks like tumor detection, diagnosis, and classification of various medical conditions. CNNs play an important role in the development of self-driving cars, helping them to recognize and understand the road traffic signs, pedestrians, and other vehicles.  12:02 Nikita: Hemant, let’s talk about sequence models. What are they and what are they used for? Hemant: Sequence models are used to solve problems, where the input data is in the form of sequences. The sequences are ordered lists of data points or events.  The goal in sequence models is to find patterns and dependencies within the data and make predictions, classifications, or even generate new sequences.  12:31 Lois: Can you give us some examples of sequence models?  Hemant: Some common examples of the sequence models are in natural language processing, deep learning models are used for tasks, such as machine translation, sentiment analysis, or text generation. In speech recognition, deep learning models are used to convert a recorded audio into text.  Deep learning models can generate new music or create original compositions. Even sequences of hand gestures are interpreted by deep learning models for applications like sign language recognition. In fields like finance or weather prediction, time series data is used to predict future values.  13:15 Nikita: Which deep learning models can be used to work with sequence data?  Hemant: Recurrent Neural Networks, abbreviated as RNNs, are a class of neural network architectures specifically designed to handle sequential data. Unlike traditional feedforward neural network, RNNs have a feedback loop that allows information to persist across different timesteps.  The key features of RNNs is their ability to maintain an internal state often referred to as a hidden state or memory, which is updated as the network processes each element in the input sequence. The hidden state is used as input to the network for the next time step, allowing the model to capture dependencies and patterns in the data that are spread across time.  14:07 Nikita: Are there various types of RNNs? Hemant: There are different types of RNN architectures based on application.  One of them is one to one. This is like feed forward neural network and is not suited for sequential data. A one to many model produces multiple output values for one input value. Music generation or sequence generation are some applications using this architecture.  A many to one model produces one output value after receiving multiple input values. Example is sentiments analysis based on the review. Many to many model produces multiple output values for multiple input values. Examples are machine translation and named entity recognition.  RNN does not perform well when it comes to capturing long-term dependencies. This is due to the vanishing gradients problem, which is overcome by using LSTM model.  15:11 Lois: Another acronym. What is LSTM, Hemant? Hemant: Long Short-Term memory, abbreviated as LSTM, works by using a specialized memory cell and a gating mechanism to capture long term dependencies in the sequential data.  The key idea behind LSTM is to selectively remember or forget information over time, enabling the model to maintain relevant information over long sequences, which helps overcome the vanishing gradients problem.  15:45 Nikita: Can you take us, step-by-step, through the working of LSTM?  Hemant: At each timestep, the LSTM takes an input vector representing the current data point in the sequence. The LSTM also receives the previous hidden state and cell state. These represent what the LSTM has remembered and forgotten up to the current point in the sequence.  The core of the LSTM lies in its gating mechanisms, which include three gates: the input gate, the forget gate, and the output gate. These gates are like the filters that control the flow of information within the LSTM cell. The input gate decides what new information from the current input should be added to the memory cell.  The forget gate determines what information in the current memory cell should be discarded or forgotten. The output gate regulates how much of the current memory cell should be exposed as the output of the current time step.  16:52 Lois: Thank you, Hemant, for joining us in this episode of the Oracle University Podcast. I learned so much today. If you want to learn more about deep learning, visit mylearn.oracle.com and search for the Oracle Cloud Infrastructure AI Foundations course.  And remember, the AI Foundations course and certification are free. So why not get started now? Nikita: Right, Lois. In our next episode, we will discuss generative AI and language learning models. Until then, this is Nikita Abraham… Lois: And Lois Houston signing off! 17:26 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
17m
30/04/2024

Encore Episode: Machine Learning

Does machine learning feel like too convoluted a topic? Not anymore!   Listen to hosts Lois Houston and Nikita Abraham, along with Senior Principal OCI Instructor Hemant Gahankari, talk about foundational machine learning concepts and dive into how supervised learning, unsupervised learning, and reinforcement learning work.   Oracle MyLearn: https://mylearn.oracle.com/ou/learning-path/become-an-oci-ai-foundations-associate-2023/127177   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X (formerly Twitter): https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Himanshu Raj, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 The world of artificial intelligence is vast and everchanging. And with all the buzz around it lately, we figured it was the perfect time to revisit our AI Made Easy series. Join us over the next few weeks as we chat about all things AI, helping you to discover its endless possibilities. Ready to dive in? Let’s go! 00:33 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:47 Lois: Hello and welcome to the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal  Technical Editor.  Nikita: Hi everyone! Last week, we went through the basics of artificial intelligence and we’re going to take it a step further today by talking about some foundational machine learning concepts. After that, we’ll discuss the three main types of machine learning models: supervised learning, unsupervised learning, and reinforcement learning.  01:18 Lois: Hemant Gahankari, a Senior Principal OCI Instructor, joins us for this episode. Hi Hemant! Let’s dive right in. What is machine learning? How does it work? Hemant: Machine learning is a subset of artificial intelligence that focuses on creating computer systems that can learn and predict outcomes from given examples without being explicitly programmed. It is powered by algorithms that incorporate intelligence into machines by automatically learning from a set of examples usually provided as data.  01:54 Nikita: Give us a few examples of machine learning… so we can see what it can do for us. Hemant: Machine learning is used by all of us in our day-to-day life.  When we shop online, we get product recommendations based on our preferences and our shopping history. This is powered by machine learning.  We are notified about movies recommendations based on our viewing history and choices of other similar viewers. This too is driven by machine learning.  While browsing emails, we are warned of a spam mail because machine learning classifies whether the mail is spam or not based on its content. In the increasingly popular self-driving cars, machine learning is responsible for taking the car to its destination.  02:45 Lois: So, how does machine learning actually work? Hemant: Let us say we have a computer and we need to teach the computer to differentiate between a cat and a dog. We do this by describing features of a cat or a dog.  Dogs and cats have distinguishing features. For example, the body color, texture, eye color are some of the defining features which can be used to differentiate a cat from a dog. These are collectively called as input data.  We also provide a corresponding output, which is called as a label, which can be a dog or a cat in this case. By describing a specific set of features, we can say that it is a cat or a dog.  Machine learning model is first trained with the data set. Training data set consists of a set of features and output labels, and is given as an input to the machine learning model.  During the process of training, machine learning model learns the relation between input features and corresponding output labels from the provided data. Once the model learns from the data, we have a trained model.  Once the model is trained, it can be used for inference. Inference is a process of getting a prediction by giving a data point. In this example, we input features of a cat or a dog, and the trained model predicts the output that is a cat or a dog label.  The types of machine learning models depend on whether we have a labeled output or not.  04:28 Nikita: Oh, there are different types of machine learning models? Hemant: In general, there are three types of machine learning approaches.  In supervised machine learning, labeled data is used to train the model. Model learns the relation between features and labels.  Unsupervised learning is generally used to understand relationships within a data set. Labels are not used or are not available.  Reinforcement learning uses algorithms that learn from outcomes to make decisions or choices.  05:06 Lois: Ok…supervised learning, unsupervised learning, and reinforcement learning. Where do we use each of these machine learning models? Hemant: Some of the popular applications of supervised machine learning are disease detection, weather forecasting, stock price prediction, spam detection, and credit scoring. For example, in disease detection, the patient data is input to a machine learning model, and machine learning model predicts if a patient is suffering from a disease or not.  For unsupervised machine learning, some of the most common real-time applications are to detect fraudulent transactions, customer segmentation, outlier detection, and targeted marketing campaigns. So for example, given the transaction data, we can look for patterns that lead to fraudulent transactions.  Most popular among reinforcement learning applications are automated robots, autonomous driving cars, and playing games.  06:12 Nikita: I want to get into how each type of machine learning works. Can we start with supervised learning? Hemant: Supervised learning is a machine learning model that learns from labeled data. The model learns the mapping between the input and the output.  As a house price predictor model, we input house size in square feet and model predicts the price of a house. Suppose we need to develop a machine learning model for detecting cancer, the input to the model would be the person's medical details, the output would be whether the tumor is malignant or not.  06:50 Lois: So, that mapping between the input and output is fundamental in supervised learning.  Hemant: Supervised learning is similar to a teacher teaching student. The model is trained with the past outcomes and it learns the relationship or mapping between the input and output.  In supervised machine learning model, the outputs can be either categorical or continuous. When the output is continuous, we use regression. And when the output is categorical, we use classification.  07:26 Lois: We want to keep this discussion at a high level, so we’re not going to get into regression and classification. But if you want to learn more about these concepts and look at some demonstrations, visit mylearn.oracle.com. Nikita: Yeah, look for the Oracle Cloud Infrastructure AI Foundations course and you’ll find a lot of resources that you can make use of.  07:51 The Oracle University Learning Community is an excellent place to collaborate and learn with Oracle experts and fellow learners. Grow your skills, inspire innovation, and celebrate your successes. All your activities, from liking a post to answering questions and sharing with others, will help you earn a valuable reputation, badges, and ranks to be recognized in the community. Visit mylearn.oracle.com to get started.  08:19 Nikita: Welcome back! So that was supervised machine learning. What about unsupervised machine learning, Hemant? Hemant: Unsupervised machine learning is a type of machine learning where there are no labeled outputs. The algorithm learns the patterns and relationships in the data and groups similar data items. In unsupervised machine learning, the patterns in the data are explored explicitly without being told what to look for. For example, if you give a set of different-colored LEGO pieces to a child and ask to sort it, it may the LEGO pieces based on any patterns they observe. It could be based on same color or same size or same type. Similarly, in unsupervised learning, we group unlabeled data sets.  One more example could be-- say, imagine you have a basket of various fruits-- say, apples, bananas, and oranges-- and your task is to group these fruits based on their similarities. You observe that some fruits are round and red, while others are elongated and yellow. Without being told explicitly, you decide to group the round and red fruits together as one cluster and the elongated and yellow fruits as another cluster. There you go. You have just performed an unsupervised learning task.  09:42 Lois: Where is unsupervised machine learning used? Can you take us through some use cases? Hemant: The first use case of unsupervised machine learning is market segmentation. In market segmentation, one example is providing the purchasing details of an online shop to a clustering algorithm. Based on the items purchased and purchasing behavior, the clustering algorithm can identify customers based on the similarity between the products purchased. For example, customers with a particular age group who buy protein diet products can be shown an advertisement of sports-related products.  The second use case is on outlier analysis. One typical example for outlier analysis is to provide credit card purchase data for clustering. Fraudulent transactions can be detected by a bank by using outliers. In some transaction, amounts are too high or recurring. It signifies an outlier.  The third use case is recommendation systems. An example for recommendation systems is to provide users' movie viewing history as input to a clustering algorithm. It clusters users based on the type or rating of movies they have watched. The output helps to provide personalized movie recommendations to users. The same applies for music recommendations also.  11:14 Lois: And finally, Hemant, let’s talk about reinforcement learning. Hemant: Reinforcement learning is like teaching a dog new tricks. You reward it when it does something right, and over time, it learns to perform these actions to get more rewards. Reinforcement learning is a type of Machine Learning that enables an agent to learn from its interaction with the environment, while receiving feedback in the form of rewards or penalties without any labeled data.  Reinforcement learning is more prevalent in our daily lives than we might realize. The development of self-driving cars and autonomous drones rely heavily on reinforcement learning to make real time decisions based on sensor data, traffic conditions, and safety considerations.  Many video games, virtual reality experiences, and interactive entertainment use reinforcement learning to create intelligent and challenging computer-controlled opponents. The AI characters in games learn from player interactions and become more difficult to beat as the game progresses.  12:25 Nikita: Hemant, take us through some of the terminology that’s used with reinforcement learning. Hemant: Let us say we want to train a self-driving car to drive on a road and reach its destination. For this, it would need to learn how to steer the car based on what it sees in front through a camera. Car and its intelligence to steer on the road is called as an agent.  More formally, agent is a learner or decision maker that interacts with the environment, takes actions, and learns from the feedback received. Environment, in this case, is the road and its surroundings with which the car interacts. More formally, environment is the external system with which the agent interacts. It is the world or context in which the agent operates and receives feedback for its actions.  What we see through a camera in front of a car at a moment is a state. State is a representation of the current situation or configuration of the environment at a particular time. It contains the necessary information for the agent to make decisions. The actions in this example are to drive left, or right, or keep straight. Actions are a set of possible moves or decisions that the agent can take in a given state.  Actions have an impact on the environment and influence future states. After driving through the road many times, the car learns what action to take when it views a road through the camera. This learning is a policy. Formally, policy is a strategy or mapping that the agent uses to decide which action to take in a given state. It defines the agent's behavior and determines how it selects actions.  14:13 Lois: Ok. Say we’re talking about the training loop of reinforcement learning in the context of training a dog to learn tricks. We want it to pick up a ball, roll, sit… Hemant: Here the dog is an agent, and the place it receives training is the environment. While training the dog, you provide a positive reward signal if the dog picks it right and a warning or punishment if the dog does not pick up a trick. In due course, the dog gets trained by the positive rewards or negative punishments.  The same tactics are applied to train a machine in the reinforcement learning. For machines, the policy is the brain of our agent. It is a function that tells what actions to take when in a given state. The goal of reinforcement learning algorithm is to find a policy that will yield a lot of rewards for the agent if the agent follows that policy referred to as the optimal policy.  Through a process of learning from experiences and feedback, the agent becomes more proficient at making good decisions and accomplishing tasks. This process continues until eventually we end up with the optimal policy. The optimal policy is learned through training by using algorithms like Deep Q Learning or Q Learning.  15:40 Nikita: So through multiple training iterations, it gets better. That’s fantastic. Thanks, Hemant, for joining us today. We’ve learned so much from you. Lois: Remember, the course and certification are free, so if you’re interested, make sure you log in to mylearn.oracle.com and get going. Join us next week for another episode of the Oracle University Podcast. Until then, I’m Lois Houston… Nikita: And Nikita Abraham signing off! 16:09 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
25m
23/04/2024

Encore Episode: Introduction to Artificial Intelligence (AI)

You probably interact with artificial intelligence (AI) more than you realize. So, there’s never been a better time to start figuring out how it all works.   Join Lois Houston and Nikita Abraham as they decode the fundamentals of AI so that anyone, irrespective of their technical background, can leverage the benefits of AI and tap into its infinite potential.   Together with Senior Cloud Engineer Nick Commisso, they take you through key AI concepts, common AI tasks and domains, and the primary differences between AI, machine learning, and deep learning.   Oracle MyLearn: https://mylearn.oracle.com/ou/learning-path/become-an-oci-ai-foundations-associate-2023/127177   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X (formerly Twitter): https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Himanshu Raj, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 The world of artificial intelligence is vast and everchanging. And with all the buzz around it lately, we figured it was the perfect time to revisit our AI Made Easy series. Join us over the next few weeks as we chat about all things AI, helping you to discover its endless possibilities. Ready to dive in? Let’s go! 00:33 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:46 Nikita: Hello and welcome to the Oracle University Podcast. I’m Nikita Abraham, Principal Technical Editor with Oracle University, and with me is Lois Houston, Director of Innovation Programs. Lois: Hi there! Welcome to a new season of the Oracle University Podcast. I’m so excited about this season because we’re going to delve into the world of artificial intelligence. In upcoming episodes, we’ll talk about the fundamentals of artificial intelligence and machine learning. And we’ll discuss neural network architectures, generative AI and large language models, the OCI AI stack, and OCI AI services. 01:27 Nikita: So, if you’re an IT professional who wants to start learning about AI and ML or even if you’re a student who is familiar with OCI or similar cloud services, but have no prior exposure to this field, you’ll want to tune in to these episodes. Lois: That’s right, Niki. So, let’s get started. Today, we’ll talk about the basics of artificial intelligence with Senior Cloud Engineer Nick Commisso. Hi Nick! Thanks for joining us today. So, let’s start right at the beginning. What is artificial intelligence? 01:57 Nick: Well, the ability of machines to imitate the cognitive abilities and problem solving capabilities of human intelligence can be classified as artificial intelligence or AI.  02:08 Nikita: Now, when you say capabilities and abilities, what are you referring to? Nick: Human intelligence is the intellectual capability of humans that allows us to learn new skills through observation and mental digestion, to think through and understand abstract concepts and apply reasoning, to communicate using a language and understand the nonverbal cues, such as facial recognition, tone variation, and body language.  You can handle objections in real time, even in a complex setting. You can plan for short and long-term situations or projects. And, of course, you can create music and art or invent something new like an original idea.  If you can replicate any of these human capabilities in machines, this is artificial general intelligence or AGI. So in other words, AGI can mimic human sensory and motor skills, performance, learning, and intelligence, and use these abilities to carry out complicated tasks without human intervention.  When we apply AGI to solve problems with specific and narrow objectives, we call it artificial intelligence or AI.  03:16 Lois: It seems like AI is everywhere, Nick. Can you give us some examples of where AI is used? Nick: AI is all around us, and you've probably interacted with AI, even if you didn't realize it. Some examples of AI can be viewing an image or an object and identifying if that is an apple or an orange. It could be examining an email and classifying it spam or not. It could be writing computer language code or predicting the price of an older car.  So let's get into some more specifics of AI tasks and the nature of related data. Machine learning, deep learning, and data science are all associated with AI, and it can be confusing to distinguish.  03:57 Nikita: Why do we need AI? Why’s it important?  Nick: AI is vital in today's world, and with the amount of data that's generated, it far exceeds the human ability to absorb, interpret, and actually make decisions based on that data. That's where AI comes in handy by enhancing the speed and effectiveness of human efforts.  So here are two major reasons why we need AI. Number one, we want to eliminate or reduce the amount of routine tasks, and businesses have a lot of routine tasks that need to be done in large numbers. So things like approving a credit card or a bank loan, processing an insurance claim, recommending products to customers are just some example of routine tasks that can be handled.  And second, we, as humans, need a smart friend who can create stories and poems, designs, create code and music, and have humor, just like us.  04:54 Lois: I’m onboard with getting help from a smart friend! There are different domains in AI, right, Nick?  Nick: We have language for language translation; vision, like image classification; speech, like text to speech; product recommendations that can help you cross-sell products; anomaly detection, like detecting fraudulent transactions; learning by reward, like self-driven cars. You have forecasting with weather forecasting. And, of course, generating content like image from text.  05:24 Lois: There are so many applications. Nick, can you tell us more about these commonly used AI domains like language, audio, speech, and vision? Nick: Language-related AI tasks can be text related or generative AI. Text-related AI tasks use text as input, and the output can vary depending on the task. Some examples include detecting language, extracting entities in a text, or extracting key phrases and so on.  Consider the example of translating text. There's many text translation tools where you simply type or paste your text into a given text box, choose your source and target language, and then click translate.  Now, let's look at the generative AI tasks. They are generative, which means the output text is generated by a model. Some examples are creating text like stories or poems, summarizing a text, answering questions, and so on. Let's take the example of ChatGPT, the most well-known generative chat bot. These bots can create responses from their training on large language models, and they continuously grow through machine learning.  06:31 Nikita: What can you tell us about using text as data? Nick: Text is inherently sequential, and text consists of sentences. Sentences can have multiple words, and those words need to be converted to numbers for it to be used to train language models. This is called tokenization. Now, the length of sentences can vary, and all the sentences lengths need to be made equal. This is done through padding.  Words can have similarities with other words, and sentences can also be similar to other sentences. The similarity can be measured through dot similarity or cosine similarity. We need a way to indicate that similar words or sentences may be close by. This is done through representation called embedding.  07:17 Nikita: And what about language AI models? Nick: Language AI models refer to artificial intelligence models that are specifically designed to understand, process, and generate natural language. These models have been trained on vast amounts of textual data that can perform various natural language processing or NLP tasks.  The task that needs to be performed decides the type of input and output. The deep learning model architectures that are typically used to train models that perform language tasks are recurrent neural networks, which processes data sequentially and stores hidden states, long short-term memory, which processes data sequentially that can retain the context better through the use of gates, and transformers, which processes data in parallel. It uses the concept of self-attention to better understand the context.  08:09 Lois: And then there’s speech-related AI, right? Nick: Speech-related AI tasks can be either audio related or generative AI. Speech-related AI tasks use audio or speech as input, and the output can vary depending on the task. For example, speech-to-text conversion or speaker recognition, voice conversion, and so on. Generative AI tasks are generative in nature, so the output audio is generated by a model. For example, you have music composition and speech synthesis.  Audio or speech is digitized as snapshots taken in time. The sample rate is the number of times in a second an audio sample is taken. Most digital audio have a sampling rate of 44.1 kilohertz, which is also the sampling rate for audio CDs.  Multiple samples need to be correlated to make sense of the data. For example, listening to a song for a fraction of a second, you won't be able to infer much about the song, and you'll probably need to listen to it a little bit longer.  Audio and speech AI models are designed to process and understand audio data, including spoken language. These deep-learning model architectures are used to train models that perform language with tasks-- recurrent neural networks, long short-term memory, transformers, variational autoencoders, waveform models, and Siamese networks. All of the models take into consideration the sequential nature of audio.  09:42 Did you know that Oracle University offers free courses on Oracle Cloud Infrastructure? You’ll find training on everything from cloud computing, database, and security to artificial intelligence and machine learning, all free to subscribers. So, what are you waiting for? Pick a topic, leverage the Oracle University Learning Community to ask questions, and then sit for your certification.  Visit mylearn.oracle.com to get started.  10:10 Nikita: Welcome back! Now that we’ve covered language and speech-related tasks, let’s move on to vision-related tasks. Nick: Vision-related AI tasks could be image related or generative AI. Image-related AI tasks will use an image as an input, and the output depends on the task. Some examples are classifying images, identifying objects in an image, and so on. Facial recognition is one of the most popular image-related tasks that is often used for surveillance and tracking of people in real time, and it's used in a lot of different fields, including security, biometrics, law enforcement, and social media.  For generative AI tasks, the output image is generated by a model. For example, creating an image from a contextual description, generating images of a specific style or a high resolution, and so on. It can create extremely realistic new images and videos by generating original 3D models of an object, machine components, buildings, medication, people, and even more.  11:14 Lois: So, then, here again I need to ask, how do images work as data? Nick: Images consist of pixels, and pixels can be either grayscale or color. And we can't really make out what an image is just by looking at one pixel.  The task that needs to be performed decides the type of input needed and the output produced. Various architectures have evolved to handle this wide variety of tasks and data. These deep-learning model architectures are typically used to train models that perform vision tasks-- convolutional neural networks, which detects patterns in images; learning hierarchical representations of visual features; YOLO, which is You Only Look Once, processes the image and detects objects within the image; and then you have generative adversarial networks, which generates real-looking images.  12:04 Nikita: Nick, earlier you mentioned other AI tasks like anomaly detection, recommendations, and forecasting. Could you tell us more about them? Nick: Anomaly detection. This is time-series data, which is required for anomaly detection, and it can be a single or multivariate for fraud detection, machine failure, etc.  Recommendations. You can recommend products using data of similar products or users. For recommendations, data of similar products or similar users is required.  Forecasting. Time-series data is required for forecasting and can be used for things like weather forecasting and predicting the stock price.  12:43 Lois: Nick, help me understand the difference between artificial intelligence, machine learning, and deep learning. Let’s start with AI.  Nick: Imagine a self-driving car that can make decisions like a human driver, such as navigating traffic or detecting pedestrians and making safe lane changes. AI refers to the broader concept of creating machines or systems that can perform tasks that typically require human intelligence. Next, we have machine learning or ML. Visualize a spam email filter that learns to identify and move spam emails to the spam folder, and that's based on the user's interaction and email content. Now, ML is a subset of AI that focuses on the development of algorithms that enable machines to learn from and make predictions or decisions based on data.  To understand what an algorithm is in the context of machine learning, it refers to a specific set of rules, mathematical equations, or procedures that the machine learning model follows to learn from data and make predictions on. And finally, we have deep learning or DL. Think of an image recognition software that can identify specific objects or animals within images, such as recognizing cats in photos on the internet. DL is a subfield of ML that uses neural networks with many layers, deep neural networks, to learn and make sense of complex patterns in data.  14:12 Nikita: Are there different types of machine learning? Nick: There are several types of machine learning, including supervised learning, unsupervised learning, and reinforcement learning. Supervised learning where the algorithm learns from labeled data, making predictions or classifications. Unsupervised learning is an algorithm that discovers patterns and structures in unlabeled data, such as clustering or dimensionality reduction. And then, you have reinforcement learning, where agents learn to make predictions and decisions by interacting with an environment and receiving rewards or punishments.  14:47 Lois: Can we do a deep dive into each of these types you just mentioned? We can start with the supervised machine learning algorithm. Nick: Let's take an example of how a credit card company would approve a credit card. Once the application and documents are submitted, a verification is done, followed by a credit score check and another 10 to 15 days for approval. And how is this done? Sometimes, purely manually or by using a rules engine where you can build rules, give new data, get a decision.  The drawbacks are slow. You need skilled people to build and update rules, and the rules keep changing. The good thing is that the businesses had a lot of insight as to how the decisions were made. Can we build rules by looking at the past data?  We all learn by examples. Past data is nothing but a set of examples. Maybe reviewing past credit card approval history can help. Through a process of training, a model can be built that will have a specific intelligence to do a specific task. The heart of training a model is an algorithm that incrementally updates the model by looking at the data samples one by one.  And once it's built, the model can be used to predict an outcome on a new data. We can train the algorithm with credit card approval history to decide whether to approve a new credit card. And this is what we call supervised machine learning. It's learning from labeled data.  16:13 Lois: Ok, I see. What about the unsupervised machine learning algorithm? Nick: Data does not have a specific outcome or a label as we know it. And sometimes, we want to discover trends that the data has for potential insights. Similar data can be grouped into clusters. For example, retail marketing and sales, a retail company may collect information like household size, income, location, and occupation so that the suitable clusters could be identified, like a small family or a high spender and so on. And that data can be used for marketing and sales purposes.  Regulating streaming services. A streaming service may collect information like viewing sessions, minutes per session, number of unique shows watched, and so on. That can be used to regulate streaming services. Let's look at another example. We all know that fruits and vegetables have different nutritional elements. But do we know which of those fruits and vegetables are similar nutritionally?  For that, we'll try to cluster fruits and vegetables' nutritional data and try to get some insights into it. This will help us include nutritionally different fruits and vegetables into our daily diets. Exploring patterns and data and grouping similar data into clusters drives unsupervised machine learning.  17:34 Nikita: And then finally, we come to the reinforcement learning algorithm.  Nick: How do we learn to play a game, say, chess? We'll make a move or a decision, check to see if it's the right move or feedback, and we'll keep the outcomes in your memory for the next step you take, which is learning. Reinforcement learning is a machine learning approach where a computer program learns to make decisions by trying different actions and receiving feedback. It teaches agents how to solve tasks by trial and error.  This approach is used in autonomous car driving and robots as well.  18:06 Lois: We keep coming across the term “deep learning.” You’ve spoken a bit about it a few times in this episode, but what is deep learning, really? How is it related to machine learning? Nick: Deep learning is all about extracting features and rules from data. Can we identify if an image is a cat or a dog by looking at just one pixel? Can we write rules to identify a cat or a dog in an image? Can the features and rules be extracted from the raw data, in this case, pixels?  Deep learning is really useful in this situation. It's a special kind of machine learning that trains super smart computer networks with lots of layers. And these networks can learn things all by themselves from pictures, like figuring out if a picture is a cat or a dog.  18:49 Lois: I know we’re going to be covering this in detail in an upcoming episode, but before we let you go, can you briefly tell us about generative AI? Nick: Generative AI, a subset of machine learning, creates diverse content like text, audio, images, and more. These models, often powered by neural networks, learn patterns from existing data to craft fresh and creative output. For instance, ChatGPT generates text-based responses by understanding patterns in text data that it's been trained on. Generative AI plays a vital role in various AI tasks requiring content creation and innovation.  19:28 Nikita: Thank you, Nick, for sharing your expertise with us. To learn more about AI, go to mylearn.oracle.com and search for the Oracle Cloud Infrastructure AI Foundations course. As you complete the course, you’ll find skill checks that you can attempt to solidify your learning.  Lois: And remember, the AI Foundations course on MyLearn also prepares you for the Oracle Cloud Infrastructure 2023 AI Foundations Associate certification. Both the course and the certification are free, so there’s really no reason NOT to take the leap into AI, right Niki? Nikita: That’s right, Lois! Lois: In our next episode, we will look at the fundamentals of machine learning. Until then, this is Lois Houston… Nikita: And Nikita Abraham signing off! 20:13 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
20m
16/04/2024

Using ORDS to Make Your ADB Data Available in VBS

Visual Builder Studio requires its data sources to connect to the webpage it produces using REST calls. Therefore, the data source has to provide a REST interface. A simple, easy, secure, and free way to do that is with Oracle REST Data Services (ORDS).   In this episode, hosts Lois Houston and Nikita Abraham chat with Senior Principal OCI Instructor Joe Greenwald about what ORDS can do, how to easily set it up, how to work with it, and how to use it within Visual Builder Studio.   Develop Fusion Applications Using Visual Builder Studio: https://mylearn.oracle.com/ou/course/develop-fusion-applications-using-visual-builder-studio/122614/   Build Visual Applications Using Visual Builder Studio: https://mylearn.oracle.com/ou/course/build-visual-applications-using-oracle-visual-builder-studio/110035/   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X (formerly Twitter): https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started. 00:26   Nikita: Hello and welcome to the Oracle University Podcast! I’m Nikita Abraham, Principal Technical Editor with Oracle University, and with me is Lois Houston, Director of Innovation Programs.   Lois: Hi there! In our last episode, we took a look at model-based development tools, their start as CASE tools, what they morphed into, and how they’re currently used in Oracle software development. We’re wrapping up the season with this episode today, which will be about how to access Oracle database data through a REST interface created and managed by Oracle REST Data Services, or ORDS, and how to access this data in Visual Builder Studio.   01:03 Nikita: Being able to access Oracle database data through a REST interface over the web is highly useful, but sometimes it can be complicated to create that interface in a programming language. Joe Greenwald, our Senior OCI Learning Solutions Architect and Principal Instructor is back with us one last time this season to tell us more about ORDS, and how it makes it much simpler and easier for us to REST-enable our database for use in tools like Visual Builder Studio. Hi Joe! Tell us a little about what Visual Builder Studio is and why we must REST-enable our data for VBS to be able to use it.   01:40 Joe: Hi Niki, hi Lois! Ok, so, Visual Builder Studio is Oracle’s low-code software development and project asset management product for creating graphical webpage front-ends for web applications. It’s the tool of choice for designing, building, and implementing all of Oracle Fusion Cloud Applications and is being used by literally tens of thousands of engineers at Oracle now to bring the next generation of Fusion Applications to our customers and the market. It’s based on standards like HTML5, CSS3, and JavaScript. It’s highly performant and combined with the Redwood graphical design system and components that we talked about previously, delivers a world-class experience for users. One thing about Visual Builder Studio though: it only works with data sources that have a REST interface. This is unusual. I like to think I’ve worked with every software development tool that Oracle’s created since I joined Oracle in 1992, including some unreleased ones, and all of them allowed you to talk to the database directly. This is the first time that we’ve released a tool that I know of where we don’t do that. Now at first, I was a little put off and wondered how’s it going to do this and how much work I would have to do to create a REST interface for some simple tables in the Oracle database. Like, here’s one more thing I must do just to create a page that displays data from the database. As it turns out, it’s a wise design decision on the part of the designers. It simplifies the data access parts of Visual Builder Studio and makes the data access model common across the different data sources. And, thanks to ORDS, REST-enabling data in Oracle database couldn’t be easier! 03:13 Lois: That’s cool. We don’t want to focus too much on Visual Builder Studio today. We have free courses that teach you how to create service connections to REST services to access the data and all of that. What we actually want to talk with you about is working with Oracle REST Data Service. How easy is it to work with Oracle REST Data Service to add REST support, what we call REST-enable your Oracle Database, and why it is important?  Nikita: Yeah, I could use a bit of a refresher on REST myself. Could you describe what REST is, how it works for both the client and server, and what ORDS is doing for us? 03:50 Joe: Sure. So, REST is a way to make a request to a server for a resource using the HTTP web protocol from a client, like your browser, to a web server, which hands off the request to code that handles the request and sends the response back to your client/browser, which then uses it, displays or whatever. So, you can see we have two parts. We have the client, which makes the request, and the server, which handles the request and figures out what the response should be (static, dynamic, or a combination of both) and sends that back to the client. For example, a visual application built with Visual Builder Studio acts as a client making the request, just as your browser makes a request. It’s really just a web app built with HTML5, CSS3, and JavaScript, and the JavaScript makes a request to the server on your behalf. Let’s say you wish to access your student record within Oracle University. And now, this is a contrived example and it won’t actually work, but it’s good for illustrative purposes.  Oracle University, let’s say, publishes the URL for your student data as something like https://oracle.com/oracleuniversity/student/{studentnumber} you put in some kind of number, like the number 23, and if you enter that into your browser address bar and press Enter, then your browser, on your behalf, sends a GET request—what we call an HTTP GET operation—to the web server. When the web server receives the request, it will somehow read the record for student 23, format a response, and send the response back to the client. 05:16 Joe: That’s a GET or a READ request. Now, what if you are creating a new student? Well, you fill out a form on the webpage and you click the Submit button. And it sends a POST request, which tells the server to create a new record in its storage mechanism, most likely a database of some form. If you do an update, you change certain fields on the webpage and click the Submit button and this time, an update request is made. If you wanted to delete the record, you’d find the record you want to delete and press the Submit button, and this time a delete request is made. This is the general idea, though there are different ways to do creates and updates that are really irrelevant here. Those requests to the server I mentioned are called HTTP operations and there are several of them. But the four most popular are GET to retrieve data, POST to create a new record on the server, PUT to update a record, and DELETE to remove a record.  On the client side, we just need to specify where the record is that we want to retrieve—that’s the oracle.com/oracleuniversity/student part of the URL and an identifying value, which makes it unique. So, when I do a GET request on customer or student 23, I’m going to get back a representation of the student data that exists in the student database for a student with ID 23. There should not be more than one of these or that would indicate an error. The response typically comes back in a format of a key:value pair called JavaScript Object Notation (JSON), but it could also be in a Text format, HTML, Excel, PDF, or whatever the server implements and is requested.  06:42 Nikita: OK great! That’s on the client side making the request. But what’s happening on the server side? Do I need to worry about that if I’m the client? Joe: No, that’s the great part. As a client, I don’t know, and frankly I’d rather not know what the server’s doing or how it does it. I don’t want to be dependent on the server implementation at all. I simply want to make a request and the server handles the request and sends a response. Now, just a word about what’s on the server. Some data on the server is static like a PDF file or an image or an audio file, for example, and sometimes you’ll see that in the URL the file type as an extension, like .pdf, and you get back a PDF file that your browser displays or that you can download to your machine. But with dynamic data, like student data coming out of a database based on the student number, a query is made against a database. The database responds with the data, and that’s formatted into some type of data format—typically JSON—and sent back to the client, which then does something with it, like displaying it on a webpage. So, as we can see, the client is fairly simple in the sense that it makes a request, receives the data response, and displays it or does something with it. And that’s one of the reasons why the choice to use REST and only REST in Visual Builder Studio is such a wise one. 07:54 Joe: Regardless of the different data sources or the different server implementations or how the data is stored on the server, or any of that, Visual Builder Studio doesn’t know and doesn’t care. What it sees is the REST request it sends and the response it gets back and then it deals with the response data regardless of how it’s implemented on the server. I mentioned the server sends back a representation of the resource, in this case, for example, the student record. That’s really where the abbreviation REST comes from: REpresentational State Transformation, which is a long way of saying, bring me back a representation of the resource—the thing—that I’m requesting. Now, of course, the server is a little more complex. On the server side, we would need software that is going to take the request from the web server using some programming language like Java, C#, C++, Python, or maybe even JavaScript in a Node.js application. You have a program that receives a request from the web server, executes the request (typically by connecting to the database if it’s a database call), makes the request, receives a data response from the database, formats that into some form, and passes it back to the web server, which then sends it back to the client that requested it. 09:01 Lois: Ok… I think I see. I’m guessing that ORDS gets involved somehow between the client and the server. Joe: Yes, exactly. We can see that the implementation on the server side is where the complexity is. For example, if I implement a student management service in Java, I have to write a bunch of Java code, a lot of which is boilerplate, housekeeping, boring code. For simple database access, it’s tedious to have to do this over and over, and if the database changes, it can be even more tedious to maintain that code to handle simple to moderately complex requests. Writing and maintaining software code to just read and write data from the database to pass to a client for a web request is cool the very first time you do it and then gets boring very quickly and it’s prone to errors because it’s so manual. So, it would be nice if we had a piece of software that could handle the tedious, boring, manual bits of this service. It would receive the request that our client, the browser or Visual Builder Studio for example, is sending, take that request, execute the request against the database for us, receive the response from the database, and then format it for us and send it back to us, without a developer having to write custom code on the server side. And that is what Oracle REST Data Services (ORDS) does. 10:13 Joe: ORDS contains a lightweight web server based on the Jetty web server that receives the request from the client, like a browser or Visual Builder Studio or whatever, in the form of a URL, parses the request, generates a query or an update, or an insert or delete, depending on the nature of the HTTP operation sent or requested, and sends it to the database on our behalf. The database executes the request from ORDS, sends back a response to ORDS, and ORDS formats the response for us in the JSON and sends it back to our client. In nutshell, that’s it. 10:45 Lois: So ORDS does all that? And it’s free? How does it work? Uhm, remember I’m not as technical as you are. Joe: Of course. ORDS is free. It’s a lightweight, highly performant Java app that can run in many different modes, from stand-alone on a server to embedded in an application server like WebLogic, to running in the Oracle Cloud with the Oracle Autonomous Database (ADB). When you REST-enable your tables, your web requests are intercepted by ORDS running in ADB. It’s optimized for the purpose of handling web requests, connecting to the Oracle database, and sending back formatted responses as JSON. It can also handle more complex requests as well in the form of queries with special parameters.  So, you can see what ORDS does for us. It handles the request coming from the client, which could be a browser or Visual Builder Studio or APEX or whatever client—pretty much any client today can make an HTTP call—it handles the call, parses the request, makes the request to the server on our behalf, and of course security is built-in and all of that, and so we don’t get to data we’re not supposed to see. It receives a response from the database, formats it into the JSON key:value pair format, and sends it back to our client. 12:00 Are you planning to become an Oracle Certified Professional this year? Whether you're a seasoned IT pro or just starting your career, getting certified can give you a significant boost. And don't worry, we've got your back. Join us at one of our cert prep live events in the Oracle University Learning Community. You'll get insider tips from seasoned experts and learn from other professionals' experiences. Plus, once you've earned your certification, you'll become part of our exclusive forum for Oracle-certified users. So, what are you waiting for? Head over to mylearn.oracle.com and create an account to jump-start your journey towards certification today! 12:43 Nikita: Welcome back. So, Joe, then the next question is, what do we do to REST-enable our database? Does that only work for ADB? Joe: This can be done in a couple of different ways. It can be done implicitly, called AutoREST, or explicitly. AutoREST is very convenient. In the case of an ADB database, you log in as the user who owns the structures, select your tables, views, packages, procedures, or functions that you want to REST-enable. Choose REST and then Enable from the menu for the table, view, stored package, procedure, or function and a URL is generated using your POST, GET, PUT, and DELETE for the standard database create, retrieve, update, delete operations. And it’s not just for ADB. You can do this in SQL Developer Desktop as well. Then, when you invoke the URL for the service, if you include just the name of the resource, like students, you get the entire collection back. If you add an ID at the end of the URL, like student/23, you get back the data for that specific student back, or whatever the structure is. You can add more complex filter parameters as well. And that’s it! Very easy. And, of course, you can apply appropriate security and ORDS enforces it.  But you also can create custom code to handle more complex requests.  13:53 Lois: Joe, what if there’s custom logic or processing that you want to do when the REST call comes in and you need to write custom code to handle it? Joe: Remember, I said on the server side, we use custom code to retrieve data as well as apply business rules, validations, edits, whatever needs to be done to appropriately handle the REST call. So it’s a great question, Lois. When using ORDS, you can write a REST service handler in PL/SQL and SQL, just like if you were writing a stored procedure or a function or a package in the database, which is exactly what you’re doing. ORDS exposes your PL/SQL code wrapped in a REST interface with, of course, the necessary security. And since it’s PL/SQL, it runs in the database, so it’s highly performant, fast, and uses code you’re likely already familiar with or maybe already have. Your REST service handler can call existing PL/SQL packages, procedures, and functions. For example, if you created packages with stored procedures and functions that wrap access to your database tables and views, you can REST-enable those stored procedures, functions, and packages, and call them over the web. And maintain the package access you already created. I do want to point out that the recommended way to access your tables and views is through packages, stored procedures, and functions. While you can expose your tables and views directly to REST, should you really do that? In practice, it’s generally not a recommended way to do it. Do you want to expose your data in tables and views directly through a REST interface? Ideally, no, access should be through a PL/SQL wrapper, same as it’s—hopefully—done today for your client-server applications. 15:26 Nikita: I understand it’s easy to generate a simple REST interface for tables and so on to do basic create, retrieve, update, and delete operations. But what’s required to create custom code to handle more complex business operations?  Joe: The process to create your own custom handlers is a little bit more involved as you would expect. It uses your skills as a PL/SQL programmer, while hiding the details of the REST implementation to let you focus on the logic and processing. Mechanically, you’d begin by creating a module that has a URL associated with it. So, for example, you would create a URL like https://oracle.com/oracleuniversity/studentregistry. Then, within that module, you create a template that names the specific resource—or thing—that you want to work with. For example, student, or course, or registration. 16:15 Joe: Then you create the handler for it. You have a handler to do the read, another handler for the insert, another handler for an update, another handler for a delete, and even possibly multiple handlers for more complex APIs based on your needs and the parameters being passed in.  You can create complex URLs with multiple parameters for passing needed information into the PL/SQL procedure, which is going to do the actual programming work for you. There are predefined implicit variables about the message itself that you can use, as well as all the parameters from the URL itself. Now, this is all done in a nice developer interface on the web if you’re using SQL Developer Web with ADB or in SQL Developer for the desktop. Either one can do this because under the covers, ORDS is generating and executing the PL/SQL calls necessary to create and expose your web services. It’s very easy to work with and test immediately. 17:06 Lois: Joe, how much REST knowledge do I need to use ORDS properly to create REST services? Joe: Well, you should have some basic knowledge of REST, HTTP operations, request and response messages, and JSON, since this is the data format ORDS produces. The developer interface is really not designed for somebody who knows nothing about REST at all; it’s not designed to take them step-by-step through everything that needs to be done. It’s not wizard-based. Rather, it’s an efficient, minimal interface that can be used quickly and easily by someone who has at least some experience building REST services. But, if you have a little knowledge and you understand how REST works and how a REST interface is used and you understand PL/SQL and SQL, you could do quite a lot with only minimal knowledge. It’s easy to get started and it’s fun to see your data start appearing in webpages formatted for you, with very little or even no code at all as in the case of AutoREST enabling. And ORDS is free and comes as part of the database in ADB as SQL Developer Web and SQL Developer Desktop, both of which are free as well. And SQL Developer Web and SQL Developer Desktop both have a data modeler built into them so you can model your database tables, columns, and keys, and generate and execute the code necessary to create the structures immediately, and they can create graphical models of your database to aid in understanding and communication. Now, while this is not required, modeling your database structures before you build them is most definitely a best practice. 18:29 Nikita: Ok, so now that I have my REST-enabled database tables and all, how do I use them in VBS Designer? Joe: In Visual Builder Studio Designer, you define a service connection by its endpoint and paste the URL for the REST-enabled resource into the wizard, and it generates everything for you by introspecting the REST service. You can test it, see the data shape of the response, and see data returned. You access your REST-enabled data from your database from Visual Builder Studio Designer and use it to populate lists, tables, and forms using the quick start wizards built in. I’ll also mention that ORDS provides other capabilities in addition to handling REST calls for the database tables and views. It also exposes over 500 different endpoints for managing your Oracle Database, things like Pluggable Database Management (PDBs), Data Pump, Data Dictionary, Performance, and Monitoring. It’s very easy to use and get started with. A great place to start is to create a free, autonomous database in Oracle Cloud, start it up, and then access the database actions. You can start creating tables, columns, and keys, and loading data, or you can load your own scripts, if you’ve got them, to produce the tables and columns and load them. You can upload the script and run it and it will create your tables and other needed structures. You can then REST-enable them by selecting simple menu options. It’s a lot of fun and easy to get started with. 19:47 Lois: So much good stuff today. Thank you, Joe, for being with us today and in the past few weeks and sharing your knowledge with us. Nikita: Yeah, it’s been so nice to have you around. Joe: Thank you both! It’s been great being here with you. 19:59 Lois: And remember, our Visual Builder courses, Develop Visual Applications with VBS and Develop Fusion Apps with VBS, both show you how to work with a third-party REST service. And our data modeling and design course teaches the fundamentals of data modeling. You can access all these of courses, for free, on mylearn.oracle.com. Join us next week for another episode of the Oracle University Podcast. Until then, I’m Lois Houston… Nikita: And Nikita Abraham signing off! 20:30 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
20m
09/04/2024

Forgotten, But Not Gone: How Model-Based Development Is Still Alive and Well Today

Computer Aided Software Engineering (CASE) tools, which helped make the analysis, design, and implementation phases of software development better, faster, and cheaper, fell out of favor in the mid-'90s. Yet much of what they have to offer remains and is in active use within different Oracle tools.   Listen to Lois Houston and Nikita Abraham interview Senior Principal OCI Instructor Joe Greenwald about the origins of CASE tools and model-based development, as well as how they evolved into their current forms.   Develop Fusion Applications Using Visual Builder Studio: https://mylearn.oracle.com/ou/course/develop-fusion-applications-using-visual-builder-studio/122614/   Build Visual Applications Using Visual Builder Studio: https://mylearn.oracle.com/ou/course/build-visual-applications-using-oracle-visual-builder-studio/110035/   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X (formerly Twitter): https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode.   ---------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started. 00:26   Nikita: Hello and welcome to the Oracle University Podcast! I’m Nikita Abraham, Principal Technical Editor with Oracle University, and joining me is Lois Houston, Director of Innovation Programs.   Lois: Hi there! In our last episode, we looked at Oracle’s Redwood design system and how it helps create world-class apps and user experiences. Today, Joe Greenwald, our Senior Principal OCI Instructor, is back on our podcast. We’re going to focus on where model-based development tools came from: their start as CASE tools, how they morphed into today’s model-based development tools, and how these tools are currently used in Oracle software development to make developers’ lives better.   01:08 Nikita: That’s right. It’s funny how things that fell out of favor years ago come back and are used to support our app development efforts today. Hi Joe!   Joe: Haha! Hi Niki. Hi Lois.
 01:18 Lois: Joe, how did you get started with CASE tools?    Joe: I was first introduced to computer-aided software engineering tools, called CASE tools, in the late 1980s when I began working with them at Arthur Young consulting and then Knowledgeware corporation in Atlanta, helping customers improve and even automate their software development efforts using structured analysis and design techniques, which were popular and in high use at that time. But it was a pain to have to draw diagrams by hand, redraw them as specifications changed, and then try to maintain them to represent the changes in understanding what we were getting from our analysis and design phase work. CASE tools were used to help us draw the pictures as well as enforce rules and provide a repository so we could share what we were creating with other developers. I was immediately attracted to the idea of using diagrams and graphical images to represent requirements for computer systems.  02:08 Lois: Yeah, you’re like me. You’re a visual person. Joe: Yes, exactly. So, the idea that I could draw a picture and a computer could turn that into executable code was fascinating to me. Pictures helped us understand what the analysts told us the users wanted, and helped us communicate amongst the teams, and they also helped us validate our understanding with our users. This was a critical aspect because there was a fundamental cognitive disconnect between what the users told the analysts they needed, what the analysts told us the users needed, and what we understood was needed, and what the user actually wanted. There’s a famous cartoon, you can probably find this on the web, that shows what the users wanted, what was delivered, and then all the iterations that the different teams go through trying to represent the simple original request.   I started using entity relationship diagrams, data flow diagrams, and structure charts to support the structured analysis, design, and information engineering methods that we were using at the time for our clients. Used correctly, these were powerful tools that resulted in higher quality systems because it forced us to answer questions earlier on and not wait until later in the project life cycle, when it’s more expensive and difficult to make changes. 03:16 Nikita: So, the idea was to try to get it wrong sooner. Joe: That’s right, Niki. We wanted to get our analysis and designs in front of the customer as soon as possible to find out what was wrong with our models and then change the code as early in the life cycle as possible where it was both easier and, more importantly, cheaper to make changes before solidifying it in code.   Of course, the key words here are “used correctly,” right? I saw the tools misused many times by those who weren’t trained properly or, more typically, by those whose software development methodology, if there even was one, didn’t use the tools properly—and of course the tools took the blame. CASE tools at the time held a lot of promise, but one could say vendors were overpromising and under delivering, although I did have a number of clients who were successful with them and could get useful support for their software development life cycle from the use of the tools. Since then, I’ve been very interested in using tools to make it easier for us to build software.   04:09 Nikita: So, let me ask you Joe, what is your definition of a CASE tool?
 Joe: I’m glad you asked, Niki, because I think many people have many different definitions. I’m precise about it, and maybe even a bit pedantic with the definition. The definition I use for a CASE tool comprises four things. One, it uses graphics, graphical symbols, and diagrams to represent requirements and business rules for the application. Two, there is a repository, either private, or shared, or both, of models, definitions, objects, requirements, rules, diagrams, and other assets that can be shared, reused, and almost more importantly, tracked. Three, there’s a rule-base that prevents you from drawing things that can’t be implemented. For example, Visio was widely regarded as a CASE tool, but it really wasn’t because it had no rules behind it. You could wire together anything you wanted, but that didn’t mean it could be built.  Fourth, it generates useful code, and it should do two-way engineering, where code, typically code changed outside the model, can be reverse engineered back into the model and apply updates to the model, and to keep the model and the source code in synchronization.   05:13 Joe: I came up with a good slogan for CASE tools years ago: a good CASE tool should automate the tedious, manual portions of software development. I’d add that one also needs to be smarter than the tools they’re using. Which reminds me, interestingly enough, of clients who would pick up CASE tools, thinking that they would make their software development life cycle shorter. But if they weren’t already building models for analysis or design, then automating the building of things that they weren’t building already was not going to save them time and effort.  And some people adopted CASE tools because they were told to or worse, forced to, or they read an article on an airplane, or had a Eureka moment, and they would try to get their entire software development staff to use this new tool, overnight literally, in some cases. Absolutely sheer madness!  Tools like this need to be brought into the enterprise in a slow, measured fashion with a pilot project and build upon small successes until people start demanding to use the tools in their own projects once they see the value. And each group, each team would use the CASE tool differently and to a different degree. One size most definitely does not fit all and identifying what the teams’ needs are and how the tool can automate and support those needs is an important aspect of adopting a CASE tool. It’s funny, almost everyone would agree there’s value in creating models and, eventually, generating code from them to get better systems and it should be faster and cheaper, etc. But CASE tools never really penetrated the market more than maybe about 18 to 25%, tops.   06:39 Lois: Huh, why? Why do you think CASE tools were not widely accepted and used?   Joe: Well, I don’t think it was an issue with the tools so much as it was with a company’s software development life cycle, and the culture and politics in the company. And I imagine you’re shocked to hear that.  Ideally, switching to or adopting automated tools like CASE tools would reduce development time and costs, and improve quality. So it should increase reusability too. But increasing the reusability of code elements and software assets is difficult and requires discipline, commitment, and investment. Also, there can be a significant amount of training required to teach developers, analysts, project managers, and senior managers how to deal with these different forms of life cycles and artifacts: how they get created, how to manage them, and how to use them. When you have project managers or senior managers asking where’s the code and you try telling them, “Well, it’s gonna take a little while. We’re building models and will press the button to generate the code.” That’s tough. And that’s also another myth. It was never a matter of build all the models, press the button, generate all the code, and be done. It’s a very iterative process. 07:40 Joe: I’ve also found that developers find it very psychologically reinforcing to type code into the keyboard, see it appear on the screen, see it execute, and models were not quite as satisfying in the same way. There was kind of a disconnect. And are still not today. Coders like to code.   So using CASE tools and the discipline that went along with them often created issues for customers because it could shine a bright light on the, well let’s say, less positive aspects of their existing software development process. And what was seen often wasn’t pretty. I had several clients who stopped using CASE tools because it made their poor development process highly visible and harder to ignore. It was actually easier for them to abandon the CASE tools and the benefits of CASE tools than to change their internal processes and culture. CASE tools require discipline, planning, preparation, and thoughtful approaches, and some places just couldn’t or wouldn’t do that. Now, for those who did have discipline and good software development practices, CASE tools helped them quite a bit—by creating documentation and automating the niggly little manual tasks they were spending a lot of time on.   08:43 Nikita: You’ve mentioned in the past that CASE tools are still around today, but we don’t call them that. Have they morphed into something else? And if so, what?   Joe: Ah, so the term Computer Aided Software Engineering morphed into something more acceptable in the ‘90s as vendors overpromised and under-delivered, because many people still saw value and do today see value in creating models to help with understanding, and even automating some aspects of software code development.  The term model-based development arose with the idea that you could build small models of what you want to develop and then use that to guide and help with manual code development. And frankly just not using the word CASE was a benefit. “Oh we’re not doing CASE tools, but we’ll still build pictures and do stuff.” So, it could be automated and generate useful code as well as documentation. And this was both easy to use and easier to manage, and I think the industry and the tools themselves were maturing. 09:35 Joe: So, model-based development took off and the idea of building a model to represent your understanding of the system became popular. And it was funny because people were saying that these were not CASE tools, this was something different, oh for sure, when of course it was pretty much the same thing: rule-based graphical modeling with a repository that created and read code—just named differently. And as I go through this, it reminds me of an interesting anecdote that’s given about US President Abraham Lincoln. He once asked someone, “If you call a dog’s tail a leg, how many legs does a dog have?” Now, while you’re thinking about that, I’ll go ahead and give you the correct answer. It’s four. You can call a dog’s tail anything you want, but it still has four legs. You can call your tools whatever you want, but you still have the idea of building graphical representations of requirements based on rules, and generating code and engineering in both directions. 10:29 Did you know that Oracle University offers free courses on Oracle Cloud Infrastructure? You’ll find training on everything from cloud computing, database, and security to artificial intelligence and machine learning, all free to subscribers. So, what are you waiting for? Pick a topic, leverage the Oracle University Learning Community to ask questions, and then sit for your certification. Visit mylearn.oracle.com to get started. 10:58 Nikita: Welcome back! Joe, how did you come to Oracle and its CASE tools?   Joe: I joined Oracle in 1992 teaching the Oracle CASE tool Designer. It was focused on structured analysis and design, and could generate database Data Definition Language (DDL) for creating databases. And it was quite good at it and could reverse engineer databases as well. And it could generate Oracle Forms and Reports – character mode at first, and then GUI. But it was in the early days of the tool and there was definitely room for improvement, or as we would say opportunities for enhancement, and it could be hard to learn and work with. It didn’t do round-trip engineering of reading Oracle Forms code and updating the Designer models, though some of that came later. So now you had an issue where you could generate an application as a starting point, but then you had to go in and modify the code, and the code would get updated, but the models wouldn’t get updated and so little by little they’d go out of sync with the code, and it just became a big mess. But a lot of people saw that you could develop parts of the application and data definition in models and save time, and that led to what we call model-based development, where we use models for some aspects but not all. We use models where it makes sense and hand code where we need to code.    12:04 Lois: Right, so the two can coexist. Joe, how have model-based development tools been used at Oracle? Are they still in use?   Joe: Absolutely! And I’ll start with my favorite CASE tool at Oracle, uhm excuse me, model-based development tool. Oracle SOA Suite is my idea of a what a model-based development tool should be. We create graphical diagrams to represent the flow of messages and message processing in web services—both SOAP and REST interfaces—with logic handled by other diagrammers and models. We have models for logic, human interaction, and rules processing. All this is captured in XML metadata and displayed as nice, colored diagrams that are converted to source code once deployed to the server. The reason I like it so much is Oracle SOA Suite addressed a fundamental problem and weakness in using modeling tools that generated code. It doesn’t let the developer touch the generated code. I worked with many different CASE tools over the years, and they all suffered from a fundamental flaw. Analysts and developers would create the models, generate the code, eventually put it into production, and then, if there was a bug in the code, the developer would fix the code rather than change the model. For example, if a bug was found at 10:30 at night, people would get dragged out of bed to come down and fix things. What they should have done is update the model and then generate the new code. But late at night or in a crunch, who’s going to do that, right? They would fix the code and say they’d go back and update the model tomorrow. But as we know, tomorrow never comes, and so little by little, the model goes out of synchronization with the actual source code, and eventually people just stopped doing models. 13:33 Joe: And this just happened more and more until the use of CASE tools started diminishing—why would I build a model and have to maintain it to just maintain the code? Why do two separate things? Time is too valuable. So, the problem of creating models and generating code, and then maintaining the code and not the model was a problem in the industry. And I think it certainly hurt the adoption and progress of CASE tool adoption. This is one of the reasons why Oracle SOA Suite is my favorite CASE tool…because you never have access to the actual generated code. You are forced to change the model to get the new code deployed. Period. Problem solved. Well, SOA Suite does allow post- deployment changes, of course, and that can introduce consistency issues and while they’re easier to handle, we still have them! So even there, there’s an issue.   14:15 Nikita: How and where are modeling tools used in current Oracle software development applications?   Joe: While the use of CASE tools and even the name CASE fell out of favor in the early to mid-90s, the idea of using graphical diagrams to capture requirements and generate useful code does live on through to today. If you know what to look for, you can see elements of model-based design throughout all the Oracle tools. Oracle tools successfully use diagrams, rules, and code generation, but only in certain areas where it clearly makes sense and in well-defined boundaries. Let’s start with the software development environment that I work with most often, which is Visual Builder Studio. Its design environment uses a modeling tool to model relationships between Business Objects, which is customer data that can have parent-child relationships, and represent and store customer data in tables. It uses a form of entity relationship diagram with cardinality – meaning how many of these are related to how many of those – to model parent-child relationships, including processing requirements like deleting children if a parent is deleted.  The Business Object diagrammer displays your business objects and their relationships, and even lets you create new relationships, modify the business objects, and even create new business objects. You can do all your work in the diagram and the correct code is generated. And you can display the diagram for the code that you created by hand. And the two stay in sync. There’s also a diagramming tool to design the page and page flow navigation between the pages in the web application itself. You can work in code or you can work in the diagram (either one or both), and both are updated at the same time. Visual Builder Studio uses a lot of two-way design and engineering.   15:48 Joe: Visual Builder Studio Page Designer allows you to work in code if you want to write HTML, JavaScript, and JSON code, or work in Design mode and drag and drop components onto the page designer canvas, set properties, and both update each other. It’s very well done. Very well integrated.   Now, oddly enough, even though I am a model-based developer, I find I do most of my work in Visual Builder Studio Designer in the text-based interface because it’s so easy to use. I use the diagrammers to document and share my work, and communicate with other team members and customers. While I think it’s not being used quite so much anymore, Oracle’s JDeveloper and application development framework, ADF, includes built-in tools for doing Unified Modeling Language (UML) modeling. You can create object-oriented class models, generate Java code, reverse engineer Java code, and it updates the model for you. You can also generate the code for mapping Java objects to relational tables. And this has been the heart of data access for ADF Business Components (ADFBC), which is the data layer of Oracle Fusion Apps, for 20 years, although that is being replaced these days. 16:51 Lois: So, these are application development tools for crafting web applications. But do we have any tools like this for the database? Joe: Yes, Lois. We do. Another Oracle tool that uses model-based development functionality is the OCI automated database actions. Here you can define tables, columns, and keys. You can also REST-enable your tables, procedures, and functions.   Oracle SQL Developer for the web is included with OCI or Oracle SQL Developer on the desktop has a robust and comprehensive data modeler that allows you to do full blown entity relationship diagramming and generate code that can be implemented through execution in the database. Now that’s actually the desktop version that has the full-blown diagrammer but you also have some of that in the OCI database actions as well. But the desktop version goes further than that. You can reverse engineer the existing database, generate models from it, modify the models, and then generate the delta, the difference code, to allow you to update an existing database structure based on the change in the model. It is very powerful and highly sophisticated, and I do strongly recommend looking at it.  And Oracle’s APEX (Application Express) has SQL workshop, where you can see a graphic representation of the tables and the relationships between the tables, and even build SQL statements graphically.    18:05 Nikita: It’s time for us to wrap up today but I think it’s safe to say that model-based development tools are still with us. Any final thoughts, Joe?   Joe: Well, actually today I wonder why more people don’t model. I’ve been on multiple projects and worked with multiple clients where there’s no graphical modeling whatsoever—not even a diagram of the database design and the relationships between tables and foreign keys. And I just don’t understand that.   One thing I don’t see very much in current CASE or model-based tools is enabling impact analysis. This is another thing I don’t see a lot. I’ve learned, in many years of working with these tools, to appreciate performing impact analysis. Meaning if I make a change to this thing here, how many other places are going to be impacted? How many other changes am I going to have to make? Something like Visual Builder Studio Designer is very good at this. If you make a change to the spelling of a variable let’s say in one place, it’ll change everywhere that it is referenced and used. And you can do a Find in files to find every place something is used, but it’s still not quite going the full hundred percent and allowing me to do a cross-application impact analysis. If I want to change this one thing here, how many other things will be impacted across applications? But it’s a start. And I will say in talking to the Visual Builder Studio Architect, he understands the value of impact analysis. We’ll see where the tool goes in the future. And this is not a commitment of future direction, of course. It would appear the next step is using AI to listen to our needs and generate the necessary code from it, maybe potentially bypassing models entirely or creating models as a by-product to aid in communication and understanding. We know a picture’s worth a 1000 words and it’s as true today as it’s ever been, and I don’t see that going away anytime soon.   19:41 Lois: Thanks a lot, Joe! It’s been so nice to hear about your journey and learn about the history of CASE tools, where they started and where they are now.  Joe: Thanks Lois and Niki. Nikita: Join us next week for our final episode of this series on building the next generation of Oracle Cloud Apps with Visual Builder Studio. Until then, this is Nikita Abraham… Lois: And Lois Houston, signing off! 20:03 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
20m
02/04/2024

Developing Redwood Applications

Redwood is a state-of-the-art graphical interface that defines the look and feel of the new Oracle Cloud Redwood Applications.  In this episode, hosts Lois Houston and Nikita Abraham, along with Senior Principal OCI Instructor Joe Greenwald, take a closer look at the intent behind the design and development aspects of the new Redwood experience. They also explore Redwood page templates and components.  Developing Redwood Applications with Visual Builder: https://mylearn.oracle.com/ou/learning-path/developing-redwood-applications-with-visual-builder/112791 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X (formerly Twitter): https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started. 00:26 Lois: Hello and welcome to the Oracle University Podcast! I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor. Nikita: Hi everyone! Last week, we discussed how Visual Builder Studio can be used to extend Oracle Fusion Apps.  Lois: That’s right, Niki. In today’s episode, we will focus on Oracle’s Redwood design system and how it helps us create stunning, world-class enterprise applications and user experiences.  00:56 Nikita: Yeah, Redwood is the basis for all the new Oracle Cloud Applications being re-designed, developed, and delivered. To tell us more, we have Senior OCI Learning Solutions Architect and Principal Instructor Joe Greenwald, who’s been working with Oracle software development tools since the early 90s and is responsible for OU’s Visual Builder Studio and Redwood course content. Lois: Hi Joe! Thanks for being with us today. 01:21 Joe: Hi Lois. Hi Niki. I am excited to join you on this episode because with the release of 24A Fusion applications, we are encouraging all our customers to adopt the new Redwood design system and components, and take advantage of the world-class look and feel of the new Redwood user experience. Redwood represents a new approach and direction for us at Oracle, and we’re excited to have our customers benefit from it. 01:44 Nikita: Joe, you’ve been working with Oracle user interface development tools and frameworks for a long time. How and why is Redwood different? Joe: I joined Oracle in 1992, and the first Oracle user interface I experienced was Oracle Forms. And that was the character mode. I came from a background of Smalltalk and its amazing, pioneering graphical user interface (GUI) design capabilities. I worked at Apple and I developed my own GUIs for a few years on PCs and Macs. So, Character Mode Forms, what we used to call DMV (Department of Motor Vehicles) screens, was a shock, to say the least. Since then, I’ve worked with almost every user interface and development platform Oracle has created: Character Mode Forms, GUI Forms, Power Objects, HyperCard on the Macintosh, that was pre-OS X by the way, Sedona, written in native C++ and ActiveX and OLE, which didn’t make it to a product but appeared in other things later, ADF Faces, which uses Java to generate HTML pages, and APEX, which uses PL/SQL to generate HTML pages. And I’ve worked with and wrote training classes for Java Swing, an excellent GUI framework for event-driven desktop and enterprise applications, but it wasn’t designed for the web. So, it’s with pleasure that I introduce you to the Redwood design system, easily the best effort I’ve ever seen, from the look and feel of holistic user-goal-centered design philosophy and approach to the cutting-edge WYSIWYG design tools.  03:11 Lois: Joe, is Redwood just another set of styles, colors, and fonts, albeit very nice-looking ones? Joe: The Redwood platform is new for Oracle, and it represents a significant change, not just in the look and feel, colors, fonts, and styles, I mean that too, but it’s also a fundamental change in how Oracle is creating, designing, and imagining user interfaces. As you may be aware, all Oracle Cloud Applications are being re-designed, re-engineered, and re-rebuilt from the ground up, with significant changes to both back-end and front-end architectures. The front end is being redesigned, re-developed, and re-created in pure HTML5, CSS3, and JavaScript using Visual Builder Studio and its design-time browser-based Integrated Development Environment. The back end is being re-architected, re-designed, and implemented in a modern microservice architecture for Oracle Cloud using Kubernetes and other modern technologies that improve performance and work better in the cloud than our current legacy architecture. The new Oracle Cloud Applications platform uses Redwood for its design system—its tools, its patterns, its components, and page templates. Redwood is a richer and more productive platform to create solutions while still being cost-effective for Oracle. It encourages a transformation of the fundamental user experience, emphasizing identifying, meeting, and understanding end users’ goals and how the applications are used.  04:34 Nikita: Joe, do you think Oracle's user interface has been improved with Redwood? In what ways has the UI changed? Joe: Yes, absolutely. Redwood has changed a lot of things. When I joined Oracle back in the '90s, there was effectively no user interface division or UI team. There was no user interface lab—and that was started in the mid-‘90s—and I was asked to give product usability feedback and participate in UI tests and experiments in those labs. I also helped test the products I was teaching at the time. I actually distinctly remember having to take a week to train users on Oracle’s Designer CASE tool product just to prep the participants enough to perform usability testing. I can still hear the UI lab manager shaking her head and saying any product that requires a week of training to do usability testing has usability issues! And if you’re like me and you’ve been around Oracle long enough, you know that Oracle’s not always been known for its user interfaces and been known to release products that look like they were designed by two or more different companies. All that has changed with Redwood. With Redwood, there’s a new internal design group that oversees the design choices of all development teams that develop products. This includes a design system review and an ongoing audit process to ensure that all the products being released, whether Fusion apps or something else, all look and feel similar so it looks like it’s designed by a single company with a single thought in mind. Which it is. 06:00 Joe: There’s a deeper, consistent commitment in identifying user needs, understanding how the applications are being used, and how they meet those user needs through things like telemetry: gathering metrics from the actual components and the Redwood system itself to see how the applications are being used, what’s working well, and what isn’t.  This telemetry is available to us here at Oracle, and we use it to fine tune the applications’ usability and purpose.  06:25 Lois: That’s really interesting, Joe. So, it’s a fundamental change in the way we’re doing things. What about the GUI components themselves? Are these more sophisticated than simple GUI components like buttons and text fields? Joe: The graphical components themselves are at a much higher level, more comprehensive, and work better together. And in Redwood, everything is a component. And I’m not just talking about things like input text fields and buttons, though it applies to these more fine-grained components as well. Leveraging Oracle’s deep experience in building enterprise applications, we’ve incorporated that knowledge into creating page templates so that the structure and look and feel of the page is fixed based on our internal design standards. The developer has control over certain portions of it, but the overall look and feel of the page is controlled by Oracle. So there is consistency of look and feel within and across applications. These page templates come with predefined functionalities: headers, titles, properties, and variables to manipulate content and settings, slots for other components to hold like search fields, collections, contextual information, badges, and images, as well as primary and secondary actions, and variables for events and event handling through Visual Builder action chains, which handle the various actions and processing of the request on the page. And all these page templates and components are responsive, meaning they respond to the change in the size of the page and the orientation. So, when you move from a desktop to a handheld mobile device or a tablet, they respond appropriately and consistently to deliver a clean, easy-to-use interface and experience. 07:58 Nikita: You mentioned WYSIWYG design tools and their integration with Visual Builder Studio’s integrated development environment. How does Redwood work with Visual Builder Studio? Joe: This is easily one of my most favorite aspects about Redwood and the integration with Visual Builder Studio Designer. The components and page templates are responsive at runtime as well as responsive at design time! In over 30 years of working with Oracle software development products, this is the first development system and integrated development environment I’ve seen Oracle produce where what you see is what you get at design time. Now, with products such as Designer and JDeveloper ADF Faces and even APEX—all those page-generation types of products—you have to generate the page, deploy it, and only then can you view the final page to see whether it meets the needs of your user interface. For example, with Designer, there were literally hundreds of configuration parameters that you could set to control how forms and reports looked when they were generated —down to how many buttons on a row or how many rows to a page, that sort of thing, all done in text mode. Then you’d generate and run the page to see what the result was and then go back and modify things until you got what you wanted. 09:05 Joe: I remember hearing the product managers for Oracle ADF Faces being asked…well, a customer asked, “What happens if I put this component here and this component here? What will the page look like?” and they’d say, “I don’t know. Render the page and let’s see.” That’s just crazy talk. With Redwood and its integration with Visual Builder Studio Designer, what you see on the page at design time is literally what you get. And if I make the page narrower or I even convert it to a mobile display while in the Designer itself, I immediately see what the page looks like in that new mode. Everything just moves accordingly, at design time. For example, when changing to a mobile UI, everything stacks up nicely; the components adjust to the page size and change right there in the design environment. Again, I can’t emphasize enough the simple luxury of being able to see exactly what the user is going to see on my page and having the ability to change the resolution, orientation, and screen size, and it changes right there immediately in my design environment. 10:01 Lois: I’m intrigued by the idea of page templates that are managed by Oracle but still leave room for the developer to customize aspects of the look and feel and functionality. How does that work? Joe: Well, the page templates themselves represent the typical pages you would most likely use in an enterprise application. Things like a welcome page, a search page, and edit and create pages, and a couple of different ways to display summary information, including foldout pages, though this is not an exhaustive list of course. Not only do they provide a logical and complete starting point for the layout of the page itself, but they also include built-in functionality. These templates include functionality for buttons, primary and secondary actions, and areas for holding contextual information, badges, avatars, and images. And this is all built right into the page, and all of them use variables to describe the contents for the various parts, so the contents can change programmatically as the variables’ contents change, if necessary. 10:59 Do you have an idea for a new course or learning opportunity? We’d love to hear it! Visit the Oracle University Learning Community and share your thoughts with us on the Idea Incubator. Your suggestion could find a place in future development projects. Visit mylearn.oracle.com to get started.  11:19 Nikita: Welcome back! So, Joe, let’s say I’m a developer. How do I get started working with Redwood? Joe: One of the easiest ways to do it is to use Visual Builder Studio Designer and create a new visual application. If you’re creating a standalone, bespoke custom application, you can choose a Redwood starter template, which will include all the Redwood components and page templates automatically. Or, if you’re extending and customizing an Oracle Fusion application, Redwood is already included.  Either way, when you create a new page, you have a choice of different page templates—welcome page templates, edit pages, search pages, etc. —and all you have to do is choose a page that you want and begin configuring it. And actually if you make a mistake, it’s easy to switch page templates. All the components, page templates, and so on have documentation right there inside Visual Builder Studio Designer, and we do recommend that you read through the documentation first to get an understanding of what the use case for that template is and how to use it. And some components are more granular, like a collection container which holds a collection of rows of a list or a table and provides capabilities like toolbars and other actions that are already built and defined. You decide what actions you want and then use predefined event listeners that are triggered when an event occurs in the application—like a button being clicked or a row being selected—which kicks off a series of actions to be performed. 12:38 Lois: That sounds easy enough if you know what you’re doing. Joe, what are some of the more common pages and what are they used for?  Joe: Redwood page templates can be broken down into categories. There are overview templates like the welcome page template, which has a nice banner, colors, and illustrations that can be used for a welcoming page—like for entering a new application or a new logical section of the application. The dashboard landing page template displays key information values and their charts and graphs, which can come from Oracle Analytics, and automatically switches the display depending on which set of data is selected.  The detail templates include a general overview, which presents read-only information related to a single record or resource. The item overview gives you a small panel to view summary information (for example, information on a customer) and in the main section, you can view details like all the orders for that customer. And you can even navigate through a set of customers, clicking arrows for next-previous navigation. And that’s all built in. There’s no programming required. The fold-out page template folds out horizontally to show you individual panels with more detail that can be displayed about the subject being retrieved as well as overflow and drill-down areas. And there’s a collection detail template that will display a list with additional details about the selected item (for example, an order and its order line items). 13:52 Joe: The smart search page does exactly what it says. It has a search component that you use to filter or search the data coming back from the REST data sources and then display the results in a list or a table. You define the filter yourself and apply it using different kinds of comparators, so you can look for strings that start with certain values or contain values, or numerical values that are equal to or less than, depending on what you’re filtering for. And then there are the transactional templates, which are meant to make changes. This includes both the simple create and edit and advanced create and edit templates.  The simple create and edit page template edits a single record or creates a single record. And the advanced page template works well if you’re working with master-detail, parent-child type relationships. Let’s say you want to view the parent and create children for it or even create a parent and the children at the same time. And there’s a Gantt chart page for project management–type tracking and a guided process page for multiple-step processes and there’s a data management page template specifically for viewing and editing data collections like Excel spreadsheets. 14:50 Nikita: You mentioned that there’s a design system behind all this. How is this used, and how does the customer benefit from it? Joe: Redwood comprises both a design system and a development system. The design system has a series of steps that we follow here at Oracle and can suggest that you, our customers and partners, can follow as well. This includes understanding the problem, articulating the vision for the page and the application (what it should do), identifying the proper Redwood page templates to use, adding detail and refining the design and then using a number of different mechanisms, including PowerPoint or Figma design tools to specify the design for development, and then monitor engagement in the real world. These are the steps that we follow here at Oracle. The Redwood development process starts with learning how to use Redwood components and templates using the documentation and other content from redwood.oracle.com and Visual Builder Studio. Then it’s about understanding the design created by the design team, learning more about components and templates for your application, specifically the ones you’re going to use, how they work, and how they work together. Then developing your application using Visual Builder Studio Designer, and finally improving and refining your application. Now, right now, as I mentioned, telemetry is available to us here at Oracle so we can get a sense of the feedback on the pages of how components are being used and where time is being spent, and we use that to tune the designs and components being used. That telemetry data may be available to customers in the future. Now, when you go to redwood.oracle.com, you can access the Redwood pattern book that shows you in detail all the different page templates that are available: smart search page, data grid, welcome page, dashboard landing page, and so on, and you can select these and read more about them as well as the actual design specifications that were used to build the pages—defining what they do and what they respond to. They provide a lot of detailed information about the templates and components, how they work and how they’re intended to be used. 16:45 Lois: That’s a lot of great resources available. But what if I don’t have access to Visual Builder Studio Designer? Can I still see how Redwood looks and behaves? Joe: Well, if you go to redwood.oracle.com, you can log in and work with the Redwood reference application, which is a live application working with live data. It was created to show off the various page templates and components, their look and feel and functionality from the Redwood design and development systems. This is an order management application, so you can do things like view filtered pending orders, create new orders, manage orders, and view information about customers and inventory. It uses the different page templates to show you how the application can perform. 17:23 Nikita: I assume there are common aspects to how these page templates are designed, built, and intended to be used. Is that a good way to begin understanding how to work with them? Becoming familiar with their common properties and functionality? Joe: Absolutely! Good point! All pages have titles, and most have primary and secondary actions that can be triggered through a variety of GUI events, like clicking a button or a link or selecting something in a list or a table. The transactional page templates include validation groups that validate whether the data is correct before it is submitted, as well as a message dialog that can pop up if there are unsaved changes and someone tries to leave the page. All the pages can use variables to display information or set properties and can easily display specific contextual information about records that have been retrieved, like adding the Order Number or Customer Name and Number to the page title or section headers. 18:13 Lois: If I were a developer, I’d be really excited to get started! So, let’s say I’m a developer. What’s the best way to begin learning about Redwood, Joe? Joe: A great place to start learning about the Redwood design and development system is at the redwood.oracle.com page I mentioned. We have many different pages that describe the philosophy and fundamental basis for Redwood, the ideas and intent behind it, and how we’re using it here at Oracle. It also has a list of all the different page templates and components you can use and a link to the Redwood reference application where you can sign in and try it yourself. In addition, we at Oracle University offer a course called Design and Develop Redwood Applications, and in there, we have both lecture content as well as hands-on practices where you build a lightweight version of the Redwood reference application using data from the Fusion apps application, as well as the pages that I talked about: the welcome page, detail pages, transactional pages, and the dashboard landing page.  And you’ll see how those pages are designed and constructed while building them yourself.  It’s very important though to take one of the free Visual Builder Developer courses first: either Build Visual Applications Using Visual Builder Studio and/or Develop Fusion Applications Using Visual Builder Studio before you try to work through the practices in the Redwood course because it uses a lot of Visual Builder Designer technology.  You’ll get a lot more out of the Redwood practices if you understand the basics of Visual Builder Studio first. The Build Visual Applications Using Visual Builder Studio course is probably a better place to start unless you know for a fact you will be focusing on extending Oracle Fusion Applications using Visual Builder Studio. Now, a lot of the content is the same between the two courses as they share much of the same technology and architectures. 19:53 Lois: Ok, so Build Visual Applications Using Visual Builder Studio and Develop Fusion Applications Using Visual Builder Studio…all on mylearn.oracle.com and all free for anyone who wants to take them, right? Joe: Yes, exactly. And the free Redwood learning path leads to an Associate certification. While our courses are a great place to start in preparing for your certification exam, they are not, of course, by themselves sufficient to pass and you will want to study and be familiar with the redwood.oracle.com content as well. The learning path is free, but you do have to pay for the certification exam. 20:29 Nikita: Thanks for those tips, Joe, and we appreciate you joining us today. Joe: Thanks for having me! Lois: Join us next week when we’ll discuss how model-based development is still alive and well today. Until next time, this is Lois Houston… Nikita: And Nikita Abraham, signing off! 20:45 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
21m
26/03/2024

Preparing to Extend Oracle Fusion Apps Using Visual Builder Studio

What do you need to start customizing the next generation of Oracle Fusion Apps? How do you create new pages for business processes? What level of expertise do you require for this? Join Lois Houston and Nikita Abraham as they get answers to all these questions and more from Senior Principal OCI Instructor Joe Greenwald. Develop Fusion Applications Using Visual Builder Studio: https://mylearn.oracle.com/ou/course/develop-fusion-applications-using-visual-builder-studio/122614/ Build Visual Applications Using Visual Builder Studio: https://mylearn.oracle.com/ou/course/build-visual-applications-using-oracle-visual-builder-studio/110035/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X (formerly Twitter): https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this  series of informative podcasts, we’ll bring you foundational training on the most popular  Oracle technologies. Let’s get started. 00:26 Lois: Hello and welcome to the Oracle University Podcast! I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor. Nikita: Hi everyone! Last week, we were introduced to Visual Builder Studio and the Oracle JavaScript Extension Toolkit, also known as JET. Lois: Our friend and Senior Principal OCI Instructor Joe Greenwald is back with us today to talk about how to extend Oracle Cloud Applications that are being built using Visual Builder for its front-end. Nikita: That’s right. All Fusion Applications are being redesigned and rebuilt using Visual Builder. And we’ll find out more about that from Joe. Hi Joe! Thanks for being with us today. Joe: Hi Lois! Hi Niki! My pleasure to be here. 01:09 Nikita: Joe, tell us a little about what’s happening with the redesign and re-architecture of Oracle Cloud Applications using Visual Builder Studio, or VBS. I hear some very exciting changes are coming that are important for our customers and partners. Joe: That’s right, Niki. Oracle is redesigning and rebuilding its entire suite of Fusion Cloud Applications, over 330 different products, utilizing over 60,000 engineers — that is “60,” not “16” — at Oracle to develop the next generation of Oracle Fusion Applications. What’s most exciting is that the same tools the engineers are using to accomplish this are available to our partners and our customers to use to extend the functionality and capabilities of Fusion Applications to meet their custom needs and processes.  01:54 Lois: That’s pretty awesome! We want to use this time today to ask you about extensions, the types of extensions you can create, and how to use Visual Builder Studio to create those extensions. Nikita: Yeah, can we start with you telling us what an extension is? I’ve gotten the sense that Oracle uses the term extension as both a noun and a verb and that’s a bit confusing to me. 02:15 Joe: Yeah, good catch, Niki. Yes, Oracle does use the term extension in two ways: both as a noun and a verb. As a noun, an extension is a container for the code changes that you make to your applications. Basically, it’s a Git repository that Oracle creates and manages for you. So, the extension container holds the code changes you make to your page layouts: the fields, their positioning, showing and hiding fields, that sort of thing, as well as page functionality. These code changes you make are stored in the extension and it is this extension with your code changes that is merged with the main Git branch eventually and then deployed using continuous integration/continuous deployment jobs defined in Visual Builder Studio, which manages the project and its assets. Your extension is a Git branch that is an asset of the project. Once your extension code is merged with the main branch and deployed, then the next time someone brings up the application, they’ll see the changes you’ve made in the app. 03:08 Lois: And as a verb? Joe: As a verb, extension means to extend the functionality and the look and feel of the application, though I prefer the term customization or configuration to describe this aspect, as the documentation does, and to avoid confusion, though I’ll admit I’m not always consistent about the terms I use. 03:26 Lois: What types of customizations, or extensions, and I’m using the verb now, are available for Fusion Apps in Visual Builder Studio? Joe: There are three different ways Fusion Apps can be customized effectively, configured, or extended. The first way is what we call a basic extension, where you’re rearranging hiding, or showing, or moving around fields and sections on the page that have been set up to be extendable by the Fusion Application development teams. Things like hiding fields, showing fields, hiding sections, showing sections…  Nikita: So fairly basic actions… Joe: Yeah exactly and they can be done in Visual Builder Studio Designer by people with minimal VB training, Visual Builder training. And, most recently, if you have access to it, you can do it in the new Express mode, where the page shows you just those things you can work with and just the tools you need to work with the page. This is new and makes it much easier for folks who are not highly technical to make basic changes to the page layout. 04:18 Lois: People like me! That sounds easy enough. Joe: And the next type of extension is more of an intermediate change and requires some training with Visual Builder Studio because you’re creating rules that govern the display of layouts based on certain conditions on the page. These are highly flexible, powerful, and useful for creating customized page layouts based on a variety of factors from page size and orientation to the role of the person using it to values in the actual fields on the page itself. These rules can be combined to create complex rule-based conditions that display exactly what the user should see, given the conditions of the page and their role. I would also include making changes to action chains, which execute sequences of behaviors and navigation, and the actual structure of the application, but this is more advanced.  Lastly, is creating mashup applications, which are stand-alone Visual Builder visual applications, which use data from Fusion apps, and customer data sources, like their own database tables, and potentially third-party APIs to create brand new pages and applications with new functionality, new processes, new procedures, new displays, all of which look just like Fusion Applications and use the same data as Fusion applications. 05:27 Lois: Joe, how do I get started if I want to extend a page?  Joe: The easiest way to do it is to open a page in Fusion Applications and then select Edit Page in Visual Builder Studio from the Profile menu. You’re then prompted for a project to hold the Git repository for the extension container. And since there’s probably already one that exists, after you select the project, an extension Git container is assigned to you. Unless this is the very first time the application has been extended in which case it creates an extension for you. When creating customizations or configurations, we recommend that each application be done in its own separate project. So, for example, if you’re working on Customer Experience Sales, you might do it in Project A and if you’re working on extensions with HCM, you might do it in Project B. And if you decide to create your own pages and flows in your own app, you might do that in Project C.  06:13 Nikita: But why do you need to do this? Joe: That’s just to keep things nice and separate and organized. The tool, Visual Builder Studio, doesn’t really care, but it makes for cleaner development and can help with the management of the development teams. 06:23 Nikita: Ok, Joe, I have a question. How do I know if the page I’m on in Fusion Apps can be edited in Visual Builder? I know there are a lot of legacy pages still out there and they can co-exist with the new VB-based pages. Joe: If the URL of the page you’re on has the word /Redwood in it instead of /faces, then you know this is a page that was created using Visual Builder Studio and you’ll be able to extend it and make changes to it using the Edit in Visual Builder Studio option. So, if you select Edit in Visual Builder Studio, then the page you are on opens inside Visual Builder Studio Designer and you can make changes to any part of the page that has been explicitly enabled for extension by the development team. 07:02 Lois: That’s an important part, right? The application is not extendable by default.  Joe: That’s right, Lois. It is all locked down and you can’t make any changes to it by default. The development team must specifically enable certain parts of the page: sections, fields, layouts, variables, types, action chains, etc. as extendable for you to be able to make changes to it. This ensures the changes the development team makes to the application in the future won’t break your extensions. And conversely, the development team can choose to not extend portions that they do not want you to touch or mess with. Then if they do change that bit of the app in the future, it won’t break the application and you won’t get a big surprise. So, using the Edit page in Visual Builder Studio, you can make both basic changes, like moving, showing, and hiding fields and sections, as well as the more intermediate types of configurations, like using dynamic components to create rule-based layouts that change dynamically based on several conditions such as page size, roles of the user, and field values on the page itself. 08:00 Nikita: What happens if two developers make changes and essentially overwrite each other’s customizations — say one hides a field and another later exposes it? Joe: Well, whoever commits their changes and deploys last wins. The other developer’s changes get overwritten. So, this is something the team would want to consider carefully. It is possible to roll back to an earlier version if one must. And this can be done in Visual Builder Studio — the part that manages project assets like Git repositories. And there are Oracle blog posts about how to do that if you’re interested in learning more. 08:29 Lois: Joe, earlier you mentioned creating new pages and flows, but so far you’ve only talked about modifying existing extendable pages. How do I create new pages and flows? Joe: In a Visual Builder extension, a set of pages and flows is called an App UI. When I use the terms pages and flows, what I’m talking about is a set of pages that are logically related—whatever logical means to the designer and developer—in a group called a flow that you can navigate between. But you can also navigate between flows and even between applications. So, without getting too technical, each application has a default flow, which has a default page where that flow starts when the app first comes up. So, you can think of an App UI as a collection of flows and their pages, and a URL that accesses the default flow and its default page. That’s the page you would see first when accessing that URL. Of course, this can be configured and changed by the developer, as needed. Now, when Oracle creates the original application (for example, digital sales, helpdesk, or something like that), we create an App UI, which contains the pages and flows for that application and is the “entry point” into the app, accessing that App UI’s default flow and its default page and then things flow on from there. 09:40 Joe: Partners and customers can create their own application extensions that are dependent on an Oracle application and even create their own App UI – their own sets of pages and flows to accommodate their own processing and workflow needs. This gives them the ability to add their own processes and rules, and still leverage and navigate to the core application that Oracle built. For example, say Oracle delivered digital sales as an Oracle Cloud Application built using Visual Builder to a customer and the customer needs to add a few pages to do some validation or other type of business processing before entering the digital sales application. What the customer does, in this case, is create a new extension of the Oracle Digital Sales app and an App UI of their own, which would be the set of pages and flows that contain the processing they want to start with before then navigating into the digital sales app to use Oracle’s application. 10:31 Nikita: Wait, did I hear that correctly? We’re creating an extension of an extension or creating an extension on an existing extension? Joe: I know, right? I realize this can sound confusing the first time you hear it or the second time or even the third time. It took me a while to get my head around what they’re talking about. Let’s start with a Fusion application. In a Fusion application, everything is an extension of something. This is just how the code base and the architecture are organized and how they manage the Git repositories and the code base itself. So, Oracle created a base application called the Unified App. The Unified Application contains the basic page structure and common functionality needed for all applications. For example, it contains the header at the top that has the profile and the footer at the bottom of the page that has that little Ask Oracle icon. 11:16 Joe: Within that page, between the header and the footer, are the pages that are created by the developers, whether they be Oracle engineers or partners or customers. They display the contents of the page with the data and the layouts and all of that. In a sense, you can think of the Unified App as an index page, the starting page of the web application. Though that’s not completely true technically, it’s good enough for illustrative purposes. So, Oracle starts with the Unified App and then a development team extends that Unified App to build their product. This is how digital sales did it. This is how customer experience did it. This is how helpdesk did it. They start with the Unified App and they extend that and create an App UI that contains the flows and pages for their specific application, and then add functionality for all the pages and flows, as needed for the design. Partners and customers can then create a new extension that extends the Oracle Application and add their own App UI and their own URL if they want their pages accessed first, before navigating to the Oracle application. For example, if the digital sales application has functionality you’d like to leverage, like it has data services or fragments or page layouts that you want to reuse or other things, you extend the digital sales application, and this extension holds your code changes. You could then create a new App UI, and once deployed, users can use that URL for the new App UI to access your new pages. And your page can then navigate to the Oracle app when it needs to. Though I will say to date, we’re really not seeing much demand for this particular use case, but it is possible. 12:42 Lois: Is that the only option available to customers and partners—to extend an existing Oracle application? Joe: No, Lois. We’re seeing customers and partners create brand new Fusion applications of their own, based on the Unified App Oracle created. In a sense, doing the same thing that our development teams here are doing.  Remember, I said an Oracle development team starts with the Unified App, which has common functionality and look and feel for all applications, and then extends that to add business rules processing, flows, App UI, whatever they need for their specific Oracle application. We’re seeing our partners and customers wanting to build their own applications. Maybe a customer or partner wants to create a Time & Expense application and leverage the Fusion application data and the APIs available, but define their own flows, their own pages, their own processing. This is very easy to do. They’d start by extending the Unified App just like the Oracle development teams do, and then build their own App UI and within that, their own flows, pages, and custom processing. The nice thing about it is that the application looks and works and feels just like a Fusion application and it appears alongside other Fusion applications, because it is a Fusion application. 13:52 Did you know that the Oracle University Learning Community regularly holds live events hosted by Oracle expert instructors. Find out how to prepare for your certification exams. Learn about the latest technology advances and features. Ask questions in real time and learn from an Oracle subject matter expert. From Ask Me Anything about certification to Ask the Instructor coaching sessions, you’ll be able to achieve your learning goals for 2024 in no time. Join a live event today and witness firsthand the transformative power of the Oracle University Learning Community. Visit mylearn.oracle.com to get started.  14:33 Nikita: Welcome back! So Joe, it sounds like there are two different paths or life cycles to create extensions for future applications in Visual Builder Studio. Is that correct? Joe: Yes, exactly. So one path to extending the functionality of Fusion apps is to edit the page in Visual Builder Studio, which opens the page in Visual Builder Designer, and you then make changes to the existing pages, depending on what the development team has made extendable.  14:58 Nikita: But you can’t create new pages and flows in this scenario, right? Joe: This is strictly about modifying an existing page. The other path is creating a new application extension, which is a new application from scratch or extending an existing Oracle application or even an existing partner or customer application. Again, we’re not seeing this typically being done too much. Most partners and customers create new applications or make customizations to existing pages. But the architecture does support it. So, your partner might create a new application based on the production app released by Oracle, and you could extend their application. Or a development team at your site could extend Oracle’s application and you could then extend that team’s application. This is mechanically possible, although I question the use case behind that. Usually, we see our apps being extended – becoming a dependency when there’s code that can be leveraged or reused for a new app and its new App UI. 15:49 Lois: Joe, what did you mean when you say one extension is a dependency of another? Can you talk a bit about dependencies, what that means, how it looks to the developer? Joe: When you extend an application, it becomes a dependency to your application, and you get access to all the resources within that dependency that are marked as extendable by the developer who created that extension. Most useful are things like service connections to REST APIs from Fusion apps data sources, reusable code fragments, and layouts that you can leverage in those cases where you want to create a new App UI. When an extension is listed as a dependency, you’ll see this graphically in Visual Builder Studio Designer. When you see an extension listed as a dependency, it means you can reference any of that extension’s resources that have been marked extendable by the developer. Recall all resources are closed off or hidden by default, but development teams can mark resources as open to being extended and reused, and then you can see and use those resources. So, you can easily add and remove extensions as dependencies in Visual Builder Designer as needed. Now, this can be a nice way to modularize and reuse your resources and assets. To summarize: I can modify an existing page – this is most common, extend an existing application and create a new App UI – which is not common, or I can extend the unified app to create a new app and a new App UI and add other extensions as dependencies, as needed, to leverage their services, fragments, and layouts when building my own pages – this is pretty common as well. 17:14 Nikita: There’s one thing I’d like to come back to, Joe. You mentioned something called a mashup application earlier. Can you tell us a little more about that? Joe: To recap: I mentioned a couple of different ways that you can extend Fusion applications. One is changing layouts or creating rule-based layouts. You can also extend existing apps and create your own App UI on top of them or create your own Fusion app from scratch. But these are Fusion apps and they have restrictions.  These can only run within the Fusion applications ecosystem, which means they can only be accessed by people who are registered in the Fusion application ecosystem, and there are some other restrictions (for example, in terms of the APIs you can access). And you also have no access to customer data tables.   Mashup applications use the stand-alone Visual Builder Cloud Service, which enables you to create custom visual applications. These are visual applications that run outside the Fusion apps ecosystem. Users only need to be identified to the Identity Cloud Service, IDCS, and then they can get access to these mashup apps, depending on the roles and privileges given to them, of course. These mashup applications can access Fusion apps API data, as well as customer database tables, Excel spreadsheet data, CSV files, and third-party APIs. And all this data can appear on the same page, in the same app, using the same Redwood components, so they look and work just like Fusion applications. 18:32 Lois: I know in the past there’s been some friction to making changes in Fusion applications. Partner and customer developers use different tools than the ones Oracle engineers use and there have been some deployment issues. To wrap up things, can you tell us why customers should use Visual Builder Studio to customize Fusion apps? Joe: Glad to, Lois. The big benefit to customers is that they are using the exact same tools, Visual Builder Designer for page design work and Visual Builder Studio for project and code management, to build the customizations and extensions that Oracle is using to create the applications and extensions that are delivered to them. I can’t emphasize enough how big a deal this is and how wonderful it is for the customer. We’re constantly making the Visual Builder Designer interface easier and easier to work with. We’re currently releasing a new version of Visual Builder Designer—the Express mode version. This version of Designer is lightweight and has only the necessary features required to allow you to make changes to pages and layouts, and create and manage dynamic rule-based layouts. If you need more (for example, you need to create service connections, fragments, and do a lot more of that type of advanced work), then use the advanced version of the Designer. Both are available to you, assuming that your user has the appropriate permission and the Fusion app you are using has implemented Express Designer. 19:46 Lois: OK Joe, what courses does Oracle University offer for me if I wanted to learn more about developing extensions for Fusion apps and creating mashup apps using Visual Builder Studio? Joe: Oracle University has several courses. We have the Develop Visual Applications Using Visual Builder Studio, which focuses on creating the stand-alone custom bespoke mashup visual applications. We also have our Design and Develop Redwood Applications course, which goes into detail about working with the Redwood page templates and components. All these courses are free and available today. And all you need to do is log in to mylearn.oracle.com to get started. 20:19 Nikita: Thank you so much, Joe, for joining us today. This has been so educational. Joe: It’s been lovely talking to you both. Thank you. Lois: Yeah, my brain is full. Thanks Joe. Until next week, this is Lois Houston… Nikita: And Nikita Abraham, signing off! 20:32 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
21m
19/03/2024

Introduction to Visual Builder Studio, Visual Builder Cloud Service, Stand-Alone, and JET

The next generation of front-end user interfaces for Oracle Fusion Applications is being built using Visual Builder Studio and Oracle JavaScript Extension Toolkit. However, many of the terms associated with these tools can be confusing. In this episode, Lois Houston and Nikita Abraham are joined by Senior Principal OCI Instructor Joe Greenwald. Together, they take you through the different terminologies, how they relate to each other, and how they can be used to deliver the new Oracle Fusion Applications as well as stand-alone, bespoke visual web applications. Develop Fusion Applications Using Visual Builder Studio: https://mylearn.oracle.com/ou/course/develop-fusion-applications-using-visual-builder-studio/122614/ Build Visual Applications Using Visual Builder Studio: https://mylearn.oracle.com/ou/course/build-visual-applications-using-oracle-visual-builder-studio/110035/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X (formerly Twitter): https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode. --------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this  series of informative podcasts, we’ll bring you foundational training on the most popular  Oracle technologies. Let’s get started. 00:26 Lois: Hello and welcome to the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor. Nikita: Hi everyone! Today, we’re starting a new season on building the next generation of Oracle Cloud Apps with Visual Builder Studio. 00:45 Lois: And I’m so excited that we have someone really special to take us through the next few episodes. Joe Greenwald is joining us. Joe is a Senior Principal OCI Instructor with Oracle University. He joined Oracle in 1992 with an extensive background in CASE tools. Since then, he has used and taught all of Oracle's software development tools, including Oracle Forms, APEX, JDeveloper ADF, as well as all the Fusion Middleware courses. Currently, Joe is responsible for the Visual Builder Studio and Redwood development courses, including extending Fusion Applications with Visual Builder. 01:22 Nikita: In today’s episode, we’re going to ask Joe about Visual Builder Studio and Oracle JavaScript Extension Toolkit, also known as JET. Together, they form the basis of the technology for the next generation of front-end user interfaces for Oracle Fusion Applications, as well as many other Oracle applications, including most Oracle Cloud Infrastructure (OCI) interfaces.  Lois: We’ll look at the different terminologies and technologies, how they relate to each other, and how they deliver the new Oracle Fusion applications and stand-alone, bespoke visual web applications. Hi Joe! Thanks for being with us today. 01:57 Joe: Hi Lois! Hi Niki! I’m glad to be here. Nikita: Joe, I’m somewhat thrown by the terminology around Visual Builder, Visual Studio, and JET. Can you help streamline that for us?  Lois: Yeah, things that are named the same sometimes refer to different things, and sometimes things with a different name refer to the same thing.  02:15 Joe: Yeah, I know where you’re coming from. So, let’s start with Visual Builder Studio. It’s abbreviated as VBS and can go by a number of different names. Some of the most well-known ones are Visual Builder Studio, VBS, Visual Builder, Visual Builder Stand-Alone, and Visual Builder Cloud Service. Clearly, this can be very confusing. For the purposes of these episodes as well as the training courses I create, I use certain definitions.  02:39 Lois: Can you take us through those? Joe: Absolutely, Lois. Visual Builder Studio refers to a product that comes free with an OCI account and allows you to manage your project-related assets. This includes the project itself, which is a container for all of its assets. You can assign teams to your projects, as well as secure the project and declare roles for the different team members. You manage GIT repositories with full graphical and command-line GIT support, define package, build, and deploy jobs, and create and run continuous integration/continuous deployment graphical and code-managed pipelines for your applications. These can be visual applications, created using the Visual Builder Integrated Development Environment, the IDE, or non-visual apps, such as Java microservices, docker builds, NPM apps, and things like that. And you can define environments, which determine where your build jobs can be deployed. You can also define issues, which allow you to identify, track, and manage things like bugs, defects, and enhancements. And these can be tracked in code review merge requests and build jobs, and be mapped to agile sprints and scrum boards. There’s also support for wikis for team collaboration, code snippets, and the management of the repository and the project itself. So, VBS supports code reviews before code is merged into GIT branches for package, build, and deploy jobs using merge requests.  03:57 Nikita: OK, what exactly do you mean by that? Joe: Great. So, for example, you could have developers working in one GIT branch and when they’re done, they would push their private code changes into that remote branch. Then, they’d submit a merge request and their changes would be reviewed.  Once the changes are approved, their code branch is merged into the main branch and then automatically runs a CI/CD package (continuous integration/continuous deployment) package, build, and deploy job on the code. Also, the CI/CD package, build, and deploy jobs can run against any branches, not just the main branch. So Visual Builder Studio is intended for managing the project and all of its assets. 04:37 Lois: So Joe, what are the different tools used in developing web applications? Joe: Well, Visual Builder, Visual Builder Studio Designer, Visual Builder Designer, Visual Builder Design-Time, Visual Builder Cloud Service, Visual Builder Stand-Alone all kind of get lumped together. You can kinda see why. What I’m referring to here are the tools that we use to build a visual web application composed of HTML5, CSS3, JavaScript, and JSON (JavaScript Object Notation) for metadata. I call this Visual Builder Designer. This is an Integrated Development Environment, it’s the “IDE” which runs in your browser. You use a combination of drag and drop, setting properties, and writing and modifying custom and generated code to develop your web applications. You work within a workspace, which is your own private copy of a remote Git branch. When you’re ready to start development work, you open an existing workspace or create a new one based on a clone of the remote branch you want to work on. Typically, a new branch would be created for the development work or you would join an existing branch. 05:35 Nikita: What’s a workspace, Joe? Is it like my personal laptop and drive?  Joe: A workspace is your own private code area that stores any changes you make on the Oracle servers, so your code changes are never lost—even when working in a browser-based, network-based tool. A good analogy is, say I was working at home on my own machine. And I would make a copy of a remote GIT branch and then copy that code down to my local machine, make my code changes, do my testing, etc. and then commit my work—create a logical save point periodically—and then when I’m ready, I’d push that code up into the remote branch so it can be reviewed and merged with the main branch. My local machine is my workspace. However, since this code is hosted up by Oracle on our servers, and the code and the IDE are all running in your browser, the workspace is a simulation of a local work area on your own computer. So, the workspace is a hosted allocation of resources for you that’s private. Other people can’t see what’s going on in your workspace. Your workspace has a clone of the remote branch that you’re working with and the changes you make are isolated to your cloned code in your workspace. 06:38 Lois: Ok… the code is actually hosted on the server, so each time you make a change in the browser, the change is written back to the server? Is it possible that you might lose your edits if there’s a networking interruption? Joe: I want to emphasize that while I started out not personally being a fan of web-based integrated development environments, I have been using these tools for over three years and in all that time, while I have lost a connection at times—networks are still subject to interruptions—I’ve never lost any changes that I’ve made. Ever. 07:08 Nikita: Is there a way to save where you are in your work so that you could go back to it later if you need to? Joe: Yes, Niki, you’re asking about commits and savepoints, like in a Git repository or a Git branch. When you reach a logical stopping or development point in your work, you would create a commit or a savepoint. And when you’re ready, you would push that committed code in your workspace up to the remote branch where it can be reviewed and then eventually merged, usually with the main Git branch, and then continuous integration/continuous package and deployment build jobs are run. Now, I’m only giving you a high-level overview, but we cover all this and much more in detail with hands-on practices in our Visual Builder developer courses. Right now, I’m just trying to give you a sense of how these different tools are used. 07:49 Lois: Yes, that makes sense, Joe. It’s a lot to cover in a short amount of time. Now, we’ve discussed the Visual Builder Designer IDE and workspace. But can you tell us more about Visual Builder Cloud Service and stand-alone environments? What are they used for? What features do they provide? Are they the same or different things? Joe: Visual Builder Cloud Service or Visual Builder Stand-Alone, as it’s sometimes called, is a service that Oracle hosts on its servers. It provides hosting for the deployed web application source code as well as database tables for business objects that we build and maintain to store your customer data. This data can come from XLS or CSV files, or even your own Oracle database customer table data.  A custom REST proxy makes calls to external third-party REST services on your behalf and supports several popular authentication mechanisms. There is also integration with the Identity Cloud Service (IDCS) to manage users and their access to your web apps. 08:47 Joe: Visual Builder Cloud Service is a for-fee product. You pay licensing fees for how much you use because it’s a hosted service. Visual Builder Studio, the project asset management aspect I discussed earlier, is free with a standard OCI license. Now, keep in mind these are separate from something like Visual Builder Design Time and the service that’s running in Fusion application environments. What I’m talking about now is creating standalone, bespoke, custom visual applications. These are applications that are built using industry-standard HTML5, CSS3, JavaScript, and JSON for metadata and are hosted on the Oracle servers. 09:27 Are you looking for practical use cases to help you plan and apply configurations that solve real-world challenges?  With the new Applied Learning courses for Cloud Applications, you'll be able to practically apply the concepts learned in our implementation courses and work through case studies featuring key decisions and configurations encountered during a typical Oracle Cloud Applications implementation. Applied learning scenarios are currently available for General Ledger, Payables, Receivables, Accounting Hub, Global Human Resources, Talent Management, Inventory, and Procurement, with many more to come!  Visit mylearn.oracle.com to get started. 10:09 Nikita: Welcome back! Joe, you said Visual Builder Cloud Service or Stand-Alone is a for-fee service. Is there a way I can learn about using Visual Builder Designer to build bespoke visual applications without a fee? Joe: Yes. Actually, we’ve added an option where you can run the Visual Builder Designer and learn how to create web apps without using the app hosting or the business object database that stores your customer data or the REST proxy for authentication or the Identity Cloud Service. So you don’t get those features, but you can still learn the fundamentals of developing with Visual Builder Designer. You can call third-party APIs, you can download the source, and run it locally, for example, in a Tomcat server. This is a great and free way to learn how to develop with the Visual Builder Designer. 10:52 Lois: Joe, I want to know more about the kinds of apps you can build in VB Designer and the capabilities that VB Cloud Service provides. Joe: Visual Builder Designer allows you to build custom, bespoke web applications made of interactive webpages; flows of pages for navigation; events that respond when things happen in the app, for example, GUI events like a button is clicked or values are entered into a text field; variables to store state and the ability to make REST calls, all from your browser. These applications have full access to the Oracle Fusion Applications APIs, given that you have the right security permissions and credentials of course. They can access your customer business data as business objects in our internally hosted database tables or your own customer database tables. They can access third-party APIs, and all these different data sources can appear in the same visual application, on the same page, at the same time. They use the identity cloud service to identify which users can log in and authenticate against the application. And they all use the new Redwood graphical user interface components and page templates, so they have the same look and feel of all Oracle applications. 11:59 Nikita: But what if you’re building or extending Oracle Fusion Applications? Don’t things change a little bit? Joe: Good point, Niki. Yes. While you still work within Visual Builder Studio, that doesn’t change, VBS maintains your project and all your project-related assets, that is still the same. However, in this case, there is no separate hosted Visual Builder Cloud Service or Stand-Alone instance. In this case, Visual Builder is hosted inside of Fusion apps itself as part of the installation. I won’t go into the details of how the architecture works, but the Visual Builder instance that you’re running your code against is part of Fusion applications and is included in the architecture as well as the billing. All your code changes are maintained and stored within a single container called an extension. And this extension is a Git repository that is created for you, or you can create it yourself, depending on how you choose to work within Visual Builder Studio. You create an extension to hold the source code changes that provide a customization or configuration. This means making a change to an existing page or a set of pages or even adding new pages and flows to your Oracle Fusion Applications. You use Visual Builder Studio and Visual Builder Designer in a similar way as to how you would use them for bespoke stand-alone visual applications.  13:10 Lois: I’m trying to envision how this workflow is used. How is it different from bespoke VB app development? Or is it different at all? Joe: So, recall that the Visual Builder Designer is effectively the Integrated Development Environment, the IDE, where you make your code changes by working with both the raw HTML5, CSS3, and JavaScript code, if need be, or the Page Designer for drag and drop, and setting properties and then Live mode to test your work. You use a version of VB Designer to view and modify your customizations, and the code is stored in a Git repository called an extension. So, in that sense, the work of developing pages and flows and such is the same.  You still start by creating or, more typically, joining a project and then either create a new extension from scratch or base it on an existing application, or go directly to the page that you want to edit and, on that page, select from your profile menu to edit in Visual Builder Studio. Now, this is a different lifecycle path from bespoke visual applications. With them, you’re not extending an app or modifying individual pages in the same way. 14:11 Joe: You get a choice of which project you want to add your extension to when you’re working with Fusion apps and potentially which repository to store your customizations, unless one already exists and then it’s assigned automatically to hold your code changes. So you make your changes and edits to the portions of the application that have been opened for extensibility by the development team. This is another difference. Once you make your code changes, the workflow is pretty much the same as for a bespoke visual application: do your development work, commit your changes, push your changes to the remote branch. And then typically, your code is reviewed and if the code passes and is approved, it’s merged with the main branch. Then, the package and deploy jobs run to deploy the main code to the production environment or whatever environment you’re targeting. And once the package and deploy jobs complete, the code base is updated and users who log in see the changes that you’ve made. 15:00 Nikita: You mentioned creating apps that combine data from Fusion cloud, applications, customer data, and third-party APIs into one page. Why is it necessary? Why can’t you just do all that in one Fusion Applications extension? Joe: When you create extensions, you are working within the Oracle Fusion Applications ecosystem, that’s what they actually call it, which includes a defined a set of users who have been predefined and are, therefore, known to Fusion Applications. So, if you’re a user and you’re not part of that Fusion Apps ecosystem, you can’t access the pages. Period. That’s how Fusion Apps works to maintain its security and integrity. Secondly, you’re working pretty much solely with the Fusion Applications APIs data sources coming directly from Fusion Applications, which are also available to you when you’re creating bespoke visual apps. When you’re working with Fusion Applications in Visual Builder, you don’t have access to these business objects that give you access to your own customer database data through Visual Builder-generated REST APIs. Business objects are available only to bespoke visual applications in the hosted VB Cloud Service instance.  So, your data sources are restricted to the Oracle Fusion Applications APIs and some third-party APIs that work within a narrow set of authentication mechanisms currently, although there are plans to expand this in the future. A mashup app that allows you now to access all these data sources while creating apps that leverage the Redwood Component System, so they look and work like Fusion Apps. They’re a highly popular option for our partners and customers. 16:25 Lois: So, to review, we have two different approaches. You can create a visual application using the for-fee, hosted Visual Builder Cloud Service/Stand-Alone or the one that comes with Oracle Integration Cloud, or you can use the extension architecture for Fusion applications, where you use the designer and create your extensions, and the code is delivered and deployed to Fusion applications code.  You haven’t talked about JET yet though, Joe. What is that? Joe: So, JET is an abbreviation. It stands for Oracle JavaScript Extension Toolkit and JET is the underlying technology that makes Visual Builder, visual applications, and Visual Builder Extensions for Fusion Applications possible. Oracle JavaScript Extension Toolkit provides a module-based, open-source toolkit that leverages modern JavaScript, TypeScript, CSS3, and HTML5 to deliver web applications. It’s targeted at JavaScript developers working on client-side applications. It is not for backend development.  It’s a collection of popular, powerful JavaScript libraries and a set of Oracle-contributed JavaScript libraries that make it very simple, easy, and efficient to build front-end applications that can consume and interact with Oracle products and services, especially Oracle Cloud services, but of course it can work with any type of third-party API. 17:42 Nikita: How are JET applications architected, Joe, and how does that relate to Visual Builder pages and flows? Joe: The architecture of JET applications is what’s called a single page architecture. We’ve all seen these. These are where you have a single web page—think of your index page that provides the header and footer for your web page—and then the middle portion or the middle content of the page, represented by modules, allow you to navigate from one page or module to another. It also provides the data mapping so that the data elements in the variables and the state of the application, as well as the graphical user interface elements that provide the fields and functionality for the interface for the application, these are all maintained on the client side. If you’re working in pure JET, then you work with these modules at the raw JavaScript code level. And there are a lot of JavaScript developers who want to work like this and create their custom applications from the code up, so to speak. However, it also provides the basis for Visual Builder visual applications and Fusion Apps visual extensions in Visual Builder. 18:38 Lois: How does JET support VB Apps? You didn’t talk much about having to write a bunch of JavaScript and HTML5 so I got the impression that this is all done for you by VB Designer? Joe: Visual Builder applications are composed of HTML5, CSS3, and JavaScript code that is usually generated by the developer when she drags and drops components on to the page designer canvas or sets properties or creates action chains to respond to events. But there’s also a lot of JavaScript object notation (JSON) metadata created at the time that describes the pages, the flows, the navigation, the REST services, the variables, their data types, and other assets needed for the app to function. This JSON metadata is translated at runtime using a large JavaScript extension toolkit library called the Visual Builder Runtime that runs in the browser and real time translates the metadata and other assets in the Visual Builder source code into JET code and assets, which are actually executed at runtime. And it’s very quick, very fast, very efficient, and provides a layer of abstraction between the raw JET code and the Visual Builder architecture of pages, flows, action chains for executing code and events to handle things that occur in the user interface, including saving the state in variables that are mapped to GUI components. For example, if you have an Input text component, you need to have a variable to store the value that was entered into that Input text component between page refreshes. The data can move from the Input text component to the variable, and from the variable to that Input text component if it’s changed programmatically, for example. So, JET manages binding these data values to variables and the UI components on the page. So, a change to a variable value or a change to the contents of the component causes the others to change automatically. Now, this is only a small part of what JET and the frameworks and libraries it uses do for the applications.  JET also provides more complex GUI components like lists and tables, and selection lists, and check boxes, and all the sorts of things you would expect in a modern GUI application. 20:34 Nikita: You mentioned a layer of abstraction between Visual Builder Studio Designer and JET. What’s the benefit of working in Visual Builder Designer versus JET itself? Joe: The benefit of Visual Builder is that you work at a higher level of abstraction than having to get down into the more detailed levels of deep JavaScript code, working with modules, data mappings, HTML code, single page architecture navigation, and the related functionalities. You can work at a higher level, a graphical level, where you can drag and drop things onto a design canvas and set properties. The VB architecture insulates you from the more technical bits of JET. Now, this frees the developer to concentrate more on application and page design, implementing logic and business rules, and creating a pleasing workflow and look and feel for the user. This keeps them from having to get caught up in the details of getting this working at the code level.  Now if needed, you can write custom JavaScript, HTML5, and CSS3 code, though much less than in a JET app, and all that is part of the VB application source, which becomes part of the code used by JET to execute the application itself. And yet it all works seamlessly together. 21:35 Lois: Joe, I know we have courses in JavaScript, HTML, and CSS. But does a developer getting ready to work in Visual Builder Designer have to go take those courses first or can they start working in VB Designer right away? Joe: Yeah, that question does often comes up: Do I need to learn JET to work with Visual Builder? No, you don’t. That’s all taken care for you in the products themselves. I don’t really think it helps that much to learn JET if you are going to be a VB developer. In some ways, it could even be a bit distracting since some of things you learn to do in JET, you would have to unlearn or not do so much because of what VB does it for you. The things you would have to do manually in code in JET are done for you. This is why we call VB a low code development tool.  I mean, you certainly can if you want to, but I would spend more time learning about the different GUI components, page templates, the Visual Builder architecture — events, action chains, and the data provider variables and types. Now, I know JET myself. I started with that before learning Visual Builder, but I use very little of my JET knowledge as a VB developer. Visual Builder Designer provides a nice, abstracted, clean layer of modern visual development on top of JET, while leveraging the power and flexibility of JET and keeping the lower-level details out of my way. 22:46 Nikita: Joe, where can I go to get started with Visual Builder? Joe: Well, for more information, I recommend you take a look at our Develop Fusion Applications course if you’re working with Fusion Applications and Visual Builder Studio. The other course is Develop Visual Applications with Visual Builder Studio and that’s if you’re creating stand-alone bespoke applications. Both these courses are free. We also have a comprehensive course that covers JavaScript, HTML5, and CSS3, and while it’s not required that you take that to be successful, it can be helpful down the road. I would say that some basic knowledge of HTML5, CSS3, and JavaScript will certainly support you and serve you well when working with Visual Builder. You learn more as you go along and you find that you need to create more sophisticated applications. I would also mention that a lot of the look and feel of the applications in Visual Builder visual applications and Fusion apps extensions and customizations come through JET components, JET styles, and JET variables, and CSS variables, so that’s something that you would want to pursue at some point. There’s a JET cookbook out there. You can search for Oracle JET and look for the JET cookbook and that’s a good introduction to all of that. 23:47 Lois: Joe, thank you so much for joining us today. We’re really looking forward to having you back next week to discuss extending Oracle Fusion Applications with Visual Builder Studio. Joe: Thanks for having me. Nikita: And if you want to learn about some of the courses Joe mentioned, visit mylearn.oracle.com to get started. Until next time, this is Nikita Abraham… Lois: And Lois Houston signing off! 24:09 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
24m
12/03/2024

OCI AI Services

Listen to Lois Houston and Nikita Abraham, along with Senior Principal Product Manager Wes Prichard, as they explore the five core components of OCI AI services: language, speech, vision, document understanding, and anomaly detection, to help you make better sense of all that unstructured data around you. Oracle MyLearn: https://mylearn.oracle.com/ou/learning-path/become-an-oci-ai-foundations-associate-2023/127177 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X (formerly Twitter): https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Himanshu Raj, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:26 Nikita: Welcome to the Oracle University Podcast! I’m Nikita Abraham, Principal Technical Editor with Oracle University, and with me is Lois Houston, Director of Innovation Programs. Lois: Hi there! In our last episode, we spoke about OCI AI Portfolio, including AI and ML services, and the OCI AI infrastructure. Nikita: Yeah, and in today’s episode, we’re going to continue down a similar path and take a closer look at OCI AI services. 00:55 Lois: With us today is Senior Principal Product Manager, Wes Prichard. Hi Wes! It’s lovely to have you here with us. Hemant gave us a broad overview of the various OCI AI services last week, but we’re really hoping to get into each of them with you. So, let’s jump right in and start with the OCI Language service. What can you tell us about it? Wes: OCI Language analyzes unstructured text for you. It provides models trained on industry data to perform language analysis with no data science experience needed.  01:27 Nikita: What kind of big things can it do? Wes: It has five main capabilities. First, it detects the language of the text. It recognizes 75 languages, from Afrikaans to Welsh.  It identifies entities, things like names, places, dates, emails, currency, organizations, phone numbers--14 types in all. It identifies the sentiment of the text, and not just one sentiment for the entire block of text, but the different sentiments for different aspects.  01:56 Nikita: What do you mean by that, Wes? Wes: So let's say you read a restaurant review that said, the food was great, but the service sucked. You'll get food with a positive sentiment and service with a negative sentiment. And it also analyzes the sentiment for every sentence.  Lois: Ah, that’s smart. Ok, so we covered three capabilities. What else? Wes: It identifies key phrases in the text that represent the important ideas or subjects. And it classifies the general topic of the text from a list of 600 categories and subcategories.  02:27 Lois: Ok, and then there’s the OCI Speech service...  Wes: OCI Speech is very straightforward. It locks the data in audio tracks by converting speech to text. Developers can use Oracle's time-tested acoustic language models to provide highly accurate transcription for audio or video files across multiple languages.  OCI Speech automatically transcribes audio and video files into text using advanced deep learning techniques. There's no data science experience required. It processes data directly in object storage. And it generates timestamped, grammatically accurate transcriptions.  03:01 Nikita: What are some of the main features of OCI Speech? Wes: OCI Speech supports multiple languages, specifically English, Spanish, and Portuguese, with more coming in the future. It has batching support where multiple files can be submitted with a single call. It has blazing fast processing. It can transcribe hours of audio in less than 10 minutes. It does this by chunking up your audio into smaller segments, and transcribing each segment, and then joining them all back together into a single file. It provides a confidence score, both per word and per transcription. It punctuates transcriptions to make the text more readable and to allow downstream systems to process the text with less friction.  And it has SRT file support.  03:45 Lois: SRT? What’s that? Wes: SRT is the most popular closed caption output file format. And with this SRT support, users can add closed captions to their video. OCI Speech makes transcribed text more readable to resemble how humans write. This is called normalization. And the service will normalize things like addresses, times, numbers, URLs, and more.  It also does profanity filtering, where it can either remove, mask, or tag profanity and output text, where removing replaces the word with asterisks, and masking does the same thing, but it retains the first letter, and tagging will leave the word in place, but it provides tagging in the output data.  04:29 Nikita: And what about OCI Vision? What are its capabilities? Wes: Vision is a computed vision service that works on images, and it provides two main capabilities-- image analysis and document AI. Image analysis analyzes photographic images. Object detection is the feature that detects objects inside an image using a bounding box and assigning a label to each object with an accuracy percentage. Object detection also locates and extracts text that appears in the scene, like on a sign.  Image classification will assign classification labels to the image by identifying the major features in the scene. One of the most powerful capabilities of image analysis is that, in addition to pretrained models, users can retrain the models with their own unique data to fit their specific needs.  05:20 Lois: So object detection and image classification are features of image analysis. I think I got it! So then what’s document AI?  Wes: It's used for working with document images. You can use it to understand PDFs or document image types, like JPEG, PNG, and Tiff, or photographs containing textual information.  05:40 Lois: And what are its most important features? Wes: The features of document AI are text recognition, also known as OCR or optical character recognition.  And this extracts text from images, including non-trivial scenarios, like handwritten texts, plus tilted, shaded, or rotated documents. Document classification classifies documents into 10 different types based on visual appearance, high-level features, and extracted keywords. This is useful when you need to process a document, based on its classification, like an invoice, a receipt, or a resume.  Language detection analyzes the visual features of text to determine the language rather than relying on the text itself. Table extraction identifies tables in docs and extracts their content in tabular form. Key value extraction finds values for 13 common fields and line items in receipts, things like merchant name and transaction date.  06:41 Want to get the inside scoop on Oracle University? Head over to the Oracle University Learning Community. Attend exclusive events. Read up on the latest news. Get first-hand access to new products. Read the OU Learning Blog. Participate in Challenges. And stay up-to-date with upcoming certification opportunities. Visit mylearn.oracle.com to get started.  07:06 Nikita: Welcome back! Wes, I want to ask you about OCI Anomaly Detection. We discussed it a bit last week and it seems like such an intelligent and efficient service. Wes: Oracle Cloud Infrastructure Anomaly Detection identifies anomalies in time series data. Equipment sensors generate time series data, but all kinds of business metrics are also time-based. The unique feature of this service is that it finds anomalies, not just in a single signal, but across many signals at once. That's important because machines often generate multiple signals at once and the signals are often related.  07:42 Nikita: Ok you need to give us an example of this! Wes: Think of a pump that has an output pressure, a flow rate, an RPM, and an electrical current draw. When a pump's going to fail, anomalies may appear across several of those signals but at different times. OCI Anomaly Detection helps you to identify anomalies in a multivariate data set by taking advantage of the interrelationship among signals.  The service contains algorithms for both multi-signal, as in multivariate, single signal, as in univariate anomaly detection, and it automatically determines which algorithm to use based on the training data provided. The multivariate algorithm is called MSET-2, which stands for Multivariate State Estimation technique, and it's unique to Oracle.  08:28 Lois: And the 2? Wes: The 2 in the name refers to the patented enhancements by Oracle labs that automatically identify and fix data quality issues resulting in fewer false alarms and more accurate results.  Now unlike some of the other AI services, OCI Anomaly Detection is always trained on the customer's data. It's trained using actual historical data with no anomalies, and there can be as many different trained models as needed for different sets of signals.  08:57 Nikita: So where would one use a service like this? Wes: One of the most obvious applications of this service is for predictive maintenance. Early warning of a problem provides the opportunity to deploy maintenance resources and schedule downtime to minimize disruption to the business.  09:12 Lois: How would you train an OCI Anomaly Detection model? Wes: It's a simple four-step process to prepare a model that can be used for anomaly detection. The first step is to obtain training data from the system to be monitored. The data must contain no anomalies and should cover the normal range of values that would be experienced in a full business cycle.  Second, the training data file is uploaded to an object storage bucket.  Third, a data set is created for the training data. So a data set in this context is an object in the OCI Anomaly Detection service to manage data used for training and testing models.  And fourth, the model is trained. A wizard in the user interface steps the user through the required inputs, such as the training data set and some training parameters like the target false alarm probability.  10:02 Lois: How would this service know about the data and whether the trained model is univariate or multivariate? Wes: When training OCI Anomaly Detection models, the user does not need to specify whether the intended model is for multivariate or univariate data. It does this detection automatically.  For example, if a model is trained with 10 signals and 5 of those signals are determined to be correlated enough for multivariate anomaly detection, it will create an internal multivariate model for those signals. If the other five signals are not correlated with each other, it will create an internal univariate model for each one.  From the user's perspective, the result will be a single OCI anomaly detection model for the 10 signals. But internally, the signals are treated differently based on the training. A user can also train a model on a single signal and it will result in a univariate model.  10:55 Lois: What does this OCI Anomaly Detection model training entail? How does it ensure that it does not have any false alarms? Wes: Training a model requires a single data file with no anomalies that should cover a complete business cycle, which means it should represent all the normal variations in the signal. During training, OCI Anomaly Detection will use a portion of the data for training and another portion for automated testing. The fraction used for each is specified when the model is trained.  When model training is complete, it's best practice to do another test of the model with a data set containing anomalies to see if the anomalies are detected and if there are any false alarms. Based on the outcome, the user may want to retrain the model and specify a different false alarm probability, also called F-A-P or FAP. The FAP is the probability that the model would produce a false alarm. The false alarm probability can be thought of as the sensitivity of the model. The lower the false alarm probability, the less likelihood of it reporting a false alarm, but the less sensitive it will be to detecting anomalies. Selecting the right FAP is a business decision based on the need for sensitive detections balanced by the ability to tolerate false alarms.  Once a model has been trained and the user is satisfied with its detection performance, it can then be used for inferencing.  12:23 Nikita: Inferencing? Is that what I think it is?  Wes: New data is submitted to the model and OCI Anomaly Detection will respond with anomalies that are detected. The input data must contain the same signals that the model was trained on. So, for example, if the model was trained on signals A, B, C, and D, then for detection inferencing, the same four signals must be provided. No more, no less. 12:46 Lois: Where can I find the features of OCI Anomaly Detection that you mentioned?  Wes: The training and inferencing features of OCI Anomaly Detection can be accessed through the OCI console. However, a human-driven interface is not efficient for most business scenarios.  In most cases, automating the detection of anomalies through software is preferred to be able to process hundreds or thousands of signals using many trained models. The service provides multiple software interfaces for this purpose.  Each trained model is accessible through a REST API and an HTTP endpoint. Additionally, programming language-specific SDKs are available for multiple languages, including Python. Using the Python SDK, data scientists can work with OCI Anomaly Detection for both training and inferencing in an OCI Data Science notebook.  13:37 Nikita: How can a data scientist take advantage of these capabilities?  Wes: Well, you can write code against the REST API or use any of the various language SDKs. But for data scientists working in OCI Data Science, it makes sense to use Python.  13:51 Lois: That’s exciting! What does it take to use the Python SDK in a notebook… to be able to use the AI services? Wes: You can use a Notebook session in OCI Data Science to invoke the SDK for any of the AI services.  This might be useful to generate new features for a custom model or simply as a way to consume the service using a familiar Python interface. But before you can invoke the SDK, you have to prepare the data science notebook session by supplying it with an API Signing Key.  Signing Key is unique to a particular user and tenancy and authenticates that user to OCI when invoking the SDK. So therefore, you want to make sure you safeguard your Signing Key and never share it with another user.  14:34 Nikita: And where would I get my API Signing Key? Wes: You can obtain an API Signing Key from your user profile in the OCI Console. Then you save that key as a file to your local machine.  The API Signing Key also provides commands to be added to a config file that the SDK expects to find in the environment, where the SDK code is executing. The config file then references the key file. Once these files are prepared on your local machine, you can upload them to the Notebook session, where you will execute SDK code for the AI service.  The API Signing Key and config file can be reused with any of your notebook sessions, and the same files also work for all of the AI services. So, the files only need to be created once for each user and tenancy combination.  15:27 Lois: Thank you so much, Wes, for this really insightful discussion. To learn more about the topics covered today, you can visit mylearn.oracle.com and search for the Oracle Cloud Infrastructure AI Foundations course. Nikita: And remember, that course prepares you for the Oracle Cloud Infrastructure AI Foundations Associate certification that you can take for free! So, don’t wait too long to check it out. Join us next week for another episode of the Oracle University Podcast. Until then, this is Nikita Abraham… Lois Houston: And Lois Houston, signing off! 16:03 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
16m
05/03/2024

The OCI AI Portfolio

Oracle has been actively focusing on bringing AI to the enterprise at every layer of its tech stack, be it SaaS apps, AI services, infrastructure, or data. In this episode, hosts Lois Houston and Nikita Abraham, along with senior instructors Hemant Gahankari and Himanshu Raj, discuss OCI AI and Machine Learning services. They also go over some key OCI Data Science concepts and responsible AI principles. Oracle MyLearn: https://mylearn.oracle.com/ou/learning-path/become-an-oci-ai-foundations-associate-2023/127177 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X (formerly Twitter): https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Himanshu Raj, and the OU Studio Team for helping us create this episode. ------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:26 Lois: Welcome to the Oracle University Podcast! I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor. Nikita: Hey everyone! In our last episode, we dove into Generative AI and Language Learning Models.  Lois: Yeah, that was an interesting one. But today, we’re going to discuss the AI and machine learning services offered by Oracle Cloud Infrastructure, and we’ll look at the OCI AI infrastructure. Nikita: I’m also going to try and squeeze in a couple of questions on a topic I’m really keen about, which is responsible AI. To take us through all of this, we have two of our colleagues, Hemant Gahankari and Himanshu Raj. Hemant is a Senior Principal OCI Instructor and Himanshu is a Senior Instructor on AI/ML. So, let’s get started! 01:16 Lois: Hi Hemant! We’re so excited to have you here! We know that Oracle has really been focusing on bringing AI to the enterprise at every layer of our stack.  Hemant: It all begins with data and infrastructure layers. OCI AI services consume data, and AI services, in turn, are consumed by applications.  This approach involves extensive investment from infrastructure to SaaS applications. Generative AI and massive scale models are the more recent steps. Oracle AI is the portfolio of cloud services for helping organizations use the data they may have for the business-specific uses.  Business applications consume AI and ML services. The foundation of AI services and ML services is data. AI services contain pre-built models for specific uses. Some of the AI services are pre-trained, and some can be additionally trained by the customer with their own data.  AI services can be consumed by calling the API for the service, passing in the data to be processed, and the service returns a result. There is no infrastructure to be managed for using AI services.  02:37 Nikita: How do I access OCI AI services? Hemant: OCI AI services provide multiple methods for access. The most common method is the OCI Console. The OCI Console provides an easy to use, browser-based interface that enables access to notebook sessions and all the features of all the data science, as well as AI services.  The REST API provides access to service functionality but requires programming expertise. And API reference is provided in the product documentation. OCI also provides programming language SDKs for Java, Python, TypeScript, JavaScript, .Net, Go, and Ruby. The command line interface provides both quick access and full functionality without the need for scripting.  03:31 Lois: Hemant, what are the types of OCI AI services that are available?  Hemant: OCI AI services is a collection of services with pre-built machine learning models that make it easier for developers to build a variety of business applications. The models can also be custom trained for more accurate business results. The different services provided are digital assistant, language, vision, speech, document understanding, anomaly detection.  04:03 Lois: I know we’re going to talk about them in more detail in the next episode, but can you introduce us to OCI Language, Vision, and Speech? Hemant: OCI Language allows you to perform sophisticated text analysis at scale. Using the pre-trained and custom models, you can process unstructured text to extract insights without data science expertise. Pre-trained models include language detection, sentiment analysis, key phrase extraction, text classification, named entity recognition, and personal identifiable information detection.  Custom models can be trained for named entity recognition and text classification with domain-specific data sets. In text translation, natural machine translation is used to translate text across numerous languages.  Using OCI Vision, you can upload images to detect and classify objects in them. Pre-trained models and custom models are supported. In image analysis, pre-trained models perform object detection, image classification, and optical character recognition. In image analysis, custom models can perform custom object detection by detecting the location of custom objects in an image and providing a bounding box.  The OCI Speech service is used to convert media files to readable texts that's stored in JSON and SRT format. Speech enables you to easily convert media files containing human speech into highly exact text transcriptions.  05:52 Nikita: That’s great. And what about document understanding and anomaly detection? Hemant: Using OCI document understanding, you can upload documents to detect and classify text and objects in them. You can process individual files or batches of documents. In OCR, document understanding can detect and recognize text in a document. In text extraction, document understanding provides the word level and line level text, and the bounding box, coordinates of where the text is found.  In key value extraction, document understanding extracts a predefined list of key value pairs of information from receipts, invoices, passports, and driver IDs. In table extraction, document understanding extracts content in tabular format, maintaining the row and column relationship of cells. In document classification, the document understanding classifies documents into different types.  The OCI Anomaly Detection service is a service that analyzes large volume of multivariate or univariate time series data. The Anomaly Detection service increases the reliability of businesses by monitoring their critical assets and detecting anomalies early with high precision. Anomaly Detection is the identification of rare items, events, or observations in data that differ significantly from the expectation.  07:34 Nikita: Where is Anomaly Detection most useful? Hemant: The Anomaly Detection service is designed to help with analyzing large amounts of data and identifying the anomalies at the earliest possible time with maximum accuracy. Different sectors, such as utility, oil and gas, transportation, manufacturing, telecommunications, banking, and insurance use Anomaly Detection service for their day-to-day activities.  08:02 Lois: Ok.. and the first OCI AI service you mentioned was digital assistant… Hemant: Oracle Digital Assistant is a platform that allows you to create and deploy digital assistants, which are AI driven interfaces that help users accomplish a variety of tasks with natural language conversations. When a user engages with the Digital Assistant, the Digital Assistant evaluates the user input and routes the conversation to and from the appropriate skills.  Digital Assistant greets the user upon access. Upon user requests, list what it can do and provide entry points into the given skills. It routes explicit user requests to the appropriate skills. And it also handles interruptions to flows and disambiguation. It also handles requests to exit the bot.  09:00 Nikita: Excellent! Let’s bring Himanshu in to tell us about machine learning services. Hi Himanshu! Let’s talk about OCI Data Science. Can you tell us a bit about it? Himanshu: OCI Data Science is the cloud service focused on serving the data scientist throughout the full machine learning life cycle with support for Python and open source.  The service has many features, such as model catalog, projects, JupyterLab notebook, model deployment, model training, management, model explanation, open source libraries, and AutoML.  09:35 Lois: Himanshu, what are the core principles of OCI Data Science?  Himanshu: There are three core principles of OCI Data Science. The first one, accelerated. The first principle is about accelerating the work of the individual data scientist. OCI Data Science provides data scientists with open source libraries along with easy access to a range of compute power without having to manage any infrastructure. It also includes Oracle's own library to help streamline many aspects of their work.  The second principle is collaborative. It goes beyond an individual data scientist’s productivity to enable data science teams to work together. This is done through the sharing of assets, reducing duplicative work, and putting reproducibility and auditability of models for collaboration and risk management.  Third is enterprise grade. That means it's integrated with all the OCI Security and access protocols. The underlying infrastructure is fully managed. The customer does not have to think about provisioning compute and storage. And the service handles all the maintenance, patching, and upgrades so user can focus on solving business problems with data science.  10:50 Nikita: Let’s drill down into the specifics of OCI Data Science. So far, we know it’s cloud service to rapidly build, train, deploy, and manage machine learning models. But who can use it? Where is it? And how is it used? Himanshu: It serves data scientists and data science teams throughout the full machine learning life cycle.  Users work in a familiar JupyterLab notebook interface, where they write Python code. And how it is used? So users preserve their models in the model catalog and deploy their models to a managed infrastructure.  11:25 Lois: Walk us through some of the key terminology that’s used. Himanshu: Some of the important product terminology of OCI Data Science are projects. The projects are containers that enable data science teams to organize their work. They represent collaborative work spaces for organizing and documenting data science assets, such as notebook sessions and models.  Note that tenancy can have as many projects as needed without limits. Now, this notebook session is where the data scientists work. Notebook sessions provide a JupyterLab environment with pre-installed open source libraries and the ability to add others. Notebook sessions are interactive coding environment for building and training models.  Notebook sessions run in a managed infrastructure and the user can select CPU or GPU, the compute shape, and amount of storage without having to do any manual provisioning. The other important feature is Conda environment. It's an open source environment and package management system and was created for Python programs.  12:33 Nikita: What is a Conda environment used for? Himanshu: It is used in the service to quickly install, run, and update packages and their dependencies. Conda easily creates, saves, loads, and switches between environments in your notebooks sessions. 12:46 Nikita: Earlier, you spoke about the support for Python in OCI Data Science. Is there a dedicated library? Himanshu: Oracle's Accelerated Data Science ADS SDK is a Python library that is included as part of OCI Data Science.  ADS has many functions and objects that automate or simplify the steps in the data science workflow, including connecting to data, exploring, and visualizing data. Training a model with AutoML, evaluating models, and explaining models. In addition, ADS provides a simple interface to access the data science service mode model catalog and other OCI services, including object storage.  13:24 Lois: I also hear a lot about models. What are models? Himanshu: Models define a mathematical representation of your data and business process. You create models in notebooks, sessions, inside projects.  13:36 Lois: What are some other important terminologies related to models? Himanshu: The next terminology is model catalog. The model catalog is a place to store, track, share, and manage models.  The model catalog is a centralized and managed repository of model artifacts. A stored model includes metadata about the provenance of the model, including Git-related information and the script. Our notebook used to push the model to the catalog. Models stored in the model catalog can be shared across members of a team, and they can be loaded back into a notebook session.  The next one is model deployments. Model deployments allow you to deploy models stored in the model catalog as HTTP endpoints on managed infrastructure.  14:24 Lois: So, how do you operationalize these models? Himanshu: Deploying machine learning models as web applications, HTTP API endpoints, serving predictions in real time is the most common way to operationalize models. HTTP endpoints or the API endpoints are flexible and can serve requests for the model predictions. Data science jobs enable you to define and run a repeatable machine learning tasks on fully managed infrastructure.  Nikita: Thanks for that, Himanshu.  14:57 Did you know that Oracle University offers free courses on Oracle Cloud Infrastructure? You’ll find training on everything from cloud computing, database, and security, artificial intelligence, and machine learning, all free to subscribers. So, what are you waiting for? Pick a topic, leverage the Oracle University Learning Community to ask questions, and then sit for your certification. Visit mylearn.oracle.com to get started.  15:25 Nikita: Welcome back! The Oracle AI Stack consists of AI services and machine learning services, and these services are built using AI infrastructure. So, let’s move on to that. Hemant, what are the components of OCI AI Infrastructure? Hemant: OCI AI Infrastructure is mainly composed of GPU-based instances. Instances can be virtual machines or bare metal machines. High performance cluster networking that allows instances to communicate to each other. Super clusters are a massive network of GPU instances with multiple petabytes per second of bandwidth. And a variety of fully managed storage options from a single byte to exabytes without upfront provisioning are also available.  16:14 Lois: Can we explore each of these components a little more? First, tell us, why do we need GPUs? Hemant: ML and AI needs lots of repetitive computations to be made on huge amounts of data. Parallel computing on GPUs is designed for many processes at the same time. A GPU is a piece of hardware that is incredibly good in performing computations.  GPU has thousands of lightweight cores, all working on their share of data in parallel. This gives them the ability to crunch through extremely large data set at tremendous speed.  16:54 Nikita: And what are the GPU instances offered by OCI? Hemant: GPU instances are ideally suited for model training and inference. Bare metal and virtual machine compute instances powered by NVIDIA GPUs H100, A100, A10, and V100 are made available by OCI.  17:14 Nikita: So how do we choose what to train from these different GPU options?  Hemant: For large scale AI training, data analytics, and high performance computing, bare metal instances BM 8 X NVIDIA H100 and BM 8 X NVIDIA A100 can be used.  These provide up to nine times faster AI training and 30 times higher acceleration for AI inferencing. The other bare metal and virtual machines are used for small AI training, inference, streaming, gaming, and virtual desktop infrastructure.  17:53 Lois: And why would someone choose the OCI AI stack over its counterparts? Hemant: Oracle offers all the features and is the most cost effective option when compared to its counterparts.  For example, BM GPU 4.8 version 2 instance costs just $4 per hour and is used by many customers.  Superclusters are a massive network with multiple petabytes per second of bandwidth. It can scale up to 4,096 OCI bare metal instances with 32,768 GPUs.  We also have a choice of bare metal A100 or H100 GPU instances, and we can select a variety of storage options, like object store, or block store, or even file system. For networking speeds, we can reach 1,600 GB per second with A100 GPUs and 3,200 GB per second with H100 GPUs.  With OCI storage, we can select local SSD up to four NVMe drives, block storage up to 32 terabytes per volume, object storage up to 10 terabytes per object, file systems up to eight exabyte per file system. OCI File system employs five replicated storage located in different fault domains to provide redundancy for resilient data protection.  HPC file systems, such as BeeGFS and many others are also offered. OCI HPC file systems are available on Oracle Cloud Marketplace and make it easy to deploy a variety of high performance file servers.  19:50 Lois: I think a discussion on AI would be incomplete if we don’t talk about responsible AI. We’re using AI more and more every day, but can we actually trust it? Hemant: For us to trust AI, it must be driven by ethics that guide us as well. Nikita: And do we have some principles that guide the use of AI? Hemant: AI should be lawful, complying with all applicable laws and regulations. AI should be ethical, that is it should ensure adherence to ethical principles and values that we uphold as humans. And AI should be robust, both from a technical and social perspective. Because even with the good intentions, AI systems can cause unintentional harm. AI systems do not operate in a lawless world. A number of legally binding rules at national and international level apply or are relevant to the development, deployment, and use of AI systems today. The law not only prohibits certain actions but also enables others, like protecting rights of minorities or protecting environment. Besides horizontally applicable rules, various domain-specific rules exist that apply to particular AI applications. For instance, the medical device regulation in the health care sector.  In AI context, equality entails that the systems’ operations cannot generate unfairly biased outputs. And while we adopt AI, citizens right should also be protected.  21:30 Lois: Ok, but how do we derive AI ethics from these? Hemant: There are three main principles.  AI should be used to help humans and allow for oversight. It should never cause physical or social harm. Decisions taken by AI should be transparent and fair, and also should be explainable. AI that follows the AI ethical principles is responsible AI.  So if we map the AI ethical principles to responsible AI requirements, these will be like, AI systems should follow human-centric design principles and leave meaningful opportunity for human choice. This means securing human oversight. AI systems and environments in which they operate must be safe and secure, they must be technically robust, and should not be open to malicious use.  The development, and deployment, and use of AI systems must be fair, ensuring equal and just distribution of both benefits and costs. AI should be free from unfair bias and discrimination. Decisions taken by AI to the extent possible should be explainable to those directly and indirectly affected.  23:01 Nikita: This is all great, but what does a typical responsible AI implementation process look like?  Hemant: First, a governance needs to be put in place. Second, develop a set of policies and procedures to be followed. And once implemented, ensure compliance by regular monitoring and evaluation.  Lois: And this is all managed by developers? Hemant: Typical roles that are involved in the implementation cycles are developers, deployers, and end users of the AI.  23:35 Nikita: Can we talk about AI specifically in health care? How do we ensure that there is fairness and no bias? Hemant: AI systems are only as good as the data that they are trained on. If that data is predominantly from one gender or racial group, the AI systems might not perform as well on data from other groups.  24:00 Lois: Yeah, and there’s also the issue of ensuring transparency, right? Hemant: AI systems often make decisions based on complex algorithms that are difficult for humans to understand. As a result, patients and health care providers can have difficulty trusting the decisions made by the AI. AI systems must be regularly evaluated to ensure that they are performing as intended and not causing harm to patients.  24:29 Nikita: Thank you, Hemant and Himanshu, for this really insightful session. If you’re interested in learning more about the topics we discussed today, head on over to mylearn.oracle.com and search for the Oracle Cloud Infrastructure AI Foundations course.  Lois: That’s right, Niki. You’ll find demos that you watch as well as skill checks that you can attempt to better your understanding. In our next episode, we’ll get into the OCI AI Services we discussed today and talk about them in more detail. Until then, this is Lois Houston… Nikita: And Nikita Abraham, signing off! 25:05 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
25m
27/02/2024

Generative AI and Large Language Models

In this week’s episode, Lois Houston and Nikita Abraham, along with Senior Instructor Himanshu Raj, take you through the extraordinary capabilities of Generative AI, a subset of deep learning that doesn’t make predictions but rather creates its own content. They also explore the workings of Large Language Models. Oracle MyLearn: https://mylearn.oracle.com/ou/learning-path/become-an-oci-ai-foundations-associate-2023/127177 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X (formerly Twitter): https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:26 Lois: Hello and welcome to the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal  Technical Editor.  Nikita: Hi everyone! In our last episode, we went over the basics of deep learning. Today, we’ll look at generative AI and large language models, and discuss how they work. To help us with that, we have Himanshu Raj, Senior Instructor on AI/ML. So, let’s jump right in. Hi Himanshu, what is generative AI?  01:00 Himanshu: Generative AI refers to a type of AI that can create new content. It is a subset of deep learning, where the models are trained not to make predictions but rather to generate output on their own.  Think of generative AI as an artist who looks at a lot of paintings and learns the patterns and styles present in them. Once it has learned these patterns, it can generate new paintings that resembles what it learned. 01:27 Lois: Let's take an example to understand this better. Suppose we want to train a generative AI model to draw a dog. How would we achieve this? Himanshu: You would start by giving it a lot of pictures of dogs to learn from. The AI does not know anything about what a dog looks like. But by looking at these pictures, it starts to figure out common patterns and features, like dogs often have pointy ears, narrow faces, whiskers, etc. You can then ask it to draw a new picture of a dog.  The AI will use the patterns it learned to generate a picture that hopefully looks like a dog. But remember, the AI is not copying any of the pictures it has seen before but creating a new image based on the patterns it has learned. This is the basic idea behind generative AI. In practice, the process involves a lot of complex maths and computation, and there are different techniques and architectures that can be used, such as variational autoencoders (VAs) and Generative Adversarial Networks (GANs).  02:27 Nikita: Himanshu, where is generative AI used in the real world? Himanshu: Generative AI models have a wide variety of applications across numerous domains. For the image generation, generative models like GANs are used to generate realistic images. They can be used for tasks, like creating artwork, synthesizing images of human faces, or transforming sketches into photorealistic images.  For text generation, large language models like GPT 3, which are generative in nature, can create human-like text. This has applications in content creation, like writing articles, generating ideas, and again, conversational AI, like chat bots, customer service agents. They are also used in programming for code generation and debugging, and much more.  For music generation, generative AI models can also be used. They create new pieces of music after being trained on a specific style or collection of tunes. A famous example is OpenAI's MuseNet. 03:21 Lois: You mentioned large language models in the context of text-based generative AI. So, let’s talk a little more about it. Himanshu, what exactly are large language models? Himanshu: LLMs are a type of artificial intelligence models built to understand, generate, and process human language at a massive scale. They were primarily designed for sequence to sequence tasks such as machine translation, where an input sequence is transformed into an output sequence.  LLMs can be used to translate text from one language to another. For example, an LLM could be used to translate English text into French. To do this job, LLM is trained on a massive data set of text and code which allows it to learn the patterns and relationships that exist between different languages. The LLM translates, “How are you?” from English to French, “Comment allez-vous?”  It can also answer questions like, what is the capital of France? And it would answer the capital of France is Paris. And it will write an essay on a given topic. For example, write an essay on French Revolution, and it will come up with a response like with a title and introduction. 04:33 Lois: And how do LLMs actually work? Himanshu: So, LLM models are typically based on deep learning architectures such as transformers. They are also trained on vast amount of text data to learn language patterns and relationships, again, with a massive number of parameters usually in order of millions or even billions. LLMs have also the ability to comprehend and understand natural language text at a semantic level. They can grasp context, infer meaning, and identify relationships between words and phrases.  05:05 Nikita: What are the most important factors for a large language model? Himanshu: Model size and parameters are crucial aspects of large language models and other deep learning models. They significantly impact the model’s capabilities, performance, and resource requirement. So, what is model size? The model size refers to the amount of memory required to store the model's parameter and other data structures. Larger model sizes generally led to better performance as they can capture more complex patterns and representation from the data.  The parameters are the numerical values of the model that change as it learns to minimize the model's error on the given task. In the context of LLMs, parameters refer to the weights and biases of the model's transformer layers. Parameters are usually measured in terms of millions or billions. For example, GPT-3, one of the largest LLMs to date, has 175 billion parameters making it extremely powerful in language understanding and generation.  Tokens represent the individual units into which a piece of text is divided during the processing by the model. In natural language, tokens are usually words, subwords, or characters. Some models have a maximum token limit that they can process and longer text can may require truncation or splitting. Again, balancing model size, parameters, and token handling is crucial when working with LLMs.  06:29 Nikita: But what’s so great about LLMs? Himanshu: Large language models can understand and interpret human language more accurately and contextually. They can comprehend complex sentence structures, nuances, and word meanings, enabling them to provide more accurate and relevant responses to user queries. This model can generate human-like text that is coherent and contextually appropriate. This capability is valuable for context creation, automated writing, and generating personalized response in applications like chatbots and virtual assistants. They can perform a variety of tasks.  Large language models are very versatile and adaptable to various industries. They can be customized to excel in applications such as language translation, sentiment analysis, code generation, and much more. LLMs can handle multiple languages making them valuable for cross-lingual tasks like translation, sentiment analysis, and understanding diverse global content.  Large language models can be again, fine-tuned for a specific task using a minimal amount of domain data. The efficiency of LLMs usually grows with more data and parameters. 07:34 Lois: You mentioned the “sequence to sequence tasks” earlier. Can you explain the concept in simple terms for us? Himanshu: Understanding language is difficult for computers and AI systems. The reason being that words often have meanings based on context. Consider a sentence such as Jane threw the frisbee, and her dog fetched it.  In this sentence, there are a few things that relate to each other. Jane is doing the throwing. The dog is doing the fetching. And it refers to the frisbee. Suppose we are looking at the word “it” in the sentence. As a human, we understand easily that “it” refers to the frisbee. But for a machine, it can be tricky. The goal in sequence problems is to find patterns, dependencies, or relationships within the data and make predictions, classification, or generate new sequences based on that understanding. 08:27 Lois: And where are sequence models mostly used? Himanshu: Some common example of sequence models includes natural language processing, which we call NLP, tasks such as machine translation, text generation sentiment analysis, language modeling involve dealing with sequences of words or characters.  Speech recognition. Converting audio signals into text, involves working with sequences of phonemes or subword units to recognize spoken words. Music generation. Generating new music involves modeling musical sequences, nodes, and rhythms to create original compositions.  Gesture recognition. Sequences of motion or hand gestures are used to interpret human movements for applications, such as sign language recognition or gesture-based interfaces. Time series analysis. In fields such as finance, economics, weather forecasting, and signal processing, time series data is used to predict future values, detect anomalies, and understand patterns in temporal data. 09:35 The Oracle University Learning Community is an excellent place to collaborate and learn with Oracle experts and fellow learners. Grow your skills, inspire innovation, and celebrate your successes. All your activities, from liking a post to answering questions and sharing with others, will help you earn a valuable reputation, badges, and ranks to be recognized in the community. Visit mylearn.oracle.com to get started.  10:03 Nikita: Welcome back! Himanshu, what would be the best way to solve those sequence problems you mentioned? Let’s use the same sentence, “Jane threw the frisbee, and her dog fetched it” as an example. Himanshu: The solution is transformers. It's like model has a bird's eye view of the entire sentence and can see how all the words relate to each other. This allows it to understand the sentence as a whole instead of just a series of individual words. Transformers with their self-attention mechanism can look at all the words in the sentence at the same time and understand how they relate to each other.  For example, transformer can simultaneously understand the connections between Jane and dog even though they are far apart in the sentence. 10:52 Nikita: But how? Himanshu: The answer is attention, which adds context to the text. Attention would notice dog comes after frisbee, fetched comes after dog, and it comes after fetched.  Transformer does not look at it in isolation. Instead, it also pays attention to all the other words in the sentence at the same time. But considering all these connections, the model can figure out that “it” likely refers to the frisbee.  The most famous current models that are emerging in natural language processing tasks consist of dozens of transformers or some of their variants, for example, GPT or Bert. 11:32 Lois: I was looking at the AI Foundations course on MyLearn and came across the terms “prompt engineering” and “fine tuning.” Can you shed some light on them? Himanshu: A prompt is the input or initial text provided to the model to elicit a specific response or behavior. So, this is something which you write or ask to a language model. Now, what is prompt engineering? So prompt engineering is the process of designing and formulating specific instructions or queries to interact with a large language model effectively.  In the context of large language models, such as GPT 3 or Burt, prompts are the input text or questions given to the model to generate responses or perform specific tasks.  The goal of prompt engineering is to ensure that the language model understands the user's intent correctly and provide accurate and relevant responses. 12:26 Nikita: That sounds easy enough, but fine tuning seems a bit more complex. Can you explain it with an example? Himanshu: Imagine you have a versatile recipe robot named chef bot. Suppose that chef bot is designed to create delicious recipes for any dish you desire.  Chef bot recognizes the prompt as a request for a pizza recipe, and it knows exactly what to do. However, if you want chef bot to be an expert in a particular type of cuisine, such as Italian dishes, you fine-tune chef bot for Italian cuisine by immersing it in a culinary crash course filled with Italian cookbooks, traditional Italian recipes, and even Italian cooking shows.  During this process, chef bot becomes more specialized in creating authentic Italian recipes, and this option is called fine tuning. LLMs are general purpose models that are pre-trained on large data sets but are often fine-tuned to address specific use cases.  When you combine prompt engineering and fine tuning, and you get a culinary wizard in chef bot, a recipe robot that is not only great at understanding specific dish requests but also capable of following a specific dish requests and even mastering the art of cooking in a particular culinary style. 13:47 Lois: Great! Now that we’ve spoken about all the major components, can you walk us through the life cycle of a large language model? Himanshu: The life cycle of a Large Language Model, LLM, involves several stages, from its initial pre-training to its deployment and ongoing refinement.  The first of this lifecycle is pre-training. The LLM is initially pre-trained on a large corpus of text data from the internet. During pre-training, the model learns grammar, facts, reasoning abilities, and general language understanding. The model predicts the next word in a sentence given the previous words, which helps it capture relationships between words and the structure of language.  The second phase is fine tuning initialization. After pre-training, the model's weights are initialized, and it's ready for task-specific fine tuning. Fine tuning can involve supervised learning on labeled data for specific tasks, such as sentiment analysis, translation, or text generation.  The model is fine-tuned on specific tasks using a smaller domain-specific data set. The weights from pre-training are updated based on the new data, making the model task aware and specialized. The next phase of the LLM life cycle is prompt engineering. So this phase craft effective prompts to guide the model's behavior in generating specific responses.  Different prompt formulations, instructions, or context can be used to shape the output.  15:13 Nikita: Ok… we’re with you so far. What’s next? Himanshu: The next phase is evaluation and iteration. So models are evaluated using various metrics to access their performance on specific tasks. Iterative refinement involves adjusting model parameters, prompts, and fine tuning strategies to improve results.  So as a part of this step, you also do few shot and one shot inference. If needed, you further fine tune the model with a small number of examples. Basically, few shot or a single example, one shot for new tasks or scenarios.  Also, you do the bias mitigation and consider the ethical concerns. These biases and ethical concerns may arise in models output. You need to implement measures to ensure fairness in inclusivity and responsible use.  16:07 Himanshu: The next phase in LLM life cycle is deployment. Once the model has been fine-tuned and evaluated, it is deployed for real world applications. Deployed models can perform tasks, such as text generation, translation, summarization, and much more. You also perform monitoring and maintenance in this phase.  So you continuously monitor the model's performance and output to ensure it aligns with desired outcomes. You also periodically update and retrain the model to incorporate new data and to adapt to evolving language patterns. This overall life cycle can also consist of a feedback loop, whether you gather feedbacks from users and incorporate it into the model’s improvement process.  You use this feedback to further refine prompts, fine tuning, and overall model behavior. RLHF, which is Reinforcement Learning with Human Feedback, is a very good example of this feedback loop. You also research and innovate as a part of this life cycle, where you continue to research and develop new techniques to enhance the model capability and address different challenges associated with it. 17:19 Nikita: As we’re talking about the LLM life cycle, I see that fine tuning is not only about making an LLM task specific. So, what are some other reasons you would fine tune an LLM model? Himanshu: The first one is task-specific adaptation. Pre-trained language models are trained on extensive and diverse data sets and have good general language understanding. They excel in language generation and comprehension tasks, though the broad understanding of language may not lead to optimal performance in specific task.  These models are not task specific. So the solution is fine tuning. The fine tuning process customizes the pre-trained models for a specific task by further training on task-specific data to adapt the model's knowledge.  The second reason is domain-specific vocabulary. Pre-trained models might lack knowledge of specific words and phrases essential for certain tasks in fields, such as legal, medical, finance, and technical domains. This can limit their performance when applied to domain-specific data.  Fine tuning enables the model to adapt and learn domain-specific words and phrases. These words could be, again, from different domains.  18:35 Himanshu: The third reason to fine tune is efficiency and resource utilization. So fine tuning is computationally efficient compared to training from scratch.  Fine tuning reuses the knowledge from pre-trained models, saving time and resources. Fine tuning requires fewer iterations to achieve task-specific competence. Shorter training cycles expedite the model development process. It conserves computational resources, such as GPU memory and processing power.  Fine tuning is efficient in quicker model deployment. It has faster time to production for real world applications. Fine tuning is, again, a scalable enabling adaptation to various tasks with the same base model, which further reduce resource demands, and it leads to cost saving for research and development.  The fourth reason to fine tune is of ethical concerns. Pre-trained models learns from diverse data. And those potentially inherit different biases. Fine tune might not completely eliminate biases. But careful curation of task specific data ensures avoiding biased or harmful vocabulary. The responsible uses of domain-specific terms promotes ethical AI applications.  19:53 Lois: Thank you so much, Himanshu, for spending time with us. We had such a great time learning from you. If you want to learn more about the topics discussed today, head over to mylearn.oracle.com and get started on our free AI Foundations course. Nikita: Yeah, we even have a detailed walkthrough of the architecture of transformers that you might want to check out. Join us next week for a discussion on the OCI AI Portfolio. Until then, this is Nikita Abraham… Lois: And Lois Houston signing off! 20:24 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
20m
20/02/2024

Deep Learning

Did you know that the concept of deep learning goes way back to the 1950s? However, it is only in recent years that this technology has created a tremendous amount of buzz (and for good reason!). A subset of machine learning, deep learning is inspired by the structure of the human brain, making it fascinating to learn about. In this episode, Lois Houston and Nikita Abraham interview Senior Principal OCI Instructor Hemant Gahankari about deep learning concepts, including how Convolution Neural Networks work, and help you get your deep learning basics right. Oracle MyLearn: https://mylearn.oracle.com/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X (formerly Twitter): https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Himanshu Raj, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:26 Lois: Hello and welcome to the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor. Nikita: Hi everyone! Last week, we covered the new MySQL HeatWave Implementation Associate certification. So do go check out that episode if it interests you. Lois: That was a really interesting discussion for sure. Today, we’re going to focus on the basics of deep learning with our Senior Principal OCI Instructor, Hemant Gahankari. 00:58 Nikita: Hi Hemant! Thanks for being with us today. So, to get started, what is deep learning? Hemant: Deep learning is a subset of machine learning that focuses on training Artificial Neural Networks to solve a task at hand. Say, for example, image classification. A very important quality of the ANN is that it can process raw data like pixels of an image and extract patterns from it. These patterns are treated as features to predict the outcomes.  Let us say we have a set of handwritten images of digits 0 to 9. As we know, everyone writes the digits in a slightly different way. So how do we train a machine to identify a handwritten digit? For this, we use ANN.  ANN accepts image pixels as inputs, extracts patterns like edges and curves and so on, and correlates these patterns to predict an outcome. That is what digit does the image has in this case.  02:04 Lois: Ok, so what you’re saying is given a bunch of pixels, ANN is able to process pixel data, learn an internal representation of the data, and predict outcomes. That’s so cool! So, why do we need deep learning? Hemant: We need to specify features while we train machine learning algorithm. With deep learning, features are automatically extracted from the data. Internal representation of features and their combinations is built to predict outcomes by deep learning algorithms. This may not be feasible manually.  Deep learning algorithms can make use of parallel computations. For this, usually data is split into small batches and process parallelly. So these algorithms can process large amount of data in a short time to learn the features and their combinations. This leads to scalability and performance. In short, deep learning complements machine learning algorithms for complex data for which features cannot be described easily.  03:13 Nikita: What can you tell us about the origins of deep learning? Hemant: Some of the deep learning concepts like artificial neuron, perceptron, and multilayer perceptron existed as early as 1950s. One of the most important concept of using backpropagation for training ANN came in 1980s.  In 1990s, convolutional neural network were also introduced for image analysis task. Starting 2000, GPUs were introduced. And 2010 onwards, GPUs became cheaper and widely available. This fueled the widespread adoption of deep learning uses like computer vision, natural language processing, speech recognition, text translation, and so on.  In 2012, major networks like AlexNet and Deep-Q Network were built. 2016 onward, generative use cases of the deep learning also started to come up. Today, we have widely adopted deep learning for a variety of use cases, including large language models and many other types of generative models.  04:29 Lois: Hemant, what are various applications of deep learning algorithms?  Hemant: Deep learning algorithms are targeted at a variety of data and applications. For data, we have images, videos, text, and audio. For images, applications can be image classification, object detection, and so on. For textual data, applications are to translate the text or detect a sentiment of a text. For audio, the applications can be music generation, speech to text, and so on.  05:08 Lois: It's important that we select the right deep learning algorithm based on the data and application, right? So how do we do that?  Hemant: For image task like image classification, object detection, image segmentation, or facial recognition, CNN is a suitable architecture. For text, we have a choice of the latest transformers or LSTM or even RNN. For generative tasks like text summarization, question answering, transformers is a good choice. For generating images, text to image generation, transformers, GANs, or diffusion models are available choice. 05:51 Nikita: Let’s dive a little deeper into Artificial Neural Networks. Can you tell us more about them, Hemant? Hemant: Artificial Neural Networks are inspired by the human brain. They are made up of interconnected nodes called as neurons.  Nikita: And how are inputs processed by a neuron?  Hemant: In ANN, we assign weights to the connection between neurons. Weighted inputs are added up. And if the sum crosses a specified threshold, the neuron is fired. And the outputs of a layer of neuron become an input to another layer.  06:27 Lois: Hemant, tell us about the building blocks of ANN so we understand this better. Hemant: So first, building block is layers. We have input layer, output layer, and multiple hidden layers. The input layer and output layer are mandatory. And the hidden layers are optional. The second unit is neurons. Neurons are computational units, which accept an input and produce an output.  Weights determine the strength of connection between neurons. So the connection could be between input and a neuron, or it could be between a neuron and another neuron. Activation functions work on the weighted sum of inputs to a neuron and produce an output. Additional input to the neuron that allows a certain degree of flexibility is called as a bias.  07:27 Nikita: I think we’ve got the components of ANN straight but maybe you should give us an example. You mentioned this example earlier…of needing to train ANN to recognize handwritten digits from images. How would we go about that? Hemant: For that, we have to collect a large number of digit images, and we need to train ANN using these images.  So, in this case, the images consist of 28 by 28 pixels which act as input layer. For the output, we have neurons-- 10 neurons which represent digits 0 to 9. And we have multiple hidden layers. So, in this case, we have two hidden layers which are consisting of 16 neurons each.  The hidden layers are responsible for capturing the internal representation of the raw image data. And the output layer is responsible for producing the desired outcomes. So, in this case, the desired outcome is the prediction of whether the digit is 0 or 1 or up to digit 9.  So how do we train this particular ANN? So the first thing we use the backpropagation algorithm. During training, we show an image to the ANN. Let us say it is an image of digit 2. So we expect output neuron for digit 2 to fire. But in real, let us say output neuron of a digit 6 fired.  09:12 Lois: So, then, what do we do?  Hemant: We know that there is an error. So to correct an error, we adjust the weights of the connection between neurons based on a calculation, which we call as backpropagation algorithm. By showing thousands of images and adjusting the weights iteratively, ANN is able to predict correct outcome for most of the input images. This process of adjusting weights through backpropagation is called as model training.  09:48 Do you have an idea for a new course or learning opportunity? We’d love to hear it! Visit the Oracle University Learning Community and share your thoughts with us on the Idea Incubator. Your suggestion could find a place in future development projects! Visit mylearn.oracle.com to get started.  10:09 Nikita: Welcome back! Let’s move on to CNN. Hemant, what is a Convolutional Neural Network?  Hemant: CNN is a type of deep learning model specifically designed for processing and analyzing grid-like data, such as images and videos. In the ANN, the input image is converted to a single dimensional array and given as an input to the network.   But that does not work well with the image data because image data is inherently two dimensional. CNN works better with two dimensional data. The role of the CNN is to reduce the image into a form, which is easier to process and without losing features, which are critical for getting a good prediction.  10:53 Lois: A CNN has different layers, right? Could you tell us a bit about them?  Hemant: The first one is input layer. Input layer is followed by feature extraction layers, which is a combination and repetition of multiple feature extraction layers, including convolutional layer with ReLu activation and a pooling layer.  And this is followed by a classification layer. These are the fully connected output layers, where the classification occurs as output classes. The feature extraction layers play a vital role in image classification.   11:33 Nikita: Can you explain these layers with an example? Hemant: Let us say we have a robot to inspect a house and tell us what type of a house it is. It uses many tools for this purpose. The first tool is a blueprint detector. It scans different parts of the house, like walls, floors, or windows, and looks for specific patterns or features.  The second tool is a pattern highlighter. This tool marks areas detected by the blueprint detector. The next tool is a summarizer. It tries to capture the most significant features of every room. The next tool is house expert, which looks at all the highlighted patterns and features, and tries to understand the house.  The next tool is a guess maker. It assigns probabilities to the different possible house types. And finally, the quality checker randomly checks different parts of the analysis to make sure that the robot doesn't rely too much on any single piece of information.  12:40 Nikita: Ok, so how are you mapping these to the feature extraction layers?  Hemant: Similar to blueprint detector, we have a convolutional layer. This layer applies convolutional operations to the input image using small filters known as kernels.  Each filter slides across the input image to detect specific features, such as edges, corners, or textures. Similar to pattern highlighter, we have a activation function. The activation function allows the network to learn more complex and non-linear relationships in the data. Pooling layer is similar to room summarizer.  Pooling helps reduce the spatial dimensions of the feature maps generated by the convolutional layers. Similar to house expert, we have a fully connected layer, which is responsible for making final predictions or classifications based on the learned features. Softmax layer converts the output of the last fully connected layers into probability scores.  The class with the highest probability is the predicted class. This is similar to the guess maker. And finally, we have the dropout layer. This layer is a regularization technique used to prevent overfitting in the network. This has the same role as that of a quality checker.  14:05 Lois: Do CNNs have any limitations that we need to be aware of? Hemant: Training CNNs on large data sets can be computationally expensive and time consuming. CNNs are susceptible to overfitting, especially when the training data is limited or imbalanced. CNNs are considered black box models making it difficult to interpret.  And CNNs can be sensitive to small changes in the input leading to unstable predictions.  14:33 Nikita: And what are the top applications of CNN? Hemant: One of the most widely used applications of CNNs is image classification. For example, classifying whether an image contains a specific object, say cat or a dog.  CNNs are used for object detection tasks. The goal here is to draw bounding boxes around objects in an image. CNNs can perform pixel level segmentation, where each pixel in the image is labeled to represent different objects or regions. CNNs are employed for face recognition tasks as well, identifying and verifying individuals based on facial features.  CNNs are widely used in medical image analysis, helping with tasks like tumor detection, diagnosis, and classification of various medical conditions. CNNs play an important role in the development of self-driving cars, helping them to recognize and understand the road traffic signs, pedestrians, and other vehicles. And CNNs are applied in analyzing satellite images and remote sensing data for tasks, such as land cover classification and environmental monitoring.  15:50 Nikita: Hemant, let’s talk about sequence models. What are they and what are they used for? Hemant: Sequence models are used to solve problems, where the input data is in the form of sequences. The sequences are ordered lists of data points or events.  The goal in sequence models is to find patterns and dependencies within the data and make predictions, classifications, or even generate new sequences.  16:17 Lois: Can you give us some examples of sequence models?  Hemant: Some common examples of the sequence models are in natural language processing, deep learning models are used for tasks, such as machine translation, sentiment analysis, or text generation. In speech recognition, deep learning models are used to convert a recorded audio into text.  In deep learning models, can generate new music or create original compositions. Even sequences of hand gestures are interpreted by deep learning models for applications like sign language recognition. In fields like finance or weather prediction, time series data is used to predict future values.  17:03 Nikita: Which deep learning models can be used to work with sequence data?  Hemant: Recurrent Neural Networks, abbreviated as RNNs, are a class of neural network architectures specifically designed to handle sequential data. Unlike traditional feedforward neural network, RNNs have a feedback loop that allows information to persist across different timesteps.  The key features of RNN is their ability to maintain an internal state often referred to as a hidden state or memory, which is updated as the network processes each element in the input sequence. The hidden state is then used as input to the network for the next time step, allowing the model to capture dependencies and patterns in the data that are spread across time.  17:58 Nikita: Are there various types of RNNs? Hemant: There are different types of RNN architecture based on application.  One of them is one to one. This is like feed forward neural network and is not suited for sequential data. A one to many model produces multiple output values for one input value. Music generation or sequence generation are some applications using this architecture.  A many to one model produces one output value after receiving multiple input values. Example is sentiment analysis based on the review. Many to many model produces multiple output values for multiple input values. Examples are machine translation and named entity recognition.  RNN does not perform that well when it comes to capturing long term dependencies. This is due to the vanishing gradients problem, which is overcome by using LSTM model.  19:07 Lois: Another acronym. What is LSTM, Hemant? Hemant: Long Short-Term memory, abbreviated as LSTM, works by using a specialized memory cell and a gating mechanisms to capture long term dependencies in the sequential data.  The key idea behind LSTM is to selectively remember or forget information over time, enabling the model to maintain relevant information over long sequences, which helps overcome the vanishing gradients problem.  19:40 Nikita: Can you take us, step-by-step, through the working of LSTM?  Hemant: At each timestep, the LSTM takes an input vector representing the current data point in the sequence. The LSTM also receives the previous hidden state and cell state. These represent what the LSTM has remembered and forgotten up to the current point in the sequence.  The core of the LSTM lies in its gating mechanisms, which include three gates: the input gate, the forget gate, and the output gate. These gates are like the filters that control the flow of information within the LSTM cell. The input gate decides what new information from the current input should be added to the memory cell.  The forget gate determines what information in the current memory cell should be discarded or forgotten. The output gate regulates how much of the current memory cell should be exposed as the output of the current time step. Using the information from the input gate and forget gate, the LSTM updates its cell state. The LSTM then uses the output gate to produce the current hidden state, which becomes the output of the LSTM for the next time step.  21:12 Lois: Thank you, Hemant, for joining us in this episode of the Oracle University Podcast. I learned so much today. If you want to learn more about deep learning, visit mylearn.oracle.com and search for the Oracle Cloud Infrastructure AI Foundations course. And remember, the AI Foundations course and certification are free. So why not get started now? Nikita: Right, Lois. In our next episode, we will discuss generative AI and language learning models. Until then, this is Nikita Abraham… Lois: And Lois Houston signing off! 21:45 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
22m
13/02/2024

Everything You Need to Know About the MySQL HeatWave Implementation Associate Certification

What is MySQL HeatWave? How do I get certified in it? Where do I start? Listen to Lois Houston and Nikita Abraham, along with MySQL Developer Scott Stroz, answer all these questions and more on this week's episode of the Oracle University Podcast. MySQL Document Store: https://oracleuniversitypodcast.libsyn.com/mysql-document-store Oracle MyLearn: https://mylearn.oracle.com/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X (formerly Twitter): https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this  series of informative podcasts, we’ll bring you foundational training on the most popular  Oracle technologies. Let’s get started! 00:26 Nikita: Welcome to the Oracle University Podcast! I’m Nikita Abraham, Principal Technical Editor with Oracle University, and with me is Lois Houston, Director of Innovation Programs. Lois: Hi there! For the last two weeks, we’ve been having really exciting discussions on everything AI. We covered the basics of artificial intelligence and machine learning, and we’re taking a short break from that today to talk about the new MySQL HeatWave Implementation Associate Certification with MySQL Developer Advocate Scott Stroz. 00:59 Nikita: You may remember Scott from an episode last year where he came on to discuss MySQL Document Store. We’ll post the link to that episode in the show notes so you can listen to it if you haven’t already. Lois: Hi Scott! Thanks for joining us again. Before diving into the certification, tell us, what is MySQL HeatWave?  01:19 Scott: Hi Lois, Hi Niki. I’m so glad to be back. So, MySQL HeatWave Database Service is a fully managed database that is capable of running transactional and analytic queries in a single database instance. This can be done across data warehouses and data lakes. We get all the benefits of analytic queries without the latency and potential security issues of performing standard extract, transform, and load, or ETL, operations. Some other MySQL HeatWave database service features are automated system updates and database backups, high availability, in-database machine learning with AutoML, MySQL Autopilot for managing instance provisioning, and enhanced data security.  HeatWave is the only cloud database service running MySQL that is built, managed, and supported by the MySQL Engineering team. 02:14 Lois: And where can I find MySQL HeatWave? Scott: MySQL HeatWave is only available in the cloud. MySQL HeatWave instances can be provisioned in Oracle Cloud Infrastructure or OCI, Amazon Web Services (AWS), and Microsoft Azure. Now, some features though are only available in Oracle Cloud, such as access to MySQL Document Store. 02:36 Nikita: Scott, you said MySQL HeatWave runs transactional and analytic queries in a single instance. Can you elaborate on that? Scott: Sure, Niki. So, MySQL HeatWave allows developers, database administrators, and data analysts to run transactional queries (OLTP) and analytic queries (OLAP).  OLTP, or online transaction processing, allows for real-time execution of database transactions. A transaction is any kind of insertion, deletion, update, or query of data. Most DBAs and developers work with this kind of processing in their day-to-day activities.   OLAP, or online analytical processing, is one way to handle multi-dimensional analytical queries typically used for reporting or data analytics. OLTP system data must typically be exported, aggregated, and imported into an OLAP system. This procedure is called ETL as I mentioned – extract, transform, and load. With large datasets, ETL processes can take a long time to complete, so analytic data could be “old” by the time it is available in an OLAP system. There is also an increased security risk in moving the data to an external source. 03:56 Scott: MySQL HeatWave eliminates the need for time-consuming ETL processes. We can actually get real-time analytics from our data since HeatWave allows for OLTP and OLAP in a single instance. I should note, this also includes analytic from JSON data that may be stored in the database. Another advantage is that applications can use MySQL HeatWave without changing any of the application code. Developers only need to point their applications at the MySQL HeatWave databases. MySQL HeatWave is fully compatible with on-premise MySQL instances, which can allow for a seamless transition to the cloud. And one other thing. When MySQL HeatWave has OLAP features enabled, MySQL can determine what type of query is being executed and route it to either the normal database system or the in-memory database. 04:52 Lois: That’s so cool! And what about the other features you mentioned, Scott? Automated updates and backups, high availability… Scott: Right, Lois. But before that, I want to tell you about the in-memory query accelerator. MySQL HeatWave offers a massively parallel, in-memory hybrid columnar query processing engine. It provides high performance by utilizing algorithms for distributed query processing. And this query processing in MySQL HeatWave is optimized for cloud environments.  MySQL HeatWave can be configured to automatically apply system updates, so you will always have the latest and greatest version of MySQL. Then, we have automated backups. By this, I mean MySQL HeatWave can be configured to provide automated backups with point-in-time recovery to ensure data can be restored to a particular date and time. MySQL HeatWave also allows us to define a retention plan for our database backups, that means how long we keep the backups before they are deleted. High availability with MySQL HeatWave allows for more consistent uptime. When using high availability, MySQL HeatWave instances can be provisioned across multiple availability domains, providing automatic failover for when the primary node becomes unavailable. All availability domains within a region are physically separated from each other to mitigate the possibility of a single point of failure. 06:14 Scott: We also have MySQL Lakehouse. Lakehouse allows for the querying of data stored in object storage in various formats. This can be CSV, Parquet, Avro, or an export format from other database systems. And basically, we point Lakehouse at data stored in Oracle Cloud, and once it’s ingested, the data can be queried just like any other data in a database. Lakehouse supports querying data up to half a petabyte in size using the HeatWave engine. And this allows users to take advantage of HeatWave for non-MySQL workloads. MySQL AutoPilot is a part of MySQL HeatWave and can be used to predict the number of HeatWave nodes a system will need and automatically provision them as part of a cluster. AutoPilot has features that can handle automatic thread pooling and database shape predicting. A “shape” is one of the many different CPU, memory, and ethernet traffic configurations available for MySQL HeatWave. MySQL HeatWave includes some advanced security features such as asymmetric encryption and automated data masking at query execution. As you can see, there are a lot of features covered under the HeatWave umbrella! 07:31 Did you know that Oracle University offers free courses on Oracle Cloud Infrastructure? You’ll find training on everything from cloud computing, database, and security to artificial intelligence and machine learning, all free to subscribers. So, what are you waiting for? Pick a topic, leverage the Oracle University Learning Community to ask questions, and then sit for your certification. Visit mylearn.oracle.com to get started.  08:02 Nikita: Welcome back! Now coming to the certification, who can actually take this exam, Scott? Scott: The MySQL HeatWave Implementation Associate Certification Exam is designed specifically for administrators and data scientists who want to provision, configure, and manage MySQL HeatWave for transactions, analytics, machine learning, and Lakehouse. 08:22 Nikita: Can someone who’s just graduated, say an engineering graduate interested in data analytics, take this certification? Are there any prerequisites? What are the career prospects for them? Scott: There are no mandatory prerequisites, but anyone who wants to take the exam should have experience with MySQL HeatWave and other aspects of OCI, such as virtual cloud networks and identity and security processes. Also, the learning path on MyLearn will be extremely helpful when preparing for the exam, but you are not required to complete the learning path before registering for the exam. The exam focuses more on getting MySQL HeatWave running (and keeping it running) than accessing the data. That doesn’t mean it is not helpful for someone interested in data analytics. I think it can be helpful for data analysts to understand how the system providing the data functions, even if it is at just a high level. It is also possible that data analysts might be responsible for setting up their own systems and importing and managing their own data. 09:23 Lois: And how do I get started if I want to get certified on MySQL HeatWave? Scott: So, you’ll first need to go to mylearn.oracle.com and look for the “Become a MySQL HeatWave Implementation Associate” learning path. The learning path consists of over 10 hours of training across 8 different courses.  These courses include “Getting Started with MySQL HeatWave Database Service,” which offers an introduction to some Oracle Cloud functionality such as security and networking, as well as showing one way to connect to a MySQL HeatWave instance. Another course demonstrates how to configure MySQL instances and copy that configuration to other instances. Other courses cover how to migrate data into MySQL HeatWave, set up and manage high availability, and configure HeatWave for OLAP. You’ll find labs where you can perform hands-on activities, student and activity guides, and skill checks to test yourself along the way. And there’s also the option to Ask the Instructor if you have any questions you need answers to. You can also access the Oracle University Learning Community and discuss topics with others on the same journey. The learning path includes a practice exam to check your readiness to pass the certification exam. 10:33 Lois: Yeah, and remember, access to the entire learning path is free so there’s nothing stopping you from getting started right away. Now Scott, what does the certification test you on? Scott: The MySQL HeatWave Implementation exam, which is an associate-level exam, covers various topics. It will validate your ability to identify key features and benefits of MySQL HeatWave and describe the MySQL HeatWave architecture; identify Virtual Cloud Network (VCN) requirements and the different methods of connecting to a MySQL HeatWave instance; manage the automatic backup process and restore database systems from these backups; configure and manage read replicas and inbound replication channels; import data into MySQL HeatWave; configure and manage high availability and clustering of MySQL HeatWave instances. I know this seems like a lot of different topics. That is why we recommend anyone interested in the exam follow the learning path. It will help make sure you have the exposure to all the topics that are covered by the exam. 11:35 Lois: Tell us more about the certification process itself. Scott: While the courses we already talked about are valuable when preparing for the exam, nothing is better than hands-on experience. We recommend that candidates have hands-on experience with MySQL HeatWave with real-world implementations. The format of the exam is Multiple Choice. It is 90 minutes long and consists of 65 questions. When you’ve taken the recommended training and feel ready to take the certification exam, you need to purchase the exam and register for it. You go through the section on things to do before the exam and the exam policies, and then all that’s left to do is schedule the date and time of the exam according to when is convenient for you. 12:16 Nikita: And once you’ve finished the exam? Scott: When you’re done your score will be displayed on the screen when you finish the exam. You will also receive an email indicating whether you passed or failed. You can view your exam results and full score report in Oracle CertView, Oracle’s certification portal. From CertView, you can download and print your eCertificate and even share your newly earned badge on places like Facebook, Twitter, and LinkedIn. 12:38 Lois: And for how long does the certification remain valid, Scott? Scott: There is no expiration date for the exam, so the certification will remain valid for as long as the material that is covered remains relevant.  12:49 Nikita: What’s the next step for me after I get this certification? What other training can I take? Scott: So, because this exam is an associate level exam, it is kind of a stepping stone along a person’s MySQL training. I do not know if there are plans for a professional level exam for HeatWave, but Oracle University has several other training programs that are MySQL-specific. There are learning paths to help prepare for the MySQL Database Administrator and MySQL Database Developer exams. As with the HeatWave learning paths, the learning paths for these exams include video tutorials, hands-on activities, skill checks, and practice exams. 13:27 Lois: I think you’ve told us everything we need to know about this certification, Scott. Are there any parting words you might have? Scott: We know that the whole process of training and getting certified may seem daunting, but we’ve really tried to simplify things for you with the “Become a MySQL HeatWave Implementation Associate” learning path. It not only prepares you for the exam but also gives you experience with features of MySQL HeatWave that will surely be valuable in your career. 13:51 Lois: Thanks so much, Scott, for joining us today. Nikita: Yeah, we’ve had a great time with you. Scott: Thanks for having me. Lois: Next week, we’ll get back to our focus on AI with a discussion on deep learning. Until then, this is Lois Houston… Nikita: And Nikita Abraham, signing off. 14:07 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University  Podcast.
14m
06/02/2024

Machine Learning

Does machine learning feel like too convoluted a topic? Not anymore! Listen to hosts Lois Houston and Nikita Abraham, along with Senior Principal OCI Instructor Hemant Gahankari, talk about foundational machine learning concepts and dive into how supervised learning, unsupervised learning, and reinforcement learning work. Oracle MyLearn: https://mylearn.oracle.com/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X (formerly Twitter): https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Himanshu Raj, and the OU Studio Team for helping us create this episode. --------------------------------------------------------- Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this  series of informative podcasts, we’ll bring you foundational training on the most popular  Oracle technologies. Let’s get started!  00:26 Lois: Hello and welcome to the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal  Technical Editor. Nikita: Hi everyone! Last week, we went through the basics of artificial intelligence and we’re going to take it a step further today by talking about some foundational machine learning concepts. After that, we’ll discuss the three main types of machine learning models: supervised learning, unsupervised learning, and reinforcement learning. 00:57 Lois: Hemant Gahankari, a Senior Principal OCI Instructor, joins us for this episode. Hi Hemant! Let’s dive right in. What is machine learning? How does it work? Hemant: Machine learning is a subset of artificial intelligence that focuses on creating computer systems that can learn and predict outcomes from given examples without being explicitly programmed. It is powered by algorithms that incorporate intelligence into machines by automatically learning from a set of examples usually provided as data. 01:34 Nikita: Give us a few examples of machine learning… so we can see what it can do for us. Hemant: Machine learning is used by all of us in our day-to-day life. When we shop online, we get product recommendations based on our preferences and our shopping history. This is powered by machine learning. We are notified about movies recommendations based on our viewing history and choices of other similar viewers. This too is driven by machine learning. While browsing emails, we are warned of a spam mail because machine learning classifies whether the mail is spam or not based on its content. In the increasingly popular self-driving cars, machine learning is responsible for taking the car to its destination. 02:24 Lois: So, how does machine learning actually work? Hemant: Let us say we have a computer and we need to teach the computer to differentiate between a cat and a dog. We do this by describing features of a cat or a dog. Dogs and cats have distinguishing features. For example, the body color, texture, eye color are some of the defining features which can be used to differentiate a cat from a dog. These are collectively called as input data. We also provide a corresponding output, which is called as a label, which can be a dog or a cat in this case. By describing a specific set of features, we can say that it is a cat or a dog. Machine learning model is first trained with the data set. Training data set consists of a set of features and output labels, and is given as an input to the machine learning model. During the process of training, machine learning model learns the relation between input features and corresponding output labels from the provided data. Once the model learns from the data, we have a trained model. Once the model is trained, it can be used for inference. Inference is a process of getting a prediction by giving a data point. In this example, we input features of a cat or a dog, and the trained model predicts the output that is a cat or a dog label. The types of machine learning models depend on whether we have a labeled output or not. 04:08 Nikita: Oh, there are different types of machine learning models? Hemant: In general, there are three types of machine learning approaches. In supervised machine learning, labeled data is used to train the model. Model learns the relation between features and labels. Unsupervised learning is generally used to understand relationships within a data set. Labels are not used or are not available. Reinforcement learning uses algorithms that learn from outcomes to make decisions or choices. 04:45 Lois: Ok…supervised learning, unsupervised learning, and reinforcement learning. Where do we use each of these machine learning models? Hemant: Some of the popular applications of supervised machine learning are disease detection, weather forecasting, stock price prediction, spam detection, and credit scoring. For example, in disease detection, the patient data is input to a machine learning model, and machine learning model predicts if a patient is suffering from a disease or not. For unsupervised machine learning, some of the most common real-time applications are to detect fraudulent transactions, customer segmentation, outlier detection, and targeted marketing campaigns. So for example, given the transaction data, we can look for patterns that lead to fraudulent transactions. Most popular among reinforcement learning applications are automated robots, autonomous driving cars, and playing games. 05:51 Nikita: I want to get into how each type of machine learning works. Can we start with supervised learning? Hemant: Supervised learning is a machine learning model that learns from labeled data. The model learns the mapping between the input and the output. As a house price predictor model, we input house size in square feet and model predicts the price of a house. Suppose we need to develop a machine learning model for detecting cancer, the input to the model would be the person's medical details, the output would be whether the tumor is malignant or not. 06:29 Lois: So, that mapping between the input and output is fundamental in supervised learning. Hemant: Supervised learning is similar to a teacher teaching student. The model is trained with the past outcomes and it learns the relationship or mapping between the input and output. In supervised machine learning model, the outputs can be either categorical or continuous. When the output is continuous, we use regression. And when the output is categorical, we use classification. 07:05 Lois: We want to keep this discussion at a high level, so we’re not going to get into regression and classification. But if you want to learn more about these concepts and look at some demonstrations, visit mylearn.oracle.com. Nikita: Yeah, look for the Oracle Cloud Infrastructure AI Foundations course and you’ll find a lot of resources that you can make use of. 07:30 The Oracle University Learning Community is an excellent place to collaborate and learn with Oracle experts and fellow learners. Grow your skills, inspire innovation, and celebrate your successes. All your activities, from liking a post to answering questions and sharing with others, will help you earn a valuable reputation, badges, and ranks to be recognized in the community. Visit mylearn.oracle.com to get started.  07:58 Nikita: Welcome back! So that was supervised machine learning. What about unsupervised machine learning, Hemant? Hemant: Unsupervised machine learning is a type of machine learning where there are no labeled outputs. The algorithm learns the patterns and relationships in the data and groups similar data items. In unsupervised machine learning, the patterns in the data are explored explicitly without being told what to look for. For example, if you give a set of different-colored LEGO pieces to a child and ask to sort it, it may the LEGO pieces based on any patterns they observe. It could be based on same color or same size or same type. Similarly, in unsupervised learning, we group unlabeled data sets. One more example could be-- say, imagine you have a basket of various fruits-- say, apples, bananas, and oranges-- and your task is to group these fruits based on their similarities. You observe that some fruits are round and red, while others are elongated and yellow. Without being told explicitly, you decide to group the round and red fruits together as one cluster and the elongated and yellow fruits as another cluster. There you go. You have just performed an unsupervised learning task. 09:21 Lois: Where is unsupervised machine learning used? Can you take us through some use cases? Hemant: The first use case of unsupervised machine learning is market segmentation. In market segmentation, one example is providing the purchasing details of an online shop to a clustering algorithm. Based on the items purchased and purchasing behavior, the clustering algorithm can identify customers based on the similarity between the products purchased. For example, customers with a particular age group who buy protein diet products can be shown an advertisement of sports-related products. The second use case is on outlier analysis. One typical example for outlier analysis is to provide credit card purchase data for clustering. Fraudulent transactions can be detected by a bank by using outliers. In some transaction, amounts are too high or recurring. It signifies an outlier. The third use case is recommendation systems. An example for recommendation systems is to provide users' movie viewing history as input to a clustering algorithm. It clusters users based on the type or rating of movies they have watched. The output helps to provide personalized movie recommendations to users. The same applies for music recommendations also. 10:53 Lois: And finally, Hemant, let’s talk about reinforcement learning. Hemant: Reinforcement learning is like teaching a dog new tricks. You reward it when it does something right, and over time, it learns to perform these actions to get more rewards. Reinforcement learning is a type of Machine Learning that enables an agent to learn from its interaction with the environment, while receiving feedback in the form of rewards or penalties without any labeled data. Reinforcement learning is more prevalent in our daily lives than we might realize. The development of self-driving cars and autonomous drones rely heavily on reinforcement learning to make real time decisions based on sensor data, traffic conditions, and safety considerations. Many video games, virtual reality experiences, and interactive entertainment use reinforcement learning to create intelligent and challenging computer-controlled opponents. The AI characters in games learn from player interactions and become more difficult to beat as the game progresses. 12:05 Nikita: Hemant, take us through some of the terminology that’s used with reinforcement learning. Hemant: Let us say we want to train a self-driving car to drive on a road and reach its destination. For this, it would need to learn how to steer the car based on what it sees in front through a camera. Car and its intelligence to steer on the road is called as an agent. More formally, agent is a learner or decision maker that interacts with the environment, takes actions, and learns from the feedback received. Environment, in this case, is the road and its surroundings with which the car interacts. More formally, environment is the external system with which the agent interacts. It is the world or context in which the agent operates and receives feedback for its actions. What we see through a camera in front of a car at a moment is a state. State is a representation of the current situation or configuration of the environment at a particular time. It contains the necessary information for the agent to make decisions. The actions in this example are to drive left, or right, or keep straight. Actions are a set of possible moves or decisions that the agent can take in a given state. Actions have an impact on the environment and influence future states. After driving through the road many times, the car learns what action to take when it views a road through the camera. This learning is a policy. Formally, policy is a strategy or mapping that the agent uses to decide which action to take in a given state. It defines the agent's behavior and determines how it selects actions. 13:52 Lois: Ok. Say we’re talking about the training loop of reinforcement learning in the context of training a dog to learn tricks. We want it to pick up a ball, roll, sit… Hemant: Here the dog is an agent, and the place it receives training is the environment. While training the dog, you provide a positive reward signal if the dog picks it right and a warning or punishment if the dog does not pick up a trick. In due course, the dog gets trained by the positive rewards or negative punishments. The same tactics are applied to train a machine in the reinforcement learning. For machines, the policy is the brain of our agent. It is a function that tells what actions to take when in a given state. The goal of reinforcement learning algorithm is to find a policy that will yield a lot of rewards for the agent if the agent follows that policy referred to as the optimal policy. Through a process of learning from experiences and feedback, the agent becomes more proficient at making good decisions and accomplishing tasks. This process continues until eventually we end up with the optimal policy. The optimal policy is learned through training by using algorithms like Deep Q Learning or Q Learning. 15:19 Nikita: So through multiple training iterations, it gets better. That’s fantastic. Thanks, Hemant, for joining us today. We’ve learned so much from you. Lois: Remember, the course and certification are free, so if you’re interested, make sure you log in to mylearn.oracle.com and get going. Join us next week for another episode of the Oracle University Podcast. Until then, I’m Lois Houston… Nikita: And Nikita Abraham signing off! 15:48   That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
16m
30/01/2024

Introduction to Artificial Intelligence (AI)

You probably interact with artificial intelligence (AI) more than you realize. So, there’s never been a better time to start figuring out how it all works.   Join Lois Houston and Nikita Abraham as they decode the fundamentals of AI so that anyone, irrespective of their technical background, can leverage the benefits of AI and tap into its infinite potential.   Together with Senior Cloud Engineer Nick Commisso, they take you through key AI concepts, common AI tasks and domains, and the primary differences between AI, machine learning, and deep learning.   Oracle MyLearn: https://mylearn.oracle.com/   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X (formerly Twitter): https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Himanshu Raj, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:26 Nikita: Hello and welcome to the Oracle University Podcast. I’m Nikita Abraham, Principal Technical Editor with Oracle University, and with me is Lois Houston, Director of Innovation Programs. Lois: Hi there! Welcome to a new season of the Oracle University Podcast. I’m so excited about this season because we’re going to delve into the world of artificial intelligence. In upcoming episodes, we’ll talk about the fundamentals of artificial intelligence and machine learning. And we’ll discuss neural network architectures, generative AI and large language models, the OCI AI stack, and OCI AI services. 01:06 Nikita: So, if you’re an IT professional who wants to start learning about AI and ML or even if you’re a student who is familiar with OCI or similar cloud services, but have no prior exposure to this field, you’ll want to tune in to these episodes. Lois: That’s right, Niki. So, let’s get started. Today, we’ll talk about the basics of artificial intelligence with Senior Cloud Engineer Nick Commisso. Hi Nick! Thanks for joining us today. So, let’s start right at the beginning. What is artificial intelligence? 01:36 Nick: Well, the ability of machines to imitate the cognitive abilities and problem solving capabilities of human intelligence can be classified as artificial intelligence or AI.  01:47 Nikita: Now, when you say capabilities and abilities, what are you referring to? Nick: Human intelligence is the intellectual capability of humans that allows us to learn new skills through observation and mental digestion, to think through and understand abstract concepts and apply reasoning, to communicate using a language and understand the nonverbal cues, such as facial recognition, tone variation, and body language.  You can handle objections in real time, even in a complex setting. You can plan for short and long-term situations or projects. And, of course, you can create music and art or invent something new like an original idea.  If you can replicate any of these human capabilities in machines, this is artificial general intelligence or AGI. So in other words, AGI can mimic human sensory and motor skills, performance, learning, and intelligence, and use these abilities to carry out complicated tasks without human intervention.  When we apply AGI to solve problems with specific and narrow objectives, we call it artificial intelligence or AI.  02:55 Lois: It seems like AI is everywhere, Nick. Can you give us some examples of where AI is used? Nick: AI is all around us, and you've probably interacted with AI, even if you didn't realize it. Some examples of AI can be viewing an image or an object and identifying if that is an apple or an orange. It could be examining an email and classifying it spam or not. It could be writing computer language code or predicting the price of an older car.  So let's get into some more specifics of AI tasks and the nature of related data. Machine learning, deep learning, and data science are all associated with AI, and it can be confusing to distinguish.  03:36 Nikita: Why do we need AI? Why’s it important?  Nick: AI is vital in today's world, and with the amount of data that's generated, it far exceeds the human ability to absorb, interpret, and actually make decisions based on that data. That's where AI comes in handy by enhancing the speed and effectiveness of human efforts.  So here are two major reasons why we need AI. Number one, we want to eliminate or reduce the amount of routine tasks, and businesses have a lot of routine tasks that need to be done in large numbers. So things like approving a credit card or a bank loan, processing an insurance claim, recommending products to customers are just some example of routine tasks that can be handled.  And second, we, as humans, need a smart friend who can create stories and poems, designs, create code and music, and have humor, just like us.  04:33 Lois: I’m onboard with getting help from a smart friend! There are different domains in AI, right, Nick?  Nick: We have language for language translation; vision, like image classification; speech, like text to speech; product recommendations that can help you cross-sell products; anomaly detection, like detecting fraudulent transactions; learning by reward, like self-driven cars. You have forecasting with weather forecasting. And, of course, generating content like image from text.  05:03 Lois: There are so many applications. Nick, can you tell us more about these commonly used AI domains like language, audio, speech, and vision? Nick: Language-related AI tasks can be text related or generative AI. Text-related AI tasks use text as input, and the output can vary depending on the task. Some examples include detecting language, extracting entities in a text, or extracting key phrases and so on.  Consider the example of translating text. There's many text translation tools where you simply type or paste your text into a given text box, choose your source and target language, and then click translate.  Now, let's look at the generative AI tasks. They are generative, which means the output text is generated by a model. Some examples are creating text like stories or poems, summarizing a text, answering questions, and so on. Let's take the example of ChatGPT, the most well-known generative chat bot. These bots can create responses from their training on large language models, and they continuously grow through machine learning.  06:10 Nikita: What can you tell us about using text as data? Nick: Text is inherently sequential, and text consists of sentences. Sentences can have multiple words, and those words need to be converted to numbers for it to be used to train language models. This is called tokenization. Now, the length of sentences can vary, and all the sentences lengths need to be made equal. This is done through padding.  Words can have similarities with other words, and sentences can also be similar to other sentences. The similarity can be measured through dot similarity or cosine similarity. We need a way to indicate that similar words or sentences may be close by. This is done through representation called embedding.  06:56 Nikita: And what about language AI models? Nick: Language AI models refer to artificial intelligence models that are specifically designed to understand, process, and generate natural language. These models have been trained on vast amounts of textual data that can perform various natural language processing or NLP tasks.  The task that needs to be performed decides the type of input and output. The deep learning model architectures that are typically used to train models that perform language tasks are recurrent neural networks, which processes data sequentially and stores hidden states, long short-term memory, which processes data sequentially that can retain the context better through the use of gates, and transformers, which processes data in parallel. It uses the concept of self-attention to better understand the context.  07:48 Lois: And then there’s speech-related AI, right? Nick: Speech-related AI tasks can be either audio related or generative AI. Speech-related AI tasks use audio or speech as input, and the output can vary depending on the task. For example, speech-to-text conversion or speaker recognition, voice conversion, and so on. Generative AI tasks are generative in nature, so the output audio is generated by a model. For example, you have music composition and speech synthesis.  Audio or speech is digitized as snapshots taken in time. The sample rate is the number of times in a second an audio sample is taken. Most digital audio have a sampling rate of 44.1 kilohertz, which is also the sampling rate for audio CDs.  Multiple samples need to be correlated to make sense of the data. For example, listening to a song for a fraction of a second, you won't be able to infer much about the song, and you'll probably need to listen to it a little bit longer.  Audio and speech AI models are designed to process and understand audio data, including spoken language. These deep-learning model architectures are used to train models that perform language with tasks-- recurrent neural networks, long short-term memory, transformers, variational autoencoders, waveform models, and Siamese networks. All of the models take into consideration the sequential nature of audio.  09:21 Did you know that Oracle University offers free courses on Oracle Cloud Infrastructure? You’ll find training on everything from cloud computing, database, and security to artificial intelligence and machine learning, all free to subscribers. So, what are you waiting for? Pick a topic, leverage the Oracle University Learning Community to ask questions, and then sit for your certification. Visit mylearn.oracle.com to get started.  09:49 Nikita: Welcome back! Now that we’ve covered language and speech-related tasks, let’s move on to vision-related tasks. Nick: Vision-related AI tasks could be image related or generative AI. Image-related AI tasks will use an image as an input, and the output depends on the task. Some examples are classifying images, identifying objects in an image, and so on. Facial recognition is one of the most popular image-related tasks that is often used for surveillance and tracking of people in real time, and it's used in a lot of different fields, including security, biometrics, law enforcement, and social media.  For generative AI tasks, the output image is generated by a model. For example, creating an image from a contextual description, generating images of a specific style or a high resolution, and so on. It can create extremely realistic new images and videos by generating original 3D models of an object, machine components, buildings, medication, people, and even more.  10:53 Lois: So, then, here again I need to ask, how do images work as data? Nick: Images consist of pixels, and pixels can be either grayscale or color. And we can't really make out what an image is just by looking at one pixel.  The task that needs to be performed decides the type of input needed and the output produced. Various architectures have evolved to handle this wide variety of tasks and data. These deep-learning model architectures are typically used to train models that perform vision tasks-- convolutional neural networks, which detects patterns in images; learning hierarchical representations of visual features; YOLO, which is You Only Look Once, processes the image and detects objects within the image; and then you have generative adversarial networks, which generates real-looking images.  11:43 Nikita: Nick, earlier you mentioned other AI tasks like anomaly detection, recommendations, and forecasting. Could you tell us more about them? Nick: Anomaly detection. This is time-series data, which is required for anomaly detection, and it can be a single or multivariate for fraud detection, machine failure, etc.  Recommendations. You can recommend products using data of similar products or users. For recommendations, data of similar products or similar users is required.  Forecasting. Time-series data is required for forecasting and can be used for things like weather forecasting and predicting the stock price.  12:22 Lois: Nick, help me understand the difference between artificial intelligence, machine learning, and deep learning. Let’s start with AI.  Nick: Imagine a self-driving car that can make decisions like a human driver, such as navigating traffic or detecting pedestrians and making safe lane changes. AI refers to the broader concept of creating machines or systems that can perform tasks that typically require human intelligence. Next, we have machine learning or ML. Visualize a spam email filter that learns to identify and move spam emails to the spam folder, and that's based on the user's interaction and email content. Now, ML is a subset of AI that focuses on the development of algorithms that enable machines to learn from and make predictions or decisions based on data.  To understand what an algorithm is in the context of machine learning, it refers to a specific set of rules, mathematical equations, or procedures that the machine learning model follows to learn from data and make predictions on. And finally, we have deep learning or DL. Think of an image recognition software that can identify specific objects or animals within images, such as recognizing cats in photos on the internet. DL is a subfield of ML that uses neural networks with many layers, deep neural networks, to learn and make sense of complex patterns in data.  13:51 Nikita: Are there different types of machine learning? Nick: There are several types of machine learning, including supervised learning, unsupervised learning, and reinforcement learning. Supervised learning where the algorithm learns from labeled data, making predictions or classifications. Unsupervised learning is an algorithm that discovers patterns and structures in unlabeled data, such as clustering or dimensionality reduction. And then, you have reinforcement learning, where agents learn to make predictions and decisions by interacting with an environment and receiving rewards or punishments.  14:27 Lois: Can we do a deep dive into each of these types you just mentioned? We can start with the supervised machine learning algorithm. Nick: Let's take an example of how a credit card company would approve a credit card. Once the application and documents are submitted, a verification is done, followed by a credit score check and another 10 to 15 days for approval. And how is this done? Sometimes, purely manually or by using a rules engine where you can build rules, give new data, get a decision.  The drawbacks are slow. You need skilled people to build and update rules, and the rules keep changing. The good thing is that the businesses had a lot of insight as to how the decisions were made. Can we build rules by looking at the past data?  We all learn by examples. Past data is nothing but a set of examples. Maybe reviewing past credit card approval history can help. Through a process of training, a model can be built that will have a specific intelligence to do a specific task. The heart of training a model is an algorithm that incrementally updates the model by looking at the data samples one by one.  And once it's built, the model can be used to predict an outcome on a new data. We can train the algorithm with credit card approval history to decide whether to approve a new credit card. And this is what we call supervised machine learning. It's learning from labeled data.  15:52 Lois: Ok, I see. What about the unsupervised machine learning algorithm? Nick: Data does not have a specific outcome or a label as we know it. And sometimes, we want to discover trends that the data has for potential insights. Similar data can be grouped into clusters. For example, retail marketing and sales, a retail company may collect information like household size, income, location, and occupation so that the suitable clusters could be identified, like a small family or a high spender and so on. And that data can be used for marketing and sales purposes.  Regulating streaming services. A streaming service may collect information like viewing sessions, minutes per session, number of unique shows watched, and so on. That can be used to regulate streaming services. Let's look at another example. We all know that fruits and vegetables have different nutritional elements. But do we know which of those fruits and vegetables are similar nutritionally?  For that, we'll try to cluster fruits and vegetables' nutritional data and try to get some insights into it. This will help us include nutritionally different fruits and vegetables into our daily diets. Exploring patterns and data and grouping similar data into clusters drives unsupervised machine learning.  17:13 Nikita: And then finally, we come to the reinforcement learning algorithm.  Nick: How do we learn to play a game, say, chess? We'll make a move or a decision, check to see if it's the right move or feedback, and we'll keep the outcomes in your memory for the next step you take, which is learning. Reinforcement learning is a machine learning approach where a computer program learns to make decisions by trying different actions and receiving feedback. It teaches agents how to solve tasks by trial and error. This approach is used in autonomous car driving and robots as well.  17:46 Lois: We keep coming across the term “deep learning.” You’ve spoken a bit about it a few times in this episode, but what is deep learning, really? How is it related to machine learning? Nick: Deep learning is all about extracting features and rules from data. Can we identify if an image is a cat or a dog by looking at just one pixel? Can we write rules to identify a cat or a dog in an image? Can the features and rules be extracted from the raw data, in this case, pixels?  Deep learning is really useful in this situation. It's a special kind of machine learning that trains super smart computer networks with lots of layers. And these networks can learn things all by themselves from pictures, like figuring out if a picture is a cat or a dog.  18:28 Lois: I know we’re going to be covering this in detail in an upcoming episode, but before we let you go, can you briefly tell us about generative AI? Nick: Generative AI, a subset of machine learning, creates diverse content like text, audio, images, and more. These models, often powered by neural networks, learn patterns from existing data to craft fresh and creative output. For instance, ChatGPT generates text-based responses by understanding patterns in text data that it's been trained on. Generative AI plays a vital role in various AI tasks requiring content creation and innovation.  19:07 Nikita: Thank you, Nick, for sharing your expertise with us. To learn more about AI, go to mylearn.oracle.com and search for the Oracle Cloud Infrastructure AI Foundations course. As you complete the course, you’ll find skill checks that you can attempt to solidify your learning.  Lois: And remember, the AI Foundations course on MyLearn also prepares you for the Oracle Cloud Infrastructure 2023 AI Foundations Associate certification. Both the course and the certification are free, so there’s really no reason NOT to take the leap into AI, right Niki? Nikita: That’s right, Lois! Lois: In our next episode, we will look at the fundamentals of machine learning. Until then, this is Lois Houston… Nikita: And Nikita Abraham signing off! 19:52 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
20m
23/01/2024

Everything You Need to Know to Get Certified on Oracle Autonomous Database

How do I get certified in Oracle Autonomous Database? What material can I use to prepare for it? What's the exam like? How long is the certification valid for?   If these questions have been keeping you up at night, then join Lois Houston and Nikita Abraham in their conversation with Senior Principal OCI Instructor Susan Jang to understand the process of getting certified and begin your learning adventure.   Oracle MyLearn: mylearn.oracle.com/ Oracle University Learning Community: education.oracle.com/ou-community LinkedIn: linkedin.com/showcase/oracle-university/ X (formerly Twitter): twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:26 Lois: Hello and welcome to the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor. Nikita: Hi everyone! If you’ve listened to us these last few weeks, you’ll know we’ve been discussing Oracle Autonomous Database in detail. We looked at Autonomous Database on serverless and dedicated infrastructure. 00:51 Lois: That’s right, Niki. Then, last week, we explored Autonomous Database tools. Today, we thought we’d wrap up our focus on Autonomous Database by talking about the training offered by Oracle University, the associated certification, how to prepare for it, what you should do next, and more. Nikita: Yeah, we’ll get answers to all the big questions. And we’re going to get them from Susan Jang. Sue is a Senior Principal OCI Instructor with Oracle University. She has created and delivered training in Oracle databases and Oracle Cloud Infrastructure for over 20 years. Hi Sue! Thanks for joining us today. Sue: Happy to be here! 01:29 Lois: Sue, what training does Oracle have on Autonomous Database?   Sue: Oracle University offers a professional-level course called the Oracle Autonomous Database Administration Workshop. So, if you want to learn to deploy and administer autonomous databases, this is the one for you. You’ll explore the fundamentals of the autonomous databases, their features, and benefits. You’ll learn about the technical architecture, the tasks that are involved in creating an autonomous database on a shared and on a dedicated Exadata infrastructure. You’ll discover what is the Machine Learning, you’ll discover what is APEX, which is Application Express, and SQL Developer Web, which is all deployed with the Autonomous Database. So basically everything you need to take your skills to the next level and become a proficient database administrator is in this course. 02:28 Nikita: Who can take this course, Sue?    Sue: The course is really for anyone interested in Oracle Autonomous Database, whether you’re a database administrator, a cloud data management professional, or a consultant. The topics in the course include everything from the features of an Autonomous Database through provisioning, managing, and monitor of the database. Most people think that just because it is an Autonomous Database, Oracle will do everything for you, and there is nothing a DBA can do or needs to do. But that’s not true.   An Oracle Autonomous Database automates the day-to-day DBA tasks, like tuning the database to ensure it is running at performance level or that the backups are done successfully. By letting the Autonomous Database perform those tasks, it gives the database administrator time to fully understand the new features of an Oracle database and figure out how to implement the features that will benefit the DBA’s company. 03:30 Lois: Would a non-database administrator benefit from taking this course?   Sue: Yes, Lois. Oracle courses are designed in modules, so you can focus on the modules that meet your needs. For example, if you’re a senior technical manager, you may not need to manage and monitor the Autonomous Database. But still, it’s important to understand its features and architecture to know how other Oracle products integrate with the database. 03:57 Nikita: Right. Talking about the course itself, each module consists of videos that teach different concepts, right? Sue: Yes, Niki. Each video covers one topic. A group of topics, or I should say a group of related topics, makes up a module. We know your time is important to you, and your success is important to us. You don’t just want to spend time taking training. You want to know that you’re really understanding the concepts of what you are learning.  So to help you do this, we have skill checks at the end of most modules. You must successfully answer 80% of the questions to pass these knowledge checks. These checks are an excellent way to ensure that you’re on the right track and have the understanding of each module before you move on to the next one.  04:48   Lois: That’s great. And are there any other resources to help reinforce what’s been learned?   Sue: I grew up with this phrase from my Mom. Education was her career. I remember hearing, “I hear and I forget. I see and I remember. I do and I understand.” It’s important to us that you understand the concepts and can actually “do” or “perform” the tasks.  You'll find several demos in the different modules of the Autonomous Database Administration Workshop. These videos are where the instructor shows you how to perform the tasks so you can reinforce what you learned in the lessons. You’ll find demos on provisioning an autonomous database, creating an autonomous database clone, and configuring disaster recovery, and lots more.   Oracle also has what we call LiveLabs. These are a series of hands-on tutorials with step-by-step instructions to guide you through performing the tasks. 05:49 Nikita: I love the idea of LiveLabs. You can follow instructions on how to perform administrative tasks and then practice doing that on your own. Lois: Yeah, that’s fantastic. OK Sue, say I’ve taken the course. What do I do next?  Sue: Well, after you’ve taken the course, you’ll want to demonstrate your expertise with a certification. Because you want to get that better job. You want to increase your earning potential. You need to take the certification called the Oracle Autonomous Database Cloud Professional. We have a couple of resources to help you along the way to ensure you succeed in securing that certification. In MyLearn, the Oracle University online learning platform, you’ll see that the course, Oracle Autonomous Database Administration Workshop, falls within a learning path called Become an Oracle Autonomous Database Cloud Professional. The course is the first section of this learning path. The next section is a video describing the certification exam and how to prepare for it. The section after that is a practice exam. Now, though it doesn’t have the actual questions, you’ll find the exam will give you a good idea of the type of questions that will be asked in the exam.  07:10 Lois: OK, so now I’ve done all that, and I’m ready to validate my knowledge and expertise. Tell me more about the certification, Sue. Sue: To get the certification, you must take an online exam. The duration of the exam is 90 minutes. It’s a Multiple Choice format, and there are 60 questions to the exam.  By getting this certification, you’re demonstrating to the world that you have the knowledge to provision, manage, and monitor, as well as migrate workloads to the Autonomous Database, on both a shared as well as a dedicated Exadata infrastructure. You will show you have the understanding of the architect of the Autonomous Database and can successfully use the features as well as its workflow, and you are capable of using Autonomous Database tools in developing an Autonomous Database. 08:05 Nikita: Great! So what do I need to do to take the exam? Sue: We assume you’ve already taken the course (making sure that you’re up to date with the training), that you’ve taken the time to study the topics in depth rather than memorizing superficial information just to pass the exam, looked at the available preparation material, and you’ve also taken the practice exam. I highly recommend that you have the hands-on experience or practice on an Autonomous Database before you take the certification exam. 08:38 Nikita: Hold on, Sue. You said to make sure we’re up to date with the training. How do I do that? Sue: Technology is ever-changing, and at Oracle, we continually enhance our products to provide features that make them faster or more straightforward to use. So, if you’re taking a course, you may find a small tag that says “New” next to a topic. That indicates that there are some new training that’s been added to the course. So what I’m trying to say is if you’re looking to take some certification, check the course before you register for the exam and to see if there are any “New” tags. If you find them, you can learn what’s new and not have to go through the entire course again. This way, you’re up to date with the training! 09:25 Nikita: Ok. Got it. Tell us more about the certification, Sue. Sue: If you’re ready, search for the Become An Oracle Autonomous Database Cloud Professional learning path in MyLearn and scroll down to the Oracle Database Cloud Professional exam. Click on the “Register Now” button. You’ll be taken to a page where you’ll see the exam overview, the resources to help you prepare for the exam, a button to register for the exam, and things to do before your exam session. It will also describe what happens after the exam and some exam policies, like what to do if you need to reschedule your exam. When you’re ready to take the exam, you can schedule the date and time according to when it’s convenient for you.  10:15 Lois: What’s the actual experience of taking the exam like? Sue: It’s pretty straightforward. You want to prepare your system a day or two before the exam. You want to ensure you can connect successfully to the test site and that your laptop is plugged in and not running on battery. You want to make sure all other applications are closed before you perform the system test. Now, the system test is really with the test site and consists of testing your microphone, an internet speed test, and your video. You will also be asked to do a test-exam simulation. You will need to be able to download the simulation exam and answer a few simple true or false questions. Once you have successfully done that, you’re ready to take the test on your laptop on the actual day of the test. Now, on the day of the test, set up your test environment. For your test environment, what it really entails is that you have an environment that you do not have anything on your desk. You cannot have a second monitor. And it’s best to have a clear wall behind you so that the proctor can see there is nothing around you. And don’t forget to turn off your mobile device. 11:34 Lois: Ok, I’ve taken the test, and I passed. Wohoo! What happens now? Sue: When you pass the exam, you will receive an email from Oracle with your results as well as a link to Oracle CertView. This is the Oracle certification candidate portal. In CertView, you can download and print your eCertificate. You can share your newly earned badge on places like Facebook, Twitter, and LinkedIn, or even email your employer and others a secure link that they can use to confirm and validate your credentials. 12:11 Nikita: Can anyone take the certification? Sue: Yes, Niki. This certification is available to all candidates, including on-premise database administrators, cloud data management professionals, and consultants. 12:24 Lois: How long is the certification valid? What happens when it expires? Sue: Certain Oracle credentials require periodic recertification for Oracle to recognize them as "active." For such credentials, you must upgrade to a current version within 12 months following the Oracle credential retirement to keep your certification active. 12:51 Are you planning to become an Oracle Certified Professional this year? Whether you're a seasoned IT pro or just starting your career, getting certified can give you a significant boost. And don't worry, we've got your back. Join us at one of our cert prep live events in the Oracle University Learning Community. You'll get insider tips from seasoned experts and learn from other professionals' experiences. Plus, once you've earned your certification, you'll become part of our exclusive forum for Oracle-certified users. So, what are you waiting for? Head over to mylearn.oracle.com and create an account to jump-start your journey towards certification today! 13:35 Nikita: Welcome back. Sue, what other training can I take after Autonomous Database?    Sue: Now that you have a strong foundation in the database, there is so much more that you can learn in Oracle. You can consider Exadata if you work on a high-performance data workload that’s running mission-critical applications. Look for a learning path called Become an Exadata Service Cloud Administrator, in MyLearn, to help you with that. GoldenGate is also a good choice if you work with data that needs to be shared and replicated, both locally as well as globally. The course for this is called Oracle GoldenGate 19c: Administration/Implementation.   A hot topic in technology today is generative AI (Artificial Intelligence). You want to learn how to implement data security on different levels when it needs to be shared with large language model providers.  Perhaps venture beyond the database and learn about Oracle Cloud Infrastructure and how its components and the many cloud services work together. Just go to mylearn.oracle.com, and in the field where you see “What do you want to learn?” type in what interests you and let your learning adventure begin! 14:59 Lois: And since you brought up AI, Sue, this is the perfect time to mention that we’ll be focusing on it for the next couple of weeks. We’ll be speaking to some of our colleagues on topics like artificial intelligence, machine learning, deep learning, generative AI, the OCI AI portfolio and more, but we’ll talk more about that next week. Nikita: Yeah, can’t wait for that. Thank you so much, Sue, for giving us your time today. Sue: Thanks for having me! Lois: Until next time, this is Lois Houston… Nikita: And Nikita Abraham, signing off! 15:29 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click  Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate  and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
15m
16/01/2024

Autonomous Database Tools

In this episode, hosts Lois Houston and Nikita Abraham speak with Oracle Database experts about the various tools you can use with Autonomous Database, including Oracle Application Express (APEX), Oracle Machine Learning, and more.   Oracle MyLearn: https://mylearn.oracle.com/   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X (formerly Twitter): https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Tamal Chatterjee, and the OU Studio Team for helping us create this episode.   ---------------------------------------------------------   Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:26 Lois: Hello and welcome to the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor. Nikita: Hi everyone! We spent the last two episodes exploring Oracle Autonomous Database’s deployment options: Serverless and Dedicated. Today, it’s tool time! Lois: That’s right, Niki. We’ll be chatting with some of our Database experts on the tools that you can use with the Autonomous Database. We’re going to hear from Patrick Wheeler, Kay Malcolm, Sangeetha Kuppuswamy, and Thea Lazarova. Nikita: First up, we have Patrick, to take us through two important tools. Patrick, let’s start with Oracle Application Express. What is it and how does it help developers? 01:15 Patrick: Oracle Application Express, also known as APEX-- or perhaps APEX, we're flexible like that-- is a low-code development platform that enables you to build scalable, secure, enterprise apps with world-class features that can be deployed anywhere. Using APEX, developers can quickly develop and deploy compelling apps that solve real problems and provide immediate value. You don't need to be an expert in a vast array of technologies to deliver sophisticated solutions. Focus on solving the problem, and let APEX take care of the rest. 01:52 Lois: I love that it’s so easy to use. OK, so how does Oracle APEX integrate with Oracle Database? What are the benefits of using APEX on Autonomous Database? Patrick: Oracle APEX is a fully supported, no-cost feature of Oracle Database. If you have Oracle Database, you already have Oracle APEX. You can access APEX from database actions. Oracle APEX on Autonomous Database provides a preconfigured, fully managed, and secure environment to both develop and deploy world-class applications. Oracle takes care of configuration, tuning, backups, patching, encryption, scaling, and more, leaving you free to focus on solving your business problems. APEX enables your organization to be more agile and develop solutions faster for less cost and with greater consistency. You can adapt to changing requirements with ease, and you can empower professional developers, citizen developers, and everyone else. 02:56 Nikita: So you really don’t need to have a lot of specializations or be an expert to use APEX. That’s so cool! Now, what are the steps involved in creating an application using APEX?  Patrick: You will be prompted to log in as the administrator at first. Then, you may create workspaces for your respective users and log in with those associated credentials. Application Express provides you with an easy-to-use, browser-based environment to load data, manage database objects, develop REST interfaces, and build applications which look and run great on both desktop and mobile devices. You can use APEX to develop a wide variety of solutions, import spreadsheets, and develop a single source of truth in minutes. Create compelling data visualizations against your existing data, deploy productivity apps to elegantly solve a business need, or build your next mission-critical data management application. There are no limits on the number of developers or end users for your applications. 04:01 Lois: Patrick, how does APEX use SQL? What role does SQL play in the development of APEX applications?  Patrick: APEX embraces SQL. Anything you can express with SQL can be easily employed in an APEX application. Application Express also enables low-code development, providing developers with powerful data management and data visualization components that deliver modern, responsive end user experiences out-of-the-box. Instead of writing code by hand, you're able to use intelligent wizards to guide you through the rapid creation of applications and components. Creating a new application from APEX App Builder is as easy as one, two, three. One, in App Builder, select a project name and appearance. Two, add pages and features to the app. Three, finalize settings, and click Create. 05:00 Nikita: OK. So, the other tool I want to ask you about is Oracle Machine Learning. What can you tell us about it, Patrick? Patrick: Oracle Machine Learning, or OML, is available with Autonomous Database. A new capability that we've introduced with Oracle Machine Learning is called Automatic Machine Learning, or AutoML. Its goal is to increase data scientist productivity while reducing overall compute time. In addition, AutoML enables non-experts to leverage machine learning by not requiring deep understanding of the algorithms and their settings. 05:37 Lois: And what are the key functions of AutoML? Patrick: AutoML consists of three main functions: Algorithm Selection, Feature Selection, and Model Tuning. With Automatic Algorithm Selection, the goal is to identify the in-database algorithms that are likely to achieve the highest model quality. Using metalearning, AutoML leverages machine learning itself to help find the best algorithm faster than with exhaustive search. With Automatic Feature Selection, the goal is to denoise data by eliminating features that don't add value to the model. By identifying the most predicted features and eliminating noise, model accuracy can often be significantly improved with a side benefit of faster model building and scoring. Automatic Model Tuning tunes algorithm hyperparameters, those parameters that determine the behavior of the algorithm, on the provided data. Auto Model Tuning can significantly improve model accuracy while avoiding manual or exhaustive search techniques, which can be costly both in terms of time and compute resources. 06:44 Lois: How does Oracle Machine Learning leverage the capabilities of Autonomous Database? Patrick: With Oracle Machine Learning, the full power of the database is accessible with the tremendous performance of parallel processing available, whether the machine learning algorithm is accessed via native database SQL or with OML4Py through Python or R.  07:07 Nikita: Patrick, talk to us about the Data Insights feature. How does it help analysts uncover hidden patterns and anomalies? Patrick: A feature I wanted to call the electromagnet, but they didn't let me. An analyst's job can often feel like looking for a needle in a haystack. So throw the switch and all that metallic stuff is going to slam up onto that electromagnet. Sure, there are going to be rusty old nails and screws and nuts and bolts, but there are going to be a few needles as well. It's far easier to pick the needles out of these few bits of metal than go rummaging around in a pile of hay, especially if you have allergies. That's more or less how our Insights tool works. Load your data, kick off a query, and grab a cup of coffee. Autonomous Database does all the hard work, scouring through this data looking for hidden patterns, anomalies, and outliers. Essentially, we run some analytic queries that predict expected values. And where the actual values differ significantly from expectation, the tool presents them here. Some of these might be uninteresting or obvious, but some are worthy of further investigation. You get this dashboard of various exceptional data patterns. Drill down on a specific gauge in this dashboard and significant deviations between actual and expected values are highlighted. 08:28 Lois: What a useful feature! Thank you, Patrick. Now, let’s discuss some terms and concepts that are applicable to the Autonomous JSON Database with Kay. Hi Kay, what’s the main focus of the Autonomous JSON Database? How does it support developers in building NoSQL-style applications? Kay: Autonomous Database supports the JavaScript Object Notation, also known as JSON, natively in the database. It supports applications that use the SODA API to store and retrieve JSON data or SQL queries to store and retrieve data stored in JSON-formatted data.  Oracle AJD is Oracle ATP, Autonomous Transaction Processing, but it's designed for developing NoSQL-style applications that use JSON documents. You can promote an AJD service to ATP. 09:22 Nikita: What makes the development of NoSQL-style, document-centric applications flexible on AJD?  Kay: Development of these NoSQL-style, document-centric applications is particularly flexible because the applications use schemaless data. This lets you quickly react to changing application requirements. There's no need to normalize the data into relational tables and no impediment to changing the data structure or organization at any time, in any way. A JSON document has its own internal structure, but no relation is imposed on separate JSON documents. Nikita: What does AJD do for developers? How does it actually help them? Kay: So Autonomous JSON Database, or AJD, is designed for you, the developer, to allow you to use simple document APIs and develop applications without having to know anything about SQL. That's a win. But at the same time, it does give you the ability to create highly complex SQL-based queries for reporting and analysis purposes. It has built-in binary JSON storage type, which is extremely efficient for searching and for updating. It also provides advanced indexing capabilities on the actual JSON data. It's built on Autonomous Database, so that gives you all of the self-driving capabilities we've been talking about, but you don't need a DBA to look after your database for you. You can do it all yourself. 11:00 Lois: For listeners who may not be familiar with JSON, can you tell us briefly what it is?  Kay: So I mentioned this earlier, but it's worth mentioning again. JSON stands for JavaScript Object Notation. It was originally developed as a human readable way of providing information to interchange between different programs. So a JSON document is a set of fields. Each of these fields has a value, and those values can be of various data types. We can have simple strings, we can have integers, we can even have real numbers. We can have Booleans that are true or false. We can have date strings, and we can even have the special value null. Additionally, values can be objects, and objects are effectively whole JSON documents embedded inside a document. And of course, there's no limit on the nesting. You can nest as far as you like. Finally, we can have a raise, and a raise can have a list of scalar data types or a list of objects. 12:13 Nikita: Kay, how does the concept of schema apply to JSON databases? Kay: Now, JSON documents are stored in something that we call collections. Each document may have its own schema, its own layout, to the JSON. So does this mean that JSON document databases are schemaless? Hmmm. Well, yes. But there's nothing to fear because you can always use a check constraint to enforce a schema constraint that you wish to introduce to your JSON data. Lois: Kay, what about indexing capabilities on JSON collections? Kay: You can create indexes on a JSON collection, and those indexes can be of various types, including our flexible search index, which indexes the entire content of the document within the JSON collection, without having to know anything in advance about the schema of those documents.  Lois: Thanks Kay! 13:18 AI is being used in nearly every industry—healthcare, manufacturing, retail, customer service, transportation, agriculture, you name it! And, it’s only going to get more prevalent and transformational in the future. So it’s no wonder that AI skills are the most sought after by employers.  We’re happy to announce a new OCI AI Foundations certification and course that is available—for FREE! Want to learn about AI? Then this is the best place to start! So, get going! Head over to mylearn.oracle.com to find out more.  13:54 Nikita: Welcome back! Sangeetha, I want to bring you in to talk about Oracle Text. Now I know that Oracle Database is not only a relational store but also a document store. And you can load text and JSON assets along with your relational assets in a single database.  When I think about Oracle and databases, SQL development is what immediately comes to mind. So, can you talk a bit about the power of SQL as well as its challenges, especially in schema changes? Sangeetha: Traditionally, Oracle has been all about SQL development. And with SQL development, it's an incredibly powerful language. But it does take some advanced knowledge to make the best of it. So SQL requires you to define your schema up front. And making changes to that schema could be a little tricky and sometimes highly bureaucratic task. In contrast, JSON allows you to develop your schema as you go--the schemaless, perhaps schema-later model. By imposing less rigid requirements on the developer, it allows you to be more fluid and Agile development style. 15:09 Lois: How does Oracle Text use SQL to index, search, and analyze text and documents that are stored in the Oracle Database? Sangeetha: Oracle Text can perform linguistic analyses on documents as well as search text using a variety of strategies, including keyword searching, context queries, Boolean operations, pattern matching, mixed thematic queries, like HTML/XML session searching, and so on. It can also render search results in various formats, including unformatted text, HTML with term highlighting, and original document format. Oracle Text supports multiple languages and uses advanced relevance-ranking technology to improve search quality. Oracle Text also offers advantage features like classification, clustering, and support for information visualization metaphors. Oracle Text is now enabled automatically in Autonomous Database. It provides full-text search capabilities over text, XML, JSON content. It also could extend current applications to make better use of textual fields. It builds new applications specifically targeted at document searching. Now, all of the power of Oracle Database and a familiar development environment, rock-solid autonomous database infrastructure for your text apps, we can deal with text in many different places and many different types of text. So it is not just in the database. We can deal with data that's outside of the database as well. 17:03 Nikita: How does it handle text in various places and formats, both inside and outside the database? Sangeetha: So in the database, we can be looking a varchar2 column or LOB column or binary LOB columns if we are talking about binary documents such as PDF or Word. Outside of the database, we might have a document on the file system or out on the web with URLs pointing out to the document. If they are on the file system, then we would have a file name stored in the database table. And if they are on the web, then we should have a URL or a partial URL stored in the database. And we can then fetch the data from the locations and index it in the term documents format. We recognize many different document formats and extract the text from them automatically. So the basic forms we can deal with-- plain text, HTML, JSON, XML, and then formatted documents like Word docs, PDF documents, PowerPoint documents, and also so many different types of documents. All of those are automatically handled by the system and then processed into the format indexing. And we are not restricted by the English either here. There are various stages in the index pipeline. A document starts one, and it's taken through the different stages so until it finally reaches the index. 18:44 Lois: You mentioned the indexing pipeline. Can you take us through it? Sangeetha: So it starts with a data store. That's responsible for actually reaching the document. So once we fetch the document from the data store, we pass it on to the filter. And now the filter is responsible for processing binary documents into indexable text. So if you have a PDF, let's say a PDF document, that will go through the filter. And that will extract any images and return it into the stream of HTML text ready for indexing. Then we pass it on to the sectioner, which is responsible for identifying things like paragraphs and sentences. The output from the section is fed onto the lexer. The lexer is responsible for dividing the text into indexable words. The output of the lexer is fed into the index engine, which is responsible for laying out to the indexes on the disk. Storage, word list, and stop list are some additional inputs there. So storage tells exactly how to lay out the index on disk. Word list which has special preferences like desegmentation. And then stop is a list word that we don't want to index. So each of these stages and inputs can be customized. Oracle has something known as the extensibility framework, which originally was designed to allow people to extend capabilities of these products by adding new domain indexes. And this is what we've used to implement Oracle Text. So when kernel sees this phrase INDEXTYPE ctxsys.context, it knows to handle all of the hard work creating the index. 20:48 Nikita: Other than text indexing, Oracle Text offers additional operations, right? Can you share some examples of these operations? Sangeetha: So beyond the text index, other operations that we can do with the Oracle Text, some of which are search related. And some examples of that are these highlighting markups and snippets. Highlighting and markup are very similar. They are ways of fetching these results back with the search. And then it's marked up with highlighting within the document text. Snippet is very similar, but it's only bringing back the relevant chunks from the document that we are searching for. So rather than getting the whole document back to you, just get a few lines showing this in a context and the theme and extraction. So Oracle Text is capable of figuring out what a text is all about. We have a very large knowledge base of the English language, which will allow you to understand the concepts and the themes in the document. Then there's entity extraction, which is the ability to find out people, places, dates, times, zip codes, et cetera in the text. So this can be customized with your own user dictionary and your own user rules. 22:14 Lois: Moving on to advanced functionalities, how does Oracle Text utilize machine learning algorithms for document classification? And what are the key types of classifications? Sangeetha: The text analytics uses machine learning algorithms for document classification. We can process a large set of data documents in a very efficient manner using Oracle's own machine learning algorithms. So you can look at that as basically three different headings. First of all, there's classification. And that comes in two different types-- supervised and unsupervised. The supervised classification which means in this classification that it provides the training set, a set of documents that have already defined particular characteristics that you're looking for. And then there's unsupervised classification, which allows your system itself to figure out which documents are similar to each other. It does that by looking at features within the documents. And each of those features are represented as a dimension in a massively high dimensional feature space in documents, which are clustered together according to that nearest and nearness in the dimension in the feature space. Again, with the named entity recognition, we've already talked about that a little bit. And then finally, there is a sentiment analysis, the ability to identify whether the document is positive or negative within a given particular aspect. 23:56 Nikita: Now, for those who are already Oracle database users, how easy is it to enable text searching within applications using Oracle Text? Sangeetha: If you're already an Oracle database user, enabling text searching within your applications is quite straightforward. Oracle Text uses the same SQL language as the database. And it integrates seamlessly with your existing SQL. Oracle Text can be used from any programming language which has SQL interface, meaning just about all of them.  24:32 Lois: OK from Oracle Text, I’d like to move on to Oracle Spatial Studio. Can you tell us more about this tool? Sangeetha: Spatial Studio is a no-code, self-service application that makes it easy to access the sorts of spatial features that we've been looking at, in particular, in order to get that data prepared to use with spatial, visualizing results in maps and tables, and also doing the analysis and sharing results. Spatial Studios is encoded at no extra cost with Autonomous Database. The studio web application itself has no additional cost and it runs on the server. 25:13 Nikita: Let’s talk a little more about the cost. How does the deployment of Spatial Studio work, in terms of the server it runs on?  Sangeetha: So, the server that it runs on, if it's running in the Cloud, that computing node, it would have some cost associated with it. It can also run on a free tier with a very small shape, just for evaluation and testing.  Spatial Studio is also available on the Oracle Cloud Marketplace. And there are a couple of self-paced workshops that you can access for installing and using Spatial Studio. 25:47 Lois: And how do developers access and work with Oracle Autonomous Database using Spatial Studio? Sangeetha: Oracle Spatial Studio allows you to access data in Oracle Database, including Oracle Autonomous Database. You can create connections to Oracle Autonomous Databases, and then you work with the data that's in the database. You can also see Spatial Studio to load data to Oracle Database, including Oracle Autonomous Database. So, you can load these spreadsheets in common spatial formats. And once you've loaded your data or accessed data that already exists in your Autonomous Database, if that data does not already include native geometrics, Oracle native geometric type, then you can prepare the data if it has addresses or if it has latitude and longitude coordinates as a part of the data. 26:43 Nikita: What about visualizing and analyzing spatial data using Spatial Studio? Sangeetha: Once you have the data prepared, you can easily drag and drop and start to visualize your data, style it, and look at it in different ways. And then, most importantly, you can start to ask spatial questions, do all kinds of spatial analysis, like we've talked about earlier. While Spatial Studio provides a GUI that allows you to perform those same kinds of spatial analysis. And then the results can be dropped on the map and visualized so that you can actually see the results of spatial questions that you're asking. When you've done some work, you can save your work in a project that you can return to later, and you can also publish and share the work you've done. 27:34 Lois: Thank you, Sangeetha. For the final part of our conversation today, we’ll talk with Thea. Thea, thanks so much for joining us. Let's get the basics out of the way. How can data be loaded directly into Autonomous Database? Thea: Data can be loaded directly to ADB through applications such as SQL Developer, which can read data files, such as txt and xls, and load directly into tables in ADB. 27:59 Nikita: I see. And is there a better method to load data into ADB? Thea: A more efficient and preferred method for loading data into ADB is to stage the data cloud object store, preferably Oracle's, but also supported our Amazon S3 and Azure Blob Storage. Any file type can be staged in object store. Once the data is in object store, Autonomous Database can access a directly. Tools can be used to facilitate the data movement between object store and the database. 28:27 Lois: Are there specific steps or considerations when migrating a physical database to Autonomous? Thea: A physical database can simply be migrated to autonomous because database must be converted to pluggable database, upgraded to 19C, and encrypted. Additionally, any changes to an Oracle-shipped stored procedures or views must be found and reverted. All uses of container database admin privileges must be removed. And all legacy features that are not supported must be removed, such as legacy LOBs. Data Pump, expdp/impdp must be used for migrating databases versions 10.1 and above to Autonomous Database as it addresses the issues just mentioned. For online migrations, GoldenGate must be used to keep old and new database in sync. 29:15 Nikita: When you’re choosing the method for migration and loading, what are the factors to keep in mind? Thea: It's important to segregate the methods by functionality and limitations of use against Autonomous Database. The considerations are as follows. Number one, how large is the database to be imported? Number two, what is the input file format? Number three, does the method support non-Oracle database sources? And number four, does the methods support using Oracle and/or third-party object store? 29:45 Lois: Now, let’s move on to the tools that are available. What does the DBMS_CLOUD functionality do? Thea: The Oracle Autonomous Database has built-in functionality called DBMS_CLOUD specifically designed so the database can move data back and forth with external sources through a secure and transparent process. DBMS_CLOUD allows data movement from the Oracle object store. Data from any application or data source export to text-- .csv or JSON-- output from third-party data integration tools. DBMS_CLOUD can also access data stored on Object Storage from the other clouds, AWS S3 and Azure Blob Storage. DBMS_CLOUD does not impose any volume limit, so it's the preferred method to use. SQL*Loader can be used for loading data located on the local client file systems into Autonomous Database. There are limits around OS and client machines when using SQL*Loader. 30:49 Nikita: So then, when should I use Data Pump and SQL Developer for migration? Thea: Data Pump is the best way to migrate a full or part database into ADB, including databases from previous versions. Because Data Pump will perform the upgrade as part of the export/import process, this is the simplest way to get to ADB from any existing Oracle Database implementation. SQL Developer provides a GUI front end for using data pumps that can automate the whole export and import process from an existing database to ADB. SQL Developer also includes an import wizard that can be used to import data from several file types into ADB. A very common use of this wizard is for importing Excel files into ADW. Once a credential is created, it can be used to access a file as an external table or to ingest data from the file into a database table. DBMS_CLOUD makes it much easier to use external tables, and the organization external needed in other versions of the Oracle Database are not needed. 31:54 Lois: Thea, what about Oracle Object Store? How does it integrate with Autonomous Database, and what advantages does it offer for staging data? Thea: Oracle Object Store is directly integrated into Autonomous Database and is the best option for staging data that will be consumed by ADB. Any file type can be stored in object store, including SQL*Loader files, Excel, JSON, Parquet, and, of course, Data Pump DMP files. Flat files stored on object store can also be used as Oracle Database external tables, so they can queried directly from the database as part of a normal DML operation. Object store is a separate bin storage allocated to the Autonomous Database for database Object Storage, such as tables and indexes. That storage is part of the Exadata system Autonomous Database runs on, and it is automatically allocated and managed. Users do not have direct access to that storage. 32:50 Nikita: I know that one of the main considerations when loading and updating ADB is the network latency between the data source and the ADB. Can you tell us more about this? Thea: Many ways to measure this latency exist. One is the website cloudharmony.com, which provides many real-time metrics for connectivity between the client and Oracle Cloud Services. It's important to run these tests when determining with Oracle Cloud service location will provide the best connectivity. The Oracle Cloud Dashboard has an integrated tool that will provide real time and historic latency information between your existing location and any specified Oracle Data Center. When migrating data to Autonomous Database, table statistics are gathered automatically during direct-path load operations. If direct-path load operations are not used, such as with SQL Developer loads, the user can gather statistics manually as needed. 33:44 Lois: And finally, what can you tell us about the Data Migration Service? Thea: Database Migration Service is a fully managed service for migrating databases to ADB. It provides logical online and offline migration with minimal downtime and validates the environment before migration. We have a requirement that the source database is on Linux. And it would be interesting to see if we are going to have other use cases that we need other non-Linux operating systems. This requirement is because we are using SSH to directly execute commands on the source database. For this, we are certified on the Linux only. Target in the first release are Autonomous databases, ATP, or ADW, both serverless and dedicated. For agent environment, we require Linux operating system, and this is Linux-safe. In general, we're targeting a number of different use cases-- migrating from on-premise, third-party clouds, Oracle legacy clouds, such as Oracle Classic, or even migrating within OCI Cloud and doing that with or without direct connection. If you have any direct connection behind a firewall, we support offline migration. If you have a direct connection, we support both offline and online migration. For more information on all migration approaches are available for your particular situation, check out the Oracle Cloud Migration Advisor. 35:06 Nikita: I think we can wind up our episode with that. Thanks to all our experts for giving us their insights.  Lois: To learn more about the topics we’ve discussed today, visit mylearn.oracle.com and search for the Oracle Autonomous Database Administration Workshop. Remember, all of the training is free, so dive right in! Join us next week for another episode of the Oracle University Podcast. Until then, Lois Houston… Nikita: And Nikita Abraham, signing off! 35:35 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
36m
09/01/2024

Autonomous Database on Dedicated Infrastructure

The Oracle Autonomous Database Dedicated deployment is a good choice for customers who want to implement a private database cloud in their own dedicated Exadata infrastructure. That dedicated infrastructure can either be in the Oracle Public Cloud or in the customer's own data center via Oracle Exadata Cloud@Customer.   In a dedicated environment, the Exadata infrastructure is entirely dedicated to the subscribing customer, isolated from other cloud tenants, with no shared processor, storage, and memory resource.   In this episode, hosts Lois Houston and Nikita Abraham speak with Oracle Database experts about how Autonomous Database Dedicated offers greater control of the software and infrastructure life cycle, customizable policies for separation of database workload, software update schedules and versioning, workload consolidation, availability policies, and much more.   Oracle MyLearn: https://mylearn.oracle.com/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X (formerly Twitter): https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Tamal Chatterjee, and the OU Studio Team for helping us create this episode.   -------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started. 00:26 Nikita: Hello and welcome to the Oracle University Podcast. I’m Nikita Abraham, Principal Technical Editor with Oracle University, and I’m joined by Lois Houston, Director of Innovation Programs. Lois: Hi there! This is our second episode on Oracle’s Autonomous Database, and today we’re going to spend time discussing Autonomous Database on Dedicated Infrastructure. We’ll be talking with three of our colleagues: Maria Colgan, Kamryn Vinson, and Kay Malcolm. 00:53 Nikita: Maria is a Distinguished Product Manager for Oracle Database, Kamryn is a Database Product Manager, and Kay is a Senior Director of Database Product Management.  Lois: Hi Maria! Thanks for joining us today. We know that Oracle Autonomous Database offers two deployment choices: serverless and dedicated Exadata infrastructure. We spoke about serverless infrastructure last week but for anyone who missed that episode, can you give us a quick recap of what it is? 01:22 Maria: With Autonomous Database Serverless, Oracle automates all aspects of the infrastructure and database management for you. That includes provisioning, configuring, monitoring, backing up, and tuning. You simply select what type of database you want, maybe a data warehouse, transaction processing, or a JSON document store, which region in the Oracle Public Cloud you want that database deployed, and the base compute and storage resources necessary. Oracle automatically takes care of everything else. Once provisioned, the database can be instantly scaled through our UI, our APIs, or automatically based on your workload needs. All scaling activities happen completely online while the database remains open for business. 02:11 Nikita: Ok, so now that we know what serverless is, let’s move on to dedicated infrastructure. What can you tell us about it? Maria: Autonomous Database Dedicated allows customers to implement a private database cloud running on their own dedicated Exadata infrastructure. That dedicated infrastructure can be in Oracle’s Public Cloud or in the customer's own data center via Oracle Exadata Cloud@Customer. It makes an ideal platform to consolidate multiple databases regardless of their workload type or their size. And it also allows you to offer database as a service within your enterprise. 02:50 Lois: What are the primary benefits of Autonomous Database Dedicated infrastructure? Maria: With the dedicated deployment option, you must first subscribe to Dedicated Exadata Cloud Infrastructure that is isolated from other tenants with no shared processors, memory, network, or storage resources. This infrastructure choice offers greater control of both the software and the infrastructure life cycle. Customers can specify their own policies for workload separation, software update schedules, and availability. One of the key benefits of an autonomous database is a lower total cost of ownership through more automation and operational delegation to Oracle. Remember it’s a fully managed service. All database operations, such as backup, software updates, upgrades, OS maintenance, incident management, and health monitoring, will be automatically done for you by Oracle. Its maximum availability architecture protects you from any hardware failures and in the event of a full outage, the service will be automatically failed over to your standby site. Built-in application continuity ensures zero downtime during the standard software update or in the event of a failover.  04:09 Nikita: And how is this billed?  Maria: Autonomous Database also has true pay-per-use billing so even when autoscale is enabled, you’ll only pay for those additional resources when you use them. And we make it incredibly simple to develop on this environment with managed developer add-ons like our low code development environment, APEX, and our REST data services. This means you don’t need any additional development environments in order to get started with a new application. 04:40 Lois: Ok. So, it looks like the dedicated option offers more control and customization. Maria, how do we access a dedicated database over a network? Maria: The network path is through a VCN, or Virtual Cloud Network, and the subnet that's defined by the Exadata infrastructure hosting the database. By default, this subnet is defined as private, meaning, there's no public internet access to those databases. This ensures only your company can access your Exadata infrastructure and your databases. Autonomous Database Dedicated can also take advantage of network services provided by OCI, including subnets or VCN peering, as well as connections to on-prem databases through the IP secure VPN and FastConnect dedicated corporate network connections. 05:33 Maria: You can also take advantage of the Oracle Microsoft partnership that enables customers to connect their Oracle Cloud Infrastructure resources and Microsoft Azure resources through a dedicated private connection. However, for some customers, a move to the public cloud is just not possible. Perhaps it's due to industry regulations, performance concerns, or integration with legacy on-prem applications. For these types of customers, Exadata Cloud@Customer should meet their requirements for strict data sovereignty and security by delivering high-performance Exadata Cloud Services capabilities in their data center behind their own firewall. 06:16 Nikita: What are the benefits of Autonomous Database on Exadata Cloud@Customer? How’s it different? Maria: Autonomous Database on Exadata Cloud@Customer provides the same service as Autonomous Database Dedicated in the public cloud. So you get the same simplicity, agility, and performance, and elasticity that you get in the cloud. But it also provides a very fast and simple transition to an autonomous cloud because you can easily migrate on-prem databases to Exadata Cloud@Customer. Once the database is migrated, any existing applications can simply reconnect to that new database and run without any application changes being needed. And the data will leave your data center, so making it a very safe way to adopt a cloud model. 07:04 Lois: So, how do we manage communication to and from the public cloud? Maria: Each Cloud@Customer rack includes two local control plane servers to manage the communication to and from the public cloud. The local control plane acts on behalf of requests from the public cloud, keeping communications consolidated and secure. Platform control plane commands are sent to the Exadata Cloud@Customer system through a dedicated WebSocket secure tunnel.  Oracle Cloud operations staff use that same tunnel to monitor the autonomous database on Exadata Cloud@Customer both for maintenance and for troubleshooting. The two remote, control plane servers installed in the Exadata Cloud@Customer rack host that secure tunnel endpoint and act as a gateway for access to the infrastructure. They also host components that orchestrate the cloud automation, aggregates and routes telemetry messages from the Exadata Cloud@Customer platform to the Oracle Support Service infrastructure. And they also host images for server patching. 08:13 Maria: The Exadata Database Server is connected to the customer-managed switches via either 10 gigabit or 25 gigabit Ethernet. Customers have access to the customer Virtual Machine, or VM, via a pair of layer 2 network connections that are implemented as Virtual Network Interface Cards, or vNICs. They're also tagged VLAN. The physical network connections are implemented for high availability in an active standby configuration. Autonomous Database on Exadata Cloud@Customer provides the best of both worlds-- all of the automation including patching, backing up, scaling, and management of a database that you get with a cloud service, but without the data ever leaving the customer's data center. 09:01 Nikita: That's interesting. And, what happens if a dedicated database loses network connectivity to the OCI control plane? Maria: In the event an autonomous database on Exadata Cloud@Customer loses network connectivity to the OCI control plane, the Autonomous Database will actually continue to be available for your applications. And operations such as backups and autoscaling will not be impacted in that loss of network connectivity. However, the management and monitoring of the Autonomous Database via the OCI console and APIs as well as access by the Oracle Cloud operations team will not be available until that network is reconnected. 09:43 Maria: The capability suspended in the case of a lost network connection include, as I said, infrastructure management-- so that's the manual scaling of an Autonomous Database via the UI or our OCI CLI, or REST APIs, as well as Terraform scripts. They won't be available. Neither will the ability for Oracle Cloud ops to access and perform maintenance activities, such as patching. Nor will we be able to monitor the Oracle infrastructure during the time where the system is not connected. 10:20 Lois: That’s good to know, Maria. What about data encryption and backup options? Maria: All Oracle Autonomous Databases encrypt data at REST. Data is automatically encrypted as it's written to the storage. But this encryption is transparent to authorized users and applications because the database automatically decrypts the data when it's being read from the storage. There are several options for backing up the Autonomous Database Cloud@Customer including using a Zero Data Loss Recovery Appliance, or ZDLRA. You can back it up to locally mounted NFS storage or back it up to the Oracle Public Cloud. 10:57 Nikita: I want to ask you about the typical workflow for Autonomous Database Dedicated infrastructure. What are the main steps here? Maria: In the typical workflow, the fleet administrator role performs the following steps. They provision the Exadata infrastructure by specifying its size, availability domain, and region within the Oracle Cloud. Once the hardware has been provisioned, the fleet administrator partitions the system by provisioning clusters and container databases. Then the developers, DBAs, or anyone who needs a database can provision databases within those container databases. Billing is based on the size of the Exadata infrastructure that's provisioned. So whether that's a quarter rack, half rack, or full rack. It also depends on the number of CPUs that are being consumed. Remember, it's also possible for customers to use their existing Oracle database licenses with this service to reduce the cost. 11:53 Lois: And what Exadata infrastructure models and shapes does Autonomous Database Dedicated support? Maria: That's the X7, X8, and X8M and you can get all of those in either a quarter, half, or full Exadata rack. Currently, you can create a maximum of 12 VM clusters on an Autonomous Database Dedicated infrastructure. We also advise that you limit the number of databases you provision to meet your preferred SLA. To meet the high availability SLA, we recommend a maximum of 100 databases. To meet the extreme availability SLA, we recommend a maximum of 25 databases. 12:35 Nikita: Ok, so now that I know all this, how do I actually get started with Autonomous Database on dedicated infrastructure? Maria: You need to increase your service limit to include that Exadata infrastructure and then you need to create the fleet and DBA service roles. You also need to create the necessary network model, VM clusters, and container databases for your organization. Finally, you need to provide access to the end users who want to create and use those Autonomous databases. Autonomous Database requires a subscription to that Exadata infrastructure for a minimum of 48 hours. But once subscribed, you can test out ideas and then terminate the subscription with no ongoing costs. While subscribed, you can control where you place the resources to perhaps manage latency sensitive applications. 13:29 Maria: You can also have control over patching schedules, software versions, so you can be sure that you're testing exactly what you need to. You can also migrate databases to the Autonomous Database via our export, import capabilities via the object store or through Data Pump or Golden Gate. As with any Autonomous Database, once it's provisioned, you've got full access to both autoscaling and all our cloning capabilities.  13:57 Lois: Maria, I've heard you talk about the importance of clean role separation in managing a private cloud. Can you elaborate on that, please? Maria: A successful private cloud is set up and managed using clean role separation between the fleet administration group and the developers, or DBA groups. The fleet administration group establishes the governance constraints, including things like budgeting, capacity compliance, and SLAs, according to the business structure. The physical resources are also logically grouped to align with this business structure, and then groups of users are given self-service access to the resources within these groups. So a good example of this would be that the developers and DBA groups use self-service database resources within these constraints. 14:46 Nikita: I see. So, what exactly does a fleet administrator do? Maria: Fleet administrators allocate budget by department and are responsible for the creation, monitoring, and management of the autonomous exadata infrastructure, the autonomous exadata VM clusters, and the autonomous container databases. To perform these duties, the fleet administrators must have an Oracle Cloud account or user, and that user must have permissions to manage these resources and be permitted to use network resources that need to be specified when you create these other resources. 15:24 Nikita: And what about database administrators? Maria: Database administrators create, monitor, and manage autonomous databases. They, too, need to have an Oracle Cloud account or be an Oracle Cloud user. Now, those accounts need to have the necessary permissions in order to create and access databases. They also need to be able to access autonomous backups and have permission to access the autonomous container databases, inside which these autonomous databases will be created, and have all of the necessary permissions to be able to create those databases, as I said. While creating autonomous databases, the database administrators will define and gain access to an admin user account inside the database. It's through this account that they will actually get the necessary permissions to be able to create and control database users.  16:24 Lois: How do developers fit into the picture? Maria: Database users and developers who write applications that will use or access an autonomous database don't actually need Oracle Cloud accounts. They'll actually be given the network connectivity and authorization information they need to access those databases by the database administrators. 16:45 Lois: Maria, you mentioned the various ways to manage the lifecycle of an autonomous dedicated service. Can you tell us more about that? Maria: You can manage the lifecycle of an autonomous dedicated service through the Cloud UI, Command Line Interface, through our REST APIs, or through one of the several language SDKs. The lifecycle operations that you can manage include capacity planning and setup, the provisioning and partitioning of exadata infrastructure, the provisioning and management of databases, the scaling of CPU storage and other resources, the scheduling of updates for the infrastructure, the VMs, and the database, as well as monitoring through event notifications.  17:30 Lois: And how do policies come into play? Maria: OCI allows fine-grained control over resources through the application of policies to groups. These policies are applicable to any member of the group. For Oracle Autonomous Database on dedicated infrastructure, the resources in question are autonomous exadata infrastructure, autonomous container databases, autonomous databases, and autonomous backups.  Lois: Thanks so much, Maria. That was great information. 18:05 The Oracle University Learning Community is a great place for you to collaborate and learn with experts, peers, and practitioners. Grow your skills, inspire innovation, and celebrate your successes. The more you participate, the more recognition you can earn. All of your activities, from liking a post to answering questions and sharing with others, will help you earn badges and ranks, and be recognized within the community. If you are already an Oracle MyLearn user, go to MyLearn to join the community. You will need to log in first. If you have not yet accessed Oracle MyLearn, visit mylearn.oracle.com and create an account to get started. 18:44 Nikita: Welcome back! Hi Kamryn, thanks for joining us on the podcast. So, in an Autonomous Database environment where most DBA tasks are automated, what exactly does an application DBA do? Kamryn: While Autonomous Database automates most of the repetitive tasks that DBAs perform, the application DBA will still want to monitor and diagnose databases for applications to maintain the highest performance and the greatest security possible. Tasks the application DBA performs includes operations on databases, cloning, movement, monitoring, and creating alerts. When required, the application DBA performs low-level diagnostics for application performance and looks for insights on performance and capacity trends.  19:36 Nikita: I see. And which tools do they use for these tasks? Kamryn: There are several tools at the application DBA's disposal, including Enterprise Manager, Performance Hub, and the OCI Console. For Autonomous Dedicated, all the database operations are exposed through the console UI and available through REST API calls, including provisioning, stop/start, lifecycle operations for dedicated database types, unscheduled on-demand backups and restores, CPU scaling and storage management, providing connectivity information, including wallets, scheduling updates. 20:17 Lois: So, Kamryn, what tools can DBAs use for deeper exploration? Kamryn: For deeper exploration of the databases themselves, Autonomous Database DBAs can use SQL Developer Web, Performance Hub, and Enterprise Manager. 20:31 Nikita: Let’s bring Kay into the conversation. Hi Kay! With Autonomous Database Dedicated, I’ve heard that customers have more control over patching. Can you tell us a little more about that? Kay: With Autonomous Database Dedicated, customers get to determine the update or patching schedule if they wish. Oracle automatically manages all patching activity, but with the ADB-Dedicated service, customers have the option of customizing the patching schedule. You can specify which month in every quarter you want, which week in that month, which day in that month, and which patching window within that day. You can also dynamically change the scheduled patching date and time for a specific database if the originally scheduled time becomes inconvenient. 21:22 Lois: That's great! So, how often are updates published, and what options do customers have when it comes to applying these updates? Kay: Every quarter, updates are published to the console, and OCI notifications are sent out. ADB-Dedicated allows for greater control over updates by allowing you to choose to apply the current update or stay with the previous version and skip to the next release. And the latest update can be applied immediately. This provides fleet administrators with the option to maintain test and production systems at different patch levels. A fleet administrator or a database admin sets up the software version policy at the Autonomous Container Database level during provisioning, although the defaults can be modified at any time for an existing Autonomous Container Database. At the bottom of the Autonomous Exadata Infrastructure provisioning screen, you will see a Configure the Automatic Maintenance section, where you should click the Modify Schedule.  22:34 Nikita: What happens if a customer doesn't customize their patching schedule? Kay: If you do not customize a schedule, it behaves like Autonomous Serverless, and Oracle will set a schedule for you. ADB-Dedicated customers get to choose the patching schedule that fits their business.  22:52 Lois: Back to you, Kamryn, I know a bit about Transparent Data Encryption, but I'm curious to learn more. Can you tell me what it does and how it helps protect data? Kamryn: Transparent Data Encryption, TDE, enables you to encrypt sensitive data that you store in tables and tablespaces. After the data is encrypted, this data is transparently decrypted for authorized users or applications when they access this data. TDE helps protect data stored on media, also called data at rest. If the storage media or data file is stolen, Oracle database uses authentication, authorization, and auditing mechanisms to secure data in the database, but not in the operating system data files where data is stored. To protect these data files, Oracle database provides TDE.  23:45 Nikita: That sounds important for data security. So, how does TDE protect data files? Kamryn: TDE encrypts sensitive data stored in data files. To prevent unauthorized decryption, TDE stores the encryption keys in a security module external to the database called a keystore. You can configure Oracle Key Vault as part of the TDE implementation. This enables you to centrally manage TDE key stores, called TDE wallets, in Oracle Key Vault in your enterprise. For example, you can upload a software keystore to Oracle Key Vault and then make the contents of this keystore available to other TDE-enabled databases. 24:28 Lois: What about Oracle Autonomous Database? How does it handle encryption? Kamryn: Oracle Autonomous Database uses always-on encryption that protects data at rest and in transit. All data stored in Oracle Cloud and network communication with Oracle Cloud is encrypted by default. Encryption cannot be turned off. By default, Oracle Autonomous Database creates and manages all the master encryption keys used to protect your data, storing them in a secure PKCS 12 keystore on the same Exadata systems where the databases reside. If your company's security policies require, Oracle Autonomous Database can instead use keys you create and manage. Customers can control key generation and rotation of the keys. 25:19 Kamryn: The Autonomous databases you create automatically use customer-managed keys because the Autonomous container database in which they are created is configured to use customer-managed keys. Thus, those users who create and manage Autonomous databases do not have to worry about configuring their databases to use customer-managed keys. 25:41 Nikita: Thank you so much, Kamryn, Kay, and Maria for taking the time to give us your insights. To learn more about provisioning Autonomous Database Dedicated resources, head over to mylearn.oracle.com and search for the Oracle Autonomous Database Administration Workshop. Lois: In our next episode, we will discuss Autonomous Database tools. Until then, this is Lois Houston… Nikita: …and Nikita Abraham signing off. 26:07 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
26m
02/01/2024

Autonomous Database on Serverless Infrastructure

Want to quickly provision your autonomous database? Then look no further than Oracle Autonomous Database Serverless, one of the two deployment choices offered by Oracle Autonomous Database.   Autonomous Database Serverless delegates all operational decisions to Oracle, providing you with a completely autonomous experience.   Join hosts Lois Houston and Nikita Abraham, along with Oracle Database experts, as they discuss how serverless infrastructure eliminates the need to configure any hardware or install any software because Autonomous Database handles provisioning the database, backing it up, patching and upgrading it, and growing or shrinking it for you.   Oracle Autonomous Database Episode: https://oracleuniversitypodcast.libsyn.com/oracle-autonomous-database Oracle MyLearn: https://mylearn.oracle.com/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X (formerly Twitter): https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Rajeev Grover, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started. 00:26 Lois: Hello and welcome to the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor. Nikita: Hi everyone! Welcome back to a new season of the Oracle University Podcast. This time, our focus is going to be on Oracle Autonomous Database. We’ve got a jam-packed season planned with some very special guests joining us. 00:52 Lois: If you’re a regular listener of the podcast, you’ll remember that we’d spoken a bit about Autonomous Database last year. That was a really good introductory episode so if you missed it, you might want to check it out.  Nikita: Yeah, we’ll post a link to the episode in today’s show notes so you can find it easily. 01:07 Lois: Right, Niki. So, for today’s episode, we wanted to focus on Autonomous Database on Serverless Infrastructure and we reached out to three experts in the field: Hannah Nguyen,  Sean Stacey, and Kay Malcolm. Hannah is an Associate Cloud Engineer, Sean, a Director of Platform Technology Solutions, and Kay, who’s been on the podcast before, is Senior Director of Database Product Management. For this episode, we’ll be sharing portions of our conversations with them. So, let’s get started. 01:38 Nikita: Hi Hannah! How does Oracle Cloud handle the process of provisioning an Autonomous Database?   Hannah: The Oracle Cloud automates the process of provisioning an Autonomous Database, and it automatically provisions for you a highly scalable, highly secure, and a highly available database very simply out of the box. 01:56 Lois: Hannah, what are the components and architecture involved when provisioning an Autonomous Database in Oracle Cloud? Hannah: Provisioning the database involves very few steps. But it's important to understand the components that are part of the provisioned environment. When provisioning a database, the number of CPUs in increments of 1 for serverless, storage in increments of 1 terabyte, and backup are automatically provisioned and enabled in the database. In the background, an Oracle 19c pluggable database is being added to the container database that manages all the user's Autonomous Databases. Because this Autonomous Database runs on Exadata systems, Real Application Clusters is also provisioned in the background to support the on-demand CPU scalability of the service. This is transparent to the user and administrator of the service. But be aware it is there. 02:49 Nikita: Ok…So, what sort of flexibility does the Autonomous Database provide when it comes to managing resource usage and costs, you know… especially in terms of starting, stopping, and scaling instances? Hannah: The Autonomous Database allows you to start your instance very rapidly on demand. It also allows you to stop your instance on demand as well to conserve resources and to pause billing. Do be aware that when you do pause billing, you will not be charged for any CPU cycles because your instance will be stopped. However, you'll still be incurring charges for your monthly billing for your storage. In addition to allowing you to start and stop your instance on demand, it's also possible to scale your database instance on demand as well. All of this can be done very easily using the Database Cloud Console. 03:36 Lois: What about scaling in the Autonomous Database? Hannah: So you can scale up your OCPUs without touching your storage and scale it back down, and you can do the same with your storage. In addition to that, you can also set up autoscaling. So the database, whenever it detects the need, will automatically scale up to three times the base level number of OCPUs that you have allocated or provisioned for the Autonomous Database. 04:00 Nikita: Is autoscaling available for all tiers?  Hannah: Autoscaling is not available for an always free database, but it is enabled by default for other tiered environments. Changing the setting does not require downtime. So this can also be set dynamically. One of the advantages of autoscaling is cost because you're billed based on the average number of OCPUs consumed during an hour. 04:23 Lois: Thanks, Hannah! Now, let’s bring Sean into the conversation. Hey Sean, I want to talk about moving an autonomous database resource. When or why would I need to move an autonomous database resource from one compartment to another? Sean: There may be a business requirement where you need to move an autonomous database resource, serverless resource, from one compartment to another. Perhaps, there's a different subnet that you would like to move that autonomous database to, or perhaps there's some business applications that are within or accessible or available in that other compartment that you wish to move your autonomous database to take advantage of. 04:58 Nikita: And how simple is this process of moving an autonomous database from one compartment to another? What happens to the backups during this transition? Sean: The way you can do this is simply to take an autonomous database and move it from compartment A to compartment B. And when you do so, the backups, or the automatic backups that are associated with that autonomous database, will be moved with that autonomous database as well. 05:21 Lois: Is there anything that I need to keep in mind when I’m moving an autonomous database between compartments?  Sean: A couple of things to be aware of when doing this is, first of all, you must have the appropriate privileges in that compartment in order to move that autonomous database both from the source compartment to the target compartment. In addition to that, once the autonomous database is moved to this new compartment, any policies or anything that's defined in that compartment to govern the authorization and privileges of that said user in that compartment will be applied immediately to that new autonomous database that has been moved into that new compartment. 05:59 Nikita: Sean, I want to ask you about cloning in Autonomous Database. What are the different types of clones that can be created?  Sean: It's possible to create a new Autonomous Database as a clone of an existing Autonomous Database. This can be done as a full copy of that existing Autonomous Database, or it can be done as a metadata copy, where the objects and tables are cloned, but they are empty. So there's no rows in the tables. And this clone can be taken from a live running Autonomous Database or even from a backup. So you can take a backup and clone that to a completely new database. 06:35 Lois: But why would you clone in the first place? What are the benefits of this?  Sean: When cloning or when creating this clone, it can be created in a completely new compartment from where the source Autonomous Database was originally located. So it's a nice way of moving one database to another compartment to allow developers or another community of users to have access to that environment. 06:58 Nikita: I know that along with having a full clone, you can also have a refreshable clone. Can you tell us more about that? Who is responsible for this? Sean: It's possible to create a refreshable clone from an Autonomous Database. And this is one that would be synced with that source database up to so many days. The task of keeping that refreshable clone in sync with that source database rests upon the shoulders of the administrator. The administrator is the person who is responsible for performing that sync operation. Now, actually performing the operation is very simple, it's point and click. And it's an automated process from the database console. And also be aware that refreshable clones can trail the source database or source Autonomous Database up to seven days. After that period of time, the refreshable clone, if it has not been refreshed or kept in sync with that source database, it will become a standalone, read-only copy of that original source database. 08:00 Nikita: Ok Sean, so if you had to give us the key takeaways on cloning an Autonomous Database, what would they be?  Sean: It's very easy and a lot of flexibility when it comes to cloning an Autonomous Database. We have different models that you can take from a live running database instance with zero impact on your workload or from a backup. It can be a full copy, or it can be a metadata copy, as well as a refreshable, read-only clone of a source database. 08:33 Did you know that Oracle University offers free courses on Oracle Cloud Infrastructure? You’ll find training on everything from cloud computing, database, and security to artificial intelligence and machine learning, all of which is available free to subscribers. So, get going! Pick a course of your choice, get certified, join the Oracle University Learning Community, and network with your peers. If you are already an Oracle MyLearn user, go to MyLearn to begin your journey. If you have not yet accessed Oracle MyLearn, visit mylearn.oracle.com and create an account to get started.  09:12 Nikita: Welcome back! Thank you, Sean, and hi Kay! I want to ask you about events and notifications in Autonomous Database. Where do they really come in handy?  Kay: Events can be used for a variety of notifications, including admin password expiration, ADB services going down, and wallet expiration warnings. There's this service, and it's called the notifications service. It's part of OCI. And this service provides you with the ability to broadcast messages to distributed components using a publish and subscribe model. These notifications can be used to notify you when event rules or alarms are triggered or simply to directly publish a message. In addition to this, there's also something that's called a topic. This is a communication channel for sending messages to subscribers in the topic. You can manage these topics and their subscriptions really easy. It's not hard to do at all. 10:14 Lois: Kay, I want to ask you about backing up Autonomous Databases. How does Autonomous Database handle backups? Kay: Autonomous Database automatically backs up your database for you. The retention period for backups is 60 days. You can restore and recover your database to any point in time during this retention period. You can initiate recovery for your Autonomous Database by using the cloud console or an API call. Autonomous Database automatically restores and recovers your database to the point in time that you specify. In addition to a point in time recovery, we can also perform a restore from a specific backup set.  10:59 Lois: Kay, you spoke about automatic backups, but what about manual backups?  Kay: You can do manual backups using the cloud console, for example, if you want to take a backup say before a major change to make restoring and recovery faster. These manual backups are put in your cloud object storage bucket. 11:20 Nikita: Are there any special instructions that we need to follow when configuring a manual backup? Kay: The manual backup configuration tasks are a one-time operation. Once this is configured, you can go ahead, trigger your manual backup any time you wish after that. When creating the object storage bucket for the manual backups, it is really important-- so I don't want you to forget-- that the name format for the bucket and the object storage follows this naming convention. It should be backup underscore database name. And it's not the display name here when I say database name. 12:00 Kay: In addition to that, the object name has to be all lowercase. So three rules. Backup underscore database name, and the specific database name is not the display name. It has to be in lowercase. Once you've created your object storage bucket to meet these rules, you then go ahead and set a database property. Default_backup_bucket. This points to the object storage URL and it's using the Swift protocol. Once you've got your object storage bucket mapped and you've created your mapping to the object storage location, you then need to go ahead and create a database credential inside your database. You may have already had this in place for other purposes, like maybe you were loading data, you were using Data Pump, et cetera. If you don't, you would need to create this specifically for your manual backups. Once you've done so, you can then go ahead and set your property to that default credential that you created. So once you follow these steps as I pointed out, you only have to do it one time. Once it's configured, you can go ahead and use it from now on for your manual backups. 13:21 Lois: Kay, the last topic I want to talk about before we let you go is Autonomous Data Guard. Can you tell us about it? Kay: Autonomous Data Guard monitors the primary database, in other words, the database that you're using right now.  Lois: So, if ADB goes down… Kay: Then the standby instance will automatically become the primary instance. There's no manual intervention required. So failover from the primary database to that standby database I mentioned, it's completely seamless and it doesn't require any additional wallets to be downloaded or any new URLs to access APEX or Oracle Machine Learning. Even Oracle REST Data Services. All the URLs and all the wallets, everything that you need to authenticate, to connect to your database, they all remain the same for you if you have to failover to your standby database. 14:19 Lois: And what happens after a failover occurs? Kay: After performing a failover, a new standby for your primary will automatically be provisioned. So in other words, in performing a failover your standby does become your new primary. Any new standby is made for that primary. I know, it's kind of interesting. So currently, the standby database is created in the same region as the primary database. For better resilience, if your database is provisioned, it would be available on AD1 or Availability Domain 1. My secondary, or my standby, would be provisioned on a different availability domain. 15:10 Nikita: But there’s also the possibility of manual failover, right? What are the differences between automatic and manual failover scenarios? When would you recommend using each? Kay: So in the case of the automatic failover scenario following a disastrous situation, if the primary ADB becomes completely unavailable, the switchover button will turn to a failover button. Because remember, this is a disaster. Automatic failover is automatically triggered. There's no user action required. So if you're asleep and something happens, you're protected. There's no user action required, but automatic failover is allowed to succeed only when no data loss will occur.   15:57 Nikita: For manual failover scenarios in the rare case when an automatic failover is unsuccessful, the switchover button will become a failover button and the user can trigger a manual failover should they wish to do so. The system automatically recovers as much data as possible, minimizing any potential data loss. But you can see anywhere from a few seconds or minutes of data loss. Now, you should only perform a manual failover in a true disaster scenario, expecting the fact that a few minutes of potential data loss could occur, to ensure that your database is back online as soon as possible.  16:44 Lois: Thank you so much, Kay. This conversation has been so educational for us. And thank you once again to Hannah and Sean. To learn more about Autonomous Database, head over to mylearn.oracle.com and search for the Oracle Autonomous Database Administration Workshop. Nikita: Thanks for joining us today. In our next episode, we will discuss Autonomous Database on Dedicated Infrastructure. Until then, this is Nikita Abraham… Lois: …and Lois Houston signing off. 17:12 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
17m
26/12/2023

Best of 2023: Getting Started with Oracle Database

In today’s digital economy, data is a form of capital. Given the mission-critical role that it has, having a robust data management strategy is now more crucial than ever.   Join Lois Houston and Nikita Abraham, along with Kay Malcolm, as they talk about the various Oracle Database offerings and discuss how to actually use them to efficiently manage data across a diverse but unified data tier.   Oracle MyLearn: https://mylearn.oracle.com/ Oracle University Learning Community: https://education.oracle.com/ou-community X (formerly Twitter): https://twitter.com/Oracle_Edu LinkedIn: https://www.linkedin.com/showcase/oracle-university/   Special thanks to Arijit Ghosh, David Wright, Ranbir Singh, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started. 00:26 Lois: Welcome to the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor. Nikita: Hi there. If you’ve been following along with us these past few weeks, you’ll know we’ve been revisiting our most popular episodes of the year.  Lois: Right, and today’s episode is the last one of the Best of 2023 series. It’s a throwback to our conversation on Oracle’s Data Management strategy and offerings with Kay Malcolm, Senior Director of Database Product Management at Oracle. Nikita: We’d often heard Kay say that Oracle’s data management strategy is simply complete and completely simple. And so we began by asking her what she meant by that. 01:09 Kay: It's a fun play on words, right? App development paradigms are in a rapid state of transformation. Modern app development is simplifying and accelerating how you deploy applications. Also simplifying how data models and data analytics are used. Oracle data management embraces modern app development and transformations that go beyond technology changes. It presents a simply complete solution that is completely simple. Immediately you can see benefits of the easiest and most productive platform for developing and running modern app and analytics. 01:54 Kay: Oracle Database is a converged database that provides best of breed support for all different data models and workloads that you need. When you have converged support for application development, you eliminate data fragmentation. You can perform unique queries and transactions that span any data and create value across all data types and build into your applications.  02:24 Nikita: When you say all data types, this can include both structured and unstructured data, right? Kay: This also includes structured and unstructured data. The Oracle converged database has the best of breed for JSON, graph, and text while including other data types, relations, blockchain, spatial, and others. Now that we have the ability to access any data type, we have various workloads and converged data management that supports all modern transactional and analytical workloads. We have the unique ability to run any combination of workloads on any combination of data. Simply complete for analytics means the ability to include all of the transactions, including key value, IoT, or Internet of Things, along with operational data warehouse and lake and machine learning. 03:27 Kay: Oracle's decentralized database architecture makes decentralized apps simple to deploy and operate. This architecture makes it simple to use decentralized app development techniques like coding events, data events, API driven development, low code, and geo distribution. Autonomous Database or ADB now supports the Mongo database API adding more tools for architectural support. Autonomous Database or ADB has a set of automated tools to manage, provision, tune, and patch. It provides solutions for difficult database engineering with auto indexing and partitioning and is elastic. You can automatically scale up or down based on the workload. Autonomous Database is also very productive. It allows for focus on the data for solving business problems. ADB has self-service tools for analytics, data access, and it simplifies these difficult data engineering architectures. 04:43 Lois: OK…so can you tell us about running modern apps and analytics? Kay: Running applications means thinking about all the operational concerns and solving how to support mission-critical applications. Traditionally, this is where Oracle excels with high availability, security, operational solutions that have been proven over the years. Now, having developer tools and the ability to scale and reduce risk simplifies the development process without having to use complex sharding and data protection. Mission-critical capabilities that are needed for the applications are already provided in the functionality of the Oracle Data Management architecture. Disaster recovery, replication, backups, and security are all part of the Oracle Autonomous Database. 05:42 Kay: Even complex business-critical applications are supported by the operational security and availability of Oracle ADB. Transparently, it provides automated solutions for minimizing risk, dealing with complexity, and availability for all applications. Oracle's big picture data management strategy is simply complete and completely simple with the converged database, data management tools, and the best platform. It is focused on providing a platform that allows for modern app development across all data types, workloads, and development styles. It is completely scalable, available, and secure, leveraging the database technologies developed over several years. And it's available consistently across the environment. It is the simplest to use because of the available tools and running completely mission critical applications. 06:50 Nikita: Ah, so that’s how we come to… Kay: Simply complete and completely simple. Easy to remember and easy to incorporate into your existing architectures.  Lois: OK. So Kay, can you talk a little bit more about Autonomous Database? 07:04 Kay: Let's compare Autonomous Database to how you ran the database on premise. How you ran the database on the cloud using our earlier Cloud Services, Database Cloud Services, and Oracle Exadata Cloud Service. The key thing to understand is Autonomous Database, or ADB, is a fully managed service. We fully manage the infrastructure. We fully manage the database for you. In on premise, you manage everything-- the infrastructure, the database, everything. We also have a service in between that that we call a co-managed service. Here we manage the infrastructure, and you manage the database. That service is important for customers who are not yet up to 19c. Or they might be running a packaged application like E-Business Suite. But for the rest of you, ADB is really the place you want to go. 08:09 Nikita: And why is that? Kay: Because it's fully managed and, because it's fully managed, is a much, much lower cost way to go. So when you talk to your boss about why he wants to move to ADB, they often care about the bottom line. They want to know like, am I going to lower my costs? And with ADB, because we take care of a lot of the tedious chores that DBAs normally have to do and because we take care of best practices, configurations, we can do things at a really low cost.  08:49 Lois: Kay, what does it take for a customer to move to Oracle’s Autonomous Database?  Kay: We've got a tool that helps you look at your current database on prem. This tool will analyze what features you're using and let you know, hey, you know you're doing something that's not supported for ADB, for example. Like if you're running some release before 19c, we don't support it. If you're doing stuff like putting database tables in the system or sys schema, we don't support it. You know, there are a few things that very few customers do that we don't support. And this tool will flag those for you. And then the next step, it's pretty simple. You just use our Data Pump import/export tool to move your data out of your database on prem into the object store on the Cloud. And then you simply import-- you know how to use Data Pump to import-- the data off the file and the object store into the database. Then you're done. Pretty simple process. 09:57 Nikita: Do we assist our customers with data migration from on-prem to Cloud? Kay: More recently have come out with a new service on our Cloud called the Database Migration Service. With Autonomous Database Migration Service, you can just point us at your source database on prem or even on some other cloud. Whatever it is, we will take care of everything from there and move that, go through all the steps and move your database to ADB on the Cloud. Even better, we now are working with our Applications customers to make it really easy for them to move their packaged applications to Autonomous Database. The Oracle development teams that built JD Edwards, PeopleSoft, Siebel have now all certified that those packaged applications can run with Autonomous Database no problem. Our EBS team is working on it. And that'll be coming soon, sometime next year. 11:02 Lois: So, if I am an Apps customer, is there a special service for me? Kay: We have a fully managed service available on our Cloud that lets you take your entire application stack on the middle tier and the database tier, move it to our Cloud. Move the database part to Autonomous Database. And they will also manage your middle tier for you. 11:32 Want to get the inside scoop on Oracle University? Head on over to the all-new Oracle University Learning Community. Attend exclusive events. Read up on the latest news. Get first-hand access to new products and stay up-to-date with upcoming certification opportunities. If you are already an Oracle MyLearn user, go to MyLearn to join the community. You will need to log in first. If you have not yet accessed Oracle MyLearn, visit mylearn.oracle.com and create an account to get started. Join the community today! 12:11 Nikita: Welcome back! Kay, can you talk a bit about APEX?  Kay: We have this great tool called APEX or Application Express. We have a version of Autonomous Database just for any APEX application.  Well, APEX is a low-code tool. It is our low-code tool that lets you rapidly build data-driven applications where the data is in the Oracle Database, really easy and really rapidly. We estimate at least 10 times faster than doing traditional coding to build your applications. What we're seeing is much, much higher productivity than that. Sometimes 40, even 50 times faster coding. 13:01 Kay: Out of the box, it comes with really nice tools for building things-- your classical forms and reporting kinds of workloads. It gives you things like faceted search and capabilities to do things like see on an e-commerce website where you get to choose things like dimensions, like I want a product where the cost is in this range. And, you know, it might have some other attributes. And it can very quickly filter that data for you and return the best results. And it's a really nice tool for iterating. Now, if your user interface doesn't look quite right, it's very easy to tweak colors and backgrounds and themes. Another reason it's so productive is that the whole middle tier part of your application is fully automated for you. You don't have to do anything about connection management or state management. You don't have to worry about mapping data types from some other 3GL programming language to data types. All of that is done for you. The combination of ADB and APEX really rocks. 14:17 Lois: Do we have Extract, Transform, and Load capabilities in our ADB? Kay: We have ETL transformation tools. Again, they let you specify transformations in a drag-and-drop fashion on the screen. We have all sorts of other tools and, in the service, the full power of the converged analytic technologies, things like graph analytics, spatial analytics, machine learning. All of this is built into this new platform. Now, a big, new capability around machine learning is something that we call AutoML. That lets any data scientists give us a data set, tell us what the key feature is that they want to analyze, and what the predictions are. And we will come up with a machine learning model for them out of the box. Really that easy. Plus, we have the low-code tool APEX that I mentioned earlier. 15:17 Kay: So this environment is really powerful for doing more than traditional data warehouses. We can build data lakes. We are integrated with the object stores on Oracle Cloud and also on other clouds. And we can do massively parallel querying of data in the core database itself and the data lake. 15:38 Nikita: Beyond the database tech, there’s the business side, right? How easy do we make a customer’s path to ADB from a business standpoint, a decision-making standpoint? Kay: So if you're an existing Oracle customer, you have an existing Oracle Database license you're using on prem, we have something called BYOL, Bring Your Own License, to OCI. We have the Cloud Lift Service. This huge cloud engineering team across all regions of the world will help you move your existing on-prem database to ADB for free. 16:16 Kay: And then, finally, we announced fairly recently something called the Support Rewards Program. This is something our customers are really excited about. It lets them translate their spending on OCI to a reduction in their support bill. So if you're a customer using OCI, you get a $0.25 to $0.33 reward for every dollar you spend on Oracle's Cloud. You can then take that money from your rewards and apply it to your bill for customer support, for your technology support even, like the database. And this is exactly what customers want as they move their investment to the cloud. They want to lower the costs of paying for their on-prem support. Now, we've talked about money. This lowers costs greatly. So ADB has lots of value. But the big thing I think to think about is really that it lowers costs. It lowers that cost via automation, higher productivity, less downtime, all sorts of areas.   17:22 Lois: You make a very convincing case for ADB, Kay. Kay: ADB is a great place to go. Take those existing Oracle Databases you have. Move and modernize them to a modern cloud infrastructure that's going to give you all the benefits of cloud, including agility and lower cost. So on our Cloud, we have something called the Always Free Autonomous Database Service. This service lets you get your hands on ADB. Try it out for yourself. You don't have to believe what we claim about how great this technology is. And we have other technologies like Live Labs that you can find on developer.oracle.com/livelabs that lets you do all kinds of exercises on this Always Free ADB infrastructure. Really get your hands dirty. And see for yourself how productive it can be.  18:16 Nikita: Thanks, Kay, for telling us about ADB and our database offerings. To learn more about this, head over mylearn.oracle.com, create a profile if you don’t already have one, and get started on our free Oracle Cloud Data Management Foundations Workshop. Lois: We hope you’ve enjoyed revisiting some of our most popular episodes these past few weeks. We’re kicking off the new year with a new season of the Oracle University Podcast. And this time around, it’ll be on Oracle Autonomous Database so make sure you don’t miss it. Until next week, this is Lois Houston… Nikita: And Nikita Abraham, signing off! 18:52 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
19m