A feature store client object is created for interacting with this feature store. but I can't seem to assign a derived value to a variable for reuse. Save my name, email, and website in this browser for the next time I comment. In addition, PySpark, helps you interface with Resilient Distributed Datasets (RDDs) in Apache Spark and Python programming language. Why does bunched up aluminum foil become so extremely hard to compress? Specifies the name of the database to be created. New survey of biopharma executives reveals real-world success with real-world evidence. // define a case class that represents the device data. Browse other questions tagged, Where developers & technologists share private knowledge with coworkers, Reach developers & technologists worldwide. Making statements based on opinion; back them up with references or personal experience. First, for primitive types in examples or demos, you can create Datasets within a Scala or Python notebook or in your sample Spark application. In the sectionProcess and visualize the Dataset, notice how usingDatasettyped objects makes the code easier to express and read. Spark SQL - How do i set a variable within the query, to re-use throughout? Why wouldn't a plane start its take-off run from the very beginning of the runway to keep the option to utilize the full runway if necessary? At this stage Spark, upon reading JSON, created a generic, // DataFrame = Dataset[Rows]. By default, Hive creates a table as an Internal table and owned the table structure and the files. Let's start off by outlining a couple of concepts. How to Create the database from the variable in the pyspark in pyspark? By explicitly converting DataFrame into Dataset, // results in a type-specific rows or collection of objects of type Person. Internal tables are also known as Managed tables that are owned and managed by Hive. To manage and run PySpark notebooks, you can employ one of the two popular modern data warehouse platforms. By clicking Post Your Answer, you agree to our terms of service and acknowledge that you have read and understand our privacy policy and code of conduct. Databricks Inc. Spark, however, throws Asking for help, clarification, or responding to other answers. ) ] By clicking Accept all cookies, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy. Can I trust my bikes frame after I was hit by a car if there's no visible cracking? These are the extracted features in this model that can then be saved and reused in the model building process. In this article, you have learned by using Apache Spark or PySpark we can create table in Hive, Databricks, and many external storage systems. The transformation is shown below and the data frame df_new is created, which will be fed to the topic modeling algorithm. If the location is not specified, the database will be created in the default warehouse directory, whose path is configured by the static configuration spark.sql.warehouse.dir. Did an AI-enabled drone attack the human operator in a simulation environment? May 15, 2023 This section provides a guide to developing notebooks and jobs in Databricks using the Python language. While usage of SCHEMA and DATABASE is interchangeable, SCHEMA is preferred. please help on this df = sqlContext.sql("SELECT * FROM $SourceTableName where 1=2") where $SourceTableName is Parameter, @user3843858 Assign value of your parameter to a python variable SourceTableName and then do: df = sqlContext.sql(f"SELECT * FROM {SourceTableName} where 1=2"). You'll find preview announcement of new Open, Save, and Share options when working with files in OneDrive and SharePoint document libraries, updates to the On-Object Interaction feature released to Preview in March, a new feature gives authors the ability to define query limits in Desktop, data model . Jul 20, 2018 It's fairly simple to work with Databases and Tables in Azure Databricks. Databricks SQL doesn't support DECLARE keyword, Table creation in Databricks with alias column name, Databricks SQL database creation with location Azure Data Lake. The above two examples create a DataFrame and create the ct.sampletable2 table. The IDE can communicate with Databricks to execute large computations on Databricks clusters. If you would like to change your settings or withdraw consent at any time, the link to do so is in our privacy policy accessible from our home page.. To learn more, see our tips on writing great answers. For detailed tips, see Best practices: Cluster configuration. Is it OK to pray any five decades of the Rosary or do they have to be in the specific set of mysteries? The more common way is to read a data file from an external data source, such HDFS, blob storage, NoSQL, RDBMS, or local filesystem. The vectorized data was then saved as features using the Databricks Feature Store so that it can enable reuse and experimentation by the data scientist. The alpha and beta hyperparameters can be set using the parameters setDocConcentration and setTopicConcentration, respectively. Related articles CREATE SCHEMA DESCRIBE SCHEMA DROP SCHEMA Databricks 2023. Shared External Hive Metastore with Azure Databricks and Synapse Spark If database with the same name already exists, an exception will be thrown. You are missing a semi-colon at the end of the variable assignment. With Databricks Runtime 12.1 and above, you can use variable explorer to track the current value of Python variables in the notebook UI. To learn more, see our tips on writing great answers. Open notebook in new tab Run your code on a cluster: Either create a cluster of your own, or ensure you have permissions to use a shared cluster. The SET command used is for spark.conf get/set, not a variable for SQL queries, https://docs.databricks.com/notebooks/widgets.html. Jobs can run notebooks, Python scripts, and Python wheels. As mentioned above, the number of topics is a hyperparameter that either requires domain-level expertise or hyperparameter tuning. Databricks Delta Lake Database on top of a Data Lake Create sample data. Why doesnt SpaceX sell Raptor engines commercially? databricks - create a database in pyspark using Python API's only Apache, Apache Spark, Spark and the Spark logo are trademarks of theApache Software Foundation. Why do some images depict the same constellations differently? Site design / logo 2023 Stack Exchange Inc; user contributions licensed under CC BY-SA. DataFrameis an alias for an untypedDataset[Row]. Databricks, on the other hand, is a platform-independent offering and can run on Azure, AWS, or Google Cloud Platform. Microsoft offers Azure Synapse Analytics, which is solely available in Azure. Databricks notebooks support Python. See Manage code with notebooks and Databricks Repos below for details. To create a new dashboard, click the picture icon in the menu, and click the last item . Create a table All tables created on Azure Databricks use Delta Lake by default. To learn to use Databricks Connect to create this connection, see Use IDEs with Databricks. While further processing is not done in this work, it is highly recommended to remove links and emoticons. In order to create a Hive table from Spark or PySpark SQL you need to create a SparkSession with enableHiveSupport (). works just fine, Thanks Alex. To subscribe to this RSS feed, copy and paste this URL into your RSS reader. Features that support interoperability between PySpark and pandas, Convert between PySpark and pandas DataFrames. Does Intelligent Design fulfill the necessary criteria to be recognized as a scientific theory? If database with the same name already exists, an exception will be thrown. Creating permanent views from dataframes? - community.databricks.com Copy link for import. Databricks 2023. To create an external table use the path of your choice using option(). Therefore, it might be wasteful to run the entire ETL pipeline when the intent is to model experimentation. The Pandas API on Spark is available on clusters that run Databricks Runtime 10.0 (Unsupported) and above. First, we create a SQL notebook in Databricks and add the below command into the cell. But it creates me a database called "myvar". There are two hyperparameters that determine the extent of the mixture of topics. pyspark.sql.SparkSession.builder.enableHiveSupport, pyspark.sql.SparkSession.builder.getOrCreate, pyspark.sql.SparkSession.getActiveSession, pyspark.sql.DataFrame.createGlobalTempView, pyspark.sql.DataFrame.createOrReplaceGlobalTempView, pyspark.sql.DataFrame.createOrReplaceTempView, pyspark.sql.DataFrame.sortWithinPartitions, pyspark.sql.DataFrameStatFunctions.approxQuantile, pyspark.sql.DataFrameStatFunctions.crosstab, pyspark.sql.DataFrameStatFunctions.freqItems, pyspark.sql.DataFrameStatFunctions.sampleBy, pyspark.sql.functions.monotonically_increasing_id, pyspark.sql.functions.approxCountDistinct, pyspark.sql.functions.approx_count_distinct, pyspark.sql.PandasCogroupedOps.applyInPandas, pyspark.pandas.Series.is_monotonic_increasing, pyspark.pandas.Series.is_monotonic_decreasing, pyspark.pandas.Series.dt.is_quarter_start, pyspark.pandas.Series.cat.rename_categories, pyspark.pandas.Series.cat.reorder_categories, pyspark.pandas.Series.cat.remove_categories, pyspark.pandas.Series.cat.remove_unused_categories, pyspark.pandas.Series.pandas_on_spark.transform_batch, pyspark.pandas.DataFrame.first_valid_index, pyspark.pandas.DataFrame.last_valid_index, pyspark.pandas.DataFrame.spark.to_spark_io, pyspark.pandas.DataFrame.spark.repartition, pyspark.pandas.DataFrame.pandas_on_spark.apply_batch, pyspark.pandas.DataFrame.pandas_on_spark.transform_batch, pyspark.pandas.Index.is_monotonic_increasing, pyspark.pandas.Index.is_monotonic_decreasing, pyspark.pandas.Index.symmetric_difference, pyspark.pandas.CategoricalIndex.categories, pyspark.pandas.CategoricalIndex.rename_categories, pyspark.pandas.CategoricalIndex.reorder_categories, pyspark.pandas.CategoricalIndex.add_categories, pyspark.pandas.CategoricalIndex.remove_categories, pyspark.pandas.CategoricalIndex.remove_unused_categories, pyspark.pandas.CategoricalIndex.set_categories, pyspark.pandas.CategoricalIndex.as_ordered, pyspark.pandas.CategoricalIndex.as_unordered, pyspark.pandas.MultiIndex.symmetric_difference, pyspark.pandas.MultiIndex.spark.data_type, pyspark.pandas.MultiIndex.spark.transform, pyspark.pandas.DatetimeIndex.is_month_start, pyspark.pandas.DatetimeIndex.is_month_end, pyspark.pandas.DatetimeIndex.is_quarter_start, pyspark.pandas.DatetimeIndex.is_quarter_end, pyspark.pandas.DatetimeIndex.is_year_start, pyspark.pandas.DatetimeIndex.is_leap_year, pyspark.pandas.DatetimeIndex.days_in_month, pyspark.pandas.DatetimeIndex.indexer_between_time, pyspark.pandas.DatetimeIndex.indexer_at_time, pyspark.pandas.TimedeltaIndex.microseconds, pyspark.pandas.window.ExponentialMoving.mean, pyspark.pandas.groupby.DataFrameGroupBy.agg, pyspark.pandas.groupby.DataFrameGroupBy.aggregate, pyspark.pandas.groupby.DataFrameGroupBy.describe, pyspark.pandas.groupby.SeriesGroupBy.nsmallest, pyspark.pandas.groupby.SeriesGroupBy.nlargest, pyspark.pandas.groupby.SeriesGroupBy.value_counts, pyspark.pandas.groupby.SeriesGroupBy.unique, pyspark.pandas.extensions.register_dataframe_accessor, pyspark.pandas.extensions.register_series_accessor, pyspark.pandas.extensions.register_index_accessor, pyspark.sql.streaming.StreamingQueryManager, pyspark.sql.streaming.StreamingQueryListener, pyspark.sql.streaming.DataStreamReader.csv, pyspark.sql.streaming.DataStreamReader.format, pyspark.sql.streaming.DataStreamReader.json, pyspark.sql.streaming.DataStreamReader.load, pyspark.sql.streaming.DataStreamReader.option, pyspark.sql.streaming.DataStreamReader.options, pyspark.sql.streaming.DataStreamReader.orc, pyspark.sql.streaming.DataStreamReader.parquet, pyspark.sql.streaming.DataStreamReader.schema, pyspark.sql.streaming.DataStreamReader.text, pyspark.sql.streaming.DataStreamWriter.foreach, pyspark.sql.streaming.DataStreamWriter.foreachBatch, pyspark.sql.streaming.DataStreamWriter.format, pyspark.sql.streaming.DataStreamWriter.option, pyspark.sql.streaming.DataStreamWriter.options, pyspark.sql.streaming.DataStreamWriter.outputMode, pyspark.sql.streaming.DataStreamWriter.partitionBy, pyspark.sql.streaming.DataStreamWriter.queryName, pyspark.sql.streaming.DataStreamWriter.start, pyspark.sql.streaming.DataStreamWriter.trigger, pyspark.sql.streaming.StreamingQuery.awaitTermination, pyspark.sql.streaming.StreamingQuery.exception, pyspark.sql.streaming.StreamingQuery.explain, pyspark.sql.streaming.StreamingQuery.isActive, pyspark.sql.streaming.StreamingQuery.lastProgress, pyspark.sql.streaming.StreamingQuery.name, pyspark.sql.streaming.StreamingQuery.processAllAvailable, pyspark.sql.streaming.StreamingQuery.recentProgress, pyspark.sql.streaming.StreamingQuery.runId, pyspark.sql.streaming.StreamingQuery.status, pyspark.sql.streaming.StreamingQuery.stop, pyspark.sql.streaming.StreamingQueryManager.active, pyspark.sql.streaming.StreamingQueryManager.addListener, pyspark.sql.streaming.StreamingQueryManager.awaitAnyTermination, pyspark.sql.streaming.StreamingQueryManager.get, pyspark.sql.streaming.StreamingQueryManager.removeListener, pyspark.sql.streaming.StreamingQueryManager.resetTerminated, RandomForestClassificationTrainingSummary, BinaryRandomForestClassificationTrainingSummary, MultilayerPerceptronClassificationSummary, MultilayerPerceptronClassificationTrainingSummary, GeneralizedLinearRegressionTrainingSummary, pyspark.streaming.StreamingContext.addStreamingListener, pyspark.streaming.StreamingContext.awaitTermination, pyspark.streaming.StreamingContext.awaitTerminationOrTimeout, pyspark.streaming.StreamingContext.checkpoint, pyspark.streaming.StreamingContext.getActive, pyspark.streaming.StreamingContext.getActiveOrCreate, pyspark.streaming.StreamingContext.getOrCreate, pyspark.streaming.StreamingContext.remember, pyspark.streaming.StreamingContext.sparkContext, pyspark.streaming.StreamingContext.transform, pyspark.streaming.StreamingContext.binaryRecordsStream, pyspark.streaming.StreamingContext.queueStream, pyspark.streaming.StreamingContext.socketTextStream, pyspark.streaming.StreamingContext.textFileStream, pyspark.streaming.DStream.saveAsTextFiles, pyspark.streaming.DStream.countByValueAndWindow, pyspark.streaming.DStream.groupByKeyAndWindow, pyspark.streaming.DStream.mapPartitionsWithIndex, pyspark.streaming.DStream.reduceByKeyAndWindow, pyspark.streaming.DStream.updateStateByKey, pyspark.streaming.kinesis.KinesisUtils.createStream, pyspark.streaming.kinesis.InitialPositionInStream.LATEST, pyspark.streaming.kinesis.InitialPositionInStream.TRIM_HORIZON, pyspark.SparkContext.defaultMinPartitions, pyspark.RDD.repartitionAndSortWithinPartitions, pyspark.RDDBarrier.mapPartitionsWithIndex, pyspark.BarrierTaskContext.getLocalProperty, pyspark.util.VersionUtils.majorMinorVersion, pyspark.resource.ExecutorResourceRequests. CREATE DATABASE - Azure Databricks - Databricks SQL | Microsoft Learn setDocConcentration([0.1, 0.2]), #TopicConcentration - set using setTopicConcentration. While usage of SCHEMA and DATABASE is interchangeable, SCHEMA is preferred. Tutorial: Declare a data pipeline with Python in Delta Live Tables. Get Started What is PySpark? Above we have created a temporary view sampleView. by just doing the total average? Following are the two scenario's . 576), AI/ML Tool examples part 3 - Title-Drafting Assistant, We are graduating the updated button styling for vote arrows. See why Gartner named Databricks a Leader for the second consecutive year. The workflow to extract topics from these tweets consists of the following steps. missed to add brackets. I would encourage you to try out the notebook and experiment with this pipeline by adjusting the hyperparameters, such as the number of topics, to see how it can work for you! Let's say I have two tables, tableSrc and tableBuilder, and I'm creating tableDest. To learn more, see our tips on writing great answers. This is entirely confusing to me - clearly the environment supports . You can use APIs to manage resources like clusters and libraries, code and other workspace objects, workloads and jobs, and more. We start by loading the data using Apache Pyspark and extracting the necessary fields required for extracting the topics. Tutorial: Delta Lake - Azure Databricks | Microsoft Learn Theoretical Approaches to crack large files encrypted with AES, Living room light switches do not work during warm/hot weather. Thanks for contributing an answer to Stack Overflow! What one-octave set of notes is most comfortable for an SATB choir to sing in unison/octaves? Connect and share knowledge within a single location that is structured and easy to search. Start with the default libraries in the Databricks Runtime. For general information about machine learning on Databricks, see the Introduction to Databricks Machine Learning. You can also create a Spark DataFrame from a list or a pandas DataFrame, such as in the following example: Python Copy import pandas as pd data = [ [1, "Elia"], [2, "Teo"], [3, "Fang"]] pdf = pd.DataFrame(data, columns=["id", "name"]) df1 = spark.createDataFrame(pdf) df2 = spark.createDataFrame(data, schema="id LONG, name STRING") All rights reserved. Connect with validated partner solutions in just a few clicks. Examples I write about BigData Architecture, tools and techniques that are used to build Bigdata pipelines and other generic blogs. How to read and write from Database in Spark using pyspark. Site design / logo 2023 Stack Exchange Inc; user contributions licensed under CC BY-SA. Get started Spark with Databricks and PySpark Note that by default this method creates an Internal or Managed table. Are all constructible from below sets parameter free definable? Import code: Either import your own code from files or Git repos or try a tutorial listed below. The spirit of map-reducing was brooding upon the surface of the big data. You can also install custom libraries. All I can find is SQL based approach. Discover how to build and manage all your data, analytics and AI use cases with the Databricks Lakehouse Platform. // Apply higher-level Dataset API methods such as groupBy() and avg(). But the file system in a single machine became limited and slow. rev2023.6.2.43474. How to create a database with a name from a variable (in SQL, not in Spark) ? -- `Comments`,`Specific Location` and `Database properties`. How can I correctly use LazySubsets from Wolfram's Lazy package? An additional benefit of using the Databricksdisplay()command is that you can quickly view this data with a number of embedded visualizations. Data scientists will generally begin work either by creating a cluster or using an existing shared cluster. I feel like I must be missing something obvious here, but I can't seem to dynamically set a variable value in Spark SQL. The actual data is still accessible outside of Hive. Not the answer you're looking for? We and our partners use data for Personalised ads and content, ad and content measurement, audience insights and product development. Hello, when I use this methode, it show me a blank textbox where I must fill in the variable "myVar". The Dataset API also offers high-level domain-specific language operations likesum(),avg(),join(),select(),groupBy(), making the code a lot easier to express, read, and write. First, after an explicit conversion, for all relational and query expressions using Dataset API, you get compile-type safety. If a database with the same name already exists, nothing will happen. Note Delta Lake is the default for all reads, writes, and table creation commands in Databricks Runtime 8.0 and above. Power BI May 2023 Feature Summary Remote machine execution: You can run code from your local IDE for interactive development and testing. Dropping an external table just drops the metadata but not the actual data. Second, the Dataset API provides high-order methods, which makes code much easier to read and develop. Can't get TagSetDelayed to match LHS when the latter has a Hold attribute set. Join Generation AI in San Francisco Is Spider-Man the only Marvel character that has been represented as multiple non-human characters? 1.How to create the database using varible in pyspark.Assume we have variable with database name .using that variable how to create the database in the pyspark. This post is part of a series of posts on topic modeling. In this article, we shall discuss how to create a table in Hive and Databricks. I hope this solution could be useful for someone. You can customize cluster hardware and libraries according to your needs. Asking for help, clarification, or responding to other answers. New survey of biopharma executives reveals real-world success with real-world evidence. The below subsections list key features and tips to help you begin developing in Databricks with Python. Databricks Repos helps with code versioning and collaboration, and it can simplify importing a full repository of code into Databricks, viewing past notebook versions, and integrating with IDE development. Can I trust my bikes frame after I was hit by a car if there's no visible cracking? MLflow Tracking lets you record model development and save models in reusable formats; the MLflow Model Registry lets you manage and automate the promotion of models towards production; and Jobs and Model Serving, allow hosting models as batch and streaming jobs and as REST endpoints. The plot below illustrates the topic distribution as sets of bar charts, where each row corresponds to a topic. How to create a PySpark DataFrame from a Python loop, pyspark save dataframe to hive table using variable in the name, Pass parameters to SQL in Databricks (Python), Create a Database with name from variable on Databricks (in SQL, not in Spark), Doubt in Arnold's "Mathematical Methods of Classical Mechanics", Chapter 2. Not the answer you're looking for? It's possible to create temp views in pyspark using a dataframe (df.createOrReplaceTempView ()), and it's possible to create a permanent view in Spark SQL. // range of 100 numbers to create a Dataset. Probably the code can be polished but right now it is the only working solution I've managed to implement. Spark supports multiple formats: JSON, CSV, Text, Parquet, ORC, and so on. Topic modeling is the process of extracting topics from a set pub_sentences_unique = pub_extracted.dropDuplicates([, yesterday = datetime.date.today() + datetime.timedelta(seconds=, "split(substr(stringFeatures,2,length(stringFeatures)-2), ',\\\\s*(?=\\\\[)')", /* type = 0 for SparseVector and type = 1 for DenseVector */, # learning_offset - large values downweight early iterations, # DocConcentration - optimized using setDocConcentration, e.g. A feature store client object is created for interacting with this feature store. All rights reserved. Check if the database with the specified name exists. Is there a place where adultery is a crime? Why does bunched up aluminum foil become so extremely hard to compress? A document can be a line of text, a paragraph or a chapter in a book. Cartoon series about a world-saving agent, who is an Indiana Jones and James Bond mixture, Solana SMS 500 Error: Unable to resolve module with Metaplex SDK and Project Serum Anchor. Download PDF Learn Azure Azure Databricks CREATE DATABASE Article 11/01/2022 2 minutes to read 5 contributors Feedback In this article Related articles Applies to: Databricks SQL Databricks Runtime An alias for CREATE SCHEMA. Path of the file system in which the specified database is to be created. To view the data in a tabular format instead of exporting it to a third-party tool, you can use the Databricksdisplay()command. 160 Spear Street, 13th Floor Assign the scalar result of a SELECT statement to a variable or widget in Spark SQL (Databricks), Declare a value in Sparksql in Databricks. The Jobs API allows you to create, edit, and delete jobs. The topics were then fed to the PySpark LDA algorithm and the extracted topics were then visualized using Plot.ly. Above we have created a managed Spark table (sparkExamples.sampleTable) and inserted a few records into it. San Francisco, CA 94105 As in thePersonexample, here create acaseclassthat encapsulates the Scala object. The words in the corpus are vectorized by word count and the Inverse Document Frequency is then computed (IDF). I hadn't heard about this. Connect and share knowledge within a single location that is structured and easy to search. CREATE DATABASE November 01, 2022 Applies to: Databricks SQL Databricks Runtime An alias for CREATE SCHEMA. How appropriate is it to post a tweet saying that I am looking for postdoc positions? Var a="databasename" create database a can you please it is possible to use the variable? Doubt in Arnold's "Mathematical Methods of Classical Mechanics", Chapter 2. Join Generation AI in San Francisco Is there any evidence suggesting or refuting that Russian officials knowingly lied that Russia was not going to attack Ukraine? This throws exception if database with name customer_db. By default, all the tables created in Databricks are delta tables with underlying data in parquet format. So, lets create a Spark Session with Hive support enabled while creating the Spark Sessions using its builder() method. You can use the delta keyword to specify the format if using Databricks Runtime 7.3 LTS. Extending IC sheaves across smooth normal crossing divisors, Diagonalizing selfadjoint operator on core domain. This detaches the notebook from your cluster and reattaches it, which restarts the Python process. It's an example, also, just to test if it's working (the real query is operating on a temp table that did all my filtering already). Apache Spark is written in Scala programming language. If you have existing code, just import it into Databricks to get started. Having saved theDatasetof DeviceIoTData as a temporary table, you can issue SQL queries to it. The right way to use the new pyspark.pandas? What happens if you've already found the item an old map leads to? Azure Synapse Analytics vs. Databricks. For single-machine computing, you can use Python APIs and libraries as usual; for example, pandas and scikit-learn will just work. For distributed Python workloads, Databricks offers two popular APIs out of the box: the Pandas API on Spark and PySpark. The general idea behind a feature store is that it acts as a central repository to store the features for different models. You can change this behavior, using thespark.sql.warehouse.dirconfiguration while creating aSparkSession. What are good reasons to create a city/nation in which a government wouldn't let you leave. -- Create database `customer_db` only if database with same name doesn't exist. But on local it creates in the current directory. We have lots of exciting new features for you this month. The topics themselves are represented as a combination of words, with the distribution over the words representing their relevance to the topic. Tutorial: End-to-end ML models on Databricks. To subscribe to this RSS feed, copy and paste this URL into your RSS reader. Spark SQL Tutorial 1 : How to Create Database in Spark SQL / Delta Lake #DeltaLake #SQL #SparkSQL 5,596 views May 23, 2021 71 Dislike Share TechLake 19.6K subscribers VS "I don't like it raining.". display(ds.select($"battery_level", $"c02_level", $"device_name"). Login to MySQL Server using your preferred tool and create a database for the metastore with your chosen name.