Friday, 4 August 2017

Umzugsdurchschnitt Gretl


Gretlmodeladdallocatedvarnames () Bietet einen zugeordneten Satz von Variablennamen an, die beim Drucken von Modellresultaten verwendet werden sollen, für den Einsatz in speziellen Fällen, in denen wir nur die Namen aus der Liste der an das Modell angehängten Regressoren angeben können. Die Anzahl der Strings muss mit der Anzahl der Koeffizienten übereinstimmen, die durch den ncoeff-Member von pmod gegeben ist. Beachten Sie, dass pmod die Array-Vnamen übernimmt. Dies wird befreit, wenn das Modell zerstört wird. Zeiger auf Zielmodell. Array von Namen von unabhängigen Variablen. Gretlmodeladdymedian () Berechnet den Median von y mit den gültigen Beobachtungen mit dem Modellmusterbereich und platziert den Median an das Modell als Daten unter dem Schlüssel ymedian. Zeiger auf Zielmodell. Array mit der abhängigen Variablen. 0 auf Erfolg oder Fehlercode bei Fehler. Gretlmodeladdnormalitytest () gretlmodelgetnormalitytest () gretlmodelgetfittedformula () Wenn pmod ein einfaches lineares, quadratisches oder logistisches Modell ist und wenn xvar tatsächlich die x-Variable aus dem Modell ist, gibt es einen String zurück, der die Formel zur Erzeugung der angepassten Werte als Funktion von x darstellt . Diese Formel kann im Kontext eines angepassten versus tatsächlichen plot. moving Durchschnitt verwendet werden, werde ich diese Fragen auf Wort hochladen werden dort gretl und xls Dateien, die mit ihnen kommen, auch ich werde Material hochladen, um mit diesem zu helfen. Auch für die Excel-Dateien speichern sie auch. 1. Ein Eiscreme-Ladenbesitzer möchte seine Lagerbestände mit seinen Verkäufen koordinieren. Er glaubt, dass er seine Verkäufe auf der Grundlage der Tage Temperatur prognostizieren könnte und ob es ein Wochenende oder nicht. Er sammelt Daten von 40 verschiedenen Tagen. Die Daten werden in Ice Cream. gdt gespeichert. A) Zeichnen Sie die Daten mit dem entsprechenden Graphen (kopieren und fügen Sie sie aus Gretl) ein. B) Verwenden Sie gretl, um ein lineares Regressionsmodell zu erklären, um Eiscreme-Verkäufe zu erklären. Zeigen Sie Ihre endgültigen Ergebnisse (Kopieren und fügen Sie es aus Gretl) c) Schreiben Sie Ihre endgültige geschätzte Regressionsmodell. D) Warum ist dies das beste Modell, das Sie mit e) Grafik die Residuen aus Ihrem endgültigen Modell, was sagen sie über Ihr Modell f) Verwenden Sie Ihr Modell, um Eis-Verkäufe prognostizieren, wenn die Temperatur ist 28 Grad an einem Wochenende. G) Geben Sie ein 95 Konfidenzintervall Ihrer Prognose an. 2. Das Wirtschaftsministerium interessiert sich für den Aufbau eines Modells zur Erläuterung des Konsumverhaltens unter den Mittelstandsstellen. Es holt eine Probe von 35 Haushalten und sammelt Daten über ihren Verbrauch, Einkommen, Reichtum, Größe und Alter. Die Daten sind in der Datei Consumption. gdt verfügbar. Die monetären Daten sind in Tausenden von Dollar. A) Zeichnen Sie die Daten mit dem entsprechenden Graphen (kopieren und fügen Sie sie aus Gretl ein). B) Verwenden Sie gretl, um ein lineares Regressionsmodell zu erklären, um Eiscreme-Verkäufe zu erklären. Zeigen Sie Ihre endgültigen Ergebnisse (Kopieren und fügen Sie es aus Gretl) c) Schreiben Sie Ihre endgültige geschätzte Regressionsmodell. D) Warum ist das das beste Modell, mit dem du gekommen bist (erklären) e) Diagnose des Modells für Multikollinearität. F) Zeichnen Sie die Residuen aus Ihrem Modell, was sagen sie über Ihr Modell g) Verwenden Sie Ihr Modell, um den Verbrauch für die folgenden zwei Familie zu prognostizieren: Familie 1: Einkommen (50), Reichtum (10), Größe (5), Alter (46 ) 3. Die Akte Marokko GDP. gdt enthält Daten über Moroccos Bruttoinlandsprodukt, landwirtschaftliche Wertschöpfung, Exporte von Waren und Dienstleistungen, touristische Einnahmen und Überweisungen von Marokkanern, die im Ausland leben. A) Verwenden Sie die entsprechende Grafik, um die Variablen darzustellen. (Kopie 038paste aus Gretl) b) Was sollte getan werden, um diese Variablen, bevor wir eine Regression auf sie c) Schätzen Sie ein lineares Regressionsmodell zu erklären, Moroccos BIP. D) Sind die Koeffizienten signifikant Alle unbedeutenden Koeffizienten fallen lassen und das Modell neu einschätzen. Zeigen Sie Ihre endgültigen Ergebnisse (Kopieren und fügen Sie es aus Gretl) e) Warum ist dies das beste Modell, das Sie kam mit (Explain) f) Grafik die Residuen aus Ihrer Regression, was sagen sie über Ihr Modell 4. Google ist die eine der Die meisten prominenten Unternehmen in der NASDAQ aufgeführt. Die Excel-Datei unten zeigt ihren Tagespreis von Januar bis April 2015. a) Verwenden Sie die entsprechende Grafik, um den Preis von Google-Aktie darzustellen (Kopie 038 fügen Sie sie aus Excel). B) Welche Art von Muster wird die Aktie angezeigt c) Verwenden Sie die verschiedenen Blätter in der Excel-Datei, um die folgenden univariaten Modelle auszuführen: Nave 5-Tage gleitender Durchschnitt d) Vorhersage des Wertes der Aktie am 1. Mai. (Speichern Sie Ihre Arbeit in der eingebetteten Excel-Datei) e) Welches der beiden Modelle ist besser bei der Vorhersage dieser Zeitreihe (verwenden Sie MSE) 5. Die Excel-Datei unten enthält Daten über die monatlichen monatlichen touristischen Ankünfte nach Singapur von Januar 2009 bis Dezember 2014. Wie Sie aus der Zeitreihen-Handlung sehen konnten, sind die Daten saisonal. A) Entwickeln Sie ein multiplikatives Zerlegungsmodell (Y TS e) der Variablen und verwenden Sie es, um touristische Ankünfte in jedem Monat von 2015 zu prognostizieren. (Zeigen Sie alle Ihre Arbeit und speichern Sie sie in der eingebetteten Excel-Datei) b) Bewerten Sie Ihr Modell mit R und Theil8217s U-Statistiken. CATEGORIESGretl Command Reference Die oben gezeigten Optionen und die nachfolgende Diskussion beziehen sich auf die Verwendung des adf-Befehls mit regelmäßigen Zeitreihendaten. Zur Verwendung dieses Befehls mit Bedienfelddaten siehe unten. Berechnet einen Satz von Dickey8211Fuller-Tests auf jeder der aufgeführten Variablen, wobei die Nullhypothese darin besteht, dass die betreffende Variable eine Einheitswurzel hat. (Wenn aber das --Differenzflag gegeben ist, wird die erste Differenz der Variablen vor dem Testen genommen, und die nachfolgende Diskussion muss als Bezug auf die transformierte Variable genommen werden.) Standardmäßig werden zwei Varianten des Tests angezeigt: Eins Basierend auf einer Regression, die eine Konstante und eine mit einem konstanten und linearen Trend enthält. Sie können die Varianten steuern, die durch die Angabe eines oder mehrerer der Optionsflags dargestellt werden. Die Option --gls kann in Verbindung mit dem einen oder anderen der Flags --c und --ct (das Modell mit Konstante oder Modell mit Konstante und Trend) verwendet werden. Der Effekt dieser Option ist, dass die De-Bedeutung oder De-Trending der zu testenden Variablen nach dem GLS-Verfahren durchgeführt wird, das von Elliott, Rothenberg und Stock (1996) vorgeschlagen wurde. Die einen Test von größerer Macht gibt als der Standard Dickey8211Fuller Ansatz. Diese Option ist nicht kompatibel mit --nc. --ctt oder --seasonals. In allen Fällen ist die abhängige Variable die erste Differenz der angegebenen Variablen y. Und die Schlüsselunabhängige Variable ist die erste Verzögerung von y. Das Modell ist so konstruiert, dass der Koeffizient auf verzögertem y gleich der Wurzel in Frage minus 1 ist. Beispielsweise kann das Modell mit einer Konstante geschrieben werden. Unter der Nullhypothese einer Einheitswurzel ist der Koeffizient auf verzögertem y gleich Null unter der Alternative Y ist stationär Dieser Koeffizient ist negativ. Ist das Auftragsargument (künftig k) größer als 0, so sind auf der rechten Seite der Testregressionen k Verzögerungen der abhängigen Variablen enthalten. Wenn die Bestellung als 82111 angegeben ist, wird k nach der Empfehlung von Schwert (1989) gesetzt. Nämlich der ganzzahlige Teil von 12 (T 100) 0,25. Wobei T die Stichprobengröße ist. In beiden Fällen wird jedoch, wenn die Option --Test-down gegeben ist, k als die maximale Verzögerung genommen und die tatsächliche Verzögerungsreihenfolge wird durch Testen nach unten erhalten. Das Kriterium zum Testen kann mit dem Optionsparameter ausgewählt werden, der einer von AIC sein sollte. BIC oder tstat AIC ist die Voreinstellung. Beim Testen nach AIC oder BIC ist die endgültige Verzögerungsreihenfolge für die ADF-Gleichung diejenige, die das gewählte Informationskriterium (Akaike oder Schwarz Bayesian) optimiert. Die genaue Vorgehensweise hängt davon ab, ob die Option --gls gegeben ist oder nicht: Wenn GLS-Detrending angegeben ist, sind AIC und BIC die modifizierten Versionen, die in Ng und Perron (2001) beschrieben sind. Ansonsten sind sie die Standardversionen. Im GLS-Fall steht eine Verfeinerung zur Verfügung: Wird die Zusatzoption --perron-qu gegeben, werden die geänderten Informationskriterien nach der von Perron und Qu (2007) empfohlenen überarbeiteten Methode berechnet. Beim Testen nach der t-statistischen Methode ist die Prozedur wie folgt: Schätzen Sie die Dickey8211Fuller-Regression mit k-Verzögerungen der abhängigen Variablen ab. Ist die letzte Verzögerung signifikant Wenn ja, führen Sie den Test mit der Lag-Reihenfolge k aus. Andernfalls lassen Sie kk 8211 1, wenn k gleich 0 ist, führen Sie den Test mit der Lag-Reihenfolge 0 aus, sonst gehen Sie zu Schritt 1. Im Kontext von Schritt 2 oben bedeutet signifikant, dass die t-Statistik für die letzte Verzögerung eine asymptotische zweiseitige hat P-Wert, gegen die Normalverteilung von 0,10 oder weniger. P-Werte für die Dickey8211Fuller-Tests basieren auf MacKinnon (1996). Der entsprechende Code ist mit freundlicher Genehmigung des Autors enthalten. Im Falle des Tests mit linearem Trend mit GLS werden diese P-Werte nicht anwendbar kritische Werte aus Tabelle 1 in Elliott, Rothenberg und Stock (1996). Panel-Daten Wenn der adf-Befehl mit Panel-Daten verwendet wird, um einen Panel-Unit-Root-Test zu produzieren, sind die zutreffenden Optionen und die dargestellten Ergebnisse etwas anders. Erstens, während Sie eine Liste von Variablen für die Prüfung in der regulären Zeitreihe Fall geben können, mit Panel-Daten nur eine Variable kann pro Befehl getestet werden. Zweitens werden die Optionen, die die Einbeziehung deterministischer Begriffe regeln, sich gegenseitig ausschließen: Sie müssen zwischen nicht konstant, konstant nur und konstant plus Trend wählen, ist der Standard nur konstant. Darüber hinaus ist die Option --seasonals nicht verfügbar. Drittens hat die Option --verbose eine andere Bedeutung: Sie erzeugt einen kurzen Bericht über den Test für jede einzelne Zeitreihe (die Vorgabe ist nur das Gesamtergebnis zu zeigen). Der Gesamttest (Nullhypothese: die fragliche Serie hat eine Einheitswurzel für alle Paneleinheiten) wird auf eine oder beide von zwei Arten berechnet: mit der Methode von Im, Pesaran und Shin (Journal of Econometrics, 2003) oder der von Choi (Zeitschrift für Internationales Geld und Finanzen, 2001). Menüpfad: VariableUnit-Root-TestsGeschützter Dickey-Fuller-Test Siehe unten für zusätzliche spezielle Optionen Öffnet eine Datendatei und hängt den Inhalt an den aktuellen Dataset an, wenn die neuen Daten kompatibel sind. Das Programm wird versuchen, das Format der Datendatei (native, Klartext, CSV, Gnumeric, Excel, etc.) zu erkennen. Die angehängten Daten können entweder zusätzliche Beobachtungen auf bereits im Datensatz vorhandene Serien und neue Serien annehmen. Im Falle des Hinzufügens von Serien erfordert die Kompatibilität entweder (a), dass die Anzahl der Beobachtungen für die neuen Daten gleich der für die aktuellen Daten ist, oder (b) dass die neuen Daten klare Beobachtungsinformationen tragen, damit Gretl herausfinden kann, wie sie platzieren können die Werte. Ein Fall, der nicht unterstützt wird, ist, wo die neuen Daten früher beginnen und auch später als die ursprünglichen Daten enden. Um neue Serien in einem solchen Fall hinzuzufügen, können Sie die Option - Fix-Probe verwenden, die die Wirkung hat, das Hinzufügen von Beobachtungen zu unterdrücken und so den Vorgang auf die Hinzufügung neuer Serien zu beschränken. Eine spezielle Funktion wird für die Anhänge an ein Panel-Dataset unterstützt. Es sei n die Anzahl der Querschnittseinheiten in der Tafel, T die Anzahl der Zeitperioden und m die Anzahl der Beobachtungen für die neuen Daten. Wenn m n die neuen Daten zeitlich-invariant sind und für jeden Zeitraum kopiert werden. Auf der anderen Seite, wenn m T die Daten als nicht variierend über die Panel-Einheiten behandelt werden und werden für jede Einheit kopiert. Wenn das Panel quadratisch ist und m gleich n und T ist. Eine zweideutigkeit entsteht Die Vorgabe in diesem Fall ist, die neuen Daten als zeitinvariante zu behandeln, aber Sie können gretl dazu zwingen, die neuen Daten als Zeitreihen über die Option - time-Serie zu behandeln. (Diese Option wird in allen anderen Fällen ignoriert.) Wenn eine Datendatei zum Anhängen ausgewählt ist, kann es zu einem Überlappungsbereich mit dem vorhandenen Datensatz kommen, bei dem eine oder mehrere Serien eine oder mehrere Beobachtungen gemeinsam über die beiden Quellen haben können . Wenn die Option --update-overlap gegeben ist, ersetzt die Append-Operation alle überlappenden Beobachtungen mit den Werten aus der ausgewählten Datendatei, ansonsten werden die aktuell vorhandenen Werte nicht beeinflusst. Die zusätzlichen spezialisierten Optionen --sheet. --coloffset --offensatz und --fix-cols arbeiten in der gleichen Weise wie bei offenen sehen, dass Befehl für Erklärungen. Siehe auch beitreten für mehr anspruchsvolle Handhabung von mehreren Datenquellen. Menüpfad: FileAppend data --vcv (print covarianance matrix) ar 1 3 4 y 0 x1 x2 x3 Berechnet die Parameterschätzungen mit dem verallgemeinerten Cochrane8211Orcutt iterative Verfahren siehe Abschnitt 9.5 von Ramanathan (2002). Iteration wird beendet, wenn sukzessive Fehlersummen von Quadraten sich nicht um mehr als 0,005 Prozent oder nach 20 Iterationen unterscheiden. Lags ist eine Liste von Verzögerungen in den Resten, die durch ein Semikolon beendet werden. Im obigen Beispiel wird der Fehlerbegriff als Menüpfad angegeben: ModelTime-SerieAutoregressive Schätzung --hilu (Verwendung Hildreth8211Lu-Prozedur) --pwe (use Prais8211Winsten Schätzer) --vcv (Druckkovarianzmatrix) --no-corc (nicht gut) - Tune-Ergebnisse mit Cochrane-Orcutt) --Lose (verwenden Sie looseres Konvergenzkriterium) ar1 1 0 2 4 6 7 ar1 y 0 xlist --pul ar1 y 0 xlist --hilu --no-corc Berechnet mögliche GLS-Schätzungen für ein Modell In dem angenommen wird, dass der Fehlertermin einem autoregressiven Prozess erster Ordnung folgt. Die Standardmethode ist die Cochrane8211Orcutt iterative Prozedur siehe zB Abschnitt 9.4 von Ramanathan (2002). Das Kriterium für die Konvergenz besteht darin, dass sukzessive Schätzungen des Autokorrelationskoeffizienten sich nicht um mehr als 1e-6 unterscheiden oder wenn die Option - loose gegeben ist, um mehr als 0,001. Wenn dies nicht innerhalb von 100 Iterationen erreicht wird, wird ein Fehler markiert. Wenn die Option --pwe angegeben ist, wird der Prais8211Winsten Schätzer verwendet. Dies beinhaltet eine Iteration ähnlich wie bei Cochrane8211Orcutt der Unterschied ist, dass, während Cochrane8211Orcutt die erste Beobachtung verwirft, Prais8211Winsten nutzt es. Siehe z. B. Kapitel 13 von Greene (2000) für Details. Wenn die Option --hilu gegeben ist, wird das Hildreth8211Lu Suchverfahren verwendet. Die Ergebnisse werden dann mit der Cochrane8211Orcutt-Methode fein abgestimmt, es sei denn, das Flag --no-corc ist angegeben. Die Option --no-corc wird für andere Schätzer als Hildreth8211Lu ignoriert. Menüpfad: ModelTime seriesAR (1) --quiet (siehe geschätztes Modell) --vcv (Druckkovarianzmatrix) - Zwei-Schritt (2-stufige GMM-Schätzung durchführen) - Zeit-Dummies (Zeitdummy-Variablen hinzufügen) - - asymptotische (unkorrigierte asymptotische Standardfehler) arbond 2 y Dx1 Dx2 arbond 2 5 y Dx1 Dx2 Dx1 arbond 1 y Dx1 Dx2 Dx1 GMM (x2,2,3) Führt die Schätzung von dynamischen Panel-Datenmodellen (dh Panel-Modelle einschließlich eines Oder mehr Verzögerungen der abhängigen Variablen) unter Verwendung der GMM-DIF-Methode, die von Arellano und Bond (1991) festgelegt wurde. Bitte sehen Sie dpanel für eine aktualisierte und flexiblere Version dieses Befehls, der GMM-SYS sowie GMM-DIF verarbeitet. Der Parameter p steht für die Reihenfolge der Autoregression für die abhängige Variable. Der optionale Parameter q gibt die maximale Verzögerung des Pegels der abhängigen Variablen an, die als Instrument verwendet werden soll. Wenn dieses Argument weggelassen oder als 0 angegeben wird, werden alle verfügbaren Verzögerungen verwendet. Die abhängige Variable sollte in Level-Form gegeben werden, wird sie automatisch differenziert (da dieser Schätzer differenziert, um die einzelnen Effekte auszubrechen). Die unabhängigen Variablen werden nicht automatisch differenziert, wenn man Unterschiede verwenden möchte (was im Allgemeinen für gewöhnliche quantitative Variablen der Fall ist, wenn auch nicht für, sagen wir, Zeit-Dummy-Variablen), dann solltest du die Differenzen zuerst erstellen und diese als Regressoren angeben. Das letzte (optionale) Feld im Befehl ist für die Angabe von Instrumenten. Wenn keine Instrumente gegeben werden, wird davon ausgegangen, dass alle unabhängigen Variablen streng exogen sind. Wenn Sie irgendwelche Instrumente angeben, sollten Sie in die Liste alle streng exogenen unabhängigen Variablen aufnehmen. Für vorgegebene Regressoren können Sie die GMM-Funktion verwenden, um einen bestimmten Bereich von Verzögerungen in blockdiagonaler Weise einzuschließen. Dies ist im dritten obigen Beispiel dargestellt. Das erste Argument für GMM ist der Name der fraglichen Variablen, die zweite ist die minimale Verzögerung, die als Instrument verwendet wird, und die dritte ist die maximale Verzögerung. Wenn das dritte Argument als 0 angegeben ist, werden alle verfügbaren Lags verwendet. Standardmäßig werden die Ergebnisse der 1-stufigen Schätzung gemeldet (mit robusten Standardfehlern). Sie können die Option 2-stufig als Option auswählen. In beiden Fällen werden Tests für die Autokorrelation der Ordnungen 1 und 2 sowie der Sargan-Überidentifizierungstest und ein Wald-Test für die gemeinsame Bedeutung der Regressoren zur Verfügung gestellt. Beachten Sie, dass in diesem differenzierten Modell die Autokorrelation erster Ordnung keine Bedrohung für die Gültigkeit des Modells darstellt, aber die Autokorrelation der zweiten Ordnung gegen die beibehaltenen statistischen Annahmen verstößt. Im Falle einer 2-stufigen Schätzung werden Standardfehler standardmäßig mit der von Windmeijer (2005) vorgeschlagenen Finite-Sample-Korrektur berechnet. Die standardmäßigen asymptotischen Standardfehler, die mit dem 2-stufigen Schätzer verbunden sind, werden in der Regel als unzuverlässiger Leitfaden zur Schlußfolgerung berechnet, aber wenn Sie aus irgendeinem Grund sie sehen wollen, können Sie die --asymptotische Option verwenden, um die Windmeijer-Korrektur auszuschalten. Wenn die Option - time-dummies gegeben ist, wird den angegebenen Regressoren ein Satz von Zeit-Dummy-Variablen hinzugefügt. Die Anzahl der Dummies ist eine weniger als die maximale Anzahl von Perioden, die bei der Schätzung verwendet werden, um eine perfekte Kollinearität mit der Konstanten zu vermeiden. Die Dummies werden in Ebenen eingegeben, wenn du Zeitdummies in der ersten differenzierten Form verwenden willst, musst du diese Variablen manuell definieren und hinzufügen. Order depvar indepvars arima 0 1 1 0 1 1 y - nc Wenn keine Unabhängigkeitsliste gegeben ist, schätzt ein univariates ARIMA (Autoregressives, Integriertes, Moving Average) Modell. Die Werte p. D und q repräsentieren die autoregressive (AR) Ordnung, die differenzierende Ordnung und die gleitende durchschnittliche (MA) Reihenfolge. Diese Werte können in numerischer Form oder als Namen von bereits vorhandenen skalaren Variablen angegeben werden. Ein d-Wert von 1 bedeutet beispielsweise, dass die erste Differenz der abhängigen Variablen vor der Schätzung der ARMA-Parameter berücksichtigt werden sollte. Wenn du nur bestimmte AR - oder MA-Verzögerungen im Modell aufnehmen möchtest (im Gegensatz zu allen Verzögerungen bis zu einer gegebenen Reihenfolge) kannst du für p und q q entweder a) den Namen einer vordefinierten Matrix, die einen Satz von Integer enthält, ersetzen Werte oder (b) ein Ausdruck wie derjenige ist ein Satz von Verzögerungen, die durch Kommas getrennt und in Klammern eingeschlossen sind. Die optionalen Integer-Werte P. D und Q repräsentieren die saisonale AR-Ordnung, die Reihenfolge für die saisonale Differenzierung und die saisonale MA-Order. Diese gelten nur, wenn die Daten eine Frequenz größer als 1 haben (z. B. vierteljährliche oder monatliche Daten). Diese Befehle können in numerischer Form oder als skalare Variablen angegeben werden. Im Univariate Fall ist die Voreinstellung ein Intercept im Modell, aber das kann mit dem Flag --nc unterdrückt werden. Wenn Unabhängige hinzugefügt werden, wird das Modell zu ARMAX in diesem Fall die Konstante sollte explizit eingeschlossen werden, wenn Sie einen Intercept wollen (wie im zweiten Beispiel oben). Eine alternative Form der Syntax steht für diesen Befehl zur Verfügung: Wenn Sie keine Differenzierung (entweder saisonal oder nicht saisonal) anwenden möchten, können Sie die d - und D-Felder insgesamt auslassen, anstatt explizit 0 einzugeben. Darüber hinaus ist arma ein Synonym oder alias für arima So ist beispielsweise der folgende Befehl ein gültiger Weg, um ein ARMA (2, 1) Modell anzugeben: Die Voreinstellung ist die Verwendung der nativen gretl ARMA Funktionalität, mit Schätzung durch exaktes ML mit der Kalman Filterschätzung über bedingtes ML ist optional erhältlich . (Wenn X-12-ARIMA installiert ist, haben Sie die Möglichkeit, es statt nativen Code zu verwenden.) Einzelheiten zu diesen Optionen finden Sie in Kapitel 25 des Gretl Benutzerhandbuchs. Wenn der native exakte ML-Code verwendet wird, werden standardmäßig geschätzte Standardfehler auf der Grundlage einer numerischen Annäherung an die (negative inverse) des Hessischen, mit einem Fallback auf das äußere Produkt des Gradienten (OPG), wenn die Berechnung des numerischen Hessian sollte fehlschlagen. Zwei (gegenseitig ausschließende) Optionsflags können verwendet werden, um das Problem zu erzwingen: Die Option --opg erzwingt die Verwendung der OPG-Methode, ohne den Versuch, den Hessian zu berechnen, während die --hessian flag den Fallback auf OPG deaktiviert. Beachten Sie, dass der Ausfall der numerischen Hessischen Berechnungen in der Regel ein Indikator für ein fehlerhaftes Modell ist. Die Option --lbfgs ist spezifisch für die Schätzung mit nativem ARMA-Code und exaktem ML: Es fordert die Verwendung des begrenzten Speicher-L-BFGS-B-Algorithmus anstelle des regulären BFGS-Maximierer. Dies kann in einigen Fällen helfen, wo die Konvergenz schwer zu erreichen ist. Die Option --y-diff-only ist spezifisch für die Schätzung von ARIMAX-Modellen (Modelle mit einer Nicht-Null-Reihenfolge der Integration und einschließlich exogener Regressoren) und gilt nur, wenn gretls native exakte ML verwendet wird. Bei solchen Modellen ist das Default-Verhalten sowohl die abhängige Variable als auch die Regressoren zu unterscheiden, aber wenn diese Option angegeben ist, wird nur die abhängige Variable differenziert, wobei die Regressoren in Levelform bleiben. Die Option --save-ehat gilt nur bei Verwendung der nativen exakten ML-Schätzung. Der Effekt besteht darin, einen Vektor zur Verfügung zu stellen, der die optimale Schätzung ab Periode t der t-dated Störung oder Innovation aufweist: Diese kann über den Accessor ehat abgerufen werden. Diese Werte unterscheiden sich von der Restreihe (uhat), die die einstufigen Vorhersagefehler enthält. Der in Verbindung mit ARIMA-Modellen angegebene AIC-Wert wird nach der in X-12-ARIMA verwendeten Definition berechnet, nämlich wo ist die Log-Likelihood und k die Gesamtzahl der geschätzten Parameter. Beachten Sie, dass X-12-ARIMA keine Informationskriterien wie AIC erzeugt, wenn die Schätzung durch bedingtes ML erfolgt. Die AR - und MA-Wurzeln, die im Zusammenhang mit der ARMA-Schätzung gezeigt werden, basieren auf der folgenden Darstellung eines ARMA-Prozesses (p, q): Die AR-Wurzeln sind daher die Lösungen und Stabilität, dass diese Wurzeln außerhalb des Einheitskreises liegen. Die im Zusammenhang mit AR - und MA-Wurzeln gedruckte Frequenzzahl ist der Lambda-Wert, der z r exp (i2pilambda) löst, wobei z die betreffende Wurzel ist und r sein Modul ist. Menüpfad: ModelTime seriesARIMA Anderer Zugriff: Hauptfenster Popup-Menü (Einzelauswahl) Schätzt ein bivariate Probit-Modell, mit der Newton8211Raphson Methode, um die Wahrscheinlichkeit zu maximieren. Die Argumentliste beginnt mit den beiden (binären) abhängigen Variablen, gefolgt von einer Liste von Regressoren. Wenn eine zweite Liste gegeben wird, die durch ein Semikolon getrennt ist, wird dies als ein Satz von Regressoren interpretiert, die für die zweite Gleichung spezifisch sind, wobei die Unabhängigkeit1 spezifisch für die erste Gleichung ist, ansonsten wird Unentschieden1 genommen, um einen gemeinsamen Satz von Regressoren darzustellen. Standardmäßig werden Standardfehler mit einer numerischen Annäherung an die Hessian bei Konvergenz berechnet. Wenn aber die Option --opg angegeben ist, basiert die Kovarianzmatrix auf dem Äußeren Produkt des Gradienten (OPG) oder wenn die Option --robust gegeben ist, werden QML-Standardfehler mit einem Sandwich des Inversen des Hessischen und Das OPG. Nach erfolgreicher Schätzung ruft der Accessor eine Matrix mit zwei Spalten ab, die die verallgemeinerten Residuen für die beiden Gleichungen enthält, dh die erwarteten Werte der Störungen bedingen sich von den beobachteten Ergebnissen und Kovariaten. In der Regel (1, 1), (1,0), (0,1), ist die Matrix mit vier Spalten, die die geschätzten Wahrscheinlichkeiten der vier möglichen gemeinsamen Ergebnisse für (y 1 y 2) (0,0). Alternativ, wenn die Option --save-xbeta gegeben ist, hat y zwei Spalten und hält die Werte der Indexfunktionen für die jeweiligen Gleichungen. Die Ausgabe beinhaltet einen Wahrscheinlichkeitsverhältnisstest der Nullhypothese, dass die Störungen in den beiden Gleichungen unkorreliert sind. - Geben Sie den Dateinamen ein (senden Sie die Ausgabe an die angegebene Datei) Diese Plots zeigen die Verteilung einer Variablen an. Die zentrale Schachtel umschließt die Mitte 50 Prozent der Daten, d. h. sie wird durch die ersten und dritten Quartile begrenzt. Die Whisker erstrecken sich von jedem Ende der Box für einen Bereich, der dem 1,5-fachen des interquartilen Bereichs entspricht. Beobachtungen außerhalb dieses Bereichs gelten als Ausreißer und werden über Punkte dargestellt. Eine Linie wird über den Kasten am Median gezogen. Ein Zeichen wird verwendet, um den Mittelwert anzuzeigen. Wenn die Option, ein Konfidenzintervall für den Median anzuzeigen, ausgewählt wird, wird dies über die Bootstrap-Methode berechnet und in Form von gestrichelten horizontalen Linien oberhalb und unterhalb des Medians dargestellt. Mit der Option - factorized können Sie die Verteilung einer gewählten Variablen auf den Wert eines diskreten Faktors untersuchen. Wenn zum Beispiel ein Datensatz Löhne und eine Geschlechtsdummy-Variable enthält, können Sie die Lohnvariable als Ziel und Geschlecht als Faktor auswählen, um nebeneinanderliegende Boxen von männlichen und weiblichen Löhnen zu sehen, wie in Anmerkung, dass in diesem Fall Sie müssen genau zwei Variablen angeben, mit dem Faktor, der zweite gegeben wird. Wenn der aktuelle Datensatz ein Panel ist und nur eine Variable angegeben ist, erzeugt die Option --panel eine Reihe von Side-by-Side-Boxplots, eine für jede Paneleinheit oder Gruppe. Im Allgemeinen ist die Argument-Varlist erforderlich und bezieht sich auf eine oder mehrere Serien im aktuellen Datensatz (entweder durch Name oder ID-Nummer). Wenn aber eine benannte Matrix über die Option --matrix geliefert wird, wird dieses Argument optional: standardmäßig wird für jede Spalte der angegebenen Matrix ein Plot gezeichnet. Gretls Boxplots werden mit gnuplot generiert, und es ist möglich, die Handlung genauer zu spezifizieren, indem zusätzliche Gnuplot-Befehle angefügt werden, die in geschweiften Klammern eingeschlossen sind. Details finden Sie in der Hilfe für den Befehl gnuplot. Im interaktiven Modus wird das Ergebnis sofort angezeigt. Im Batch-Modus ist das Standardverhalten, dass eine gnuplot-Befehlsdatei in das Benutzer-Arbeitsverzeichnis geschrieben wird, mit einem Namen auf dem Muster gpttmpN. plt. Beginnend mit N 01. Die tatsächlichen Plots können später mit gnuplot (unter MS Windows, wgnuplot) generiert werden. Dieses Verhalten kann durch die Verwendung der Option --output Dateiname geändert werden. Weitere Informationen finden Sie im Befehl gnuplot. Menüpfad: ViewGraph angegeben varsBoxplots Ausbruch einer Schleife Dieser Befehl kann nur innerhalb einer Schleife verwendet werden, damit die Befehlsausführung aus der aktuellen (innersten) Schleife ausbricht. Siehe auch Schleife Dies ist kein Befehl in seinem eigenen Recht, sondern kann als Präfix für die meisten regulären Befehle verwendet werden: Der Effekt ist, die Beendigung eines Skripts zu verhindern, wenn ein Fehler beim Ausführen des Befehls auftritt. Wenn ein Fehler auftritt, wird dieser in einem internen Fehlercode registriert, der als Fehler aufgerufen werden kann (ein Nullwert gibt Erfolg an). Der Fehlerwert sollte immer sofort nach dem Fang überprüft werden. Und entsprechende Maßnahmen ergriffen, wenn der Befehl fehlgeschlagen ist. Das Fangschlüsselwort kann nicht vorher verwendet werden, wenn. Elif oder endif Darüber hinaus sollte es nicht bei Anrufen von benutzerdefinierten Funktionen verwendet werden, die nur für die Verwendung von Gretl-Befehlen und Anrufen von integrierten Funktionen oder Operatoren vorgesehen sind. Chow-Frau - Dummy Muss einer OLS-Regression folgen. Wenn eine Beobachtungsnummer oder ein Datum angegeben ist, gibt es einen Test für die Nullhypothese eines strukturellen Bruchs an dem gegebenen Splitpunkt. Die Prozedur besteht darin, eine Dummy-Variable zu erzeugen, die gleich 1 ist von dem Splitpunkt, der von obs bis zum Ende des Samples angegeben ist, andernfalls und auch Interaktionsterme zwischen diesem Dummy und den ursprünglichen Regressoren. Wenn eine Dummy-Variable gegeben ist, prüft die Nullhypothese der strukturellen Homogenität in Bezug auf diesen Dummy. Wiederum werden Interaktionsbegriffe hinzugefügt. In beiden Fällen wird eine erweiterte Regression einschließlich der zusätzlichen Begriffe durchgeführt. Standardmäßig wird eine F-Statistik berechnet, wobei die erweiterte Regression als das uneingeschränkte Modell und das Original als eingeschränkt gilt. Aber wenn das ursprüngliche Modell einen robusten Schätzer für die Kovarianzmatrix verwendet hat, ist die Teststatistik ein Wald-Chi-Quadrat-Wert, der auf einem robusten Schätzer der Kovarianzmatrix für die erweiterte Regression basiert. Die Option --limit-to kann verwendet werden, um den Satz von Interaktionen mit der Split-Dummy-Variable auf eine Teilmenge der ursprünglichen Regressoren zu begrenzen. Der Parameter für diese Option muss eine benannte Liste sein, deren Mitglieder zu den ursprünglichen Regressoren gehören. Die Liste sollte nicht die Konstante enthalten. Menüpfad: Modellfenster, TestsChow-Test --Dataset (nur Datensatz löschen) Ohne Optionen werden alle gespeicherten Objekte, einschließlich des aktuellen Datasets, aus dem Speicher gelöscht. Beachten Sie, dass das Öffnen eines neuen Datensatzes oder die Verwendung des Befehls nulldata, um einen leeren Dataset zu erstellen, auch diesen Effekt hat. Daher ist die Verwendung von clear normalerweise nicht erforderlich. Wenn die Option --dataset angegeben ist, wird nur der Dataset gelöscht (plus beliebige Namenslisten), andere gespeicherte Objekte wie Matrizen und Skalare werden beibehalten. Coint2 3 y x1 x2 --rc Führt den Johansen-Test für die Kointegration unter den Variablen in ylist für die gegebene Lag-Reihenfolge durch. Einzelheiten zu diesem Test finden Sie in Kapitel 27 des Gretl Benutzerhandbuchs oder Hamilton (1994). Kapitel 20. P-Werte werden über Doorniks Gamma-Approximation (Doornik, 1998) berechnet. Für den Trace-Test werden zwei Sätze p-Werte angezeigt, gerade asymptotische Werte und Werte, die für die Stichprobengröße angepasst sind. Standardmäßig erhält der pvalue accessor die eingestellte Variante, aber das --asy flag kann verwendet werden, um die asymptotischen Werte stattdessen aufzuzeichnen. Die Einbeziehung von deterministischen Begriffen im Modell wird durch die Optionsflaggen gesteuert. Der Standardwert, wenn keine Option angegeben ist, besteht darin, eine uneingeschränkte Konstante einzuschließen, die das Vorhandensein eines Nicht-Null-Intercept in den Kointegrationsbeziehungen sowie einen Trend in den Ebenen der endogenen Variablen ermöglicht. In der Literatur, die aus der Arbeit von Johansen stammt (siehe zum Beispiel sein Buch 1995), wird dies oft als Fall 3 bezeichnet. Die ersten vier oben genannten Optionen, die sich gegenseitig ausschließen, produzieren die Fälle 1, 2, 4 und 5. Die Bedeutung dieser Fälle und die Kriterien für die Auswahl eines Falles sind in Kapitel 27 des Gretl Benutzerhandbuchs erläutert. Die optionalen Listen xlist und rxlist erlauben es Ihnen, für bestimmte exogene Variablen zu steuern: diese betreten das System entweder uneingeschränkt (xlist) oder beschränken auf den Kointegrationsraum (rxlist). Diese Listen sind von ylist und voneinander durch Semikolons getrennt. Die Option --seasonals, die mit einer der anderen Optionen kombiniert werden kann, gibt die Einbeziehung eines Satzes von zentrierten saisonalen Dummy-Variablen an. Diese Option ist nur für vierteljährliche oder monatliche Daten verfügbar. Die folgende Tabelle wird als Leitfaden für die Interpretation der für den Test gezeigten Ergebnisse für den 3-variablen Fall angeboten. H0 bezeichnet die Nullhypothese, H1 die alternative Hypothese und c die Anzahl der kointegrierenden Beziehungen. Siehe auch den Befehl vecm. Menüpfad: ModelTime-SerieCointegration testJohansen Führt verschiedene Operationen auf dem Datensatz als Ganzes aus, abhängig vom gegebenen Schlüsselwort. Das muss addobs sein Insobs klar. kompakt. erweitern. Transponieren sortiere nach. Dsortby Resample Umbenennen oder pad-täglich. Hinweis: mit Ausnahme von clear. Diese Aktionen sind nicht verfügbar, wenn der Dataset derzeit durch Auswahl von Fällen bei einem booleschen Kriterium unterteilt wird. Addobs Es muss eine positive ganze Zahl folgen. Fügt die Anzahl der zusätzlichen Beobachtungen bis zum Ende des Arbeitsdatensatzes hinzu. Dies ist in erster Linie für die Prognose bestimmt. Die Werte der meisten Variablen über den zusätzlichen Bereich werden auf fehlende gesetzt, aber bestimmte deterministische Variablen werden erkannt und erweitert, nämlich eine einfache lineare Trend - und periodische Dummy-Variablen. Insobs Muss eine positive ganze Zahl folgen, die nicht größer ist als die aktuelle Anzahl der Beobachtungen. Fügt eine einzige Beobachtung an der angegebenen Position ein. All subsequent data are shifted by one place and the dataset is extended by one observation. All variables apart from the constant are given missing values at the new observation. This action is not available for panel datasets. clear. No parameter required. Clears out the current data, returning gretl to its initial empty state. compact. Must be followed by a positive integer representing a new data frequency, which should be lower than the current frequency (for example, a value of 4 when the current frequency is 12 indicates compaction from monthly to quarterly). This command is available for time series data only it compacts all the series in the data set to the new frequency. A second parameter may be given, namely one of sum. zuerst. last or spread. to specify, respectively, compaction using the sum of the higher-frequency values, start-of-period values, end-of-period values, or spreading of the higher-frequency values across multiple series (one per sub-period). The default is to compact by averaging. expand. This command is only available for annual or quarterly time series data: annual data can be expanded to quarterly, and quarterly data to monthly frequency. By default all the series in the data set are padded out to the new frequency by repeating the existing values, but if the modifier interp is appended then the series are expanded using Chow8211Lin interpolation (see Chow and Lin, 1971 ): the regressors are a constant and quadratic trend and an AR(1) disturbance process is assumed. transpose. No additional parameter required. Transposes the current data set. That is, each observation (row) in the current data set will be treated as a variable (column), and each variable as an observation. This command may be useful if data have been read from some external source in which the rows of the data table represent variables. sortby. The name of a single series or list is required. If one series is given, the observations on all variables in the dataset are re-ordered by increasing value of the specified series. If a list is given, the sort proceeds hierarchically: if the observations are tied in sort order with respect to the first key variable then the second key is used to break the tie, and so on until the tie is broken or the keys are exhausted. Note that this command is available only for undated data. dsortby. Works as sortby except that the re-ordering is by decreasing value of the key series. resample. Constructs a new dataset by random sampling, with replacement, of the rows of the current dataset. One argument is required, namely the number of rows to include. This may be less than, equal to, or greater than the number of observations in the original data. The original dataset can be retrieved via the command smpl full. renumber. Requires the name of an existing series followed by an integer between 1 and the number of series in the dataset minus one. Moves the specified series to the specified position in the dataset, renumbering the other series accordingly. (Position 0 is occupied by the constant, which cannot be moved.) pad-daily. Valid only if the current dataset contains dated daily data with an incomplete calendar. The effect is to pad the data out to a complete calendar by inserting blank rows (that is, rows containing nothing but NA s). This option requires an integer parameter, namely the number of days per week, which must be 5, 6 or 7, and must be greater than or equal to the current data frequency. On successful completion, the data calendar will be complete relative to this value. For example if days-per-week is 5 then all weekdays will be represented, whether or not any data are available for those days. Experimental debugger for user-defined functions, available in the command-line program, gretlcli, and in the GUI console. The debug command should be invoked after the function in question is defined but before it is called. The effect is that execution pauses when the function is called and a special prompt is shown. At the debugging prompt you can type next to execute the next command in the function, or continue to allow execution of the function to continue unimpeded. These commands can be abbreviated as n and c respectively. You can also interpolate an instruction at this prompt, for example a print command to reveal the current value of some variable of interest. --verbose (print extra output) Carries out a nonparametric test for a difference between two populations or groups, the specific test depending on the option selected. With the --sign option, the Sign test is performed. This test is based on the fact that if two samples, x and y . are drawn randomly from the same distribution, the probability that x i y i . for each observation i . should equal 0.5. The test statistic is w . the number of observations for which x i y i . Under the null hypothesis this follows the Binomial distribution with parameters ( n . 0.5), where n is the number of observations. With the --rank-sum option, the Wilcoxon rank-sum test is performed. This test proceeds by ranking the observations from both samples jointly, from smallest to largest, then finding the sum of the ranks of the observations from one of the samples. The two samples do not have to be of the same size, and if they differ the smaller sample is used in calculating the rank-sum. Under the null hypothesis that the samples are drawn from populations with the same median, the probability distribution of the rank-sum can be computed for any given sample sizes and for reasonably large samples a close Normal approximation exists. With the --signed-rank option, the Wilcoxon signed-rank test is performed. This is designed for matched data pairs such as, for example, the values of a variable for a sample of individuals before and after some treatment. The test proceeds by finding the differences between the paired observations, x i 8211 y i . ranking these differences by absolute value, then assigning to each pair a signed rank, the sign agreeing with the sign of the difference. One then calculates W . the sum of the positive signed ranks. As with the rank-sum test, this statistic has a well-defined distribution under the null that the median difference is zero, which converges to the Normal for samples of reasonable size. For the Wilcoxon tests, if the --verbose option is given then the ranking is printed. (This option has no effect if the Sign test is selected.) Carries out estimation of dynamic panel data models (that is, panel models including one or more lags of the dependent variable) using either the GMM-DIF or GMM-SYS method. The parameter p represents the order of the autoregression for the dependent variable. In the simplest case this is a scalar value, but a pre-defined matrix may be given for this argument, to specify a set of (possibly non-contiguous) lags to be used. The dependent variable and regressors should be given in levels form they will be differenced automatically (since this estimator uses differencing to cancel out the individual effects). The last (optional) field in the command is for specifying instruments. If no instruments are given, it is assumed that all the independent variables are strictly exogenous. If you specify any instruments, you should include in the list any strictly exogenous independent variables. For predetermined regressors, you can use the GMM function to include a specified range of lags in block-diagonal fashion. This is illustrated in the third example above. The first argument to GMM is the name of the variable in question, the second is the minimum lag to be used as an instrument, and the third is the maximum lag. The same syntax can be used with the GMMlevel function to specify GMM-type instruments for the equations in levels. By default the results of 1-step estimation are reported (with robust standard errors). You may select 2-step estimation as an option. In both cases tests for autocorrelation of orders 1 and 2 are provided, as well as the Sargan overidentification test and a Wald test for the joint significance of the regressors. Note that in this differenced model first-order autocorrelation is not a threat to the validity of the model, but second-order autocorrelation violates the maintained statistical assumptions. In the case of 2-step estimation, standard errors are by default computed using the finite-sample correction suggested by Windmeijer (2005). The standard asymptotic standard errors associated with the 2-step estimator are generally reckoned to be an unreliable guide to inference, but if for some reason you want to see them you can use the --asymptotic option to turn off the Windmeijer correction. If the --time-dummies option is given, a set of time dummy variables is added to the specified regressors. The number of dummies is one less than the maximum number of periods used in estimation, to avoid perfect collinearity with the constant. The dummies are entered in differenced form unless the --dpdstyle option is given, in which case they are entered in levels. For further details and examples, please see chapter 19 of the Gretl Users Guide. Menu path: ModelPanelDynamic panel model --drop-first (omit lowest value from encoding) --drop-last (omit highest value from encoding) For any suitable variables in varlist . creates a set of dummy variables coding for the distinct values of that variable. Suitable variables are those that have been explicitly marked as discrete, or those that take on a fairly small number of values all of which are fairly round (multiples of 0.25). By default a dummy variable is added for each distinct value of the variable in question. For example if a discrete variable x has 5 distinct values, 5 dummy variables will be added to the data set, with names Dx1. Dx2 and so on. The first dummy variable will have value 1 for observations where x takes on its smallest value, 0 otherwise the next dummy will have value 1 when x takes on its second-smallest value, and so on. If one of the option flags --drop-first or --drop-last is added, then either the lowest or the highest value of each variable is omitted from the encoding (which may be useful for avoiding the dummy variable trap). This command can also be embedded in the context of a regression specification. For example, the following line specifies a model where y is regressed on the set of dummy variables coding for x. (Option flags cannot be passed to dummify in this context.) Other access: Main window pop-up menu (single selection) depvar indepvars censvar --exponential (use exponential distribution) --loglogistic (use log-logistic distribution) --lognormal (use log-normal distribution) --medians (fitted values are medians) --robust (robust (QML) standard errors) --cluster clustvar (see logit for explanation) --vcv (print covariance matrix) --verbose (print details of iterations) duration y 0 x1 x2 duration y 0 x1 x2 cens Estimates a duration model: the dependent variable (which must be positive) represents the duration of some state of affairs, for example the length of spells of unemployment for a cross-section of respondents. By default the Weibull distribution is used but the exponential, log-logistic and log-normal distributions are also available. If some of the duration measurements are right-censored (e. g. an individuals spell of unemployment has not come to an end within the period of observation) then you should supply the trailing argument censvar . a series in which non-zero values indicate right-censored cases. By default the fitted values obtained via the accessor yhat are the conditional means of the durations, but if the --medians option is given then yhat provides the conditional medians instead. Please see chapter 32 of the Gretl Users Guide for details. Menu path: ModelLimited dependent variableDuration data. See if. Note that else requires a line to itself, before the following conditional command. You can append a comment, as in But you cannot append a command, as in Ends a block of commands of some sort. For example, end system terminates an equation system. estimate Sys1 methodsur --iterate Calls for estimation of a system of equations, which must have been previously defined using the system command. The name of the system should be given first, surrounded by double quotes if the name contains spaces. The estimator, which must be one of ols. tsls. sur. 3sls. fiml or liml. is preceded by the string method. These arguments are optional if the system in question has already been estimated and occupies the place of the last model in that case the estimator defaults to the previously used value. If the system in question has had a set of restrictions applied (see the restrict command), estimation will be subject to the specified restrictions. If the estimation method is sur or 3sls and the --iterate flag is given, the estimator will be iterated. In the case of SUR, if the procedure converges the results are maximum likelihood estimates. Iteration of three-stage least squares, however, does not in general converge on the full-information maximum likelihood results. The --iterate flag is ignored for other methods of estimation. If the equation-by-equation estimators ols or tsls are chosen, the default is to apply a degrees of freedom correction when calculating standard errors. This can be suppressed using the --no-df-corr flag. This flag has no effect with the other estimators no degrees of freedom correction is applied in any case. By default, the formula used in calculating the elements of the cross-equation covariance matrix is If the --geomean flag is given, a degrees of freedom correction is applied: the formula is where the k s denote the number of independent parameters in each equation. If the --verbose option is given and an iterative method is specified, details of the iterations are printed. fcast 2004:1 2008:3 4 rfcast --rolling Must follow an estimation command. Forecasts are generated for a certain range of observations: if startobs and endobs are given, for that range (if possible) otherwise if the --out-of-sample option is given, for observations following the range over which the model was estimated otherwise over the currently defined sample range. If an out-of-sample forecast is requested but no relevant observations are available, an error is flagged. Depending on the nature of the model, standard errors may also be generated see below. Also see below for the special effect of the --rolling option. If the last model estimated is a single equation, then the optional vname argument has the following effect: the forecast values are not printed, but are saved to the dataset under the given name. If the last model is a system of equations, vname has a different effect, namely selecting a particular endogenous variable for forecasting (the default being to produce forecasts for all the endogenous variables). In the system case, or if vname is not given, the forecast values can be retrieved using the accessor fcast. and the standard errors, if available, via fcerr. The choice between a static and a dynamic forecast applies only in the case of dynamic models, with an autoregressive error process or including one or more lagged values of the dependent variable as regressors. Static forecasts are one step ahead, based on realized values from the previous period, while dynamic forecasts employ the chain rule of forecasting. For example, if a forecast for y in 2008 requires as input a value of y for 2007, a static forecast is impossible without actual data for 2007. A dynamic forecast for 2008 is possible if a prior forecast can be substituted for y in 2007. The default is to give a static forecast for any portion of the forecast range that lies within the sample range over which the model was estimated, and a dynamic forecast (if relevant) out of sample. The --dynamic option requests a dynamic forecast from the earliest possible date, and the --static option requests a static forecast even out of sample. The --rolling option is presently available only for single-equation models estimated via OLS. When this option is given the forecasts are recursive. That is, each forecast is generated from an estimate of the given model using data from a fixed starting point (namely, the start of the sample range for the original estimation) up to the forecast date minus k . where k is the number of steps ahead, which must be given in the steps-ahead argument. The forecasts are always dynamic if this is applicable. Note that the steps-ahead argument should be given only in conjunction with the --rolling option. The --plot option (available only in the case of single-equation estimation) calls for a plot file to be produced, containing a graphical representation of the forecast. The suffix of the filename argument to this option controls the format of the plot. eps for EPS. pdf for PDF. png for PNG. plt for a gnuplot command file. The dummy filename display can be used to force display of the plot in a window. For example, will generate a graphic in PDF format. Absolute pathnames are respected, otherwise files are written to the gretl working directory. The nature of the forecast standard errors (if available) depends on the nature of the model and the forecast. For static linear models standard errors are computed using the method outlined by Davidson and MacKinnon (2004) they incorporate both uncertainty due to the error process and parameter uncertainty (summarized in the covariance matrix of the parameter estimates). For dynamic models, forecast standard errors are computed only in the case of a dynamic forecast, and they do not incorporate parameter uncertainty. For nonlinear models, forecast standard errors are not presently available. Menu path: Model window, AnalysisForecasts This simple command (no arguments, no options) is intended for use in time-consuming scripts that may be executed via the gretl GUI (it is ignored by the command-line program), to give the user a visual indication that things are moving along and gretl is not frozen. Ordinarily if you launch a script in the GUI no output is shown until its execution is completed, but the effect of invoking flush is as follows: On the first invocation, gretl opens a window, displays the output so far, and appends the message Processing. . On subsequent invocations the text shown in the output window is updated, and a new processing message is appended. When execution of the script is completed any remaining output is automatically flushed to the text window. Please note, there is no point in using flush in scripts that take less than (say) 5 seconds to execute. Also note that this command should not be used at a point in the script where there is no further output to be printed, as the processing message will then be misleading to the user. The following illustrates the intended use of flush. freq x --min0 --binwidth0.10 With no options given, displays the frequency distribution for the series var (given by name or number), with the number of bins and their size chosen automatically. If the --matrix option is given, var (which must be an integer) is instead interpreted as a 1-based index that selects a column from the named matrix. If the matrix in question is in fact a column vector, the var argument may be omitted. To control the presentation of the distribution you may specify either the number of bins or the minimum value plus the width of the bins, as shown in the last two examples above. The --min option sets the lower limit of the left-most bin. If the --normal option is given, the Doornik8211Hansen chi-square test for normality is computed. If the --gamma option is given, the test for normality is replaced by Lockes nonparametric test for the null hypothesis that the variable follows the gamma distribution see Locke (1976). Shapiro and Chen (2001). Note that the parameterization of the gamma distribution used in gretl is (shape, scale). By default, if the program is not in batch mode a plot of the distribution is shown. This can be adjusted via the --plot option. The acceptable parameters to this option are none (to suppress the plot) display (to display a plot even when in batch mode) or a file name. The effect of providing a file name is as described for the --output option of the gnuplot command. The --silent flag suppresses the usual text output. This might be used in conjunction with one or other of the distribution test options: the test statistic and its p-value are recorded, and can be retrieved using the accessors test and pvalue. It might also be used along with the --plot option if you just want a histogram and dont care to see the accompanying text. Menu path: VariableFrequency distribution garch 1 1 y 0 x1 x2 --robust Estimates a GARCH model (GARCH Generalized Autoregressive Conditional Heteroskedasticity), either a univariate model or, if indepvars are specified, including the given exogenous variables. The integer values p and q (which may be given in numerical form or as the names of pre-existing scalar variables) represent the lag orders in the conditional variance equation: The parameter p therefore represents the Generalized (or AR) order, while q represents the regular ARCH (or MA) order. If p is non-zero, q must also be non-zero otherwise the model is unidentified. However, you can estimate a regular ARCH model by setting q to a positive value and p to zero. The sum of p and q must be no greater than 5. Note that a constant is automatically included in the mean equation unless the --nc option is given. By default native gretl code is used in estimation of GARCH models, but you also have the option of using the algorithm of Fiorentini, Calzolari and Panattoni (1996). The former uses the BFGS maximizer while the latter uses the information matrix to maximize the likelihood, with fine-tuning via the Hessian. Several variant estimators of the covariance matrix are available with this command. By default, the Hessian is used unless the --robust option is given, in which case the QML (White) covariance matrix is used. Other possibilities (e. g. the information matrix, or the Bollerslev8211Wooldridge estimator) can be specified using the set command. By default, the estimates of the variance parameters are initialized using the unconditional error variance from initial OLS estimation for the constant, and small positive values for the coefficients on the past values of the squared error and the error variance. The flag --arma-init calls for the starting values of these parameters to be set using an initial ARMA model, exploiting the relationship between GARCH and ARMA set out in Chapter 21 of Hamiltons Time Series Analysis . In some cases this may improve the chances of convergence. The GARCH residuals and estimated conditional variance can be retrieved as uhat and h respectively. For example, to get the conditional variance: If the --stdresid option is given, the uhat values are divided by the square root of h t . Menu path: ModelTime seriesGARCH NOTE: this command has undergone numerous changes and enhancements since the following help text was written, so for comprehensive and updated info on this command youll want to refer to chapter 9 of the Gretl Users Guide. On the other hand, this help does not contain anything actually erroneous, so take the following as you have this, plus more. In the appropriate context, series. scalar. matrix. string and bundle are synonyms for this command. Creates new variables, often via transformations of existing variables. See also diff. logs. lags. ldiff. sdiff and square for shortcuts. In the context of a genr formula, existing variables must be referenced by name, not ID number. The formula should be a well-formed combination of variable names, constants, operators and functions (described below). Note that further details on some aspects of this command can be found in chapter 9 of the Gretl Users Guide. A genr command may yield either a series or a scalar result. For example, the formula x2 x 2 naturally yields a series if the variable x is a series and a scalar if x is a scalar. The formulae x 0 and mx mean(x) naturally return scalars. Under some circumstances you may want to have a scalar result expanded into a series or vector. You can do this by using series as an alias for the genr command. For example, series x 0 produces a series all of whose values are set to 0. You can also use scalar as an alias for genr. It is not possible to coerce a vector result into a scalar, but use of this keyword indicates that the result should be a scalar: if it is not, an error occurs. When a formula yields a series result, the range over which the result is written to the target variable depends on the current sample setting. It is possible, therefore, to define a series piecewise using the smpl command in conjunction with genr. Supported arithmetical operators are, in order of precedence: (exponentiation) . and (modulus or remainder) and -. The available Boolean operators are (again, in order of precedence). (negation), (logical AND), (logical OR), . . (greater than or equal), (less than or equal) and (not equal). The Boolean operators can be used in constructing dummy variables: for instance (x 10) returns 1 if x 10, 0 otherwise. Built-in constants are pi and NA. The latter is the missing value code: you can initialize a variable to the missing value with scalar x NA. The genr command supports a wide range of mathematical and statistical functions, including all the common ones plus several that are special to econometrics. In addition it offers access to numerous internal variables that are defined in the course of running regressions, doing hypothesis tests, and so on. For a listing of functions and accessors, see the Gretl Function Reference. Besides the operators and functions noted above there are some special uses of genr. genr time creates a time trend variable (1,2,3. ) called time. genr index does the same thing except that the variable is called index. genr dummy creates dummy variables up to the periodicity of the data. In the case of quarterly data (periodicity 4), the program creates dq1 1 for first quarter and 0 in other quarters, dq2 1 for the second quarter and 0 in other quarters, and so on. With monthly data the dummies are named dm1. dm2. und so weiter. With other frequencies the names are dummy1. dummy2. etc. genr unitdum and genr timedum create sets of special dummy variables for use with panel data. The first codes for the cross-sectional units and the second for the time period of the observations. Hinweis . In the command-line program, genr commands that retrieve model-related data always reference the model that was estimated most recently. This is also true in the GUI program, if one uses genr in the gretl console or enters a formula using the Define new variable option under the Add menu in the main window. With the GUI, however, you have the option of retrieving data from any model currently displayed in a window (whether or not its the most recent model). You do this under the Save menu in the models window. The special variable obs serves as an index of the observations. For instance series dum (obs15) will generate a dummy variable that has value 1 for observation 15, 0 otherwise. You can also use this variable to pick out particular observations by date or name. For example, series d (obs1986:4). series d (obs2008-04-01). or series d (obsCA). If daily dates or observation labels are used in this context, they should be enclosed in double quotes. Quarterly and monthly dates (with a colon) may be used unquoted. Note that in the case of annual time series data, the year is not distinguishable syntactically from a plain integer therefore if you wish to compare observations against obs by year you must use the function obsnum to convert the year to a 1-based index value, as in series d (obsobsnum(1986)). Scalar values can be pulled from a series in the context of a genr formula, using the syntax varname obs . The obs value can be given by number or date. Examples: x5. CPI1996:01. For daily data, the form YYYY-MM-DD should be used, e. g. ibm1970-01-23. An individual observation in a series can be modified via genr. To do this, a valid observation number or date, in square brackets, must be appended to the name of the variable on the left-hand side of the formula. For example, genr x3 30 or genr x1950:04 303.7. Menu path: AddDefine new variable Other access: Main window pop-up menu --two-step (two step estimation) --lbfgs (use L-BFGS-B instead of regular BFGS) Performs Generalized Method of Moments (GMM) estimation using the BFGS (Broyden, Fletcher, Goldfarb, Shanno) algorithm. You must specify one or more commands for updating the relevant quantities (typically GMM residuals), one or more sets of orthogonality conditions, an initial matrix of weights, and a listing of the parameters to be estimated, all enclosed between the tags gmm and end gmm. Any options should be appended to the end gmm line. Please see chapter 22 of the Gretl Users Guide for details on this command. Here we just illustrate with a simple example. In the example above we assume that y and X are data matrices, b is an appropriately sized vector of parameter values, W is a matrix of instruments, and V is a suitable matrix of weights. The statement indicates that the residual vector e is in principle orthogonal to each of the instruments composing the columns of W. Parameter names In estimating a nonlinear model it is often convenient to name the parameters tersely. In printing the results, however, it may be desirable to use more informative labels. This can be achieved via the additional keyword paramnames within the command block. For a model with k parameters the argument following this keyword should be either a double-quoted string literal holding k space-separated names or the name of a string variable that holds k such names. Menu path: ModelGMM gnuplot y1 y2 x --with-linesy2 The variables in the list yvars are graphed against xvar . For a time series plot you may either give time as xvar or use the option flag --time-series. By default, data-points are shown as points this can be overridden by giving one of the options --with-lines. --with-lp or --with-impulses. If more than one variable is to be plotted on the y axis, the effect of these options may be confined to a subset of the variables by using the varspec parameter. This should take the form of a comma-separated listing of the names or numbers of the variables to be plotted with lines or impulses respectively. For instance, the final example above shows how to plot y1 and y2 against x. such that y2 is represented by a line but y1 by points. If the --dummy option is selected, exactly three variables should be given: a single y variable, an x variable, and dvar . a discrete variable. The effect is to plot yvar against xvar with the points shown in different colors depending on the value of dvar at the given observation. Taking data from a matrix Generally, the arguments yvars and xvar are required, and refer to series in the current dataset (given either by name or ID number). But if a named matrix is supplied via the --matrix option these arguments become optional: if the specified matrix has k columns, by default the first k 8211 1 columns are treated as the yvars and the last column as xvar . If the --time-series option is given, however, all k columns are plotted against time. If you wish to plot selected columns of the matrix, you should specify yvars and xvar in the form of 1-based column numbers. For example if you want a scatterplot of column 2 of matrix M against column 1, you can do: Showing a line of best fit The --fit option is applicable only for bivariate scatterplots and single time-series plots. The default behavior for a scatterplot is to show the OLS fit if the slope coefficient is significant at the 10 percent level, while the default behavior for time-series is not to show any fitted line. You can call for different behavior by using this option along with one of the following fitspec parameter values. Note that if the plot is a single time series the place of x is taken by time. linear. show the OLS fit regardless of its level of statistical significance. none. dont show any fitted line. inverse. quadratic. cubic. semilog or linlog. show a fitted line based on a regression of the specified type. By semilog. we mean a regression of log y on x the fitted line represents the conditional expectation of y . obtained by exponentiation. By linlog we mean a regression of y on the log of x . loess. show the fit from a robust locally weighted regression (also is sometimes known as lowess). Plotting a band The --band option can be used for plotting zero or more series along with a band of some sort (typically representing a confidence interval). This option requires two comma-separated parameters: the name or ID number of a series representing the center of the band, and the name or ID of a series giving the width of the band: the effect is to draw a band with y coordinates equal to center minus width and center plus width. An optional third parameter (again, comma-separated) can be used to give a multiplier for the width dimension, in the form of a numerical constant or the name of a scalar variable. So for example, the following example plots y along with a band of plus or minus 1.96 times sey. When the --band option is given, the companion option --band-style can be used to control the bands representation. By default the upper and lower limits are shown as solid lines, but the parameters fill. dash or bars cause the band to be drawn as a shaded area, using dashed lines or using error bars, respectively. In addition a color specification can be appended (following a comma) or substituted. Here are some style examples: The first example produces a shaded area in the default color the second switches to dashed lines with a specified blue-gray color the third uses solid black lines and the last shows blue bars. Note that colors can be given as either hexadecimal RGB values or by name you can access the list of color-names recognized by gnuplot by issuing the command show colornames in gnuplot itself, or in the gretl console by doing Controlling the output In interactive mode the plot is displayed immediately. In batch mode the default behavior is that a gnuplot command file is written in the users working directory, with a name on the pattern gpttmpN. plt. starting with N 01. The actual plots may be generated later using gnuplot (under MS Windows, wgnuplot). This behavior can be modified by use of the --output filename option. This option controls the filename used, and at the same time allows you to specify a particular output format via the three-letter extension of the file name, as follows. eps results in the production of an Encapsulated PostScript (EPS) file. pdf produces PDF. png produces PNG format. emf calls for EMF (Enhanced MetaFile). fig calls for an Xfig file, and. svg for SVG (Scalable Vector Graphics). If the dummy filename display is given then the plot is shown on screen as in interactive mode. If a filename with any extension other than those just mentioned is given, a gnuplot command file is written. Adding gnuplot commands A further option to this command is available: following the specification of the variables to be plotted and the option flag (if any), you may add literal gnuplot commands to control the appearance of the plot (for example, setting the plot title andor the axis ranges). These commands should be enclosed in braces, and each gnuplot command must be terminated with a semi-colon. A backslash may be used to continue a set of gnuplot commands over more than one line. Here is an example of the syntax: Menu path: ViewGraph specified vars Other access: Main window pop-up menu, graph button on toolbar graphpg --output filename The session graph page will work only if you have the LaTeX typesetting system installed, and are able to generate and view PDF or PostScript output. In the session icon window, you can drag up to eight graphs onto the graph page icon. When you double-click on the graph page (or right-click and select Display), a page containing the selected graphs will be composed and opened in a suitable viewer. From there you should be able to print the page. To clear the graph page, right-click on its icon and select Clear. Note that on systems other than MS Windows, you may have to adjust the setting for the program used to view PDF or PostScript files. Find that under the Programs tab in the gretl Preferences dialog box (under the Tools menu in the main window). Its also possible to operate on the graph page via script, or using the console (in the GUI program). The following commands and options are supported: To add a graph to the graph page, issue the command graphpg add after saving a named graph, as in To display the graph page: graphpg show. To clear the graph page: graphpg free. To adjust the scale of the font used in the graph page, use graphpg fontscale scale . where scale is a multiplier (with a default of 1.0). Thus to make the font size 50 percent bigger than the default you can do To call for printing of the graph page to file, use the flag --output plus a filename the filename should have the suffix. pdf ,.ps or. eps . For example: The output file will be written in the currently set workdir. unless the filename string contains a full path specification. In this context the output uses colored lines by default to use dotdash patterns instead of colors you can append the --monochrome flag. This test is available only after estimating an OLS model using panel data (see also setobs ). It tests the simple pooled model against the principal alternatives, the fixed effects and random effects models. The fixed effects model allows the intercept of the regression to vary across the cross-sectional units. An F - test is reported for the null hypotheses that the intercepts do not differ. The random effects model decomposes the residual variance into two parts, one part specific to the cross-sectional unit and the other specific to the particular observation. (This estimator can be computed only if the number of cross-sectional units in the data set exceeds the number of parameters to be estimated.) The Breusch8211Pagan LM statistic tests the null hypothesis that the pooled OLS estimator is adequate against the random effects alternative. The pooled OLS model may be rejected against both of the alternatives, fixed effects and random effects. Provided the unit - or group-specific error is uncorrelated with the independent variables, the random effects estimator is more efficient than the fixed effects estimator otherwise the random effects estimator is inconsistent and the fixed effects estimator is to be preferred. The null hypothesis for the Hausman test is that the group-specific error is not so correlated (and therefore the random effects model is preferable). A low p-value for this test counts against the random effects model and in favor of fixed effects. Menu path: Model window, TestsPanel diagnostics depvar indepvars selection equation --func (select functions help) If no arguments are given, prints a list of available commands. If the single argument functions is given, prints a list of available functions (see genr ). help command describes command (e. g. help smpl ). help function describes function (e. g. help ldet ). Some functions have the same names as related commands (e. g. diff ): in that case the default is to print help for the command, but you can get help on the function by using the --func option. --with-lines (plot with lines) --time-series (put time on x-axis) --output filename (send output to specified file) Provides a means of plotting a high-frequency series, possibly along with one or more series observed at the base frequency of the dataset. The first argument should be a MIDAS list the optional additional lflist terms, following a semicolon, should be regular (low-frequency) series. For details on the effect of the --output option, please see the gnuplot command. --no-squares (see below) --vcv (print covariance matrix) This command is applicable where heteroskedasticity is present in the form of an unknown function of the regressors which can be approximated by a quadratic relationship. In that context it offers the possibility of consistent standard errors and more efficient parameter estimates as compared with OLS. The procedure involves (a) OLS estimation of the model of interest, followed by (b) an auxiliary regression to generate an estimate of the error variance, then finally (c) weighted least squares, using as weight the reciprocal of the estimated variance. In the auxiliary regression (b) we regress the log of the squared residuals from the first OLS on the original regressors and their squares (by default), or just on the original regressors (if the --no-squares option is given). The log transformation is performed to ensure that the estimated variances are all non-negative. Call the fitted values from this regression u . The weight series for the final WLS is then formed as 1exp( u ). Menu path: ModelOther linear modelsHeteroskedasticity corrected --plot mode-or-filename (see below) Calculates the Hurst exponent (a measure of persistence or long memory) for a time-series variable having at least 128 observations. The Hurst exponent is discussed by Mandelbrot (1983). In theoretical terms it is the exponent, H . in the relationship where RS is the rescaled range of the variable x in samples of size n and a is a constant. The rescaled range is the range (maximum minus minimum) of the cumulated value or partial sum of x over the sample period (after subtraction of the sample mean), divided by the sample standard deviation. As a reference point, if x is white noise (zero mean, zero persistence) then the range of its cumulated wandering (which forms a random walk), scaled by the standard deviation, grows as the square root of the sample size, giving an expected Hurst exponent of 0.5. Values of the exponent significantly in excess of 0.5 indicate persistence, and values less than 0.5 indicate anti-persistence (negative autocorrelation). In principle the exponent is bounded by 0 and 1, although in finite samples it is possible to get an estimated exponent greater than 1. In gretl, the exponent is estimated using binary sub-sampling: we start with the entire data range, then the two halves of the range, then the four quarters, and so on. For sample sizes smaller than the data range, the RS value is the mean across the available samples. The exponent is then estimated as the slope coefficient in a regression of the log of RS on the log of sample size. By default, if the program is not in batch mode a plot of the rescaled range is shown. This can be adjusted via the --plot option. The acceptable parameters to this option are none (to suppress the plot) display (to display a plot even when in batch mode) or a file name. The effect of providing a file name is as described for the --output option of the gnuplot command. Menu path: VariableHurst exponent Flow control for command execution. Three sorts of construction are supported, as follows. condition must be a Boolean expression, for the syntax of which see genr. More than one elif block may be included. In addition, if. endif blocks may be nested. Intended for use in a command script, primarily for including definitions of functions. Executes the commands in filename then returns control to the main script. To include a packaged function, be sure to include the filename extension. Prints out any supplementary information stored with the current datafile. Menu path: DataDataset info Other access: Data browser windows --local (install from local file) --remove (see below) --purge (see below) install pathtomyfile. gfn --local Installer for gretl function packages ( gfn or zip files). If this command is given the plain name of a gretl function package (as in the first two examples) the action is to download the specified package from the gretl server and install it on the local machine. In this case it is not necessary to supply a filename extension. If the --local option is given, the pkgname argument should be the path to an uninstalled package file on the local machine, with the correct extension. The action is to copy the file into place ( gfn ), or unzip it into place ( zip ), into place meaning where the include command will find it. When no option is given, if pkgname begins with . the effect is to download a package file from a specified server and install it locally. With the --remove or --purge option the inverse operation is performed that is, an installed package is uninstalled. If just --remove is given, the specified package is unloaded from memory and is removed from the GUI menu to which it is attached, if any. If the --purge option is given then in addition to the actions just mentioned the package file is deleted. (If the package is installed in its own subdirectory, the whole subdirectory is deleted.) Menu path: ToolsFunction packagesOn server minvar maxvar indepvars --quiet (suppress printing of results) --verbose (print details of iterations) --robust (robust standard errors) --cluster clustvar (see logit for explanation) intreg lo hi const x1 x2 Estimates an interval regression model. This model arises when the dependent variable is imperfectly observed for some (possibly all) observations. In other words, the data generating process is assumed to be but we only observe m NA s for left - and right-unbounded observations, respectively. The model is estimated by maximum likelihood, assuming normality of the disturbance term. By default, standard errors are computed using the negative inverse of the Hessian. If the --robust flag is given, then QML or Huber8211White standard errors are calculated instead. In this case the estimated covariance matrix is a sandwich of the inverse of the estimated Hessian and the outer product of the gradient. Menu path: ModelLimited dependent variableInterval regression --data column-name (see below) --filter expression (see below) --ikey inner-key (see below) --okey outer-key (see below) --aggr method (see below) --tkey column-name, format-string (see below) --verbose (report on progress) This command imports a data series from the source filename (which must be either a delimited text data file or a native gretl data file) under the name varname . For details please see chapter 7 of the Gretl Users Guide here we just give a brief summary of the available options. The --data option can be used to specify the column heading of the data in the source file, if this differs from the name by which the data should be known in gretl. The --filter option can be used to specify a criterion for filtering the source data (that is, selecting a subset of observations). The --ikey and --okey options can be used to specify a mapping between observations in the current dataset and observations in the source data (for example, individuals can be matched against the household to which they belong). The --aggr option is used when the mapping between observations in the current dataset and the source is not one-to-one. The --tkey option is applicable only when the current dataset has a time-series structure. It can be used to specify the name of a column containing dates to be matched to the dataset andor the format in which dates are represented in that column. See also append for simpler joining operations. kpss 4 x1 --trend For use of this command with panel data please see the final section in this entry. Computes the KPSS test (Kwiatkowski et al, Journal of Econometrics, 1992) for stationarity, for each of the specified variables (or their first difference, if the --difference option is selected). The null hypothesis is that the variable in question is stationary, either around a level or, if the --trend option is given, around a deterministic linear trend. The order argument determines the size of the window used for Bartlett smoothing. If a negative value is given this is taken as a signal to use an automatic window size of 4( T 100) 0.25. where T is the sample size. If the --verbose option is chosen the results of the auxiliary regression are printed, along with the estimated variance of the random walk component of the variable. The critical values shown for the test statistic are based on response surfaces estimated in the manner set out by Sephton (Economics Letters, 1995). which are more accurate for small samples than the values given in the original KPSS article. When the test statistic lies between the 10 percent and 1 percent critical values a p-value is shown this is obtained by linear interpolation and should not be taken too literally. See the kpsscrit function for a means of obtaining these critical values programmatically. Panel data When the kpss command is used with panel data, to produce a panel unit root test, the applicable options and the results shown are somewhat different. While you may give a list of variables for testing in the regular time-series case, with panel data only one variable may be tested per command. And the --verbose option has a different meaning: it produces a brief account of the test for each individual time series (the default being to show only the overall result). When possible, the overall test (null hypothesis: the series in question is stationary for all the panel units) is calculated using the method of Choi (Journal of International Money and Finance, 2001). This is not always straightforward, the difficulty being that while the Choi test is based on the p-values of the tests on the individual series, we do not currently have a means of calculating p-values for the KPSS test statistic we must rely on a few critical values. If the test statistic for a given series falls between the 10 percent and 1 percent critical values, we are able to interpolate a p-value. But if the test falls short of the 10 percent value, or exceeds the 1 percent value, we cannot interpolate and can at best place a bound on the global Choi test. If the individual test statistic falls short of the 10 percent value for some units but exceeds the 1 percent value for others, we cannot even compute a bound for the global test. Menu path: VariableUnit root testsKPSS test lags 4 x1 x2 x3 --bylag Creates new series which are lagged values of each of the series in varlist . By default the number of lags created equals the periodicity of the data. For example, if the periodicity is 4 (quarterly), the command lags x creates The number of lags created can be controlled by the optional first parameter (which, if present, must be followed by a semicolon). The --bylag option is meaningful only if varlist contains more than one series and the maximum lag order is greater than 1. By default the lagged terms are added to the dataset by variable: first all lags of the first series, then all lags of the second series, and so on. But if --bylag is given, the ordering is by lags: first lag 1 of all the listed series, then lag 2 of all the list series, and so on. Menu path: AddLags of selected variables The first difference of the natural log of each series in varlist is obtained and the result stored in a new series with the prefix ld. Thus ldiff x y creates the new variables Menu path: AddLog differences of selected variables --save (save the resulting series) --quiet (dont print results) --plot mode-or-filename (see below) Must follow an ols command. Calculates the leverage ( h . which must lie in the range 0 to 1) for each data point in the sample on which the previous model was estimated. Displays the residual ( u ) for each observation along with its leverage and a measure of its influence on the estimates, uh (1 8211 h ). Leverage points for which the value of h exceeds 2 k n (where k is the number of parameters being estimated and n is the sample size) are flagged with an asterisk. For details on the concepts of leverage and influence see Davidson and MacKinnon (1993). Chapter 2. DFFITS values are also computed: these are studentized residuals (predicted residuals divided by their standard errors) multiplied by . For discussions of studentized residuals and DFFITS see chapter 12 of Maddalas Introduction to Econometrics or Belsley, Kuh and Welsch (1980). Briefly, a predicted residual is the difference between the observed value of the dependent variable at observation t . and the fitted value for observation t obtained from a regression in which that observation is omitted (or a dummy variable with value 1 for observation t alone has been added) the studentized residual is obtained by dividing the predicted residual by its standard error. If the --save flag is given with this command, the leverage, influence and DFFITS values are added to the current data set in this context the --quiet flag may be used to suppress the printing of results. The default names of the saved series are, respectively, lever. influ and dffits. However, if series of these names already exist, the names of the newly saved series will be adjusted to ensure uniqueness in any case, they will be the highest-numbered three series in the dataset. After execution, the test accessor returns the cross-validation criterion, which is defined as the sum of squared deviations of the dependent variable from its forecast value, the forecast for each observation being based on a sample from which that observation is excluded. (This is known as the leave-one-out estimator). For a broader discussion of the cross-validation criterion, see Davidson and MacKinnons Econometric Theory and Methods . pages 6858211686, and the references therein. By default, if the program is not in batch mode a plot of the leverage and influence values is shown. This can be adjusted via the --plot option. The acceptable parameters to this option are none (to suppress the plot) display (to display a plot even when in batch mode) or a file name. The effect of providing a file name is as described for the --output option of the gnuplot command. Menu path: Model window, AnalysisInfluential observations logistic y const x --ymax50 Logistic regression: carries out an OLS regression using the logistic transformation of the dependent variable, The dependent variable must be strictly positive. If all its values lie between 0 and 1, the default is to use a y value (the asymptotic maximum of the dependent variable) of 1 if its values lie between 0 and 100, the default y is 100. If you wish to set a different maximum, use the --ymax option. Note that the supplied value must be greater than all of the observed values of the dependent variable. The fitted values and residuals from the regression are automatically transformed using where x represents either a fitted value or a residual from the OLS regression using the transformed dependent variable. The reported values are therefore comparable with the original dependent variable. Note that if the dependent variable is binary, you should use the logit command instead. Menu path: ModelLimited dependent variableLogistic --p-values (show p-values instead of slopes) If the dependent variable is a binary variable (all values are 0 or 1) maximum likelihood estimates of the coefficients on indepvars are obtained via the Newton8211Raphson method. As the model is nonlinear the slopes depend on the values of the independent variables. By default the slopes with respect to each of the independent variables are calculated (at the means of those variables) and these slopes replace the usual p-values in the regression output. This behavior can be suppressed my giving the --p-values option. The chi-square statistic tests the null hypothesis that all coefficients are zero apart from the constant. By default, standard errors are computed using the negative inverse of the Hessian. If the --robust flag is given, then QML or Huber8211White standard errors are calculated instead. In this case the estimated covariance matrix is a sandwich of the inverse of the estimated Hessian and the outer product of the gradient see chapter 10 of Davidson and MacKinnon (2004). But if the --cluster option is given, then cluster-robust standard errors are produced see chapter 17 of the Gretl Users Guide for details. If the dependent variable is not binary but is discrete, then by default it is interpreted as an ordinal response, and Ordered Logit estimates are obtained. However, if the --multinomial option is given, the dependent variable is interpreted as an unordered response, and Multinomial Logit estimates are produced. (In either case, if the variable selected as dependent is not discrete an error is flagged.) In the multinomial case, the accessor mnlprobs is available after estimation, to get a matrix containing the estimated probabilities of the outcomes at each observation (observations in rows, outcomes in columns). If you want to use logit for analysis of proportions (where the dependent variable is the proportion of cases having a certain characteristic, at each observation, rather than a 1 or 0 variable indicating whether the characteristic is present or not) you should not use the logit command, but rather construct the logit variable, as in and use this as the dependent variable in an OLS regression. See chapter 12 of Ramanathan (2002). Menu path: ModelLimited dependent variableLogit --unequal-vars (assume variances are unequal) Calculates the t statistic for the null hypothesis that the population means are equal for the variables series1 and series2 . and shows its p-value. By default the test statistic is calculated on the assumption that the variances are equal for the two variables. With the --unequal-vars option the variances are assumed to be different in this case the degrees of freedom for the test statistic are approximated as per Satterthwaite (1946). Menu path: ToolsTest statistic calculator depvar indepvars MIDAS-terms --vcv (print covariance matrix) --robust (robust standard errors) --quiet (suppress printing of results) midasreg y 0 y(-1) mds(X, 1, 9, 1, theta) midasreg y 0 y(-1) mds(X, 1, 9, 0) midasreg y 0 y(-1) mdsl(XL, 2, theta) Carries out least-squares estimation (either NLS or OLS, depending on the specification) of a MIDAS (Mixed Data Sampling) model. Such models include one or more independent variables that are observed at a higher frequency than the dependent variable for a good brief introduction see Armesto, Engemann and Owyang (2010). The variables in indepvars should be of the same frequency as the dependent variable. This list should usually include const or 0 (intercept) and typically includes one or more lags of the dependent variable. The high-frequency terms are given after a semicolon each one takes the form of a number of comma-separated arguments within parentheses, prefixed by either mds or mdsl. mds. this variant generally requires 5 arguments, as follows: the name of a MIDAS list, two integers giving the minimum and maximum high-frequency lags, an integer between 0 and 4 specifying the type of parameterization to use, and the name of a vector holding initial values of the parameters. The example below calls for lags 3 to 11 of the high-frequency series represented by the list X. using parameterization type 1 (exponential Almon, see below) with initializer theta. mdsl. generally requires 3 arguments: the name of a list of MIDAS lags, an integer to specify the type of parameterization and the name of an initialization vector. In this case the minimum and maximum lags are implicit in the initial list argument. In the example below Xlags should be a list which already holds all the required lags such a list can be constructed using the hflags function. The supported types of parameterization are as follows: 0 unrestricted MIDAS or U-MIDAS (each lag has its own coefficient) 1 normalized exponential Almon requires at least one parameter, commonly uses two 2 normalized beta with a zero last lag requires exactly two parameters 3 normalized beta with non-zero last lag requires exactly three parameters 4 (non-normalized) Almon polynomial requires at least one parameter When the parameterization is U-MIDAS, the final initializer argument is not required with mds or mdsl. In other cases you can request an automatic initialization by substituting one or other of these two forms for the name of an initial parameter vector: The keyword null. this is acceptable only if the parameterization has a fixed number of terms (the beta cases, 2 or 3). An integer value giving the required number of parameters. Menu path: ModelTime seriesMIDAS Performs Maximum Likelihood (ML) estimation using either the BFGS (Broyden, Fletcher, Goldfarb, Shanno) algorithm or Newtons method. The user must specify the log-likelihood function. The parameters of this function must be declared and given starting values prior to estimation. Optionally, the user may specify the derivatives of the log-likelihood function with respect to each of the parameters if analytical derivatives are not supplied, a numerical approximation is computed. Simple example: Suppose we have a series X with values 0 or 1 and we wish to obtain the maximum likelihood estimate of the probability, p. that X 1. (In this simple case we can guess in advance that the ML estimate of p will simply equal the proportion of Xs equal to 1 in the sample.) The parameter p must first be added to the dataset and given an initial value. For example, scalar p 0.5. We then construct the MLE command block: The first line above specifies the log-likelihood function. It starts with the keyword mle. then a dependent variable is specified and an expression for the log-likelihood is given (using the same syntax as in the genr command). The next line (which is optional) starts with the keyword deriv and supplies the derivative of the log-likelihood function with respect to the parameter p. If no derivatives are given, you should include a statement using the keyword params which identifies the free parameters: these are listed on one line, separated by spaces and can be either scalars, or vectors, or any combination of the two. For example, the above could be changed to: in which case numerical derivatives would be used. Note that any option flags should be appended to the ending line of the MLE block. By default, estimated standard errors are based on the Outer Product of the Gradient. If the --hessian option is given, they are instead based on the negative inverse of the Hessian (which is approximated numerically). If the --robust option is given, a QML estimator is used (namely, a sandwich of the negative inverse of the Hessian and the covariance matrix of the gradient). If you supply analytical derivatives, by default gretl runs a numerical check on their plausibility. Occasionally this may produce false positives, instances where correct derivatives appear to be wrong and estimation is refused. To counter this, or to achieve a little extra speed, you can give the option --no-gradient-check. Obviously, you should do this only if you are confident that the gradient you have specified is right. Parameter names In estimating a nonlinear model it is often convenient to name the parameters tersely. In printing the results, however, it may be desirable to use more informative labels. This can be achieved via the additional keyword paramnames within the command block. For a model with k parameters the argument following this keyword should be either a double-quoted string literal holding k space-separated names or the name of a string variable that holds k such names. For an in-depth description of mle. please refer to chapter 21 of the Gretl Users Guide. Menu path: ModelMaximum likelihood --output filename (send output to specified file) Prints the coefficient table and optional additional statistics for a model estimated by hand. Mainly useful for user-written functions. The argument coeffmat should be a k by 2 matrix containing k coefficients and k associated standard errors, and names should be a string containing at least k names for the coefficients, separated by commas or spaces. (The names argument may be either the name of a string variable or a literal string, enclosed in double quotes.) The optional argument addstats is a vector containing p additional statistics to be printed under the coefficient table. If this argument is given, then names should contain k p comma-separated strings, the additional p strings to be associated with the additional statistics. To put the output into a file, use the flag --output plus a filename. If the filename has the suffix. tex , the output will be in TeX format if the suffix is. rtf the output will be RTF otherwise it will be plain text. In the case of TeX output the default is to produce a fragment, suitable for inclusion in a document if you want a stand-alone document instead, use the --complete option. The output file will be written in the currently set workdir. unless the filename string contains a full path specification. --normality (normality of residual) --logs (non-linearity, logs) --autocorr (serial correlation) --squares (non-linearity, squares) --white (heteroskedasticity, Whites test) --white-nocross (Whites test, squares only) --breusch-pagan (heteroskedasticity, Breusch8211Pagan) --robust (robust variance estimate for Breusch8211Pagan) --panel (heteroskedasticity, groupwise) --comfac (common factor restriction, AR1 models only) --xdepend (cross-sectional dependence, panel data only) --quiet (dont print details) --silent (dont print anything) Must immediately follow an estimation command. Depending on the option given, this command carries out one of the following: the Doornik8211Hansen test for the normality of the error term a Lagrange Multiplier test for nonlinearity (logs or squares) Whites test (with or without cross-products) or the Breusch8211Pagan test (Breusch and Pagan, 1979 ) for heteroskedasticity the LMF test for serial correlation (Kiviet, 1986) a test for ARCH (Autoregressive Conditional Heteroskedasticity see also the arch command) a test of the common factor restriction implied by AR(1) estimation or a test for cross-sectional dependence in panel-data models. With the exception of the normality, common factor and cross-sectional dependence tests most of the options are only available for models estimated via OLS, but see below for details regarding two-stage least squares. The optional order argument is relevant only in case the --autocorr or --arch options are selected. The default is to run these tests using a lag order equal to the periodicity of the data, but this can be adjusted by supplying a specific lag order. The --robust option applies only when the Breusch8211Pagan test is selected its effect is to use the robust variance estimator proposed by Koenker (1981). making the test less sensitive to the assumption of normality. The --panel option is available only when the model is estimated on panel data: in this case a test for groupwise heteroskedasticity is performed (that is, for a differing error variance across the cross-sectional units). The --comfac option is available only when the model is estimated via an AR(1) method such as Hildreth8211Lu. The auxiliary regression takes the form of a relatively unrestricted dynamic model, which is used to test the common factor restriction implicit in the AR(1) specification. The --xdepend option is available only for models estimated on panel data. The test statistic is that developed by Pesaran (2004). The null hypothesis is that the error term is independently distributed across the cross-sectional units or individuals. By default, the program prints the auxiliary regression on which the test statistic is based, where applicable. This may be suppressed by using the --quiet flag (minimal printed output) or the --silent flag (no printed output). The test statistic and its p-value may be retrieved using the accessors test and pvalue respectively. When a model has been estimated by two-stage least squares (see tsls ), the LM principle breaks down and gretl offers some equivalents: the --autocorr option computes Godfreys test for autocorrelation (Godfrey, 1994) while the --white option yields the HET1 heteroskedasticity test (Pesaran and Taylor, 1999). Menu path: Model window, Tests --vcv (print covariance matrix) --simple-print (do not print auxiliary statistics) --quiet (suppress printing of results) Computes OLS estimates for the specified model using multiple precision floating-point arithmetic, with the help of the Gnu Multiple Precision (GMP) library. By default 256 bits of precision are used for the calculations, but this can be increased via the environment variable GRETLMPBITS. For example, when using the bash shell one could issue the following command, before starting gretl, to set a precision of 1024 bits. A rather arcane option is available for this command (primarily for testing purposes): if the indepvars list is followed by a semicolon and a further list of numbers, those numbers are taken as powers of x to be added to the regression, where x is the last variable in indepvars . These additional terms are computed and stored in multiple precision. In the following example y is regressed on x and the second, third and fourth powers of x. Menu path: ModelOther linear modelsHigh precision OLS Performs Nonlinear Least Squares (NLS) estimation using a modified version of the Levenberg8211Marquardt algorithm. You must supply a function specification. The parameters of this function must be declared and given starting values prior to estimation. Optionally, you may specify the derivatives of the regression function with respect to each of the parameters. If you do not supply derivatives you should instead give a list of the parameters to be estimated (separated by spaces or commas), preceded by the keyword params. In the latter case a numerical approximation to the Jacobian is computed. It is easiest to show what is required by example. The following is a complete script to estimate the nonlinear consumption function set out in William Greenes Econometric Analysis (Chapter 11 of the 4th edition, or Chapter 9 of the 5th). The numbers to the left of the lines are for reference and are not part of the commands. Note that any option flags, such as --vcv for printing the covariance matrix of the parameter estimates, should be appended to the final command, end nls. It is often convenient to initialize the parameters by reference to a related linear model that is accomplished here on lines 2 to 5. The parameters alpha, beta and gamma could be set to any initial values (not necessarily based on a model estimated with OLS), although convergence of the NLS procedure is not guaranteed for an arbitrary starting point. The actual NLS commands occupy lines 6 to 10. On line 6 the nls command is given: a dependent variable is specified, followed by an equals sign, followed by a function specification. The syntax for the expression on the right is the same as that for the genr command. The next three lines specify the derivatives of the regression function with respect to each of the parameters in turn. Each line begins with the keyword deriv. gives the name of a parameter, an equals sign, and an expression whereby the derivative can be calculated. As an alternative to supplying numerical derivatives, you could substitute the following for lines 7 to 9: Line 10, end nls. completes the command and calls for estimation. Any options should be appended to this line. If you supply analytical derivatives, by default gretl runs a numerical check on their plausibility. Occasionally this may produce false positives, instances where correct derivatives appear to be wrong and estimation is refused. To counter this, or to achieve a little extra speed, you can give the option --no-gradient-check. Obviously, you should do this only if you are confident that the gradient you have specified is right. Parameter names In estimating a nonlinear model it is often convenient to name the parameters tersely. In printing the results, however, it may be desirable to use more informative labels. This can be achieved via the additional keyword paramnames within the command block. For a model with k parameters the argument following this keyword should be either a double-quoted string literal holding k space-separated names or the name of a string variable that holds k such names. For further details on NLS estimation please see chapter 20 of the Gretl Users Guide. Menu path: ModelNonlinear Least Squares ols y 0 x1 x2 x3 --quiet Computes ordinary least squares (OLS) estimates with depvar as the dependent variable and indepvars as the list of independent variables. Variables may be specified by name or number use the number zero for a constant term. Besides coefficient estimates and standard errors, the program also prints p-values for t (two-tailed) and F - statistics. A p-value below 0.01 indicates statistical significance at the 1 percent level and is marked with . indicates significance between 1 and 5 percent and indicates significance between the 5 and 10 percent levels. Model selection statistics (the Akaike Information Criterion or AIC and Schwarzs Bayesian Information Criterion) are also printed. The formula used for the AIC is that given by Akaike (1974). namely minus two times the maximized log-likelihood plus two times the number of parameters estimated. If the option --no-df-corr is given, the usual degrees of freedom correction is not applied when calculating the estimated error variance (and hence also the standard errors of the parameter estimates). The option --print-final is applicable only in the context of a loop. It arranges for the regression to be run silently on all but the final iteration of the loop. See chapter 12 of the Gretl Users Guide for details. Various internal variables may be retrieved following estimation. For example saves the residuals under the name uh. See the accessors section of the gretl function reference for details. The specific formula (HC version) used for generating robust standard errors when the --robust option is given can be adjusted via the set command. The --jackknife option has the effect of selecting an hcversion of 3a. The --cluster overrides the selection of HC version, and produces robust standard errors by grouping the observations by the distinct values of clustvar see chapter 17 of the Gretl Users Guide for details. Menu path: ModelOrdinary Least Squares Other access: Beta-hat button on toolbar This command must follow an estimation command. It calculates a Wald test for the joint significance of the variables in varlist . which should be a subset of the independent variables in the model last estimated. The results of the test may be retrieved using the accessors test and pvalue. By default the restricted model is estimated and it replaces the original as the current model for the purposes of, for example, retrieving the residuals as uhat or doing further tests. This behavior may be suppressed via the --test-only option. By default the F - form of the Wald test is recorded the --chi-square option may be used to record the chi-square form instead. If the restricted model is both estimated and printed, the --vcv option has the effect of printing its covariance matrix, otherwise this option is ignored. Alternatively, if the --auto flag is given, sequential elimination is performed: at each step the variable with the highest p-value is omitted, until all remaining variables have a p-value no greater than some cutoff. The default cutoff is 10 percent (two-sided) this can be adjusted by appending and a value between 0 and 1 (with no spaces), as in the fourth example above. If varlist is given this process is confined to the listed variables, otherwise all variables are treated as candidates for omission. Note that the --auto and --test-only options cannot be combined. Menu path: Model window, TestsOmit variables --quiet (dont print list of series) --preserve (preserve variables other than series) --frompkg pkgname (see below) --www (use a database on the gretl server) See below for additional specialized options open fedbog --www Opens a data file or database. If a data file is already open, it is replaced by the newly opened one. To add data to the current dataset, see append and (for greater flexibility) join. If a full path is not given, the program will search some relevant paths to try to find the file, with workdir as a first choice. If no filename suffix is given (as in the first example above), gretl assumes a native datafile with suffix. gdt. Based on the name of the file and various heuristics, gretl will try to detect the format of the data file (native, plain text, CSV, MS Excel, Stata, SPSS, etc.). If the --frompkg option is used, gretl will look for the specified data file in the subdirectory associated with the function package specified by pkgname . If the filename argument takes the form of a URI starting with . then gretl will attempt to download the indicated data file before opening it. By default, opening a new data file clears the current gretl session, which includes deletion of all named variables, including matrices, scalars and strings. If you wish to keep your currently defined variables (other than series, which are necessarily cleared out), use the --preserve option. The open command can also be used to open a database (gretl, RATS 4.0 or PcGive) for reading. In that case it should be followed by the data command to extract particular series from the database. If the www option is given, the program will try to access a database of the given name on the gretl server 8212 for instance the Federal Reserve interest rates database in the third example above. When opening a spreadsheet file (Gnumeric, Open Document or MS Excel), you may give up to three additional parameters following the filename. First, you can select a particular worksheet within the file. This is done either by giving its (1-based) number, using the syntax, e. g. --sheet2. or, if you know the name of the sheet, by giving the name in double quotes, as in --sheetMacroData. The default is to read the first worksheet. You can also specify a column andor row offset into the worksheet via, e. g. which would cause gretl to ignore the first 3 columns and the first 2 rows. The default is an offset of 0 in both dimensions, that is, to start reading at the top-left cell. With plain text files, gretl generally expects to find the data columns delimited in some standard manner. But there is also a special facility for reading fixed format files, in which there are no delimiters but there is a known specification of the form, e. g. variable k occupies 8 columns starting at column 24. To read such files, you should append a string --fixed-cols colspec . where colspec is composed of comma-separated integers. These integers are interpreted as a set of pairs. The first element of each pair denotes a starting column, measured in bytes from the beginning of the line with 1 indicating the first byte and the second element indicates how many bytes should be read for the given field. So, for example, if you say then for variable 1 gretl will read 6 bytes starting at column 1 and for variable 2, 3 bytes starting at column 20. Lines that are blank, or that begin with . are ignored, but otherwise the column-reading template is applied, and if anything other than a valid numerical value is found an error is flagged. If the data are read successfully, the variables will be named v1. v2. etc. Its up to the user to provide meaningful names andor descriptions using the commands rename andor setinfo. Menu path: FileOpen data Other access: Drag a data file onto gretls main window Applicable with panel data only. A series of forward orthogonal deviations is obtained for each variable in varlist and stored in a new variable with the prefix o. Thus orthdev x y creates the new variables ox and oy. The values are stored one step ahead of their true temporal location (that is, ox at observation t holds the deviation that, strictly speaking, belongs at t 8211 1). This is for compatibility with first differences: one loses the first observation in each time series, not the last. outfile filename option Diverts output to filename . until further notice. Use the flag --append to append output to an existing file or --write to start a new file (or overwrite an existing one). The --close flag is used to close an output file that was previously opened as above. Output will then revert to the default stream. Note that since only one file can be opened via outfile at any given time (but see below), no filename argument need (nor should) be supplied with this variant of the command. The output file will be written in the currently set workdir. unless the filename string contains a full path specification. In the first example command above, the file regress. txt is opened for writing, and in the second it is closed. This would make sense as a sequence only if some commands were issued before the --close. For example if an estimation command intervened, its output would go to regress. txt rather than the screen. Three special variants on the above are available. If you give the keyword null in place of a real filename along with the --write option, the effect is to suppress all printed output until redirection is ended. If either of the keywords stdout or stderr are given in place of a regular filename the effect is to redirect output to standard output or standard error output respectively. The --quiet option is for use with --write or --append. its effect is to turn off the echoing of commands and the printing of auxiliary messages while output is redirected. It is equivalent to doing except that when redirection is ended the original values of the echo and messages variables are restored. In general only one file can be opened in this way at any given time, so calls to this command cannot be nested. However, use of this command is permitted inside user-written functions (provided the output file is also closed from inside the same function) such that output can be temporarily diverted and then given back to an original output file, in case outfile is currently in use by the caller. For example, the code will produce a file called outer. txt containing the two lines and a file called inner. txt containing the line As described above, the primary usage of this command is to divert output to a named file. However, the --buffer option may be used to achieve a different effect, namely directing output to a named string variable. This option implies --write and is incompatible with --append. The position of the filename argument is occupied by the name of a string variable (which must, of course, conform to the requirements for a valid gretl identifier). If a string variable of the given name already exists, its value will be over-written if there is no such variable, it will be created automatically. Here is a simple example of usage: In this case the variable mybuf captures the output of the labels command. This facility may be of use to writers of function packages. --verbose (more verbose output) Estimates a panel model. By default the fixed effects estimator is used this is implemented by subtracting the group or unit means from the original data. If the --random-effects flag is given, random effects estimates are computed, by default using the method of Swamy and Arora (1972). In this case (only) the option --matrix-diff forces use of the matrix-difference method (as opposed to the regression method) for carrying out the Hausman test for the consistency of the random effects estimator. Also specific to the random effects estimator is the --nerlove flag, which selects the method of Nerlove (1971) as opposed to Swamy and Arora. Alternatively, if the --unit-weights flag is given, the model is estimated via weighted least squares, with the weights based on the residual variance for the respective cross-sectional units in the sample. In this case (only) the --iterate flag may be added to produce iterative estimates: if the iteration converges, the resulting estimates are Maximum Likelihood. As a further alternative, if the --between flag is given, the between-groups model is estimated (that is, an OLS regression using the group means). The --robust option is available only for fixed effects models. The default variant is the Arellano HAC estimator, but Beck8211Katz Panel Corrected Standard Errors can be selected via the command set pcse on. When the robust option is specified the joint F test on the fixed effects is performed using the robust method of Welch (1951). For more details on panel estimation, please see chapter 18 of the Gretl Users Guide. Menu path: ModelPanel --quiet (dont print results) Principal Components Analysis. Unless the --quiet option is given, prints the eigenvalues of the correlation matrix (or the covariance matrix if the --covariance option is given) for the variables in varlist . along with the proportion of the joint variance accounted for by each component. Also prints the corresponding eigenvectors (or component loadings). If you give the --save-all option then all components are saved to the dataset as series, with names PC1. PC2 and so on. These artificial variables are formed as the sum of (component loading) times (standardized X i ), where X i denotes the i th variable in varlist . If you give the --save option without a parameter value, components with eigenvalues greater than the mean (which means greater than 1.0 if the analysis is based on the correlation matrix) are saved to the dataset as described above. If you provide a value for n with this option then the most important n components are saved. See also the princomp function. Menu path: ViewPrincipal components Other access: Main window pop-up (multiple selection) --plot mode-or-filename (see below) Computes and displays the spectrum of the specified series. By default the sample periodogram is given, but optionally a Bartlett lag window is used in estimating the spectrum (see, for example, Greenes Econometric Analysis for a discussion of this). The default width of the Bartlett window is twice the square root of the sample size but this can be set manually using the bandwidth parameter, up to a maximum of half the sample size. If the --log option is given the spectrum is represented on a logarithmic scale. The (mutually exclusive) options --radians and --degrees influence the appearance of the frequency axis when the periodogram is graphed. By default the frequency is scaled by the number of periods in the sample, but these options cause the axis to be labeled from 0 to pi radians or from 0 to 180deg, respectively. By default, if the program is not in batch mode a plot of the periodogram is shown. This can be adjusted via the --plot option. The acceptable parameters to this option are none (to suppress the plot) display (to display a plot even when in batch mode) or a file name. The effect of providing a file name is as described for the --output option of the gnuplot command. Menu path: VariablePeriodogram Other access: Main window pop-up menu (single selection) --with-lines varspec (use lines, not points) --with-lp varspec (use lines and points) --with-impulses varspec (use vertical lines) --time-series (plot against time) --single-yaxis (force use of just one y-axis) --dummy (see below) --fit fitspec (see below) --band bandspec (see below) --band-style style (see below) --output filename (send output to specified file) The plot block provides an alternative to the gnuplot command which may be more convenient when you are producing an elaborate plot (with several options andor gnuplot commands to be inserted into the plot file). A plot block starts with the command-word plot followed by the required argument, data . which specifies the data to be plotted: this should be the name of a list, a matrix, or a single series. If a list or matrix is given, the last element (list) or column (matrix) is assumed to be the x - axis variable and the other(s) the y - axis variable(s), unless the --time-series option is given in which case all the specified data go on the y axis. The option of supplying a single series name is restricted to time-series data, in which case it is assumed that a time-series plot is wanted otherwise an error is flagged. The starting line may be prefixed with the savename apparatus to save a plot as an icon in the GUI program. The block ends with end plot. Inside the block you have zero or more lines of these types, identified by an initial keyword: option. specify a single option. Optionen. specify multiple options on a single line, separated by spaces. literal. a command to be passed to gnuplot literally. printf. a printf statement whose result will be passed to gnuplot literally. Note that when you specify an option using the option or options keywords, it is not necessary to supply the customary double-dash before the option specifier. For details on the effects of the various options please see gnuplot (but see below for some specifics on using the --band option in the plot context). The intended use of the plot block is best illustrated by example: This example assumes that plotmat is the name of a matrix with at least 2 columns (or a list with at least two members). Note that it is considered good practice to place the --output option (only) on the last line of the block. Plotting a band with matrix data The --band and --band-style options mostly work as described in the help for gnuplot. with the following exception: when the data to be plotted are given in the form of a matrix, the first parameter to --band must be given as the name of a matrix with two columns (holding, respectively, the center and the width of the band). This parameter takes the place of the two values (series names or ID numbers, or matrix columns) required by the gnuplot version of this option. An illustration follows: poisson y 0 x1 x2 S Estimates a poisson regression. The dependent variable is taken to represent the occurrence of events of some sort, and must take on only non-negative integer values. If a discrete random variable Y follows the Poisson distribution, then for y 0, 1, 2. The mean and variance of the distribution are both equal to v . In the Poisson regression model, the parameter v is represented as a function of one or more independent variables. The most common version (and the only one supported by gretl) has or in other words the log of v is a linear function of the independent variables. Optionally, you may add an offset variable to the specification. This is a scale variable, the log of which is added to the linear regression function (implicitly, with a coefficient of 1.0). This makes sense if you expect the number of occurrences of the event in question to be proportional, other things equal, to some known factor. For example, the number of traffic accidents might be supposed to be proportional to traffic volume, other things equal, and in that case traffic volume could be specified as an offset in a Poisson model of the accident rate. The offset variable must be strictly positive. By default, standard errors are computed using the negative inverse of the Hessian. If the --robust flag is given, then QML or Huber8211White standard errors are calculated instead. In this case the estimated covariance matrix is a sandwich of the inverse of the estimated Hessian and the outer product of the gradient. Menu path: ModelLimited dependent variableCount data. print hflist --midas Please note that print is a rather basic command (primarily intended for printing the values of series) see printf and eval for more advanced, and less restrictive, alternatives. In the first variant shown above (also see the first example), varlist should be a list of series (either a named list or a list specified via the names or ID numbers of series, separated by spaces). In that case this command prints the values of the listed series. By default the data are printed by variable, but if the --byobs flag is added they are printed by observation. When printing by observation, the default is to show the date (with time-series data) or the observation marker string (if any) at the start of each line. The --no-dates option suppresses the printing of dates or markers a simple observation number is shown instead. See the final paragraph of this entry for the effect of the --midas option (which applies only to a named list of series). If no argument is given (the second variant shown above) then the action is similar to the first case except that all series in the current dataset are printed. The supported options are as decribed above. The third variant (with the object-names argument see the second example) expects a space-separated list of names of primary gretl objects other than series (scalars, matrices, strings, bundles, arrays). The value(s) of these objects are displayed. No option flags are supported in this case. In the fourth form (third example), string-literal should be a string enclosed in double-quotes (and there should be nothing else following on the command line). The string in question is printed, followed by a newline character. The --midas option is specific to the printing of a list of series, and moreover it is specific to datasets that contain one or more high-frequency series, each represented by a MIDAS list. If one such list is given as argument and this option is appended, the series is printed by observation at its native frequency. Menu path: DataDisplay values Prints scalar values, series, matrices, or strings under the control of a format string (providing a subset of the printf function in the C programming language). Recognized numeric formats are e. E. f. g. G and d. in each case with the various modifiers available in C. Examples: the format .10g prints a value to 10 significant figures 12.6f prints a value to 6 decimal places, with a width of 12 characters. Note, however, that in gretl the format g is a good default choice for all numerical values you dont need to get too complicated. The format s should be used for strings. The format string itself must be enclosed in double quotes. The values to be printed must follow the format string, separated by commas. These values should take the form of either (a) the names of variables, (b) expressions that are yield some sort of printable result, or (c) the special functions varname() or date(). The following example prints the values of two variables plus that of a calculated expression: The next lines illustrate the use of the varname and date functions, which respectively print the name of a variable, given its ID number, and a date string, given a 1-based observation number. If a matrix argument is given in association with a numeric format, the entire matrix is printed using the specified format for each element. The same applies to series, except that the range of values printed is governed by the current sample setting. The maximum length of a format string is 127 characters. The escape sequences n (newline), t (tab), v (vertical tab) and (literal backslash) are recognized. To print a literal percent sign, use . As in C, numerical values that form part of the format (width and or precision) may be given directly as numbers, as in 10.4f. or they may be given as variables. In the latter case, one puts asterisks into the format string and supplies corresponding arguments in order. For example, If the dependent variable is a binary variable (all values are 0 or 1) maximum likelihood estimates of the coefficients on indepvars are obtained via the Newton8211Raphson method. As the model is nonlinear the slopes depend on the values of the independent variables. By default the slopes with respect to each of the independent variables are calculated (at the means of those variables) and these slopes replace the usual p-values in the regression output. This behavior can be suppressed my giving the --p-values option. The chi-square statistic tests the null hypothesis that all coefficients are zero apart from the constant. By default, standard errors are computed using the negative inverse of the Hessian. If the --robust flag is given, then QML or Huber8211White standard errors are calculated instead. In this case the estimated covariance matrix is a sandwich of the inverse of the estimated Hessian and the outer product of the gradient. See chapter 10 of Davidson and MacKinnon for details. If the dependent variable is not binary but is discrete, then Ordered Probit estimates are obtained. (If the variable selected as dependent is not discrete, an error is flagged.) Probit for panel data With the --random-effects option, the error term is assumed to be composed of two normally distributed components: one time-invariant term that is specific to the cross-sectional unit or individual (and is known as the individual effect) and one term that is specific to the particular observation. Evaluation of the likelihood for this model involves the use of Gauss-Hermite quadrature for approximating the value of expectations of functions of normal variates. The number of quadrature points used can be chosen through the --quadpoints option (the default is 32). Using more points will increase the accuracy of the results, but at the cost of longer compute time with many quadrature points and a large dataset estimation may be quite time consuming. Besides the usual parameter estimates (and associated statistics) relating to the included regressors, certain additional information is presented on estimation of this sort of model: lnsigma2. the maximum likelihood estimate of the log of the variance of the individual effect sigmau. the estimated standard deviation of the individual effect and rho. the estimated share of the individual effect in the composite error variance (also known as the intra-class correlation). The Likelihood Ratio test of the null hypothesis that rho equals zero provides a means of assessing whether the random effects specification is needed. If the null is not rejected that suggests that a simple pooled probit specification is adequate. Menu path: ModelLimited dependent variableProbit pvalue W shape scale x Computes the area to the right of xval in the specified distribution ( z for Gaussian, t for Students t . X for chi-square, F for F . G for gamma, B for binomial, P for Poisson, or W for Weibull). Depending on the distribution, the following information must be given, before the xval . for the t and chi-square distributions, the degrees of freedom for F . the numerator and denominator degrees of freedom for gamma, the shape and scale parameters for the binomial distribution, the success probability and the number of trials for the Poisson distribution, the parameter lambda (which is both the mean and the variance) and for the Weibull distribution, shape and scale parameters. As shown in the examples above, the numerical parameters may be given in numeric form or as the names of variables. The parameters for the gamma distribution are sometimes given as mean and variance rather than shape and scale. The mean is the product of the shape and the scale the variance is the product of the shape and the square of the scale. So the scale may be found as the variance divided by the mean, and the shape as the mean divided by the scale. Menu path: ToolsP-value finder --limit-to list (limit test to subset of regressors) --plot mode-or-filename (see below) --quiet (suppress printed output) For a model estimated on time-series data via OLS, performs the Quandt likelihood ratio (QLR) test for a structural break at an unknown point in time, with 15 percent trimming at the beginning and end of the sample period. For each potential break point within the central 70 percent of the observations, a Chow test is performed. See chow for details as with the regular Chow test, this is a robust Wald test if the original model was estimated with the --robust option, an F-test otherwise. The QLR statistic is then the maximum of the individual test statistics. An asymptotic p-value is obtained using the method of Bruce Hansen (1997). Besides the standard hypothesis test accessors test and pvalue. qlrbreak can be used to retrieve the index of the observation at which the test statistic is maximized. The --limit-to option can be used to limit the set of interactions with the split dummy variable in the Chow tests to a subset of the original regressors. The parameter for this option must be a named list, all of whose members are among the original regressors. The list should not include the constant. When this command is run interactively (only), a plot of the Chow test statistic is displayed by default. This can be adjusted via the --plot option. The acceptable parameters to this option are none (to suppress the plot) display (to display a plot even when not in interactive mode) or a file name. The effect of providing a file name is as described for the --output option of the gnuplot command. Menu path: Model window, TestsQLR test --output filename (send plot to specified file) Given just one series argument, displays a plot of the empirical quantiles of the selected series (given by name or ID number) against the quantiles of the normal distribution. The series must include at least 20 valid observations in the current sample range. By default the empirical quantiles are plotted against quantiles of the normal distribution having the same mean and variance as the sample data, but two alternatives are available: if the --z-scores option is given the data are standardized, while if the --raw option is given the raw empirical quantiles are plotted against the quantiles of the standard normal distribution. The option --output has the effect of sending the output to the specified file use display to force output to the screen. See the gnuplot command for more detail on this option. Given two series arguments, y and x . displays a plot of the empirical quantiles of y against those of x . The data values are not standardized. Menu path: VariableNormal Q-Q plot Menu path: ViewGraph specified varsQ-Q plot --robust (robust standard errors) --intervals level (compute confidence intervals) --vcv (print covariance matrix) --quiet (suppress printing of results) quantreg 0.25 y 0 xlist quantreg 0.5 y 0 xlist --intervals quantreg 0.5 y 0 xlist --intervals.95 quantreg tauvec y 0 xlist --robust Quantile regression. The first argument, tau . is the conditional quantile for which estimates are wanted. It may be given either as a numerical value or as the name of a pre-defined scalar variable the value must be in the range 0.01 to 0.99. (Alternatively, a vector of values may be given for tau see below for details.) The second and subsequent arguments compose a regression list on the same pattern as ols. Without the --intervals option, standard errors are printed for the quantile estimates. By default, these are computed according to the asymptotic formula given by Koenker and Bassett (1978). but if the --robust option is given, standard errors that are robust with respect to heteroskedasticity are calculated using the method of Koenker and Zhao (1994). When the --intervals option is chosen, confidence intervals are given for the parameter estimates instead of standard errors. These intervals are computed using the rank inversion method, and in general they are asymmetrical about the point estimates. The specifics of the calculation are inflected by the --robust option: without this, the intervals are computed on the assumption of IID errors (Koenker, 1994) with it, they use the robust estimator developed by Koenker and Machado (1999). By default, 90 percent confidence intervals are produced. You can change this by appending a confidence level (expressed as a decimal fraction) to the intervals option, as in --intervals0.95. Vector-valued tau . instead of supplying a scalar, you may give the name of a pre-defined matrix. In this case estimates are computed for all the given tau values and the results are printed in a special format, showing the sequence of quantile estimates for each regressor in turn. Menu path: ModelRobust estimationQuantile regression Exits from the program, giving you the option of saving the output from the session on the way out. Menu path: FileExit Changes the name of series (identified by name or ID number) to newname . The new name must be of 31 characters maximum, must start with a letter, and must be composed of only letters, digits, and the underscore character. In addition, it must not be the name of an existing object of any kind. Menu path: VariableEdit attributes Other access: Main window pop-up menu (single selection) --quiet (dont print the auxiliary regression) --full (OLS and VECMs only, see below) Imposes a set of (usually linear) restrictions on either (a) the model last estimated or (b) a system of equations previously defined and named. In all cases the set of restrictions should be started with the keyword restrict and terminated with end restrict. In the single equation case the restrictions are always implicitly to be applied to the last model, and they are evaluated as soon as the restrict block is closed. In the case of a system of equations (defined via the system command), the initial restrict may be followed by the name of a previously defined system of equations. If this is omitted and the last model was a system then the restrictions are applied to the last model. By default the restrictions are evaluated when the system is next estimated, using the estimate command. But if the --wald option is given the restriction is tested right away, via a Wald chi-square test on the covariance matrix. Note that this option will produce an error if a system has been defined but not yet estimated. Depending on the context, the restrictions to be tested may be expressed in various ways. The simplest form is as follows: each restriction is given as an equation, with a linear combination of parameters on the left and a scalar value to the right of the equals sign (either a numerical constant or the name of a scalar variable). In the single-equation case, parameters may be referenced in the form b i . where i represents the position in the list of regressors (starting at 1), or b varname . where varname is the name of the regressor in question. In the system case, parameters are referenced using b plus two numbers in square brackets. The leading number represents the position of the equation within the system and the second number indicates position in the list of regressors. For example b2,1 denotes the first parameter in the second equation, and b3,2 the second parameter in the third equation. The b terms in the equation representing a restriction may be prefixed with a numeric multiplier, for example 3.5b4. Here is an example of a set of restrictions for a previously estimated model: And here is an example of a set of restrictions to be applied to a named system. (If the name of the system does not contain spaces, the surrounding quotes are not required.) In the single-equation case the restrictions are by default evaluated via a Wald test, using the covariance matrix of the model in question. If the original model was estimated via OLS then the restricted coefficient estimates are printed to suppress this, append the --quiet option flag to the initial restrict command. As an alternative to the Wald test, for models estimated via OLS or WLS only, you can give the --bootstrap option to perform a bootstrapped test of the restriction. In the system case, the test statistic depends on the estimator chosen: a Likelihood Ratio test if the system is estimated using a Maximum Likelihood method, or an asymptotic F - test otherwise. There are two alternatives to the method of expressing restrictions discussed above. First, a set of g linear restrictions on a k - vector of parameters, beta, may be written compactly as R beta 8211 q 0, where R is an g x k matrix and q is a g - vector. You can specify a restriction by giving the names of pre-defined, conformable matrices to be used as R and q . as in Secondly, if you wish to test a nonlinear restriction (this is currently available for single-equation models only) you should give the restriction as the name of a function, preceded by rfunc , as in The constraint function should take a single const matrix argument this will be automatically filled out with the parameter vector. And it should return a vector which is zero under the null hypothesis, non-zero otherwise. The length of the vector is the number of restrictions. This function is used as a callback by gretls numerical Jacobian routine, which calculates a Wald test statistic via the delta method. Here is a simple example of a function suitable for testing one nonlinear restriction, namely that two pairs of parameter values have a common ratio. On successful completion of the restrict command the accessors test and pvalue give the test statistic and its p-value. When testing restrictions on a single-equation model estimated via OLS, or on a VECM, the --full option can be used to set the restricted estimates as the last model for the purposes of further testing or the use of accessors such as coeff and vcv. Note that some special considerations apply in the case of testing restrictions on Vector Error Correction Models. Please see chapter 27 of the Gretl Users Guide for details. Menu path: Model window, TestsLinear restrictions scatters y1 y2 y3 x --with-lines Generates pairwise graphs of yvar against all the variables in xvars . or of all the variables in yvars against xvar . The first example above puts variable 1 on the y - axis and draws four graphs, the first having variable 2 on the x - axis, the second variable 3 on the x - axis, and so on. The second example plots each of variables 1 through 6 against variable 7 on the x - axis. Scanning a set of such plots can be a useful step in exploratory data analysis. The maximum number of plots is 16 any extra variable in the list will be ignored. By default the graphs are scatterplots, but if you give the --with-lines flag they will be line graphs. For details on usage of the --output option, please see the gnuplot command. If a named matrix is specified as the data source the x and y lists should be given as 1-based column numbers or alternatively, if no such numbers are given, all the columns are plotted against time or an index variable. If the dataset is time-series, then the second sub-list can be omitted, in which case it will implicitly be taken as time, so you can plot multiple time series in separated sub-graphs. Menu path: ViewMultiple graphs The seasonal difference of each variable in varlist is obtained and the result stored in a new variable with the prefix sd. This command is available only for seasonal time series. Menu path: AddSeasonal differences of selected variables The most common use of this command is the first variant shown above, where it is used to set the value of a selected program parameter. This is discussed in detail below. The other uses are: with --to-file. to write a script file containing all the current parameter settings with --from-file to read a script file containing parameter settings and apply them to the current session with stopwatch to zero the gretl stopwatch which can be used to measure CPU time (see the entry for the stopwatch accessor) or, if the word set is given alone, to print the current settings. Values set via this comand remain in force for the duration of the gretl session unless they are changed by a further call to set. The parameters that can be set in this way are enumerated below. Note that the settings of hcversion. haclag and hackernel are used when the --robust option is given to an estimation command. The available settings are grouped under the following categories: program interaction and behavior, numerical methods, random number generation, robust estimation, filtering, time series estimation, and interaction with GNU R. Program interaction and behavior These settings are used for controlling various aspects of the way gretl interacts with the user. workdir. path . Sets the default directory for writing and reading files, whenever full paths are not specified. usecwd. on or off (the default). Governs the setting of workdir at start-up: if its on. the working directory is inherited from the shell, otherwise it is set to whatever was selected in the previous gretl session. csvdelim. either comma (the default), space. tab or semicolon. Sets the column delimiter used when saving data to file in CSV format. csvwritena. the string used to represent missing values when writing data to file in CSV format. Maximum 7 characters the default is NA. csvreadna. the string taken to represent missing values (NAs) when reading data in CSV format. Maximum 7 characters. The default depends on whether a data column is found to contain numerical data (mostly) or string values. For numerical data the following are taken as indicating NAs: an empty cell, or any of the strings NA. N. A. na. n. a. NA. NA. NaN. NaN. -999. and -9999. For string-valued data only a blank cell, or a cell containing an empty string, is counted as NA. These defaults can be reimposed by giving default as the value for csvreadna. To specify that only empty cells are read as NAs, give a value of . Note that empty cells are always read as NAs regardless of the setting of this variable. csvdigits. a positive integer specifying the number of significant digits to use when writing data in CSV format. By default up to 15 digits are used depending on the precision of the original data. Note that CSV output employs the C librarys fprintf function with g conversion, which means that trailing zeros are dropped. mwriteg. on or off (the default). When writing a matrix to file as text, gretl by default uses scientific notation with 18-digit precision, hence ensuring that the stored values are a faithful representation of the numbers in memory. When writing primary data with no more than 6 digits of precision it may be preferable to use g format for a more compact and human-readable file you can make this switch via set mwriteg on. echo. off or on (the default). Suppress or resume the echoing of commands in gretls output. forcedecpoint. on or off (the default). Force gretl to use the decimal point character, in a locale where another character (most likely the comma) is the standard decimal separator. loopmaxiter. one non-negative integer value (default 100000). Sets the maximum number of iterations that a while loop is allowed before halting (see loop ). Note that this setting only affects the while variant its purpose is to guard against inadvertently infinite loops. Setting this value to 0 has the effect of disabling the limit use with caution. maxverbose. on or off (the default). Toggles verbose output for the BFGSmax and NRmax functions (see the Users Guide for details). messages. off or on (the default). Suppress or resume the printing of non-error messages associated with various commands, for example when a new variable is generated or when the sample range is changed. warnings. off or on (the default). Suppress or resume the printing of warning messages issued when arithmetical operations produce non-finite values. debug. 1. 2 or 0 (the default). This is for use with user-defined functions. Setting debug to 1 is equivalent to turning messages on within all such functions setting this variable to 2 has the additional effect of turning on maxverbose within all functions. shellok. on or off (the default). Enable launching external programs from gretl via the system shell. This is disabled by default for security reasons, and can only be enabled via the graphical user interface (ToolsPreferencesGeneral). However, once set to on, this setting will remain active for future sessions until explicitly disabled. shelldir. path . Sets the current working directory for shell commands issued from within gretl. bfgsverbskip. one integer. This setting affects the behavior of the --verbose option to those commands that use BFGS as an optimization algorithm and is used to compact output. if bfgsverbskip is set to, say, 3, then the --verbose switch will only print iterations 3, 6, 9 and so on. skipmissing. on (the default) or off. Controls gretls behavior when contructing a matrix from data series: the default is to skip data rows that contain one or more missing values but if skipmissing is set off missing values are converted to NaNs. matrixmask. the name of a series, or the keyword null. Offers greater control than skipmissing when constructing matrices from series: the data rows selected for matrices are those with non-zero (and non-missing) values in the specified series. The selected mask remains in force until it is replaced, or removed via the null keyword. huge. a large positive number (by default, 1.0E100). This setting controls the value returned by the accessor huge. Numerical methods These settings are used for controlling the numerical algorithms that gretl uses for estimation. optimizer. either auto (the default), BFGS or newton. Sets the optimization algorithm used for various ML estimators, in cases where both BFGS and Newton8211Raphson are applicable. The default is to use Newton8211Raphson where an analytical Hessian is available, otherwise BFGS. bhhhmaxiter. one integer, the maximum number of iterations for gretls internal BHHH routine, which is used in the arma command for conditional ML estimation. If convergence is not achieved after bhhhmaxiter. the program returns an error. The default is set at 500. bhhhtoler. one floating point value, or the string default. This is used in gretls internal BHHH routine to check if convergence has occurred. The algorithm stops iterating as soon as the increment in the log-likelihood between iterations is smaller than bhhhtoler. The default value is 1.0E821106 this value may be re-established by typing default in place of a numeric value. bfgsmaxiter. one integer, the maximum number of iterations for gretls BFGS routine, which is used for mle. gmm and several specific estimators. If convergence is not achieved in the specified number of iterations, the program returns an error. The default value depends on the context, but is typically of the order of 500. bfgstoler. one floating point value, or the string default. This is used in gretls BFGS routine to check if convergence has occurred. The algorithm stops as soon as the relative improvement in the objective function between iterations is smaller than bfgstoler. The default value is the machine precision to the power 34 this value may be re-established by typing default in place of a numeric value. bfgsmaxgrad. one floating point value. This is used in gretls BFGS routine to check if the norm of the gradient is reasonably close to zero when the bfgstoler criterion is met. A warning is printed if the norm of the gradient exceeds 1 an error is flagged if the norm exceeds bfgsmaxgrad. At present the default is the permissive value of 5.0. bfgsrichardson. on or off (the default). Use Richardson extrapolation when computing numerical derivatives in the context of BFGS maximization. initvals. either auto (the default) or the name of a pre-specified matrix. Allows manual setting of the initial parameter estimates for numerical optimization problems (such as ARMA estimation). For details see chapter 25 of the Gretl Users Guide. lbfgs. on or off (the default). Use the limited-memory version of BFGS (L-BFGS-B) instead of the ordinary algorithm. This may be advantageous when the function to be maximized is not globally concave. lbfgsmem. an integer value in the range 3 to 20 (with a default value of 8). This determines the number of corrections used in the limited memory matrix when L-BFGS-B is employed. nlstoler. a floating-point value. Sets the tolerance used in judging whether or not convergence has occurred in nonlinear least squares estimation using the nls command. The default value is the machine precision to the power 34 this value may be re-established by typing default in place of a numeric value. svd. on or off (the default). Use SVD rather than Cholesky or QR decomposition in least squares calculations. This option applies to the mols function as well as various internal calculations, but not to the regular ols command. forceqr. on or off (the default). This applies to the ols command. By default this command computes OLS estimates using Cholesky decomposition (the fastest method), with a fallback to QR if the data seem too ill-conditioned. You can use forceqr to skip the Cholesky step in doubtful cases this may ensure greater accuracy. fcp. on or off (the default). Use the algorithm of Fiorentini, Calzolari and Panattoni rather than native gretl code when computing GARCH estimates. gmmmaxiter. one integer, the maximum number of iterations for gretls gmm command when in iterated mode (as opposed to one - or two-step). The default value is 250. nadarwattrim. one integer, the trim parameter used in the nadarwat function. fdjacquality. one integer between 0 and 2, the algorithm used by the fdjac function. Random number generation seed. an unsigned integer. Sets the seed for the pseudo-random number generator. By default this is set from the system time if you want to generate repeatable sequences of random numbers you must set the seed manually. Robust estimation bootrep. an integer. Sets the number of replications for the restrict command with the --bootstrap option. garchvcv. unset. hessian. im (information matrix). op (outer product matrix), qml (QML estimator), bw (Bollerslev8211Wooldridge). Specifies the variant that will be used for estimating the coefficient covariance matrix, for GARCH models. If unset is given (the default) then the Hessian is used unless the robust option is given for the garch command, in which case QML is used. armavcv. hessian (the default) or op (outer product matrix). Specifies the variant to be used when computing the covariance matrix for ARIMA models. forcehc. off (the default) or on. By default, with time-series data and when the --robust option is given with ols. the HAC estimator is used. If you set forcehc to on, this forces calculation of the regular Heteroskedasticity Consistent Covariance Matrix (HCCM), which does not take autocorrelation into account. Note that VARs are treated as a special case: when the --robust option is given the default method is regular HCCM, but the --robust-hac flag can be used to force the use of a HAC estimator. robustz. off (the default) or on. This controls the distribution used when calculating p-values based on robust standard errors in the context of least-squares estimators. By default gretl uses the Student t distribution but if robustz is turned on the normal distribution is used. haclag. nw1 (the default), nw2. nw3 or an integer. Sets the maximum lag value or bandwidth, p . used when calculating HAC (Heteroskedasticity and Autocorrelation Consistent) standard errors using the Newey-West approach, for time series data. nw1 and nw2 represent two variant automatic calculations based on the sample size, T . for nw1, , and for nw2, . nw3 calls for data-based bandwidth selection. See also qsbandwidth and hacprewhiten below. hackernel. bartlett (the default), parzen. or qs (Quadratic Spectral). Sets the kernel, or pattern of weights, used when calculating HAC standard errors. hacprewhiten. on or off (the default). Use Andrews-Monahan prewhitening and re-coloring when computing HAC standard errors. This also implies use of data-based bandwidth selection. hcversion. 0 (the default), 1, 2, 3 or 3a. Sets the variant used when calculating Heteroskedasticity Consistent standard errors with cross-sectional data. The first four options correspond to the HC0, HC1, HC2 and HC3 discussed by Davidson and MacKinnon in Econometric Theory and Methods . chapter 5. HC0 produces what are usually called Whites standard errors. Variant 3a is the MacKinnon8211White jackknife procedure. pcse. off (the default) or on. By default, when estimating a model using pooled OLS on panel data with the --robust option, the Arellano estimator is used for the covariance matrix. If you set pcse to on, this forces use of the Beck and Katz Panel Corrected Standard Errors (which do not take autocorrelation into account). qsbandwidth. Bandwidth for HAC estimation in the case where the Quadratic Spectral kernel is selected. (Unlike the Bartlett and Parzen kernels, the QS bandwidth need not be an integer.) Time series horizon. one integer (the default is based on the frequency of the data). Sets the horizon for impulse responses and forecast variance decompositions in the context of vector autoregressions. vecmnorm. phillips (the default), diag. first or none. Used in the context of VECM estimation via the vecm command for identifying the cointegration vectors. See the chapter 27 of the Gretl Users Guide for details. Interaction with R Rlib. on (the default) or off. When sending instructions to be executed by R, use the R shared library by preference to the R executable, if the library is available. Rfunctions. off (the default) or on. Recognize functions defined in R as if they were native functions (the namespace prefix R. is required). See chapter 36 of the Gretl Users Guide for details on this and the previous item. setinfo z --discrete If the options --description or --graph-name are invoked the argument must be a single series, otherwise it may be a list of series in which case it operates on all members of the list. This command sets up to four attributes as follows. If the --description flag is given followed by a string in double quotes, that string is used to set the variables descriptive label. This label is shown in response to the labels command, and is also shown in the main window of the GUI program. If the --graph-name flag is given followed by a quoted string, that string will be used in place of the variables name in graphs. If one or other of the --discrete or --continuous option flags is given, the variables numerical character is set accordingly. The default is to treat all series as continuous setting a series as discrete affects the way the variable is handled in frequency plots. The --midas option sets a flag indicating that a given series holds data of a higher frequency than the base frequency of the dataset for example, the dataset is quarterly and the series holds values for month 1, 2 or 3 of each quarter. (MIDAS Mixed Data Sampling.) Menu path: VariableEdit attributes Other access: Main window pop-up menu Get the program to interpret some specific numerical data value (the first parameter to the command) as a code for missing, in the case of imported data. If this value is the only parameter, as in the first example above, the interpretation will be applied to all series in the data set. If value is followed by a list of variables, by name or number, the interpretation is confined to the specified variable(s). Thus in the second example the data value 100 is interpreted as a code for missing, but only for the variable x2. Menu path: DataSet missing value code setobs periodicity startobs setobs unitvar timevar --panel-vars --cross-section (interpret as cross section) --time-series (interpret as time series) --special-time-series (see below) --stacked-cross-section (interpret as panel data) --stacked-time-series (interpret as panel data) --panel-vars (use index variables, see below) --panel-time (see below) --panel-groups (see below) setobs 4 1990:1 --time-series setobs 12 1978:03 setobs 1 1 --cross-section setobs 20 1:1 --stacked-time-series setobs unit year --panel-vars This command forces the program to interpret the current data set as having a specified structure. In the first form of the command the periodicity . which must be an integer, represents frequency in the case of time-series data (1 annual 4 quarterly 12 monthly 52 weekly 5, 6, or 7 daily 24 hourly). In the case of panel data the periodicity means the number of lines per data block: this corresponds to the number of cross-sectional units in the case of stacked cross-sections, or the number of time periods in the case of stacked time series. In the case of simple cross-sectional data the periodicity should be set to 1. The starting observation represents the starting date in the case of time series data. Years may be given with two or four digits subperiods (for example, quarters or months) should be separated from the year with a colon. In the case of panel data the starting observation should be given as 1:1 and in the case of cross-sectional data, as 1. Starting observations for daily or weekly data should be given in the form YYYY-MM-DD (or simply as 1 for undated data). Certain time-series periodicities have standard interpretations8212for example, 12 monthly and 4 quarterly. If you have unusual time-series data to which the standard interpretation does not apply, you can signal this by giving the --special-time-series option. In that case gretl will not (for example) report your frequency-12 data as being monthly. If no explicit option flag is given to indicate the structure of the data the program will attempt to guess the structure from the information given. The second form of the command (which requires the --panel-vars flag) may be used to impose a panel interpretation when the data set contains variables that uniquely identify the cross-sectional units and the time periods. The data set will be sorted as stacked time series, by ascending values of the units variable, unitvar . Panel-specific options The --panel-time and --panel-groups options can only be used with a dataset which has already been defined as a panel. The purpose of --panel-time is to set extra information regarding the time dimension of the panel. This should be given on the pattern of the first form of setobs noted above. For example, the following may be used to indicate that the time dimension of a panel is quarterly, starting in the first quarter of 1990. The purpose of --panel-groups is to create a string-valued series holding names for the groups (individuals, cross-sectional units) in the panel. (This will be used where appropriate in panel graphs.) With this option you supply either one or two arguments as follows. First case: the (single) argument is the name of a string-valued series. If the number of distinct values equals the number of groups in the panel this series is used to define the group names. If necessary, the numerical content of the series will be adjusted such that the values are all 1s for the first group, all 2s for the second, and so on. If the number of string values doesnt match the number of groups an error is flagged. Second case: the first argument is the name of a series and the second is a string literal or variable holding a name for each group. The series will be created if it does not already exist. If the second argument is a string literal or string variable the group names should be separated by spaces if a name includes spaces it should be wrapped in backslash-escaped double-quotes. Alternatively the second argument may be an array of strings. For example, the following will create a series named country in which the names in cstrs are each repeated T times, T being the time-series length of the panel. Menu path: DataDataset structure smpl 100 --random Resets the sample range. The new range can be defined in several ways. In the first alternate form (and the first two examples) above, startobs and endobs must be consistent with the periodicity of the data. Either one may be replaced by a semicolon to leave the value unchanged. In the second form, the integers i and j (which may be positive or negative, and should be signed) are taken as offsets relative to the existing sample range. In the third form dummyvar must be an indicator variable with values 0 or 1 at each observation the sample will be restricted to observations where the value is 1. The fourth form, using --restrict. restricts the sample to observations that satisfy the given Boolean condition (which is specified according to the syntax of the genr command). The options --no-missing and --no-all-missing may be used to exclude from the sample observations for which data are missing. The first variant excludes those rows in the dataset for which at least one variable has a missing value, while the second excludes just those rows on which all variables have missing values. In each case the test is confined to the variables in varlist if this argument is given, otherwise it is applied to all series8212with the qualification that in the case of --no-all-missing and no varlist . the generic variables index and time are ignored. The --contiguous form of smpl is intended for use with time series data. The effect is to trim any observations at the start and end of the current sample range that contain missing values (either for the variables in varlist . or for all data series if no varlist is given). Then a check is performed to see if there are any missing values in the remaining range if so, an error is flagged. With the --random flag, the specified number of cases are selected from the current dataset at random (without replacement). If you wish to be able to replicate this selection you should set the seed for the random number generator first (see the set command). The final form, smpl full. restores the full data range. Note that sample restrictions are, by default, cumulative: the baseline for any smpl command is the current sample. If you wish the command to act so as to replace any existing restriction you can add the option flag --replace to the end of the command. (But this option is not compatible with the --contiguous option.) The internal variable obs may be used with the --restrict form of smpl to exclude particular observations from the sample. For example will drop just the fourth observation. If the data points are identified by labels, will drop the observation with label USA. One point should be noted about the --dummy. --restrict and --no-missing forms of smpl. structural information in the data file (regarding the time series or panel nature of the data) is likely to be lost when this command is issued. You may reimpose structure with the setobs command. A related option, for use with panel data, is the --balanced flag: this requests that a balanced panel is reconstituted after sub-sampling, via the insertion of missing rows if need be. But note that it is not always possible to comply with this request. By default, restrictions on the current sample range are undoable: by doing smpl full you can restore the unrestricted dataset. However, the --permanent flag can be used to substitute the restricted dataset for the original. This option is only available in conjunction with the --restrict. --dummy. --no-missing. --no-all-missing or --random forms of smpl. Please see chapter 5 of the Gretl Users Guide for further details. Menu path: Sample --comment string (see below) Save data to filename . By default all currently defined series are saved but the optional varlist argument can be used to select a subset of series. If the dataset is sub-sampled, only the observations in the current sample range are saved. The output file will be written in the currently set workdir. unless the filename string contains a full path specification. The format in which the data are written may be controlled in the first instance by the extension or suffix of filename . as follows: gdt. or no extension: gretls native XML data format. (If no extension is provided,.gdt is added automatically.) gtdb. gretls native binary data format. csv. comma-separated values (CSV). txt or. asc. space-separated values. m. GNU Octave format. dta. Stata dta format (version 113). The format-related option flags shown above can be used to force the issue of the save format independently of the filename (or to get gretl to write in the formats of PcGive or JMulTi). However, if filename has extension. gdt or. gdtb this necessarily implies use of native format and the addition of a conflicting option flag will generate an error. When data are saved in native format (only), the --gzipped option may be used for data compression, which can be useful for large datasets. The optional parameter for this flag controls the level of compression (from 0 to 9): higher levels produce a smaller file, but compression takes longer. The default level is 1 a level of 0 means that no compression is applied. The option flags --omit-obs and --no-header are applicable only when saving data in CSV format. By default, if the data are time series or panel, or if the dataset includes specific observation markers, the CSV file includes a first column identifying the observations (e. g. by date). If the --omit-obs flag is given this column is omitted. The --no-header flag suppresses the usual printing of the names of the variables at the top of the columns. The option flag --decimal-comma is also confined to the case of saving data in CSV format. The effect of this option is to replace the decimal point with the decimal comma in addition the column separator is forced to be a semicolon. The option of saving in gretl database format is intended to help with the construction of large sets of series, possibly having mixed frequencies and ranges of observations. At present this option is available only for annual, quarterly or monthly time-series data. If you save to a file that already exists, the default action is to append the newly saved series to the existing content of the database. In this context it is an error if one or more of the variables to be saved has the same name as a variable that is already present in the database. The --overwrite flag has the effect that, if there are variable names in common, the newly saved variable replaces the variable of the same name in the original dataset. The --comment option is available when saving data as a database or in CSV format. The required parameter is a double-quoted one-line string, attached to the option flag with an equals sign. The string is inserted as a comment into the database index file or at the top of the CSV output. The store command behaves in a special manner in the context of a progressive loop. See chapter 12 of the Gretl Users Guide for details. Menu path: FileSave data FileExport data --by byvar (see below) In its first form, this command prints summary statistics for the variables in varlist . or for all the variables in the data set if varlist is omitted. By default, output consists of the mean, standard deviation (sd), coefficient of variation ( sdmean), median, minimum, maximum, skewness coefficient, and excess kurtosis. If the --simple option is given, output is restricted to the mean, minimum, maximum and standard deviation. If the --by option is given (in which case the parameter byvar should be the name of a discrete variable), then statistics are printed for sub-samples corresponding to the distinct values taken on by byvar . For example, if byvar is a (binary) dummy variable, statistics are given for the cases byvar 0 and byvar 1. Note: at present, this option is incompatible with the --weight option. If the alternative form is given, using a named matrix, then summary statistics are printed for each column of the matrix. The --by option is not available in this case. Menu path: ViewSummary statistics Other access: Main window pop-up menu system method estimator Starts a system of equations. Either of two forms of the command may be given, depending on whether you wish to save the system for estimation in more than one way or just estimate the system once. To save the system you should assign it a name, as in the first example (if the name contains spaces it must be surrounded by double quotes). In this case you estimate the system using the estimate command. With a saved system of equations, you are able to impose restrictions (including cross-equation restrictions) using the restrict command. Alternatively you can specify an estimator for the system using method followed by a string identifying one of the supported estimators: ols (Ordinary Least Squares), tsls (Two-Stage Least Squares) sur (Seemingly Unrelated Regressions), 3sls (Three-Stage Least Squares), fiml (Full Information Maximum Likelihood) or liml (Limited Information Maximum Likelihood). In this case the system is estimated once its definition is complete. An equation system is terminated by the line end system. Within the system four sorts of statement may be given, as follows. equation. specify an equation within the system. At least two such statements must be provided. instr. for a system to be estimated via Three-Stage Least Squares, a list of instruments (by variable name or number). Alternatively, you can put this information into the equation line using the same syntax as in the tsls command. endog. for a system of simultaneous equations, a list of endogenous variables. This is primarily intended for use with FIML estimation, but with Three-Stage Least Squares this approach may be used instead of giving an instr list then all the variables not identified as endogenous will be used as instruments. identity. for use with FIML, an identity linking two or more of the variables in the system. This sort of statement is ignored when an estimator other than FIML is used. After estimation using the system or estimate commands the following accessors can be used to retrieve additional information: uhat. the matrix of residuals, one column per equation. yhat. matrix of fitted values, one column per equation. coeff. column vector of coefficients (all the coefficients from the first equation, followed by those from the second equation, and so on). vcv. covariance matrix of the coefficients. If there are k elements in the coeff vector, this matrix is k by k . sigma. cross-equation residual covariance matrix. sysGamma. sysA and sysB. structural-form coefficient matrices (see below). If you want to retrieve the residuals or fitted values for a specific equation as a data series, select a column from the uhat or yhat matrix and assign it to a series, as in The structural-form matrices correspond to the following representation of a simultaneous equations model: If there are n endogenous variables and k exogenous variables, Gamma is an n x n matrix and B is n x k . If the system contains no lags of the endogenous variables then the A matrix is not present. If the maximum lag of an endogenous regressor is p . the A matrix is n x np . Menu path: ModelSimultaneous equations --formatf1f2f3f4 (Specify a custom format) --output filename (send output to specified file) Must follow the estimation of a model. Prints the estimated model in tabular form 8212 by default as LaTeX, but as RTF if the --rtf flag is given or as CSV is the --csv flag is given. If a filename is specified using the --output option output goes to that file, otherwise it goes to a file with a name of the form modelN followed by the extension tex. rtf or csv. where N is the number of models estimated to date in the current session. The output file will be written in the currently set workdir. unless the filename string contains a full path specification. If CSV format is selected, values are comma-separated unless the decimal comma is in force, in which case the separator is the semicolon. Note that CSV output may be less complete than the other formats. The further options discussed below are available only when printing the model as LaTeX. If the --complete flag is given the LaTeX file is a complete document, ready for processing otherwise it must be included in a document. If you wish alter the appearance of the tabular output, you can specify a custom row format using the --format flag. The format string must be enclosed in double quotes and must be tied to the flag with an equals sign. The pattern for the format string is as follows. There are four fields, representing the coefficient, standard error, t - ratio and p-value respectively. These fields should be separated by vertical bars they may contain a printf - type specification for the formatting of the numeric value in question, or may be left blank to suppress the printing of that column (subject to the constraint that you cant leave all the columns blank). Here are a few examples: The first of these specifications prints the values in all columns using 4 decimal places. The second suppresses the p-value and prints the t - ratio to 3 places. The third omits the t - ratio. The last one again omits the t . and prints both coefficient and standard error to 8 significant figures. Once you set a custom format in this way, it is remembered and used for the duration of the gretl session. To revert to the default format you can use the special variant --formatdefault. Menu path: Model window, LaTeX --time-series (plot by observation) --one-scale (force a single scale) --tall (use 40 rows) Quick and simple ASCII graphics. Without the --time-series flag, varlist must contain at least two series, the last of which is taken as the variable for the x axis, and a scatter plot is produced. In this case the --tall option may be used to produce a graph in which the y axis is represented by 40 rows of characters (the default is 20 rows). With the --time-series. a plot by observation is produced. In this case the option --one-scale may be used to force the use of a single scale otherwise if varlist contains more than one series the data may be scaled. Each line represents an observation, with the data values plotted horizontally. tsls y1 0 y2 y3 x1 x2 0 x1 x2 x3 x4 x5 x6 Computes Instrumental Variables (IV) estimates, by default using two-stage least squares (TSLS) but see below for further options. The dependent variable is depvar . indepvars is the list of regressors (which is presumed to include at least one endogenous variable) and instruments is the list of instruments (exogenous andor predetermined variables). If the instruments list is not at least as long as indepvars . the model is not identified. In the above example, the y s are endogenous and the x s are the exogenous variables. Note that exogenous regressors should appear in both lists. Output for two-stage least squares estimates includes the Hausman test and, if the model is over-identified, the Sargan over-identification test. In the Hausman test, the null hypothesis is that OLS estimates are consistent, or in other words estimation by means of instrumental variables is not really required. A model of this sort is over-identified if there are more instruments than are strictly required. The Sargan test is based on an auxiliary regression of the residuals from the two-stage least squares model on the full list of instruments. The null hypothesis is that all the instruments are valid, and suspicion is thrown on this hypothesis if the auxiliary regression has a significant degree of explanatory power. For a good explanation of both tests see chapter 8 of Davidson and MacKinnon (2004). For both TSLS and LIML estimation, an additional test result is shown provided that the model is estimated under the assumption of i. i.d. errors (that is, the --robust option is not selected). This is a test for weakness of the instruments. Weak instruments can lead to serious problems in IV regression: biased estimates andor incorrect size of hypothesis tests based on the covariance matrix, with rejection rates well in excess of the nominal significance level (Stock, Wright and Yogo, 2002). The test statistic is the first-stage F - test if the model contains just one endogenous regressor, otherwise it is the smallest eigenvalue of the matrix counterpart of the first stage F . Critical values based on the Monte Carlo analysis of Stock and Yogo (2003) are shown when available. The R-squared value printed for models estimated via two-stage least squares is the square of the correlation between the dependent variable and the fitted values. For details on the effects of the --robust and --cluster options, please see the help for ols. As alternatives to TSLS, the model may be estimated via Limited Information Maximum Likelihood (the --liml option) or via the Generalized Method of Moments ( --gmm option). Note that if the model is just identified these methods should produce the same results as TSLS, but if it is over-identified the results will differ in general. If GMM estimation is selected, the following additional options become available: --two-step. perform two-step GMM rather than the default of one-step. --iterate. Iterate GMM to convergence. --weights Wmat . specify a square matrix of weights to be used when computing the GMM criterion function. The dimension of this matrix must equal the number of instruments. The default is an appropriately sized identity matrix. Menu path: ModelInstrumental variables var 12 x1 x2 x3 --lagselect Sets up and estimates (using OLS) a vector autoregression (VAR). The first argument specifies the lag order 8212 or the maximum lag order in case the --lagselect option is given (see below). The order may be given numerically, or as the name of a pre-existing scalar variable. Then follows the setup for the first equation. Do not include lags among the elements of ylist 8212 they will be added automatically. The semi-colon separates the stochastic variables, for which order lags will be included, from any exogenous variables in xlist . Note that a constant is included automatically unless you give the --nc flag, a trend can be added with the --trend flag, and seasonal dummy variables may be added using the --seasonals flag. While a VAR specification usually includes all lags from 1 to a given maximum, it is possible to select a specific set of lags. To do this, substitute for the regular (scalar) order argument either the name of a predefined vector or a comma-separated list of lags, enclosed in braces. We show below two ways of specifying that a VAR should include lags 1, 2 and 4 (but not lag 3): A separate regression is reported for each variable in ylist . Output for each equation includes F - tests for zero restrictions on all lags of each of the variables, an F - test for the significance of the maximum lag, and, if the --impulse-responses flag is given, forecast variance decompositions and impulse responses. Forecast variance decompositions and impulse responses are based on the Cholesky decomposition of the contemporaneous covariance matrix, and in this context the order in which the (stochastic) variables are given matters. The first variable in the list is assumed to be most exogenous within-period. The horizon for variance decompositions and impulse responses can be set using the set command. For retrieval of a specified impulse response function in matrix form, see the irf function. If the --robust option is given, standard errors are corrected for heteroskedasticity. Alternatively, the --robust-hac option can be given to produce standard errors that are robust with respect to both heteroskedasticity and autocorrelation (HAC). In general the latter correction should not be needed if the VAR includes sufficient lags. If the --lagselect option is given, the first parameter to the var command is taken as the maximum lag order. Output consists of a table showing the values of the Akaike (AIC), Schwarz (BIC) and Hannan8211Quinn (HQC) information criteria computed from VARs of order 1 to the given maximum. This is intended to help with the selection of the optimal lag order. The usual VAR output is not presented. The table of information criteria may be retrieved as a matrix via the test accessor. Menu path: ModelTime seriesVector autoregression By default, prints a listing of the series in the current dataset (if any) ls may be used as an alias. If the --type option is given, it should be followed (after an equals sign) by one of the following typenames: series. scalar. matrix. list. string. bundle or accessor. The effect is to print the names of all currently defined objects of the named type. As a special case, if the typename is accessor. the names printed are those of the internal variables currently available as accessors, such as nobs and uhat (regardless of their specific type). Calculates the F statistic for the null hypothesis that the population variances for the variables series1 and series2 are equal, and shows its p-value. Menu path: ToolsTest statistic calculator A VECM is a form of vector autoregression or VAR (see var ), applicable where the variables in the model are individually integrated of order 1 (that is, are random walks, with or without drift), but exhibit cointegration. This command is closely related to the Johansen test for cointegration (see coint2 ). The order parameter to this command represents the lag order of the VAR system. The number of lags in the VECM itself (where the dependent variable is given as a first difference) is one less than order . The rank parameter represents the cointegration rank, or in other words the number of cointegrating vectors. This must be greater than zero and less than or equal to (generally, less than) the number of endogenous variables given in ylist . ylist supplies the list of endogenous variables, in levels. The inclusion of deterministic terms in the model is controlled by the option flags. The default if no option is specified is to include an unrestricted constant, which allows for the presence of a non-zero intercept in the cointegrating relations as well as a trend in the levels of the endogenous variables. In the literature stemming from the work of Johansen (see for example his 1995 book) this is often referred to as case 3. The first four options given above, which are mutually exclusive, produce cases 1, 2, 4 and 5 respectively. The meaning of these cases and the criteria for selecting a case are explained in chapter 27 of the Gretl Users Guide. The optional lists xlist and rxlist allow you to specify sets of exogenous variables which enter the model either unrestrictedly ( xlist ) or restricted to the cointegration space ( rxlist ). These lists are separated from ylist and from each other by semicolons. The --seasonals option, which may be combined with any of the other options, specifies the inclusion of a set of centered seasonal dummy variables. This option is available only for quarterly or monthly data. The first example above specifies a VECM with lag order 4 and a single cointegrating vector. The endogenous variables are Y1. Y2 and Y3. The second example uses the same variables but specifies a lag order of 3 and two cointegrating vectors it also specifies a restricted constant, which is appropriate if the cointegrating vectors may have a non-zero intercept but the Y variables have no trend. Following estimation of a VECM some special accessors are available: jalpha. jbeta and jvbeta retrieve, respectively, the and beta matrices and the estimated variance of beta. For retrieval of a specified impulse response function in matrix form, see the irf function. Menu path: ModelTime seriesVECM Must follow the estimation of a model which includes at least two independent variables. Calculates and displays the Variance Inflation Factors (VIFs) for the regressors. The VIF for regressor j is defined as where R j is the coefficient of multiple correlation between regressor j and the other regressors. The factor has a minimum value of 1.0 when the variable in question is orthogonal to the other independent variables. Neter, Wasserman, and Kutner (1990) suggest inspecting the largest VIF as a diagnostic for collinearity a value greater than 10 is sometimes taken as indicating a problematic degree of collinearity. Menu path: Model window, AnalysisCollinearity --matrix matname (use frequencies from named matrix) Displays a contingency table or cross-tabulation for each combination of the variables included in ylist if a second list xlist is given, each variable in ylist is cross-tabulated by row against each variable in xlist (by column). Variables in these lists can be referenced by name or by number. Note that all the variables must have been marked as discrete. Alternatively, if the --matrix option is given, treat the named matrix as a precomputed set of frequencies and display this as a cross-tabulation. By default the cell entries are given as frequency counts. The --row and --column options (which are mutually exclusive), replace the counts with the percentages for each row or column, respectively. By default, cells with a zero count are left blank the --zeros option, which has the effect of showing zero counts explicitly, may be useful for importing the table into another program, such as a spreadsheet. Pearsons chi-square test for independence is displayed if the expected frequency under independence is at least 1.0e-7 for all cells. A common rule of thumb for the validity of this statistic is that at least 80 percent of cells should have expected frequencies of 5 or greater if this criterion is not met a warning is printed. If the contingency table is 2 by 2, Fishers Exact Test for independence is computed. Note that this test is based on the assumption that the row and column totals are fixed, which may or may not be appropriate depending on how the data were generated. The left p-value should be used when the alternative to independence is negative association (values tend to cluster in the lower left and upper right cells) the right p-value should be used if the alternative is positive association. The two-tailed p-value for this test is calculated by method (b) in section 2.1 of Agresti (1992). it is the sum of the probabilities of all possible tables having the given row and column totals and having a probability less than or equal to that of the observed table. Identifying the numbers of AR or MA terms in an ARIMA model ACF and PACF plots: After a time series has been stationarized by differencing, the next step in fitting an ARIMA model is to determine whether AR or MA terms are needed to correct any autocorrelation that remains in the differenced series. Of course, with software like Statgraphics, you could just try some different combinations of terms and see what works best. But there is a more systematic way to do this. By looking at the autocorrelation function (ACF) and partial autocorrelation (PACF) plots of the differenced series, you can tentatively identify the numbers of AR andor MA terms that are needed. You are already familiar with the ACF plot: it is merely a bar chart of the coefficients of correlation between a time series and lags of itself. The PACF plot is a plot of the partial correlation coefficients between the series and lags of itself. In general, the quotpartialquot correlation between two variables is the amount of correlation between them which is not explained by their mutual correlations with a specified set of other variables. For example, if we are regressing a variable Y on other variables X1, X2, and X3, the partial correlation between Y and X3 is the amount of correlation between Y and X3 that is not explained by their common correlations with X1 and X2. This partial correlation can be computed as the square root of the reduction in variance that is achieved by adding X3 to the regression of Y on X1 and X2. A partial auto correlation is the amount of correlation between a variable and a lag of itself that is not explained by correlations at all lower-order - lags. The autocorrelation of a time series Y at lag 1 is the coefficient of correlation between Y t and Y t - 1 . which is presumably also the correlation between Y t -1 and Y t -2 . But if Y t is correlated with Y t -1 . and Y t -1 is equally correlated with Y t -2 . then we should also expect to find correlation between Y t and Y t-2 . In fact, the amount of correlation we should expect at lag 2 is precisely the square of the lag-1 correlation. Thus, the correlation at lag 1 quotpropagatesquot to lag 2 and presumably to higher-order lags. The partial autocorrelation at lag 2 is therefore the difference between the actual correlation at lag 2 and the expected correlation due to the propagation of correlation at lag 1. Here is the autocorrelation function (ACF) of the UNITS series, before any differencing is performed: The autocorrelations are significant for a large number of lags--but perhaps the autocorrelations at lags 2 and above are merely due to the propagation of the autocorrelation at lag 1. This is confirmed by the PACF plot: Note that the PACF plot has a significant spike only at lag 1, meaning that all the higher-order autocorrelations are effectively explained by the lag-1 autocorrelation. The partial autocorrelations at all lags can be computed by fitting a succession of autoregressive models with increasing numbers of lags. In particular, the partial autocorrelation at lag k is equal to the estimated AR( k ) coefficient in an autoregressive model with k terms--i. e. a multiple regression model in which Y is regressed on LAG(Y,1), LAG(Y,2), etc. up to LAG(Y, k ). Thus, by mere inspection of the PACF you can determine how many AR terms you need to use to explain the autocorrelation pattern in a time series: if the partial autocorrelation is significant at lag k and not significant at any higher order lags--i. e. if the PACF quotcuts offquot at lag k --then this suggests that you should try fitting an autoregressive model of order k The PACF of the UNITS series provides an extreme example of the cut-off phenomenon: it has a very large spike at lag 1 and no other significant spikes, indicating that in the absence of differencing an AR(1) model should be used. However, the AR(1) term in this model will turn out to be equivalent to a first difference, because the estimated AR(1) coefficient (which is the height of the PACF spike at lag 1) will be almost exactly equal to 1. Now, the forecasting equation for an AR(1) model for a series Y with no orders of differencing is: If the AR(1) coefficient 981 1 in this equation is equal to 1, it is equivalent to predicting that the first difference of Y is constant--i. e. it is equivalent to the equation of the random walk model with growth: The PACF of the UNITS series is telling us that, if we dont difference it, then we should fit an AR(1) model which will turn out to be equivalent to taking a first difference. In other words, it is telling us that UNITS really needs an order of differencing to be stationarized. AR and MA signatures: If the PACF displays a sharp cutoff while the ACF decays more slowly (i. e. has significant spikes at higher lags), we say that the stationarized series displays an quotAR signature, quot meaning that the autocorrelation pattern can be explained more easily by adding AR terms than by adding MA terms. You will probably find that an AR signature is commonly associated with positive autocorrelation at lag 1--i. e. it tends to arise in series which are slightly under differenced. The reason for this is that an AR term can act like a quotpartial differencequot in the forecasting equation . For example, in an AR(1) model, the AR term acts like a first difference if the autoregressive coefficient is equal to 1, it does nothing if the autoregressive coefficient is zero, and it acts like a partial difference if the coefficient is between 0 and 1. So, if the series is slightly underdifferenced--i. e. if the nonstationary pattern of positive autocorrelation has not completely been eliminated, it will quotask forquot a partial difference by displaying an AR signature. Hence, we have the following rule of thumb for determining when to add AR terms: Rule 6: If the PACF of the differenced series displays a sharp cutoff andor the lag-1 autocorrelation is positive --i. e. if the series appears slightly quotunderdifferencedquot--then consider adding an AR term to the model. The lag at which the PACF cuts off is the indicated number of AR terms. In principle, any autocorrelation pattern can be removed from a stationarized series by adding enough autoregressive terms (lags of the stationarized series) to the forecasting equation, and the PACF tells you how many such terms are likely be needed. However, this is not always the simplest way to explain a given pattern of autocorrelation: sometimes it is more efficient to add MA terms (lags of the forecast errors) instead. The autocorrelation function (ACF) plays the same role for MA terms that the PACF plays for AR terms--that is, the ACF tells you how many MA terms are likely to be needed to remove the remaining autocorrelation from the differenced series. If the autocorrelation is significant at lag k but not at any higher lags--i. e. if the ACF quotcuts offquot at lag k-- this indicates that exactly k MA terms should be used in the forecasting equation. In the latter case, we say that the stationarized series displays an quotMA signature, quot meaning that the autocorrelation pattern can be explained more easily by adding MA terms than by adding AR terms. An MA signature is commonly associated with negative autocorrelation at lag 1--i. e. it tends to arise in series which are slightly over differenced. The reason for this is that an MA term can quotpartially cancelquot an order of differencing in the forecasting equation . To see this, recall that an ARIMA(0,1,1) model without constant is equivalent to a Simple Exponential Smoothing model. The forecasting equation for this model is where the MA(1) coefficient 952 1 corresponds to the quantity 1 - 945 in the SES model. If 952 1 is equal to 1, this corresponds to an SES model with 945 0, which is just a CONSTANT model because the forecast is never updated. This means that when 952 1 is equal to 1, it is actually cancelling out the differencing operation that ordinarily enables the SES forecast to re-anchor itself on the last observation. On the other hand, if the moving-average coefficient is equal to 0, this model reduces to a random walk model--i. e. it leaves the differencing operation alone. So, if 952 1 is something greater than 0, it is as if we are partially cancelling an order of differencing. If the series is already slightly over differenced--i. e. if negative autocorrelation has been introduced--then it will quotask forquot a difference to be partly cancelled by displaying an MA signature. (A lot of arm-waving is going on here A more rigorous explanation of this effect is found in the Mathematical Structure of ARIMA Models handout.) Hence the following additional rule of thumb: Rule 7: If the ACF of the differenced series displays a sharp cutoff andor the lag-1 autocorrelation is negative --i. e. if the series appears slightly quotoverdifferencedquot--then consider adding an MA term to the model. The lag at which the ACF cuts off is the indicated number of MA terms. A model for the UNITS series--ARIMA(2,1,0): Previously we determined that the UNITS series needed (at least) one order of nonseasonal differencing to be stationarized. After taking one nonseasonal difference--i. e. fitting an ARIMA(0,1,0) model with constant--the ACF and PACF plots look like this: Notice that (a) the correlation at lag 1 is significant and positive, and (b) the PACF shows a sharper quotcutoffquot than the ACF. In particular, the PACF has only two significant spikes, while the ACF has four. Thus, according to Rule 7 above, the differenced series displays an AR(2) signature. If we therefore set the order of the AR term to 2--i. e. fit an ARIMA(2,1,0) model--we obtain the following ACF and PACF plots for the residuals: The autocorrelation at the crucial lags--namely lags 1 and 2--has been eliminated, and there is no discernible pattern in higher-order lags. The time series plot of the residuals shows a slightly worrisome tendency to wander away from the mean: However, the analysis summary report shows that the model nonetheless performs quite well in the validation period, both AR coefficients are significantly different from zero, and the standard deviation of the residuals has been reduced from 1.54371 to 1.4215 (nearly 10) by the addition of the AR terms. Furthermore, there is no sign of a quotunit rootquot because the sum of the AR coefficients (0.2522540.195572) is not close to 1. (Unit roots are discussed on more detail below .) On the whole, this appears to be a good model. The (untransformed) forecasts for the model show a linear upward trend projected into the future: The trend in the long-term forecasts is due to fact that the model includes one nonseasonal difference and a constant term: this model is basically a random walk with growth fine-tuned by the addition of two autoregressive terms--i. e. two lags of the differenced series. The slope of the long-term forecasts (i. e. the average increase from one period to another) is equal to the mean term in the model summary (0.467566). The forecasting equation is: where 956 is the constant term in the model summary (0.258178), 981 1 is the AR(1) coefficient (0.25224) and 981 2 is the AR(2) coefficient (0.195572). Mean versus constant: In general, the quotmeanquot term in the output of an ARIMA model refers to the mean of the differenced series (i. e. the average trend if the order of differencing is equal to 1), whereas the quotconstantquot is the constant term that appears on the right-hand-side of the forecasting equation . The mean and constant terms are related by the equation: CONSTANT MEAN(1 minus the sum of the AR coefficients). In this case, we have 0.258178 0.467566(1 - 0.25224 - 0.195572) Alternative model for the UNITS series--ARIMA(0,2,1): Recall that when we began to analyze the UNITS series, we were not entirely sure of the correct order of differencing to use. One order of nonseasonal differencing yielded the lowest standard deviation (and a pattern of mild positive autocorrelation), while two orders of nonseasonal differencing yielded a more stationary-looking time series plot (but with rather strong negative autocorrelation). Here are both the ACF and PACF of the series with two nonseasonal differences: The single negative spike at lag 1 in the ACF is an MA(1) signature, according to Rule 8 above. Thus, if we were to use 2 nonseasonal differences, we would also want to include an MA(1) term, yielding an ARIMA(0,2,1) model. According to Rule 5, we would also want to suppress the constant term. Here, then, are the results of fitting an ARIMA(0,2,1) model without constant: Notice that the estimated white noise standard deviation (RMSE) is only very slightly higher for this model than the previous one (1.46301 here versus 1.45215 previously). The forecasting equation for this model is: where theta-1 is the MA(1) coefficient. Recall that this is similar to a Linear Exponential Smoothing model, with the MA(1) coefficient corresponding to the quantity 2(1-alpha) in the LES model. The MA(1) coefficient of 0.76 in this model suggests that an LES model with alpha in the vicinity of 0.72 would fit about equally well. Actually, when an LES model is fitted to the same data, the optimal value of alpha turns out to be around 0.61, which is not too far off. Here is a model comparison report that shows the results of fitting the ARIMA(2,1,0) model with constant, the ARIMA(0,2,1) model without constant, and the LES model: The three models perform nearly identically in the estimation period, and the ARIMA(2,1,0) model with constant appears slightly better than the other two in the validation period. On the basis of these statistical results alone, it would be hard to choose among the three models. However, if we plot the long-term forecasts made by the ARIMA(0,2,1) model without constant (which are essentially the same as those of the LES model), we see a significant difference from those of the earlier model: The forecasts have somewhat less of an upward trend than those of the earlier model--because the local trend near the end of the series is slightly less than the average trend over the whole series--but the confidence intervals widen much more rapidly. The model with two orders of differencing assumes that the trend in the series is time-varying, hence it considers the distant future to be much more uncertain than does the model with only one order of differencing. Which model should we choose That depends on the assumptions we are comfortable making with respect to the constancy of the trend in the data. The model with only one order of differencing assumes a constant average trend--it is essentially a fine-tuned random walk model with growth--and it therefore makes relatively conservative trend projections. It is also fairly optimistic about the accuracy with which it can forecast more than one period ahead. The model with two orders of differencing assumes a time-varying local trend--it is essentially a linear exponential smoothing model--and its trend projections are somewhat more more fickle. As a general rule in this kind of situation, I would recommend choosing the model with the lower order of differencing, other things being roughly equal. In practice, random-walk or simple-exponential-smoothing models often seem to work better than linear exponential smoothing models. Mixed models: In most cases, the best model turns out a model that uses either only AR terms or only MA terms, although in some cases a quotmixedquot model with both AR and MA terms may provide the best fit to the data. However, care must be exercised when fitting mixed models. It is possible for an AR term and an MA term to cancel each others effects . even though both may appear significant in the model (as judged by the t-statistics of their coefficients). Thus, for example, suppose that the quotcorrectquot model for a time series is an ARIMA(0,1,1) model, but instead you fit an ARIMA(1,1,2) model--i. e. you include one additional AR term and one additional MA term. Then the additional terms may end up appearing significant in the model, but internally they may be merely working against each other. The resulting parameter estimates may be ambiguous, and the parameter estimation process may take very many (e. g. more than 10) iterations to converge. Hence: Rule 8: It is possible for an AR term and an MA term to cancel each others effects, so if a mixed AR-MA model seems to fit the data, also try a model with one fewer AR term and one fewer MA term--particularly if the parameter estimates in the original model require more than 10 iterations to converge. For this reason, ARIMA models cannot be identified by quotbackward stepwisequot approach that includes both AR and MA terms. In other words, you cannot begin by including several terms of each kind and then throwing out the ones whose estimated coefficients are not significant. Instead, you normally follow a quotforward stepwisequot approach, adding terms of one kind or the other as indicated by the appearance of the ACF and PACF plots. Unit roots: If a series is grossly under - or overdifferenced--i. e. if a whole order of differencing needs to be added or cancelled, this is often signalled by a quotunit rootquot in the estimated AR or MA coefficients of the model. An AR(1) model is said to have a unit root if the estimated AR(1) coefficient is almost exactly equal to 1. (By quotexactly equal quot I really mean not significantly different from . in terms of the coefficients own standard error . ) When this happens, it means that the AR(1) term is precisely mimicking a first difference, in which case you should remove the AR(1) term and add an order of differencing instead. (This is exactly what would happen if you fitted an AR(1) model to the undifferenced UNITS series, as noted earlier.) In a higher-order AR model, a unit root exists in the AR part of the model if the sum of the AR coefficients is exactly equal to 1. In this case you should reduce the order of the AR term by 1 and add an order of differencing. A time series with a unit root in the AR coefficients is nonstationary --i. e. it needs a higher order of differencing. Rule 9: If there is a unit root in the AR part of the model--i. e. if the sum of the AR coefficients is almost exactly 1--you should reduce the number of AR terms by one and increase the order of differencing by one. Similarly, an MA(1) model is said to have a unit root if the estimated MA(1) coefficient is exactly equal to 1. When this happens, it means that the MA(1) term is exactly cancelling a first difference, in which case, you should remove the MA(1) term and also reduce the order of differencing by one. In a higher-order MA model, a unit root exists if the sum of the MA coefficients is exactly equal to 1. Rule 10: If there is a unit root in the MA part of the model--i. e. if the sum of the MA coefficients is almost exactly 1--you should reduce the number of MA terms by one and reduce the order of differencing by one. For example, if you fit a linear exponential smoothing model (an ARIMA(0,2,2) model) when a simple exponential smoothing model (an ARIMA(0,1,1) model) would have been sufficient, you may find that the sum of the two MA coefficients is very nearly equal to 1. By reducing the MA order and the order of differencing by one each, you obtain the more appropriate SES model. A forecasting model with a unit root in the estimated MA coefficients is said to be noninvertible . meaning that the residuals of the model cannot be considered as estimates of the quottruequot random noise that generated the time series. Another symptom of a unit root is that the forecasts of the model may quotblow upquot or otherwise behave bizarrely. If the time series plot of the longer-term forecasts of the model looks strange, you should check the estimated coefficients of your model for the presence of a unit root. Rule 11: If the long-term forecasts appear erratic or unstable, there may be a unit root in the AR or MA coefficients. None of these problems arose with the two models fitted here, because we were careful to start with plausible orders of differencing and appropriate numbers of AR and MA coefficients by studying the ACF and PACF models. More detailed discussions of unit roots and cancellation effects between AR and MA terms can be found in the Mathematical Structure of ARIMA Models handout.

No comments:

Post a Comment